Skip to content

Working with v2 Tweet Formats

Igor Brigadir edited this page Apr 6, 2021 · 1 revision

The new v2 Tweet formats are very different to v1.1 and require some extra handling, data that was previously included inline in the objects is now separately in an includes object. Each "page" of results will have to be processed as a whole.

This is a list and enumeration of every single expansion and field available in the docs. the FULL_PAYLOAD_PARAMS dictionary passed as url parameters should ensure you're getting the full payload (all available fields).

twarc should include an option that allows you to return either original json, or "atomic" version, like the implementation planned here: https://github.com/twitterdev/search-tweets-python/blob/v2/scripts/search_tweets.py#L127

This is the list of expansions and fields:

EXPANSIONS = [
    "author_id",
    "in_reply_to_user_id",
    "referenced_tweets.id",
    "referenced_tweets.id.author_id",
    "entities.mentions.username",
    "attachments.poll_ids",
    "attachments.media_keys",
    "geo.place_id",
]

USER_FIELDS = [
    "created_at",
    "description",
    "entities",
    "id",
    "location",
    "name",
    "pinned_tweet_id",
    "profile_image_url",
    "protected",
    "public_metrics",
    "url",
    "username",
    "verified",
    "withheld",
]

TWEET_FIELDS = [
    "attachments",
    "author_id",
    "context_annotations",
    "conversation_id",
    "created_at",
    "entities",
    "geo",
    "id",
    "in_reply_to_user_id",
    "lang",
    "public_metrics",
    # "non_public_metrics", # private
    # "organic_metrics", # private
    # "promoted_metrics", # private
    "text",
    "possibly_sensitive",
    "referenced_tweets",
    "reply_settings",
    "source",
    "withheld",
]

MEDIA_FIELDS = [
    "duration_ms",
    "height",
    "media_key",
    "preview_image_url",
    "type",
    "url",
    "width",
    # "non_public_metrics", # private
    # "organic_metrics", # private
    # "promoted_metrics", # private
    "public_metrics",
]

POLL_FIELDS = ["duration_minutes", "end_datetime", "id", "options", "voting_status"]

PLACE_FIELDS = [
    "contained_within",
    "country",
    "country_code",
    "full_name",
    "geo",
    "id",
    "name",
    "place_type",
]

FULL_PAYLOAD_PARAMS = {
    "expansions": ",".join(EXPANSIONS),
    "user.fields": ",".join(USER_FIELDS),
    "tweet.fields": ",".join(TWEET_FIELDS),
    "media.fields": ",".join(MEDIA_FIELDS),
    "poll.fields": ",".join(POLL_FIELDS),
    "place.fields": ",".join(PLACE_FIELDS),
}