# Demo: Extra Data From Twitter

In order to get alt-text data from images in Tweets, we're going to have to look at how to get extra data from Twitter.

_Note: You don't really need to undestand this whole process, you can just take the final code pieces and copy/paste them to use them yourself. We are including this explanation in case you want to know how it is working._

The examples here are based on examples from [this website](https://dev.to/twitterdev/a-comprehensive-guide-for-using-the-twitter-api-v2-using-tweepy-in-python-15d9)

But first let's do our normal tweepy set-up

## Normal Tweepy Set-Up

In [1]:
import tweepy

(optional) use the fake version of tweepy, so you don’t have to use real twitter developer access passwords

In [2]:
# Load all your developer access passwords into Python
# TODO: Put your twitter account's special developer access passwords below:
bearer_token = "n4tossfgsafs_fake_bearer_token_isa53#$%$"
consumer_key = "sa@#4@fdfdsa_fake_consumer_key_$%DSG#%DG"
consumer_secret = "45adf$T$A_fake_consumer_secret_JESdsg"
access_token = "56sd5Ss4tsea_fake_access_token_%YE%hDsdr"
access_token_secret = "j^$dr_fake_consumer_key_^A5s#DR5s"

In [3]:
# Give the tweepy code your developer access passwords so
# it can perform twitter actions
client = tweepy.Client(
   bearer_token=bearer_token,
   consumer_key=consumer_key, consumer_secret=consumer_secret,
   access_token=access_token, access_token_secret=access_token_secret
)

## Get media (including image) data

If we want to get media (including image) data from tweets, when we are using search_recent_tweets, then we have to include:
- `expansions='attachments.media_keys'` which tells Tweepy to get the media information for the tweet
- `media_fields=['preview_image_url', 'height', 'width']` which tells Tweepy which information to get for each piece of media.

Let's do a search for tweets that include the word dog, and have an image, and are not retweets (so we don't just get the same tweet for all the times it was retweeted):

In [4]:
query = "dog -is:retweet has:images"

tweet_search_results = client.search_recent_tweets(
                                    query=query,
                                    expansions='attachments.media_keys', #tell twitter to download the media related to this tweet
                                    media_fields=['preview_image_url', 'height', 'width']  # when getting the media, make sure to include this info
                                    )


TwitterServerError: 500 Internal Server Error

Now, when our search comes back, it has both the Tweet information and the information about media (including images) in those Tweets. 

Unfortunately the Tweet info and the media info come back in two separate parts of the tweet_search_results:
- `tweet_search_results.data` has the list of tweets
- `tweet_search_results.includes['media']` has a list of the pieces of media in the tweets



In [7]:
display(tweet_search_results.data)

[<Tweet id=1584802143131029504 text="Favorite macOS Ventura update so far: this new printer preview dog(?) in Xcode.\n\nI accidentally open this instead of 'Open Quickly...' on a daily basis. Nice to have a new friend greeting me. 🐶 https://t.co/kK0sJMlQYV">,
 <Tweet id=1584802141184876544 text='Not much light on the dog walk this morning, looking forward to the clocks changing. 🐕🌒 https://t.co/iedHiOFw7X'>,
 <Tweet id=1584802130141106177 text="#NowPlaying on #personalfavorites internet radio: Hair of the Dog by Nazareth #ListenLive 'Alexa, Play Personal Favorites Now' -or- https://t.co/cGpACL8uW1. #AutumnVibes #October #Halloween\n Buy this Personal Favorite! https://t.co/8UvoY14DK7 https://t.co/VzRPgF3HOC">,
 <Tweet id=1584802129381888002 text='Want to learn more about $TAG? Wait no longer, the tea is about to be spilled!\n\nDog Tag ($TAG) is an ERC-20 governance token for the CollarQuest metaverse. TAG holders will be able to claim rewards if they stake their tokens, play the game, 

In [12]:
display(tweet_search_results.includes['media'])

[<Media media_key=3_1584799486039277569 type=photo>,
 <Media media_key=3_1584802135358717953 type=photo>,
 <Media media_key=3_1584802128966688769 type=photo>,
 <Media media_key=3_1584802127238631424 type=photo>,
 <Media media_key=3_1584802096259518464 type=photo>,
 <Media media_key=3_1584802061530824705 type=photo>,
 <Media media_key=3_1584802054589263872 type=photo>,
 <Media media_key=3_1584801972791779330 type=photo>,
 <Media media_key=3_1584801194295189504 type=photo>,
 <Media media_key=3_1584802023605563393 type=photo>,
 <Media media_key=3_1584801968148594688 type=photo>,
 <Media media_key=3_1584801977392893953 type=photo>,
 <Media media_key=3_1584801988855975936 type=photo>,
 <Media media_key=3_1584802001287843840 type=photo>]

The way this comes back doesn't directly tell us which piece of media is part of which tweet. Instead, for each piece of media, there is a special id number called the `media_key`, and for each tweet there is a list of `media_key`s that are part of the tweet. 
- for a `tweet` in `tweets.data`, the media_keys are in `tweet.data['attachments']['media_keys']`
- for a piece of `media` in the `tweets.includes['media']`, the media_id is in `media['media_key']`

So, if we are looking at a tweet, and look at the media keys, we will want to look up the media information that goes with that key. Looking up something based on a key is easiest to do with a dictionary in Python. So, what we will do is make a dictionary where the keys are media_keys, and the values are the media information. It will look something like this:

Below is the code to do this (using several Python short hand tricks at once):

In [14]:
media_lookup = {m["media_key"]: m for m in tweet_search_results.includes['media']}

display(media_lookup)

{'3_1584799486039277569': <Media media_key=3_1584799486039277569 type=photo>,
 '3_1584802135358717953': <Media media_key=3_1584802135358717953 type=photo>,
 '3_1584802128966688769': <Media media_key=3_1584802128966688769 type=photo>,
 '3_1584802127238631424': <Media media_key=3_1584802127238631424 type=photo>,
 '3_1584802096259518464': <Media media_key=3_1584802096259518464 type=photo>,
 '3_1584802061530824705': <Media media_key=3_1584802061530824705 type=photo>,
 '3_1584802054589263872': <Media media_key=3_1584802054589263872 type=photo>,
 '3_1584801972791779330': <Media media_key=3_1584801972791779330 type=photo>,
 '3_1584801194295189504': <Media media_key=3_1584801194295189504 type=photo>,
 '3_1584802023605563393': <Media media_key=3_1584802023605563393 type=photo>,
 '3_1584801968148594688': <Media media_key=3_1584801968148594688 type=photo>,
 '3_1584801977392893953': <Media media_key=3_1584801977392893953 type=photo>,
 '3_1584801988855975936': <Media media_key=3_1584801988855975936

Now we can choose a tweet, find the media_keys for that tweet, and then look up the media information on each of those tweets

In [17]:
# get the first tweet
first_tweet = tweet_search_results.data[0]

print("displaying info for tweet: " + first_tweet.text)

# get the media keys for the first tweet
first_tweet_media_keys = first_tweet.data['attachments']['media_keys']

# loop through the media keys
for media_key in first_tweet_media_keys:
    # lookup the info about this particular media_key
    media_info = media_lookup[media_key]
    
    # print out some info about this piece of media
    print("  type: " + media_info.type)
    print("  height: " + str(media_info.height))
    print("  width: " + str(media_info.width))
    print()

displaying info for tweet: Favorite macOS Ventura update so far: this new printer preview dog(?) in Xcode.

I accidentally open this instead of 'Open Quickly...' on a daily basis. Nice to have a new friend greeting me. 🐶 https://t.co/kK0sJMlQYV
  type: photo
  height: 628
  width: 1480


## Get user information
User information works the same way that media information did, though there will only be one author per tweet. We have to set an expansion and tell what user fields to download:

In [19]:
query = "dog -is:retweet has:images"

tweet_search_results = client.search_recent_tweets(
                                    query=query,
                                    expansions='author_id', #tell twitter to download the author related to this tweet
                                    user_fields=['profile_image_url']  # when getting the author, make sure to include this info
                                    )

Then we make a lookup dictionary for the user information

In [20]:
user_lookup = {u["id"]: u for u in tweet_search_results.includes['users']}

display(user_lookup)

{1727995530: <User id=1727995530 name=Syed Haris Ahmed 🇵🇰 username=syedharisahmed4>,
 1072162801: <User id=1072162801 name=Stephanie Cowburn username=StephCowburn>,
 19609660: <User id=19609660 name=Tom Gillispie -- NATURE NEEDS OUR HELP 🦮🌎🌊🏈 username=EDITORatWORK>,
 1564817655802089472: <User id=1564817655802089472 name=jenn 🍂 username=jennitaliass>,
 1542487286058819586: <User id=1542487286058819586 name=实体_店_会_所_ 西湖 滨江 拱墅 绍兴 西安 北郊 未央 雁塔 曲江 陕西 咸阳 武汉 武昌 username=LeanaNeyland>,
 769963290680233984: <User id=769963290680233984 name=ハルにゃん@取引用アカウント username=harunyancyuuu>,
 7681732: <User id=7681732 name=Eileen username=NZNeep>,
 1563865803946336256: <User id=1563865803946336256 name=Mark Zz 🧪 🧪 🧪 🧪 username=MarkZz86437584>,
 1097114082849771522: <User id=1097114082849771522 name=The Garden House username=TheGardenHouse5>,
 1344888147549814785: <User id=1344888147549814785 name=Paul Storm username=PaulSto91558029>}

Then we can find the `author_id` of a tweet in tweet.author_id, and look it up in the `user_lookup` dictionary

In [23]:
first_tweet = tweet_search_results.data[0]

print("displaying info for tweet: " + first_tweet.text)

# get the author id for the first tweet
first_tweet_author_id = first_tweet.author_id

author = user_lookup[first_tweet_author_id]

# look up info about the author:
print("  author name: " + author.name)
print("  author username: " + author.username)
print("  author profile image: " + author.profile_image_url)



displaying info for tweet: Sometimes I feel sorry for the mainstream media in this country. It seems that not only Ducky, but all of you have been bitten by the dog. What is the level of our media? https://t.co/FfMaygdlHk
  author name: Syed Haris Ahmed 🇵🇰
  author username: syedharisahmed4
  author profile image: https://pbs.twimg.com/profile_images/1564543192103587842/lcoBoC_c_normal.jpg
