# Twitter Text Report
Kaitlin Cochran

### Driving Question
Are people talking about the Beatles' new album they released, or is the #Beatles being used for things that are not related to the recent album?


I used the #Beatles query because I wanted to get information about who was using the hashtag to see how many tweets related to the new album. I didn't use a language filter in the query because I thought I would get more results when looking outside of the English language.

In [201]:
import requests
import pandas as pd
import urllib
import json

Getting the bearer token from a text file in the same directory:

In [202]:
bearer_token = pd.read_csv("twitterAPI.txt", sep = "\t", header = 0)

Creating a header for the API call

In [203]:
header = {'Authorization' : 'Bearer {}'.format(bearer_token['Bearer_Token'].iloc[0])}

Defining the endpoint url I will be using to gather tweets

In [204]:
endpoint_url = 'https://api.twitter.com/2/tweets/search/recent'

Defining the query I will be using to pull tweets from Twitter

In [205]:
query = urllib.parse.quote('(#beatles)')

Defining tweet fields I would like to pull for each tweet

In [206]:
tweet_fields = 'public_metrics,created_at,author_id,lang'

Defining that I would also like to pull the author ID and geo place ID for each tweet

In [207]:
expansions = 'author_id,geo.place_id'

Defining the final API url that will be used to access Twitter and pull the tweets I wish to see

In [208]:
my_api_url = endpoint_url + '?query={}&max_results=100&tweet.fields={}&expansions={}&user.fields={}'.format(query, tweet_fields, expansions, 'username')

Using the requests package to make the GET API call

In [209]:
response = requests.request("GET", my_api_url, headers = header)

Ensuring that tweet data came through

In [None]:
response.text

In [211]:
responseDict = json.loads(response.text)

In [212]:
responseDict.keys()

dict_keys(['data', 'includes', 'meta'])

Creating a Data Frame with the new tweet data

In [213]:
responseDf = pd.DataFrame(responseDict['data'])

In [231]:
responseDf[1:10]

Unnamed: 0,author_id,public_metrics,id,text,created_at,lang,geo
1,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970171976261634,RT @izumiman1961: THE #BEATLES https://t.co/GT...,2021-10-20T23:40:21.000Z,und,
2,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970113260150788,RT @izumiman1961: THE #BEATLES https://t.co/hK...,2021-10-20T23:40:07.000Z,und,
3,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970065403219968,RT @izumiman1961: THE #BEATLES https://t.co/Uw...,2021-10-20T23:39:55.000Z,und,
4,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450969987481423876,RT @izumiman1961: #JamesMcCartney #DhaniHarris...,2021-10-20T23:39:37.000Z,und,
5,1078805978731237376,"{'retweet_count': 2, 'reply_count': 0, 'like_c...",1450969866752696320,RT @planet_beatles: Fact or Fiction? Ringo fir...,2021-10-20T23:39:08.000Z,en,
6,146676719,"{'retweet_count': 63, 'reply_count': 0, 'like_...",1450969840617869313,RT @the_stylophone: THE BEATLES playing the St...,2021-10-20T23:39:02.000Z,en,
7,133131352,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",1450969603673362443,RT @BeatlesArchive2: John Lennon (+ George)\nT...,2021-10-20T23:38:05.000Z,en,
8,987106751227936768,"{'retweet_count': 16, 'reply_count': 0, 'like_...",1450969488107655172,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,2021-10-20T23:37:38.000Z,en,
9,814919222690201601,"{'retweet_count': 16, 'reply_count': 0, 'like_...",1450969230812319749,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,2021-10-20T23:36:36.000Z,en,


Using the "meta" key to ensure the "next_token" field is not blank

In [215]:
responseDict['meta']

{'newest_id': '1450970576663818240',
 'oldest_id': '1450944762161340416',
 'result_count': 100,
 'next_token': 'b26v89c19zqg8o3fpdv5st43a32sz4edswg4o8jdfk1a5'}

"Flipping to the next page" of tweets using pagination and the "next_token" field

In [216]:
url_2 = my_api_url + '&next_token={}'.format(responseDict['meta']['next_token'])

Making the second API call with the new API url

In [217]:
response2 = requests.request("GET", url_2, headers = header)

In [None]:
response2.text

In [219]:
responseDict2 = json.loads(response2.text)

Getting the last "next_token" for a third "page" of tweets

In [220]:
responseDict2['meta']

{'newest_id': '1450944689956433937',
 'oldest_id': '1450929354842722317',
 'result_count': 100,
 'next_token': 'b26v89c19zqg8o3fpdv5st3hwsl4d9myezp6o7412r4vx'}

Making the last API url for the last page of tweets

In [221]:
url_3 = my_api_url + '&next_token={}'.format(responseDict2['meta']['next_token'])

Making the final API call 

In [222]:
response3 = requests.request("GET", url_3, headers = header)

In [None]:
response3.text

In [232]:
responseDf[1:10]

Unnamed: 0,author_id,public_metrics,id,text,created_at,lang,geo
1,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970171976261634,RT @izumiman1961: THE #BEATLES https://t.co/GT...,2021-10-20T23:40:21.000Z,und,
2,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970113260150788,RT @izumiman1961: THE #BEATLES https://t.co/hK...,2021-10-20T23:40:07.000Z,und,
3,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970065403219968,RT @izumiman1961: THE #BEATLES https://t.co/Uw...,2021-10-20T23:39:55.000Z,und,
4,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450969987481423876,RT @izumiman1961: #JamesMcCartney #DhaniHarris...,2021-10-20T23:39:37.000Z,und,
5,1078805978731237376,"{'retweet_count': 2, 'reply_count': 0, 'like_c...",1450969866752696320,RT @planet_beatles: Fact or Fiction? Ringo fir...,2021-10-20T23:39:08.000Z,en,
6,146676719,"{'retweet_count': 63, 'reply_count': 0, 'like_...",1450969840617869313,RT @the_stylophone: THE BEATLES playing the St...,2021-10-20T23:39:02.000Z,en,
7,133131352,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",1450969603673362443,RT @BeatlesArchive2: John Lennon (+ George)\nT...,2021-10-20T23:38:05.000Z,en,
8,987106751227936768,"{'retweet_count': 16, 'reply_count': 0, 'like_...",1450969488107655172,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,2021-10-20T23:37:38.000Z,en,
9,814919222690201601,"{'retweet_count': 16, 'reply_count': 0, 'like_...",1450969230812319749,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,2021-10-20T23:36:36.000Z,en,


Making the second page of tweets into a Data Frame

In [233]:
responseDf_2 = pd.DataFrame(responseDict2['data'])
responseDf_2[1:10]

Unnamed: 0,id,text,public_metrics,lang,created_at,author_id
1,1450944074949746695,"@BBCNews ""Take the vaccine or feel the wrath o...","{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,2021-10-20T21:56:39.000Z,1239199214
2,1450943958935343113,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,"{'retweet_count': 16, 'reply_count': 0, 'like_...",en,2021-10-20T21:56:11.000Z,1189845246529736704
3,1450943282675023875,RT @LOVEstaff: 今日もありがとうございました！\n\n#SMAP の #隠れ名...,"{'retweet_count': 22, 'reply_count': 0, 'like_...",ja,2021-10-20T21:53:30.000Z,124366415
4,1450943030274543616,RT @60sPsychJukebox: October 20th 1965: The #...,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",en,2021-10-20T21:52:30.000Z,26514883
5,1450942918966104073,RT @izumiman1961: THE #BEATLES https://t.co/Zf...,"{'retweet_count': 3, 'reply_count': 0, 'like_c...",und,2021-10-20T21:52:03.000Z,1294932089033416704
6,1450942798467776513,RT @izumiman1961: THE #BEATLES https://t.co/lz...,"{'retweet_count': 2, 'reply_count': 0, 'like_c...",und,2021-10-20T21:51:35.000Z,922987038063828992
7,1450942789529661440,RT @izumiman1961: THE #BEATLES https://t.co/j2...,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",und,2021-10-20T21:51:32.000Z,922987038063828992
8,1450942704943132677,RT @izumiman1961: THE #BEATLES https://t.co/KG...,"{'retweet_count': 2, 'reply_count': 0, 'like_c...",und,2021-10-20T21:51:12.000Z,922987038063828992
9,1450942695036252160,RT @izumiman1961: THE #BEATLES https://t.co/EX...,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",und,2021-10-20T21:51:10.000Z,922987038063828992


Making the third page of tweets into a Data Frame

In [234]:
responseDict3 = json.loads(response3.text)
responseDf_3 = pd.DataFrame(responseDict3['data'])
responseDf_3[1:10]

Unnamed: 0,created_at,public_metrics,lang,text,author_id,id
1,2021-10-20T20:54:44.000Z,"{'retweet_count': 16, 'reply_count': 0, 'like_...",en,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,133131352,1450928495169884171
2,2021-10-20T20:54:24.000Z,"{'retweet_count': 37, 'reply_count': 0, 'like_...",en,RT @BeatlesArchive2: The #Beatles Mad Day Out ...,981312863611518976,1450928407810908168
3,2021-10-20T20:54:00.000Z,"{'retweet_count': 21, 'reply_count': 0, 'like_...",en,RT @BeatlesArchive2: John Lennon George Harris...,1393020265936867330,1450928309936734212
4,2021-10-20T20:53:05.000Z,"{'retweet_count': 18, 'reply_count': 0, 'like_...",en,RT @BeatlesArchive2: George Harrison \nThe #Be...,1393020265936867330,1450928078243385348
5,2021-10-20T20:51:50.000Z,"{'retweet_count': 16, 'reply_count': 0, 'like_...",en,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,1361739437563068421,1450927764136239108
6,2021-10-20T20:50:02.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,20th Oct 2014:\nThe childhood home of George H...,2543507118,1450927310178332681
7,2021-10-20T20:48:35.000Z,"{'retweet_count': 8, 'reply_count': 0, 'like_c...",en,RT @BeatlesEveryDay: I never realized what a k...,1040732834,1450926946410450946
8,2021-10-20T20:48:01.000Z,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",en,RT @LDissected: Roll 27-A\n\nMaxwell Silver Ha...,1294545061653262336,1450926803086839812
9,2021-10-20T20:47:51.000Z,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",en,RT @LDissected: Roll 26-A\n\nThe time is now 5...,1294545061653262336,1450926761877721091


Creating one large Data Frame using the concat() function

In [227]:
frames = [responseDf, responseDf_2, responseDf_3]
finalDataframe = pd.concat(frames)

Ensuring the final Data Frame has 300 rows and all of the fields I asked for in the API calls

In [235]:
finalDataframe[1:10]

Unnamed: 0,author_id,public_metrics,id,text,created_at,lang,geo
1,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970171976261634,RT @izumiman1961: THE #BEATLES https://t.co/GT...,2021-10-20T23:40:21.000Z,und,
2,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970113260150788,RT @izumiman1961: THE #BEATLES https://t.co/hK...,2021-10-20T23:40:07.000Z,und,
3,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450970065403219968,RT @izumiman1961: THE #BEATLES https://t.co/Uw...,2021-10-20T23:39:55.000Z,und,
4,922987038063828992,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",1450969987481423876,RT @izumiman1961: #JamesMcCartney #DhaniHarris...,2021-10-20T23:39:37.000Z,und,
5,1078805978731237376,"{'retweet_count': 2, 'reply_count': 0, 'like_c...",1450969866752696320,RT @planet_beatles: Fact or Fiction? Ringo fir...,2021-10-20T23:39:08.000Z,en,
6,146676719,"{'retweet_count': 63, 'reply_count': 0, 'like_...",1450969840617869313,RT @the_stylophone: THE BEATLES playing the St...,2021-10-20T23:39:02.000Z,en,
7,133131352,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",1450969603673362443,RT @BeatlesArchive2: John Lennon (+ George)\nT...,2021-10-20T23:38:05.000Z,en,
8,987106751227936768,"{'retweet_count': 16, 'reply_count': 0, 'like_...",1450969488107655172,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,2021-10-20T23:37:38.000Z,en,
9,814919222690201601,"{'retweet_count': 16, 'reply_count': 0, 'like_...",1450969230812319749,RT @TheBeatlesPix: The #Beatles July 10 1964 a...,2021-10-20T23:36:36.000Z,en,


Exporting the data to a CSV file

In [229]:
finalDataframe.to_csv(r"C:\Users\kathm\EMAT22110\twitterData.csv")

## Conclusions

### Quality of data / weaknesses and limitations
- I believe the quality of this data may not be exactly what this project was intended for, but I found it fascinating that so many people were still interested in the Beatles. I thought it would be interesting to collect data on where people were tweeting from, so I added the "geo.place_id" field. This ended up going poorly because there were less than 10 tweets that actually had a value for this field. I think if I were to sample more than 300 tweets there might be more geo tags, but I also think the topic of the Beatles doesn't exactly warrant a geo tag. Something like tweets about a tour might have more tweets that tag location to follow the band across a country or around the world.

### Alternate approaches / next steps
- Because I wasn't able to find an album released right now that I cared about, I decided to use the Beatles because they were releasing a new album, which is a new version of their album "Let It Be", which has more than the number of tweets I was looking for. If there were more albums coming out or a new single were trending I think the results would be more directed towards the release, not just general tweets about the Beatles. I think I need to find a more specific research question or research a certain artist and how their music has trended over time. 