<div class="alert alert-block alert-info">
    <h1>
        <font color='darkcyan' style='font-family:calibri'>Eddie L - Twitter Text Report</font>
    </h1>
    <p>
</div>

<font color='darkcyan' style='font-family:calibri'>
<h3>
    <strong><em>Nothing</em></strong> is more important for an independent music artist or band than the day they release an album.
</h3>

In the past, there used to be no globally-recognized release day for music. However, on July 10th, 2015, <strong><em>Friday</em> </strong>was crowned the globally-recognized release day for more than <u>45 major recorded music markets worldwide.</u>

For this report, I will use the Twitter API v2 to help construct a query to incquire about the efficacy of ahdering to these guidelines and releasing music on Global Release Day. 

Using the data I gather, I beg to ask the question <em>"What does it mean to stray from the status quo?".</em> Would this be able to grant the artists who dare to deviate from the norm more publicity, in a way?

In [1118]:
import pandas as pd
import json
import requests
import urllib

<font color='darkcyan' style='font-family:calibri'>
Establishing the endpoint for my search, as well as grabbing the bearer token from a file in the same directory.
</font>

In [1119]:
endpoint = 'https://api.twitter.com/2/tweets/search/recent'
bt = pd.read_csv('Twitter_Token_9-22.txt', header = 0)

<font color='darkcyan' style='font-family:calibri'>
Establishing the header for my API call. 
</font>

In [1120]:
header = {'Authorization': 'Bearer {}'.format(bt['Bearer_Token'][0])}

<font color='darkcyan' style='font-family:calibri'>
My driving question has to do with albums being released on Friday, however, data on albums <strong>NOT</strong> collected on Friday are <em>equally as valuable</em> in this case!

Upon constructing my query, I also realized that individuals can use a multitude of different slang terms to refer to the release of new music. I made sure to include plenty of synonyms in order to collect as much data as I could.
    
Additionally, I constricted my query to only include data in the English language. <strong>Data that isn't in English isn't inherently useless, I just wouldn't be able to use it.</strong>
    
Finally, I realized that plenty of people engage in conversations on Twitter daily about new music. Collecting this data, however, would be <strong>harmful</strong> for my use case because I would run the risk of catching duplicate pieces of data in my results. For this reason <em>I chose to exclude retweets from my query.</em> 


In [1121]:
query_1_text = '(album(Friday OR Thursday OR Monday OR Tuesday OR Wednesday OR Saturday OR Sunday) (dropping OR drop OR release OR releasing OR coming (coming out)) lang:en -is:retweet)'
query_1_encoded = urllib.parse.quote(query_1_text)
tweet_fields = 'text,public_metrics,created_at'
expansions = 'author_id'

<font color='darkcyan' style='font-family:calibri'>
I had to choose to include tweet fields that have relevancy to my case. 
    
<em>Text</em> was probably the most important one, in my opinion. This is because what individuals are tweeting is stored in the text. If I want to know <strong>anything</strong> about what or when an album is coming out, I am going to need the text of all of these tweets.
    
<em>Public metrics</em> were also crucial. I want to investigate the correlations between what days an album is released, and how much attention it gets. If a twitter user has tweeted about their album dropping on Sunday, and the public metrics show that it has received no traction, then that is valuable data for my case. 
    
<em>Created at</em> was another field that is needed for my investigation. If I can see when a tweet about a specific album or artist was made, I can potentially correlate that to the release or announcement of new music. 
    
Additionally, the author ID and username are helpful for this report. This is so I can view the users who are sending out these tweets. 


In [1122]:
query_1_url = endpoint + '?query={}&tweet.fields={}&expansions={}&user.fields={}'.format(query_1_encoded, tweet_fields, expansions, 'username')

<font color='darkcyan' style='font-family:calibri'>
    Sending the request to Twitter using the url for my query. I added the <strong>user.fields</strong> parameter to select the 'username' so that it is delivered with each returned tweet.
</font>

In [1123]:
response = requests.get(query_1_url, headers = header)

<font color='darkcyan' style='font-family:calibri'>
In order to make sense of this raw data, it must be converted into JSON dictionary first. After that, the data is able to be parsed using Python. 
</font>

In [1124]:
response_dict = json.loads(response.text)

<font color='darkcyan' style='font-family:calibri'>
The contents of a JSON dictionary are commonly written in series' of key:value pairs. Our keys in this dictionary are "data", "includes" and "meta.
 
<em>Data</em> in this case, are the tweets that we have gathered using our query. 
    
<em>Meta</em> are simply <strong>information about the data.</strong> This object contains information about the number of users returned in the current request, as well as pagination details
    
<em>Includes</em> circles back to the expansions that I requested in my query. <strong>Expansions</strong> allow users to request additional data objects to be returned within the "includes" response object. In this case, I requested the "author_id".

In [1125]:
response_dict.keys()

dict_keys(['data', 'includes', 'meta'])

<font color='darkcyan' style='font-family:calibri'>
The initial response contains <strong><em>a lot</em></strong> of data that needs parsing. 
    
The next few blocks of code exhibit this technique. In order to yield a sensible dataframe, <strong>each column should contain one response field/variable.</strong>
    
However, some columns contain objects with values that yield further response fields. In order to do this, we must parse them in order to break them down so that the data is treated with the <strong>best</strong> visualization.


In [1126]:
includes_df = pd.DataFrame(response_dict['includes']['users'])

In [1127]:
includes_df.keys()

Index(['id', 'name', 'username'], dtype='object')

In [1128]:
includes_df

Unnamed: 0,id,name,username
0,2567479412,Katherine (90s trend remix),mk_buschlen
1,265759395,lover boy .,obeyy_meeeeee
2,24036264,HotNewHipHop,HotNewHipHop
3,1555218163901632512,Naomi Wisniewska,WisniewskaNaomi
4,3522445514,Cashmere Sweater,LoudboiCash
5,614806927,Notorious R-h-i-n-e,Rhine_Reynolds1
6,1093943265798295552,christa👩🏿‍🦰 | #THRILLER40,blondedcob
7,2432387833,Paul Meany,paulmeany
8,1318538212650983424,Composer Magazine,Composer_Mag
9,21306646,Hand Drawn Dracula,HandDrawnDrac


In [1129]:
response_df = pd.DataFrame(response_dict['data'])

In [1130]:
response_df.keys()

Index(['id', 'created_at', 'author_id', 'edit_history_tweet_ids',
       'public_metrics', 'text'],
      dtype='object')

In [1131]:
response_df

Unnamed: 0,id,created_at,author_id,edit_history_tweet_ids,public_metrics,text
0,1585050951232417792,2022-10-25T23:29:30.000Z,2567479412,[1585050951232417792],"{'retweet_count': 0, 'reply_count': 0, 'like_c...",Have not stopped listening to @taylorswift13 #...
1,1585050072168833024,2022-10-25T23:26:00.000Z,265759395,[1585050072168833024],"{'retweet_count': 0, 'reply_count': 0, 'like_c...",@dvsn album coming out Friday too 😩
2,1585047993094381570,2022-10-25T23:17:44.000Z,24036264,[1585047993094381570],"{'retweet_count': 2, 'reply_count': 0, 'like_c...",The R&amp;B star is steady sharing singles whi...
3,1585045073011830785,2022-10-25T23:06:08.000Z,1555218163901632512,[1585045073011830785],"{'retweet_count': 0, 'reply_count': 0, 'like_c...",@officialblue @BlueItalia Blue boy band your a...
4,1585043977698672643,2022-10-25T23:01:47.000Z,3522445514,[1585043977698672643],"{'retweet_count': 0, 'reply_count': 0, 'like_c...",Twisted tea out next Friday! Until then please...
5,1585041856362348544,2022-10-25T22:53:21.000Z,614806927,[1585041856362348544],"{'retweet_count': 0, 'reply_count': 0, 'like_c...",I’m so fuckin hype for Drake &amp; 21’s collab...
6,1585040226216742912,2022-10-25T22:46:53.000Z,1093943265798295552,[1585040226216742912],"{'retweet_count': 0, 'reply_count': 1, 'like_c...",Preorder ‘Bout Mine’ dropping this Friday! Bou...
7,1585036365359910912,2022-10-25T22:31:32.000Z,2432387833,[1585036365359910912],"{'retweet_count': 7, 'reply_count': 3, 'like_c...",@mutemath Live album from our last show coming...
8,1585035989621383173,2022-10-25T22:30:03.000Z,1318538212650983424,[1585035989621383173],"{'retweet_count': 1, 'reply_count': 0, 'like_c...",We're loving this single from Amir Yaghmai - a...
9,1585035566617223168,2022-10-25T22:28:22.000Z,21306646,[1585035566617223168],"{'retweet_count': 1, 'reply_count': 0, 'like_c...",🎃👻 This Friday! @wavelengthmusic presents BONN...


In [1132]:
public_metrics_df = pd.DataFrame(list(response_df['public_metrics']))

In [1133]:
public_metrics_df.keys()

Index(['retweet_count', 'reply_count', 'like_count', 'quote_count'], dtype='object')

In [1134]:
public_metrics_df

Unnamed: 0,retweet_count,reply_count,like_count,quote_count
0,0,0,0,0
1,0,0,0,0
2,2,0,13,0
3,0,0,0,0
4,0,0,0,0
5,0,0,1,0
6,0,1,0,0
7,7,3,34,1
8,1,0,2,0
9,1,0,1,0


<font color='darkcyan' style='font-family:calibri'>
<strong><em>Pagination</em></strong> is the next big step in this process. 

Twitter only allows up to 100 results per query "page". However, Twitter grants access to more through <strong><em>pagination.</em></strong>
    
The key to this process lies within the <em>meta</em>. Within the meta there lies a <strong>"next_token"</strong> key, which stores the value to the next query page. 
    
In the following function, this "next_token" is returned from each page of the query after all of the tweets have been gathered, allowing access to the next page of the search. This is able to go on for as long as a page contains data (tweets, in this case) and enough data is present to be stored in another page, ensuring that the "next_token" field is not blank.

In [1135]:
def twt_recent_search (query, num_pages, header):
    response_list = []
    next_token = ''
    for i in range(0, num_pages):
        if i > 0:
            this_query = query + "&next_token={}".format(next_token)
        else:
            this_query = query
        
        this_response = requests.get(this_query, headers = header)
        print(this_response.status_code)
        this_response_dict = json.loads(this_response.text)
        response_list.append(this_response_dict)
        next_token = this_response_dict['meta']['next_token']
        
    return response_list

<font color='darkcyan' style='font-family:calibri'>
Collecting additional responses by calling the function 30 times.

<strong><em>This results in a total of 300 tweets.</em></strong>


In [1136]:
my_responses = twt_recent_search(query_1_url, 30, header)

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


<font color='darkcyan' style='font-family:calibri'>
After collecting 300 tweets, we are left with a sequences of <em>JSON dictionaries.</em>

This sequence must be converted into a dataframe.

In [1137]:
results = pd.DataFrame.from_records(my_responses)

In [1138]:
data_list = list(results['data'])

<font color='darkcyan' style='font-family:calibri'>
Creating a dataframe for each item in our list.
</font>

In [1139]:
data_list_of_dfs = [pd.DataFrame(x) for x in data_list]

<font color='darkcyan' style='font-family:calibri'>
Using <strong><em>concatenation</em></strong>, combine the dataframe for each page into one. 
</font>

In [1140]:
final_df = pd.concat(data_list_of_dfs)

<font color='darkcyan' style='font-family:calibri'>
Since the data has already been parsed above, <strong><em>all that needs done is adding the columns to the default Pandas dataframe of the JSON that twitter has returned.</em></strong>
</font>

In [1141]:
final_df['retweets'] = public_metrics_df['retweet_count']
final_df['replies'] = public_metrics_df['reply_count']
final_df['likes'] = public_metrics_df['like_count']
final_df['quotes'] = public_metrics_df['quote_count']
final_df['username'] = includes_df['username']

<font color='darkcyan' style='font-family:calibri'>
<strong>Exporting the final dataframe to a CSV file.</strong>
</font>

In [1142]:
final_df.to_csv(r"C:\Users\possi\data-fa22\final_df.csv")

<font color='darkcyan' style='font-family:calibri'>
<h2>Conclusions</h2>
    
<strong><em>What does it mean to stray from the status quo?</em></strong>
    
Well...it means nothing to some, but everything to most!
    
Artists everywhere depend heavily on Global Release Day. Popular artists still tend to stick to the Friday release date, and lesser-known artists have more to lose, so they are less likely to stray from this trend to avoid the risk of losing traction. 
    
While viewing my data, I noticed that Taylor Swift's new album <em>Midnights</em> was the subject of most discussion. Sure enough, this album was released on Friday, October 21st, 2022. The Beatles also released their new album <em>Revolver</em> on a recent Friday.
  
Most small artists stuck with a Friday release date as well, I even noticed a tweet berating a small artist for releasing an album on a Friday, almost as if fans are aware that they aren't in a position to take risks. Those that didn't release on Friday barely got any attention at all.
    
However, a fair amount of big artists were able to successfully shake it up and release albums on days that did not land on Friday. Paul Meany of Pierce the Veil announced a live album releasing on a Monday, which gained an unexpected amount of popularity.
    
To conclude, it looks like for those with a decent following, straying from the status quo may just be the answer. However, that is not a risk many starving artists can afford to take.     
    
<h3>Quality of data // weaknesses and limitations</h3>
Overall, I am pleased with the quality of the data that my query was able to return. 
    
I was given a wide variety of data from big and small artists, as well as their fans. Being able to see the usernames of who tweeted, what they tweeted about, and researching the relevance it had to my topic was so interesting!
   
A huge weakness of my data is how recent it is. I would've originally loved to have done a study on certain genres of albums, and the success they have throughout different times of the year, but that wasn't possible with essential access since I cannot go back further than 30 days. 
    
If I were able to look back further at this data, I would be able to see if releasing an album on Friday made as much of a difference prior to it being declared Global Release Day.
    
Additionally, my data is heavily skewed towards English-speaking artists. A human being (me) collected this data, and this is to be expected, especially since English is the language I speak the majority of the time. 
    
Also, this data is limited because of the platform! Twitter is very popular but, it would be easier to assess the impact that Global Release Day has had on artists using something more specialized. People are biased with what they discuss online. There are definitely albums that are seldom being talked about on Twitter right now, but are very popular amongst those who don't go online very much.
<h3>Alternative approaches // potential next steps</h3>
As I said before, an alternative approach to studying this phenomenon would be to be able to declare a start_time for my data. This way, I could study how Global Release Day impacted various artists in announcing and releasing their music. 
    
I could also filter my results further to only see tweets from verified users. This would give me a more refined pool of data that would theoretically only consist of announcements from verified artists. 
    
However, this would be biased heavily towards small artists. Additionally I wouldn't be able to gather data on what fans are discussing regarding an albums release, which is crucial for determining the efficacy of the release date. 