# Twitter Data Collection Assignment

## Setup

Assessment 1: Collecting Data


This assessment is a practical assessment that evaluates your understanding of metrics and value, and your ability to use Python and its libraries to locate and extract data.  In this assessment you will be extracting data using an API.  The API to be used is the Twitter API.    You are to assume the role of a data analyst for a software company that has made a software product, such as Discord, Wordle, or Tinder for example. 
You will first identify what metric the Twitter stream will be used for, and describe the data.  
You will then write a python program that collects data from across multiple twitter accounts (in class we looked at one account that of a named Twitter account, in this example you will have to extend your code to look at the tweet database from a keyword perspective).  The tweets collected should be about one software or consumer product that your company has made.


Weighting

This assessment is worth 25% of your final grade.

Deliverable

You are to submit three files:-

(1) A word/pdf document answering questions 1, 2 and the screen shot from Q3.

(2) a labelled .py python program file, using extensive use of comments.  Clearly indicate where code has been used from other sources.

(3) a one page print screen or preview of sample data/output collected in (2) above



Question 1

What metric would you use this data for?  In your answer name the metric, and explain the value it will bring.

Question 2

Document the meta-data for this metric, i.e. data source, data type, volume, velocity, variety, ethical or legislative considerations.

Question 3

You are to write a python program to extract data from Twitter.

The key attributes of the tweets (at the time of writing - this may change - check online for up to date attributes or better still check your dataset collected) pulled out are :

text: the text of the tweet itself
created_at: the date of creation
favorite_count, retweet_count: the number of favourites and retweets
favorited, retweeted: boolean stating whether the authenticated user (you) have favourited or retweeted this tweet
lang: acronym for the language (e.g. “en” for english)
id: the tweet identifier
place, coordinates, geo: geo-location information if available
user: the author’s full profile
entities: list of entities like URLs, @-mentions, hashtags and symbols
in_reply_to_user_id: user identifier if the tweet is a reply to a specific user
in_reply_to_status_id: status identifier id the tweet is a reply to a specific status
In this extraction, we are interested in the favorite_count and retweet_count.  Pull off all tweets, sort by favorite_count and retweet_count, and print/output to screen the tweet text and the counts of the top 10 favourited tweets.

There is a lot of help online, for example https://fcpython.com/blog/scraping-twitter-tweepy-python. ;

Note: it is assumed that you will use code shared openly online, in tutorials and on gitHub.  You may use this code, however at least 25% of the program must be your own code.  Clearly distinguish your code from that found online.  You should properly attribute the copyright to its author.   You must not copy code from your classmates.  You should fully understand all code submitted, and be able to explain each line of code.

In [1]:
# Local
#!pip install -r requirements.txt
# Remote option
#!pip install -r https://raw.githubusercontent.com/mrzakiakkari/reposiroty-name/requirements.txt
#Options: --quiet --user

In [2]:
from configparser import ConfigParser
from pandas import DataFrame
import csv
import pandas
import tweepy

In [3]:
config_filepath = "config.ini"
config_parser = ConfigParser()

In [4]:
config_parser.read(config_filepath)

['config.ini']

In [5]:
access_token = config_parser["Twitter"]["AccessToken"]
access_token_secret = config_parser["Twitter"]["AccessTokenSecret"]
consumer_key = config_parser["Twitter"]["ApiKey"]
consumer_secret = config_parser["Twitter"]["ApiKeySecret"]

In [6]:
o_auth_handler = tweepy.OAuthHandler(consumer_key, consumer_secret)
o_auth_handler.set_access_token(access_token, access_token_secret)
tweepy_api = tweepy.API(o_auth_handler, wait_on_rate_limit=True)

In [41]:
screen_name = "Discord"

In [42]:
tweets = tweepy_api.user_timeline(
    screen_name=screen_name,
    count=200,  # 200 is the maximum allowed count
    include_rts=False,
    tweet_mode="extended"
)  # Necessary to keep full_text otherwise only the first 140 words are extracted

In [43]:
for info in tweets[:3]:
    print("ID: {}".format(info.id))
    print(info.created_at)
    print(info.full_text)
    print("\n")

ID: 1499814795448655872
2022-03-04 18:31:27+00:00
@the_lemon_cat Can you reach out to our support team so they can look into this issue further?: https://t.co/CLfpGOYyn0


ID: 1499796582342049795
2022-03-04 17:19:04+00:00
@Titoinzim Olá, Antônio. Lamentamos por qualquer transtorno, mas não foi possível compreender sobre o que se refere. Seria possível contatar nossa equipe de suporte? https://t.co/CLfpGOYyn0


ID: 1499727524493934595
2022-03-04 12:44:40+00:00
@Thibautgameur Bonjour, Veuillez contacter notre équipe Confiance &amp; Sécurité à https://t.co/rgJ9BU8Xu3 en utilisant l'e-mail associé à ce compte. Dans la partie "Type de rapport", vous devrez sélectionner "Faire appel d'une action que l'équipe Confiance &amp; Sécurité a prise sur mon compte".




In [44]:
len(tweets)

200

In [45]:
screen_names = ["discord", "tinder"]

all_tweets = []
all_tweets.extend(tweets)
oldest_id = tweets[-1].id
for screen_name in screen_names:
    while True:
        tweets = tweepy_api.user_timeline(
            screen_name=screen_name,
            count=200,# 200 is the maximum allowed count
            include_rts=False,
            max_id=oldest_id - 1,
            # Necessary to keep full_text
            # otherwise only the first 140 words are extracted
            tweet_mode='extended')
        if len(tweets) == 0:
            break
        oldest_id = tweets[-1].id
        all_tweets.extend(tweets)
        print('N of tweets downloaded till now {}'.format(len(all_tweets)))

N of tweets downloaded till now 400
N of tweets downloaded till now 600
N of tweets downloaded till now 800
N of tweets downloaded till now 1000
N of tweets downloaded till now 1200
N of tweets downloaded till now 1400
N of tweets downloaded till now 1600
N of tweets downloaded till now 1800
N of tweets downloaded till now 2000
N of tweets downloaded till now 2200
N of tweets downloaded till now 2400
N of tweets downloaded till now 2600
N of tweets downloaded till now 2800
N of tweets downloaded till now 3000
N of tweets downloaded till now 3200
N of tweets downloaded till now 3250
N of tweets downloaded till now 3439
N of tweets downloaded till now 3638
N of tweets downloaded till now 3837
N of tweets downloaded till now 4022
N of tweets downloaded till now 4180
N of tweets downloaded till now 4362
N of tweets downloaded till now 4550
N of tweets downloaded till now 4745
N of tweets downloaded till now 4936
N of tweets downloaded till now 5123
N of tweets downloaded till now 5311
N of

In [50]:
tweets_list: list = [[
    tweet.id_str, 
    tweet.user.screen_name, 
    tweet.created_at,
    tweet.favorited,
    tweet.retweeted,
    tweet.lang,
    tweet.place,
    tweet.coordinates, 
    tweet.geo,
    tweet.user,
    tweet.entities,
    tweet.in_reply_to_user_id,
    tweet.in_reply_to_status_id,
    tweet.favorite_count, 
    tweet.retweet_count,
    tweet.full_text.encode("utf-8").decode("utf-8")
] for _, tweet in enumerate(all_tweets)]

In [52]:
tweet_columns: list = [
    "id", "screen_name", "created_at", "favourited", "retweeted", "language",
    "place", "coordinates", "geo", "user", "entities", "user identifier",
    "status identifier", "favorite_count", "retweet_count", "text"
]
dataframe: DataFrame = DataFrame(tweets_list, columns=tweet_columns)
dataframe.to_csv('./assets/twitter-Churn-ie.csv', index=False)
dataframe.head(3)

Unnamed: 0,id,screen_name,created_at,favourited,retweeted,language,place,coordinates,geo,user,entities,user identifier,status identifier,favorite_count,retweet_count,text
0,1499814795448655872,discord,2022-03-04 18:31:27+00:00,False,False,en,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [], 'symbols': [], 'user_mentions...",1.341047e+18,1.499666e+18,0,0,@the_lemon_cat Can you reach out to our suppor...
1,1499796582342049795,discord,2022-03-04 17:19:04+00:00,False,False,pt,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [], 'symbols': [], 'user_mentions...",46682840.0,1.499691e+18,0,0,"@Titoinzim Olá, Antônio. Lamentamos por qualqu..."
2,1499727524493934595,discord,2022-03-04 12:44:40+00:00,False,False,fr,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [], 'symbols': [], 'user_mentions...",9.652699e+17,1.499466e+18,0,0,"@Thibautgameur Bonjour, Veuillez contacter not..."


In [49]:
dataframe

Unnamed: 0,id,screen_name,created_at,favourited,retweeted,language,place,coordinates,geo,user,entities,user identifier,status identifier,favorite_count,retweet_count,text
0,1499814795448655872,discord,2022-03-04 18:31:27+00:00,False,False,en,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [], 'symbols': [], 'user_mentions...",1.341047e+18,1.499666e+18,0,0,@the_lemon_cat Can you reach out to our suppor...
1,1499796582342049795,discord,2022-03-04 17:19:04+00:00,False,False,pt,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [], 'symbols': [], 'user_mentions...",4.668284e+07,1.499691e+18,0,0,"@Titoinzim Olá, Antônio. Lamentamos por qualqu..."
2,1499727524493934595,discord,2022-03-04 12:44:40+00:00,False,False,fr,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [], 'symbols': [], 'user_mentions...",9.652699e+17,1.499466e+18,0,0,"@Thibautgameur Bonjour, Veuillez contacter not..."
3,1499706214464335875,discord,2022-03-04 11:19:59+00:00,False,False,fr,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [], 'symbols': [], 'user_mentions...",1.017560e+18,1.499446e+18,0,0,@Akira13345 Désolé pour la confusion ! Vous po...
4,1499602614916374532,discord,2022-03-04 04:28:19+00:00,False,False,en,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [], 'symbols': [], 'user_mentions...",1.435930e+18,1.499576e+18,0,0,@PUSHER666YT Hi! We'll need to take a look at ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6218,968605969004380161,Tinder,2018-02-27 21:57:13+00:00,False,False,en,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [{'text': 'RepresentLove', 'indic...",2.689653e+08,9.685998e+17,6,0,@helenakkim Thank you for supporting the Inter...
6219,968605111923585024,Tinder,2018-02-27 21:53:49+00:00,False,False,en,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [{'text': 'RepresentLove', 'indic...",9.638269e+17,9.685980e+17,2,0,@jam3892011 Thank you for supporting the Inter...
6220,968602778074365952,Tinder,2018-02-27 21:44:32+00:00,False,False,en,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [{'text': 'RepresentLove', 'indic...",1.365621e+09,9.685933e+17,1,0,@osczs Thank you for supporting the Interracia...
6221,968602359319347201,Tinder,2018-02-27 21:42:52+00:00,False,False,en,,,,User(_api=<tweepy.api.API object at 0x00000285...,"{'hashtags': [{'text': 'RepresentLove', 'indic...",3.480634e+08,9.685988e+17,9,1,@Laydee_Chezella Thank you for supporting the ...


In [23]:
dataframe=dataframe[["text", "favorite_count", "retweet_count"]]

In [39]:
dataframe.sort_values(
    by=["text", "favorite_count", "retweet_count"],
    ascending=[False, True, True])[["text", "favorite_count"]].head(10)

Unnamed: 0,text,favorite_count
2679,🚨 new feature alert 🚨\n\nscheduled events lets...,7741
216,you see this at 3am what you doing https://t.c...,62065
264,which one are you joining https://t.co/wmvC9t740f,49228
3154,what if we made an app for people who hate gaming,90990
2913,we’ve officially made it https://t.co/09z26LzVYl,132380
2426,we just added backgrounds you can use when vid...,11078
2290,watching arcane rn this is sick i wish league ...,59321
2043,update: snowsgiving sounds have now been made ...,2043
873,to celebrate the 5th anniversary of nitro we'r...,146521
1528,to all of the users who never leave “do not di...,89162
