## DataFrame Mini Project: Twitter Data


## Table of Contents
* required-libraries
* connect-api
* call-api
* read-dumped-data


# Required Libraries

Twitter has API that you can use to extract tweets and users. Here we are using tweepy library to do so.

In [1]:
!pip install tweepy

Collecting tweepy
  Downloading tweepy-4.14.0-py3-none-any.whl (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.5/98.5 kB[0m [31m329.1 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting oauthlib<4,>=3.2.0 (from tweepy)
  Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.7/151.7 kB[0m [31m459.2 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting requests-oauthlib<2,>=1.2.0 (from tweepy)
  Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Installing collected packages: oauthlib, requests-oauthlib, tweepy
Successfully installed oauthlib-3.2.2 requests-oauthlib-1.3.1 tweepy-4.14.0


In [2]:
import tweepy

In [3]:
from pathlib import Path
import pandas as pd
import json
import os
from tqdm import tqdm

# Connect to the API

In [None]:
CONSUMER_KEY = os.environ['CONSUMER_KEY']
CONSUMER_SECRET = os.environ['CONSUMER_SECRET']
ACCESS_TOKEN = os.environ['ACCESS_TOKEN']
ACCESS_TOKEN_SECRET = os.environ['ACCESS_TOKEN_SECRET']
BEARER_TOKEN = os.environ['BEARER_TOKEN']

In [None]:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True)

# Call the API


In [None]:
tweets = []
dump_path = Path('./data/twitter_data/')
if not dump_path.exists():
    dump_path.mkdir()

for page in tqdm(tweepy.Cursor(
    api.search_tweets,
    tweet_mode='extended',
    q = "#machine_learning",
    count = 10,
    # lang="en",
).pages(2)):
    for tweet in page:
        json_data = tweet._json
        with open(dump_path / f'{json_data["id"]}.json', 'w') as f:
            json.dump(json_data, f)

# Read Dumped Data

In [18]:
data_path = Path('./data/twitter_data')

In [19]:
rows = []
for file_path in tqdm(data_path.iterdir()):
    if file_path.is_dir():
        continue
    with open(file_path) as f:
        d = json.load(f)    
    rows.append(dict(
        name = d['user']['name'],
        followers = d['user']['followers_count'],
        following = d['user']['friends_count'],
        follower_following_ratio =  d['user']['followers_count'] / (d['user']['friends_count'] + 1),
        text = d.get('full_text') or d.get('text'),
        hashtags = list(map(lambda item: item['text'], d['entities']['hashtags'])),
        likes = d['favorite_count'],
        retweets = d['retweet_count'],
    ))

20it [00:00, 104.62it/s]


In [16]:
df = pd.DataFrame(rows)

In [30]:
pd.set_option('display.min_rows', 20)
pd.set_option('display.max_colwidth', 50)

In [31]:
df

Unnamed: 0,name,followers,following,follower_following_ratio,text,hashtags,likes,retweets
0,Lucifer AI,37,194,0.189744,#30DaysOfCodechallenge\nDay22\nNot done much t...,"[30DaysOfCodechallenge, 30Daysofcode, machine_...",0,5
1,Coding Buddy,9,2,3.0,RT @lucifer_twtt: #30DaysOfCodechallenge\nDay2...,"[30DaysOfCodechallenge, 30Daysofcode]",0,5
2,AI Bot by uCloudify.com,991,0,991.0,RT @lucifer_twtt: #30DaysOfCodechallenge\nDay2...,"[30DaysOfCodechallenge, 30Daysofcode]",0,5
3,#30DaysOfCode,2318,1,1159.0,RT @lucifer_twtt: #30DaysOfCodechallenge\nDay2...,"[30DaysOfCodechallenge, 30Daysofcode]",0,5
4,PyBot,902,1,451.0,RT @lucifer_twtt: #30DaysOfCodechallenge\nDay2...,"[30DaysOfCodechallenge, 30Daysofcode]",0,5
5,Mr Data Scientist,10965,270,40.461255,RT @lucifer_twtt: #30DaysOfCodechallenge\nDay2...,"[30DaysOfCodechallenge, 30Daysofcode]",0,5
6,SUPER WRITERS,178,441,0.402715,We can complete your;\n#Homework \n#Machine_Le...,"[Homework, Machine_Learning, Data_Science, Ass...",0,2
7,PyBot,902,1,451.0,RT @superwriterz: We can complete your;\n#Home...,"[Homework, Machine_Learning, Data_Science, Ass...",0,2
8,Xeron Bot,2309,1,1154.5,RT @superwriterz: We can complete your;\n#Home...,"[Homework, Machine_Learning, Data_Science, Ass...",0,2
9,//InsertUsefulComment,45,212,0.211268,RT @SmitterHane: Just use it\n#DataScience #Co...,"[DataScience, CodeNewbie, code, 100DaysOfCode,...",0,24
