# Python Problem
To write a python script to fetch all the tweets(as many as allowed by Twitter API) done by midas@IIITD twitter handle and dump the responses into JSONlines file. Further, parse these JSONline files to display the following for every tweet in a tabular format.

* The text of the tweet.
* Date and time of the tweet.
* The number of favorites/likes.
* The number of retweets.
* Number of Images present in Tweet. If no image returns None.


---



## Installing the required Libraries

In [1]:
#!pip install jsonlines



##Importing Libraries & Setting up API Credentials

In [0]:
import jsonlines
import tweepy
import pandas as pd

credentials = {'consumer_key':'',
               'consumer_secret':'',
               'access_token':'',
               'access_token_secret':''}

auth = tweepy.OAuthHandler(credentials['consumer_key'], credentials['consumer_secret'])
auth.set_access_token(credentials['access_token'], credentials['access_token_secret'])
api = tweepy.API(auth)

## Scrapping Tweets from @midasIIITD and Writting them to JSONlines File

In [0]:
tweets = api.user_timeline(screen_name='midasiiitd',count=100)
tweets = [i._json for i in tweets]
with open('output.jsonl', 'w') as fp:
  writer = jsonlines.Writer(fp)
  writer.write_all(tweets)

## Parsing Tweets using the Summarize Function

Takes input of the complete Tweet and returns a parsed Json

In [0]:
def summarize(tweet):
    new_tweet = {}
    
    for label in ["text", "created_at", "favorite_count", "retweet_count", "favorite_count"]:
      new_tweet[label] = tweet[label]
      
    new_tweet["media_count"] = 0    #initialize counter
    
    try:    #try if media exists or not
      for m in tweet['extended_entities']['media']:
          if m["type"] == 'photo':
              new_tweet["media_count"] = new_tweet["media_count"]+1   #increment counter
    except:
      new_tweet["media_count"]=None
      
      
    if new_tweet["media_count"] == 0:
      new_tweet["media_count"] = None
    
    return new_tweet

## Read JSONlines File and get Parsed JSON to form the Table

JSONs are parsed line by line using an itererator. The parsed JSON of every tweet appends each element to the dictionary 'd' containg all the columns. THe dictionary 'd is used to form the dataframe and hence represented as a Table.

In [0]:
d = {"text": [], "created_at": [], "favorite_count": [], "retweet_count": [], "favorite_count": [], "media_count": []}

with open('output.jsonl', 'r') as fp:   #Read the JSONlines file
  with jsonlines.Reader(fp) as reader:
    it = iter(reader)
    
    while(True):
      try:
        json = summarize(next(it))     #Iterate line by line in JSONlines file
        for column in list(json.keys()):
          d[column].append(json[column])
      
      except:   #To stop without interuption when iter() ends
        break

In [60]:
df = pd.DataFrame(d)
df = df.where((pd.notnull(df)), None)
df.sample(35)    #Print Table with sample of 35 tweets

Unnamed: 0,created_at,favorite_count,media_count,retweet_count,text
72,Tue Feb 05 11:55:41 +0000 2019,1,,1,"Thanks, Karan Uppal and @RatnRajiv for all you..."
42,Thu Feb 21 16:27:54 +0000 2019,0,,13,RT @kdnuggets: #AI for Social Good study - how...
11,Sat Mar 16 09:20:29 +0000 2019,1,,1,IEEE BigMM 2019 - Call for Workshop Proposals....
31,Sun Mar 03 14:55:31 +0000 2019,6,,2,Considering several requests to extend the dea...
52,Mon Feb 18 05:37:59 +0000 2019,0,,6,RT @CornellDyson: Digital ag is Cornell’s newe...
25,Fri Mar 08 13:15:34 +0000 2019,8,,4,We are in the process of finalizing the shortl...
54,Sun Feb 17 09:02:28 +0000 2019,3,,1,Looking forward to your participation in Multi...
60,Wed Feb 13 18:56:13 +0000 2019,0,1.0,5,RT @kdnuggets: Using BERT for state-of-the-art...
27,Wed Mar 06 11:12:30 +0000 2019,0,1.0,22,RT @kdnuggets: Python Data Science for Beginne...
46,Wed Feb 20 05:38:29 +0000 2019,4,,2,"Deepak Gupta, has joined @Google today. \nEarl..."
