# Python Problem
To write a python script to fetch all the tweets(as many as allowed by Twitter API) done by midas@IIITD twitter handle and dump the responses into JSONlines file. Further, parse these JSONline files to display the following for every tweet in a tabular format.

* The text of the tweet.
* Date and time of the tweet.
* The number of favorites/likes.
* The number of retweets.
* Number of Images present in Tweet. If no image returns None.


---



## Installing the required Libraries

In [1]:
#!pip install jsonlines



##Importing Libraries & Setting up API Credentials

In [0]:
import jsonlines
import tweepy
import pandas as pd

credentials = {'consumer_key':'',
               'consumer_secret':'',
               'access_token':'',
               'access_token_secret':''}

auth = tweepy.OAuthHandler(credentials['consumer_key'], credentials['consumer_secret'])
auth.set_access_token(credentials['access_token'], credentials['access_token_secret'])
api = tweepy.API(auth)

## Scrapping Tweets from @midasIIITD and Writting them to JSONlines File

In [0]:
tweets = api.user_timeline(screen_name='midasiiitd',count=100)
tweets = [i._json for i in tweets]
with open('output.jsonl', 'w') as fp:
  writer = jsonlines.Writer(fp)
  writer.write_all(tweets)

## Parsing Tweets using the Summarize Function

Takes input of the complete Tweet and returns a parsed Json

In [0]:
def summarize(tweet):
    new_tweet = {}
    
    for label in ["text", "created_at", "favorite_count", "retweet_count", "favorite_count"]:
      new_tweet[label] = tweet[label]
      
    new_tweet["media_count"] = 0    #initialize counter
    
    try:    #try if media exists or not
      for m in tweet['extended_entities']['media']:
          if m["type"] is 'photo':
              new_tweet["media_count"] = new_tweet["media_count"]+1   #increment counter
    except:
      new_tweet["media_count"]=None
    
    if new_tweet["media_count"]==0:
      new_tweet["media_count"] = None
    
    return new_tweet

## Read JSONlines File and get Parsed JSON to form the Table

JSONs are parsed line by line using an itererator. The parsed JSON of every tweet appends each element to the dictionary 'd' containg all the columns. THe dictionary 'd is used to form the dataframe and hence represented as a Table.

In [0]:
d = {"text": [], "created_at": [], "favorite_count": [], "retweet_count": [], "favorite_count": [], "media_count": []}

with open('output.jsonl', 'r') as fp:   #Read the JSONlines file
  with jsonlines.Reader(fp) as reader:
    it = iter(reader)
    
    while(True):
      try:
        json = summarize(next(it))     #Iterate line by line in JSONlines file
        for column in list(json.keys()):
          d[column].append(json[column])
      
      except:   #To stop without interuption when iter() ends
        break

In [13]:
df = pd.DataFrame(d)
df.sample(35)    #Print Table with sample of 35 tweets

Unnamed: 0,created_at,favorite_count,media_count,retweet_count,text
65,Thu Feb 07 15:49:08 +0000 2019,0,,1,RT @NilayShri: @midasIIITD @the_dhumketu this ...
22,Mon Mar 11 06:22:12 +0000 2019,1,,1,@ACMMM19 @ACM_MM2018 @acmmm17 @sigmm @ACM Less...
96,Mon Jan 28 12:11:29 +0000 2019,0,,8,RT @kdnuggets: Top 16 #OpenSource #DeepLearnin...
29,Sun Mar 03 17:09:48 +0000 2019,1,,0,"@NilayShri @NilayShri, Certain thing! The next..."
81,Thu Jan 31 06:57:04 +0000 2019,1,,1,CFP for @ACMMM19 has been posted on WikiCFP. K...
28,Sun Mar 03 19:36:04 +0000 2019,0,,35,RT @stanfordnlp: Useful feature of our Python ...
57,Sat Feb 16 16:57:51 +0000 2019,0,,1,RT @debanjanbhucs: https://t.co/qNFzJ7ZHki
43,Thu Feb 21 06:39:27 +0000 2019,2,,1,@IIITDelhi has initiated PhD Admission 2019 pr...
47,Tue Feb 19 17:16:24 +0000 2019,0,,15,RT @ACMMM19: We are pleased to announce the mu...
89,Tue Jan 29 19:31:35 +0000 2019,2,,1,If you are attending @RealAAAI 2019 then visit...
