# Problem statement
You have to write a python script which can fetch all the tweets(as many as allowed by Twitter API) done by midas@IIITD twitter handle and dump the responses into JSONlines file. The other part of your script should be able to parse these JSONline files to display the following for every tweet in a tabular format.

The text of the tweet.
Date and time of the tweet.
The number of favorites/likes.
The number of retweets.
Number of Images present in Tweet. If no image returns None.

importing libraries

In [4]:
import tweepy
from tweepy import OAuthHandler
import json
import os
import pandas as pd

Creating the app on twitter developer and getting the authentication keys

This follows Oauth2 authentication
and following tokens were copied below

In [5]:
consumer_token="\\\\\\\\\\\\"
consumer_secret="\\\\\\\\\\\\"
access_token = "\\\\\\\\\\\\\\"
access_token_secret="\\\\\\\\\\\"

Using Oauth protocol we exchanged the token and established the contact with the app and twitter api and now we can start using twitter data..

we have created api object below that will be used to access the data

In [186]:
auth =OAuthHandler(consumer_token, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)


Lets create a function that will use the screen_name which is a parameter(the user name of the account which is to be accessed in our case(midasIIITD) ) 

In [187]:
def FetchTweets(screen_name):
    total_tweets=[]
    tweets=api.user_timeline(screen_name,count=200) 
    
    total_tweets.extend(tweets)
    #now we have fetched the first 200 tweets and extended(added to our total_tweets list)  
    #Since twitter allows a maximum of 200 tweets at a time hence we have to find more tweets ....
    #To get more tweets, refer the pagination part of https://developer.twitter.com/en/docs/tweets/timelines/guides/working-with-timelines
    #timeline has reverse-chronologically sorted Tweets and to avoid the redundancy of one extra tweet we have subtracted by 1
    maxId=total_tweets[-1].id -1
    while(len(tweets)>0):
        tweets=api.user_timeline(screen_name,count=200,max_id=maxId,tweet_mode='extended') 
        total_tweets.extend(tweets)
        maxId=total_tweets[-1].id -1
       
    return total_tweets    
        

using the function to fetch tweets

In [188]:
MIDAS_tweets = FetchTweets("midasIIITD")

Now we have to store them into json file..  i have mentioned the file path where it is to be stored

In [190]:
file_path= r'../Python_problem/tweets.json'

Lets create a function to store into json files 

In [191]:
def intoJsonFile(Tweets):
    json_tweets=[]
    for tweets in Tweets:
        json_tweets.append(tweets._json)
    with open(file_path, 'w', encoding='utf8') as f :
        json.dump(json_tweets, f, sort_keys = True,indent = 2)
    return json_tweets   
        
    

Lets use the above function to store the tweets into json file

In [192]:
Final_Tweets=intoJsonFile(MIDAS_tweets)

In [252]:
print(Final_Tweets[0]["text"])
print(Final_Tweets[0]["created_at"])
print(Final_Tweets[199]["retweeted_status"]["favorite_count"])
print(Final_Tweets[0]["retweet_count"])
#print(Final_Tweets[]['media'])



RT @IIITDelhi: We are delighted to share that IIIT-Delhi is ranked 55 by NIRF this year. We have moved up by 11 positions compared to the p…
Tue Apr 09 16:45:07 +0000 2019
20
9


In [255]:
print((Final_Tweets[0]["created_at"]))

Tue Apr 09 16:45:07 +0000 2019


We are done with the first part...  lets parse the json file by using the following function

In [233]:
def jsonParser(file_path):
    with open(file_path) as json_file:
        tweets_by_MIDAS = json.load(json_file)
    data=[]
    for i in Final_Tweets:
        dictionary={}
        if "text" in i:
            dictionary["Text"]=i["text"]
        else:
            dictionary["Text"]=i["full_text"]
        if "retweeted_status" in i:
            dictionary["Likes count"]=i["retweeted_status"]["favorite_count"]
        else:
            dictionary["Likes count"]=i["favorite_count"]   
        dictionary["Tweeted_on"] =i["created_at"]

        dictionary["Retweet count"]=i["retweet_count"]
        if 'media' in i['entities']:

            image_count = 0

            #get the media information about tweet
            tweet_media = i['extended_entities']['media']

            #go through all the media and check if it's photo/image
            #if yes, then increment the image_count
            for j in range(len(tweet_media)):
                if(tweet_media[j]['type'] == 'photo'):
                    image_count += 1
            dictionary['Image count'] = image_count

        #No media, hence set number of images to None    
        else:
            dictionary['Image count'] = 0 
            
            
       
        data.append(dictionary)
    return data
            

Using the above function to store in tabular form

In [244]:
dataset= jsonParser(file_path)

Storing into the dataframe

In [260]:
import pandas as pd
df= pd.DataFrame(dataset)

df=df[["Text","Tweeted_on","Likes count","Retweet count","Image count"]]
df

Unnamed: 0,Text,Tweeted_on,Likes count,Retweet count,Image count
0,RT @IIITDelhi: We are delighted to share that ...,Tue Apr 09 16:45:07 +0000 2019,38,9,0
1,RT @Harvard: Professor Jelani Nelson founded A...,Tue Apr 09 05:04:27 +0000 2019,94,35,0
2,RT @emnlp2019: For anyone interested in submit...,Tue Apr 09 05:04:11 +0000 2019,33,12,0
3,RT @multimediaeval: Announcing the 2019 MediaE...,Mon Apr 08 19:38:09 +0000 2019,20,15,0
4,"Many Congratulations to @midasIIITD student, S...",Mon Apr 08 07:08:12 +0000 2019,15,2,0
5,@midasIIITD thanks all students who have appea...,Mon Apr 08 03:27:42 +0000 2019,5,0,0
6,"@himanchalchandr Meanwhile, complete CV/NLP ta...",Sun Apr 07 14:17:29 +0000 2019,0,0,0
7,@sayangdipto123 Submit as per the guideline ag...,Sun Apr 07 14:17:09 +0000 2019,0,0,0
8,We request all students whose interview are sc...,Sun Apr 07 11:43:24 +0000 2019,1,1,0
9,"Other queries: ""none of the Tweeter Apis give ...",Sun Apr 07 06:55:19 +0000 2019,5,2,0
