# Task
You have to write a python script which can fetch all the tweets(as many as allowed by Twitter
API) done by midas@IIITD twitter handle and dump the responses into JSONlines file.
The other part of your script should be able to parse these JSONline files to display the
following for every tweet in a tabular format.

● The text of the tweet.

● Date and time of the tweet.

● The number of favorites/likes.

● The number of retweets.

● Number of Images present in Tweet. If no image returns None.

### Importing the required libraries

In [1]:
import json
import tweepy                 #python library for twitter
from tweepy import Cursor    
from tqdm import tqdm        #noting the progress of for loop
import jsonlines             #read/write the files
import pandas as pd           #to create the table

### Api keys and access tokens

In [2]:
ACCESS_TOKEN = 'xxx'
ACCESS_SECRET = 'xxx'
CONSUMER_KEY = 'xxx'
CONSUMER_SECRET = 'xxx'

### Authenticating with Twitter credentials

In [3]:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

### Api to connect to twitter with my credentials

In [4]:
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)

### Testing the syntax
    -cursor is used to iterate through timelines, user lists pages etc
```(items is set to 1 as we want only 1 tweet to see if it's working```

In [5]:
for status in tweepy.Cursor(api.home_timeline, tweet_mode="extended",entities = "extended").items(1):
    x = status._json
    print(x)

{'created_at': 'Tue Apr 09 06:08:58 +0000 2019', 'id': 1115496805561540608, 'id_str': '1115496805561540608', 'full_text': '#Estimator #ComputerVision https://t.co/lsxLY6uyMw', 'truncated': False, 'display_text_range': [0, 50], 'entities': {'hashtags': [{'text': 'Estimator', 'indices': [0, 10]}, {'text': 'ComputerVision', 'indices': [11, 26]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/lsxLY6uyMw', 'expanded_url': 'https://deepai.org/publication/defogging-kinect-simultaneous-estimation-of-object-region-and-depth-in-foggy-scenes', 'display_url': 'deepai.org/publication/de…', 'indices': [27, 50]}]}, 'source': '<a href="https://deepai.org/" rel="nofollow">ArXiv Daily</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 945496245530869760, 'id_str': '945496245530869760', 'name': 'arXiv Daily', 'screen_name': 'arXiv_Daily', 'location': '1.5

### Finding the location of required information

In [6]:
print(x.keys())
print("------")
print(x["user"].keys())

dict_keys(['created_at', 'id', 'id_str', 'full_text', 'truncated', 'display_text_range', 'entities', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'possibly_sensitive_appealable', 'lang'])
------
dict_keys(['id', 'id_str', 'name', 'screen_name', 'location', 'description', 'url', 'entities', 'protected', 'followers_count', 'friends_count', 'listed_count', 'created_at', 'favourites_count', 'utc_offset', 'time_zone', 'geo_enabled', 'verified', 'statuses_count', 'lang', 'contributors_enabled', 'is_translator', 'is_translation_enabled', 'profile_background_color', 'profile_background_image_url', 'profile_background_image_url_https', 'profile_background_tile', 'profile_image_url', 'profile_image_url_https', 'profile_link_color', 'profile_sideba

### Notes:  
    -tweet['full_text'] gives text of the tweet
    -tweet['retweet_count'] gives number of retweets
    -tweet['favorite_count'] gives number of favorites
    -tweet['created_at'] gives date and time
    -tweet['extended_entities']['media'] has the id's of media if present
    -tweet['extended_entities']['media']['type']='photo' will give us number of images

# Part 1: fetching all the tweets(as many as allowed by Twitter API) done by midas@IIITD twitter handle and dump the responses into JSONlines file

### Getting information about the user:

In [8]:
item = api.get_user(screen_name='midasIIITD')
print("name: " + item.name)
print("screen_name: " + item.screen_name)
print("description: " + item.description)
print("statuses_count: " + str(item.statuses_count))
print("friends_count: " + str(item.friends_count))
print("followers_count: " + str(item.followers_count))

name: MIDAS IIITD
screen_name: midasIIITD
description: MIDAS is a group of researchers at IIIT-Delhi who study, analyze, and build different multimedia systems for society leveraging multimodal information.
statuses_count: 341
friends_count: 43
followers_count: 284


### Making a dictionary of tweets

     - used Cursor to parse through the tweets on the user timeline
     - tweet_mode was kept "extended" to get the full length of the tweet and not just first 140 characters
     - entities was kept "extended" to get id's of all media files present and not just the first
     - used a dictionary as we have to save in the jsonlines format so it would be easier, also search time is faster in dictionaries

In [9]:
tweets = {}
# text = []
tweet_count = 0
#iterating on the timeline of user "midasIIITD and adding all the tweets to the dictionary"
for status in tqdm(Cursor(api.user_timeline, screen_name='midasIIITD',tweet_mode="extended",entities = "extended").items()):
    tweet_count += 1
    tweets[tweet_count] = status._json
#     text.append(status.full_text)

#     if tweet_count==100:
#         break

341it [00:14, 23.56it/s]


### Checking the dictionary formed

In [10]:
tweets[1].keys()
for key,matter in tweets[1].items():
    print(key,matter)

created_at Tue Apr 09 05:04:27 +0000 2019
id 1115480571323371520
id_str 1115480571323371520
full_text RT @Harvard: Professor Jelani Nelson founded AddisCoder, a program that teaches students in Ethiopia how to code https://t.co/0sM06p4qxw
truncated False
display_text_range [0, 136]
entities {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'Harvard', 'name': 'Harvard University', 'id': 39585367, 'id_str': '39585367', 'indices': [3, 11]}], 'urls': [{'url': 'https://t.co/0sM06p4qxw', 'expanded_url': 'https://hrvd.me/nelson4101t', 'display_url': 'hrvd.me/nelson4101t', 'indices': [113, 136]}]}
source <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
in_reply_to_status_id None
in_reply_to_status_id_str None
in_reply_to_user_id None
in_reply_to_user_id_str None
in_reply_to_screen_name None
user {'id': 1021355762575073281, 'id_str': '1021355762575073281', 'name': 'MIDAS IIITD', 'screen_name': 'midasIIITD', 'location': 'New Delhi, India', 'description': 'MIDAS 

### Here tweets is a dictionary of dictionaries
### Saving the tweets dictionary in .jsonl format
    -each tweet gets saved in a line

In [11]:
with jsonlines.open('output1.jsonl', mode='w') as writer:
    for k in tweets.keys():
        writer.write(tweets[k])
writer.close()

# Part 2: parse these JSONline files to display the following for every tweet in a tabular format

### Reading the .jsonl file using jsonlines library

In [12]:
tweet = []
with jsonlines.open('output1.jsonl') as reader:
    for obj in reader:
        tweet.append(obj)
reader.close()

In [13]:
len(tweet)

341

### Creating a pandas table

In [14]:
table = pd.DataFrame(columns = ['Text','Date', 'Time','Favorites Count','Retweet Count','Number of Images'])

### Checking the format of date-time column

In [15]:
tweet[1]["created_at"]

'Tue Apr 09 05:04:11 +0000 2019'

### For loop to convert the dictionary into pandas table

In [16]:
for ix in range(len(tweet)):
#     print(type(i))
    i = tweet[ix]
    no_of_images = None                               #inintialising with none
    if "extended_entities" in i:                      #calculating the number of images in the tweet
        no_of_images = 0
        for x in i['extended_entities']['media']:
            if x['type']=='photo':                    #increment the variable if media type is a photo
                no_of_images+=1
                
                
    date_time = i['created_at'].split()                      #splitting the data-time column
    date = date_time[1]+" "+date_time[2]+" "+ date_time[-1]  #date column
    time = date_time[3]+date_time[4]                         #time column
    
    
    table.loc[ix] = [i["full_text"],date,time,i['favorite_count'],i['retweet_count'],no_of_images]   #adding the rows

### Printing the table

In [17]:
table

Unnamed: 0,Text,Date,Time,Favorites Count,Retweet Count,Number of Images
0,RT @Harvard: Professor Jelani Nelson founded A...,Apr 09 2019,05:04:27+0000,0,33,
1,RT @emnlp2019: For anyone interested in submit...,Apr 09 2019,05:04:11+0000,0,11,
2,RT @multimediaeval: Announcing the 2019 MediaE...,Apr 08 2019,19:38:09+0000,0,15,
3,"Many Congratulations to @midasIIITD student, S...",Apr 08 2019,07:08:12+0000,13,2,1
4,@midasIIITD thanks all students who have appea...,Apr 08 2019,03:27:42+0000,5,0,1
5,"@himanchalchandr Meanwhile, complete CV/NLP ta...",Apr 07 2019,14:17:29+0000,0,0,
6,@sayangdipto123 Submit as per the guideline ag...,Apr 07 2019,14:17:09+0000,0,0,
7,We request all students whose interview are sc...,Apr 07 2019,11:43:24+0000,1,1,
8,"Other queries: ""none of the Tweeter Apis give ...",Apr 07 2019,06:55:19+0000,5,2,
9,"Other queries: ""do we have to make two differe...",Apr 07 2019,06:53:38+0000,4,1,
