# Python Problem Solution
Solution to the problem given by Midas - IIITD. <br>

The task is to fetch all the tweets done by midas@IIITD twitter handle and dump the responses into JSONlines file.

Then the file must be parsed to display the tweets in tabular format.

# Importing libraries

Installing the python-twitter and jsonlines libraries

In [1]:
!pip install python-twitter
!pip install jsonlines

Collecting python-twitter
[?25l  Downloading https://files.pythonhosted.org/packages/b3/a9/2eb36853d8ca49a70482e2332aa5082e09b3180391671101b1612e3aeaf1/python_twitter-3.5-py2.py3-none-any.whl (67kB)
[K    100% |████████████████████████████████| 71kB 2.7MB/s 
Installing collected packages: python-twitter
Successfully installed python-twitter-3.5
Collecting jsonlines
  Downloading https://files.pythonhosted.org/packages/4f/9a/ab96291470e305504aa4b7a2e0ec132e930da89eb3ca7a82fbe03167c131/jsonlines-1.2.0-py2.py3-none-any.whl
Installing collected packages: jsonlines
Successfully installed jsonlines-1.2.0


Importing the required libraries for fetching tweets as well as generating jsonlines file as well as other important ones.

In [0]:
#Importing libraries

import twitter
import jsonlines
import pandas as pd

import sys
import io

# Defining keys and constants

Add required keys below

In [0]:
CONSUMER_KEY = 'consumer_key_here'
CONSUMER_SECRET = 'consumer_secret_here'
ACCESS_TOKEN = 'access_token_here'
ACCESS_TOKEN_SECRET = 'access_token_secret_here'

Name of twitter handle from where to fetch tweets and output path of jsonlines file

In [0]:
MIDAS_SCREEN_NAME = 'midasIIITD'

output_path = "output.jsonl"

# Fetching tweets

Creating a API Client to interact with the twitter API

Using the python-twitter library it becomes simpler to interact with the Twitter API through pre - defined functions.

The below block will initialize the client using the various keys.

tweet_mode = 'extended' ensures that the entire tweet text is fetched and not truncated.

In [0]:
api = twitter.Api(consumer_key = CONSUMER_KEY,
                  consumer_secret=CONSUMER_SECRET,
                  access_token_key=ACCESS_TOKEN,
                  access_token_secret=ACCESS_TOKEN_SECRET,
                  tweet_mode='extended')

The function fetch_tweets will fetch all the tweets (as many as possible) from the given screen_name

In [0]:
def fetch_tweets(screen_name):
  tweet_list = api.GetUserTimeline(screen_name=screen_name, count=200) #fetch first 200 tweets
  print(tweet_list)
  earliest_tweet = min(tweet_list, key=lambda x: x.id).id  #find id of earliest tweet fetched
  
  #repeat above process till all tweets have been fetched
  while True:
    tweets = api.GetUserTimeline(screen_name=screen_name, count=200, max_id = earliest_tweet)
    new_earliest_tweet = min(tweets, key=lambda x: x.id).id
    
    #break loop if no more tweets left
    if new_earliest_tweet == earliest_tweet or not tweets :
      break
    else:
      earliest_tweet = new_earliest_tweet
      tweet_list += tweets
      
  return tweet_list  #returns all the tweets fetched

Fetch tweets for required screen name

Here we have found 297 tweets

In [8]:
tweets = fetch_tweets(MIDAS_SCREEN_NAME)
print(len(tweets))


[Status(ID=1108281874164658182, ScreenName=midasIIITD, Created=Wed Mar 20 08:19:24 +0000 2019, Text='@IEEEBigMM19 is also available on Facebook now. \nLIKE its Facebook page https://t.co/B3Q0zmmzXb  to get the regular updates. \nCheck more details at https://t.co/w9ZymoPisk \n\n#IEEE #BigMM19 #Big #Multimedia #Singapore'), Status(ID=1108196492139999233, ScreenName=midasIIITD, Created=Wed Mar 20 02:40:07 +0000 2019, Text='RT @IEEEBigMM19: BigMM 2019 : IEEE BigMM 2019 – Call for Workshop Proposals  \n\nhttps://t.co/I4vqf8FE6K …  \nWhen: Sep 11, 2019 - Sep 13, 201…'), Status(ID=1107468609914208256, ScreenName=midasIIITD, Created=Mon Mar 18 02:27:47 +0000 2019, Text='BigMM 2019 : IEEE BigMM 2019 – Call for Workshop Proposals\n\nhttps://t.co/oUq2G0UgKN\n\nWhen: Sep 11, 2019 - Sep 13, 2019\nWhere: Singapore\nSubmission Deadline: Apr 1, 2019\nNotification Due: Apr 10, 2019\n\n#IEEE #BigMM #Workshop #Proposal #Singapore #Multimedia'), Status(ID=1107285980082569218, ScreenName=midasIIITD, Creat

Showing the contents of one of the tweets in the form of a dict.

In [9]:
print(tweets[10].AsDict())

{'created_at': 'Wed Mar 13 04:06:04 +0000 2019', 'full_text': 'RT @ACMMM19: The paper deadline is approaching. 1 April abstract is due. Authors will have until 8 April to upload the final PDF version of…', 'hashtags': [], 'id': 1105681404845670401, 'id_str': '1105681404845670401', 'lang': 'en', 'retweet_count': 13, 'retweeted_status': {'created_at': 'Tue Mar 12 10:19:41 +0000 2019', 'favorite_count': 14, 'full_text': 'The paper deadline is approaching. 1 April abstract is due. Authors will have until 8 April to upload the final PDF version of their paper submission. For complete submission information, see https://t.co/zKyqPAxErM @sigmm @TheOfficialACM @acmmmsys @ACMTVX @ACMICMR @euromm', 'hashtags': [], 'id': 1105413041456328705, 'id_str': '1105413041456328705', 'lang': 'en', 'retweet_count': 13, 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'urls': [{'expanded_url': 'http://www.acmmm.org/2019/call-for-papers', 'url': 'https://t.co/zKyqPAxErM'}], 'use

Using the jsonlines library to dump the response in a JSONlines file output.jsonl

In [0]:
fp = open(output_path, "w")
with jsonlines.Writer(fp) as writer:
    for t in tweets:
      writer.write(t.AsDict())
fp.close()

# Parsing and Displaying tweets

Parse the output.jsonl file using library

Put data required in a pandas DataFrame
The fields are- 
- The text of the tweet.
- Date and time of the tweet.
- The number of favorites/likes.
- The number of retweets.
- Number of Images present in Tweet. If no image returns None.


In [0]:
#Read output file
fp = open(output_path, "r")
reader = jsonlines.Reader(fp)

#Create new df
df = pd.DataFrame(columns = ['text', 'date_time', 'favorites', 'retweets', 'num_images'])

#Fill table from file
for obj in reader.iter(type=dict, skip_invalid=True):
  row = {
      'text': obj["full_text"],
      'date_time': obj["created_at"],
      'favorites': obj["favorite_count"] if "favorite_count" in obj else 0,
      'retweets': obj["retweet_count"] if "retweet_count" in obj else 0,
      'num_images': len(obj["media"]) if 'media' in obj else None
  }
  df = df.append(row, ignore_index = True)

Displaying the data in tabular format

In [14]:
df.head(100)

Unnamed: 0,text,date_time,favorites,retweets,num_images
0,@IEEEBigMM19 is also available on Facebook now...,Wed Mar 20 08:19:24 +0000 2019,1,1,
1,RT @IEEEBigMM19: BigMM 2019 : IEEE BigMM 2019 ...,Wed Mar 20 02:40:07 +0000 2019,0,4,
2,BigMM 2019 : IEEE BigMM 2019 – Call for Worksh...,Mon Mar 18 02:27:47 +0000 2019,6,3,
3,"Congratulations @midasIIITD team, Rohan, Prady...",Sun Mar 17 14:22:04 +0000 2019,15,4,
4,We have emailed the task details to all shortl...,Sat Mar 16 14:06:56 +0000 2019,6,0,
5,IEEE BigMM 2019 - Call for Workshop Proposals....,Sat Mar 16 09:20:29 +0000 2019,1,1,
6,"Congratulations! Arijit, Ramit, @debanjanbhucs...",Sat Mar 16 09:14:58 +0000 2019,7,2,
7,We will be releasing a very interesting task t...,Sat Mar 16 05:13:14 +0000 2019,7,2,
8,RT @hcdiiitd: Last day to register for #Portfo...,Wed Mar 13 17:09:44 +0000 2019,0,2,
9,@ACMMM19 @sigmm @TheOfficialACM @acmmmsys @ACM...,Wed Mar 13 04:11:24 +0000 2019,1,0,1


Saving the output in .csv format

In [0]:
df.to_csv('tweet-output.csv', index = False)