# Twitter tweets parser - MIDAS Intership Challenge


## Explanation - 1
The code in the block below, is added to import 3rd party libaries like **requests** and **pandas** which are very handy and ease out the development process, as well as we use the code block below to import in-built Python libraries like 
**os** and **json**


### What is the code block doing
The following code block is importing required in-built/3rd party libraries for our project

In [85]:
# Importing neccassary 3rd party as well in-built Python libraries
import requests
import collections
import os
import oauth2
from requests.auth import HTTPBasicAuth
import json
import pandas as pd
import uuid
import jsonlines

## Explanation - 2
### Intuition
The intutition of the code block below is to create a new **Data Type** to solve the problem we have with ease, we can consider the class `MidasTweetParser` as a new **Data Type**, and the methods defined as the state of that data type.
    
We have used the instance variable *self.tabular_data* as a Pandas Data frame so that we can easily show the data in tabular form as easily possible, with minimum extra code.

We have stored our API keys as environment variable so that they are safe.
To keep the code more clean and readable we have created an exception class, `TwitterAPIException`, so to handle the case when any of the Twitter APIs fail.

To dump data in a JSONlines file, which is basically a file where each JSON object is dumped one by one and each object is delimitted by a `\n`, we have used *jsonlines* python module for this, though we can also dump data in file by using basic file operations.


### What is the code block doing
The code block below, incorporates the logic of what needs to be done as far this problem of parsing tweets, dumping them in JSONlines files, as well populating the data frame to show them later in tabular form.

As well as we are taking care of cases when our request to Twitter APIs fail, because of any of the reason.
`class MidasTweetParser`, has a constructor, the directory in which we should make the *JSONlines* file as well as the filename, default values are folder: **"Midas_Tweets"** and file name as **midas_tweets_record** .

In [86]:
class TwitterAPIException(Exception):
    """
     An exception to handle the case when Twitter's APIs fail.
    """
    pass

In [87]:
class MidasTweetParser:
    def __init__(self, directory_name=None, file_name=None):
        """
         This is a constructor for the class, the intuition behind the class it to keep the code, clean
         Pandas Dataframe is used so that we can easily show our tweets collection in tabular form.
        """
        self.consumer_key = os.environ["TWITTER_ACCESS_TOKEN"]
        self.consumer_secret = os.environ["TWITTER_ACCESS_TOKEN_SECRET"]
        
        if directory_name:
            self.tweets_dir = directory_name
        else:
            self.tweets_dir = "Midas_Tweets"
            
        if file_name:
            # take the first name only
            self.file_name = file_name.split(".")[0]
        else:
            self.file_name = "midas_tweets_record"
            
        self.request_token_url = "https://api.twitter.com/oauth2/token?grant_type=client_credentials"
        self.tweets_url = "https://api.twitter.com/1.1/statuses/user_timeline.json"
        self.tweets_url_params = {"screen_name":"midasiiitd", "page": 1}
        self.tabular_data = pd.DataFrame()
    
    def _get_bearer_token(self):
        """ 
         Method to complete the OAuth flow, to get bearer token for the following requests.
        """
        response = requests.post(self.request_token_url, auth=HTTPBasicAuth(self.consumer_key, self.consumer_secret))
        if response.status_code != 200:
            raise TwitterAPIException("{0}, error code: {1}".format("Could not fetch Bearer token from Twitter", response.status_code))
        response_data = json.loads(response.content.decode())
        return response_data["access_token"]
    
    def _handle_tweets(self, tweets):
        """
          The code works to get tweets in batches of 20, to save memory.
          This method handles each batch;
          
          1. Parse each tweet object, to get the required fields.
          2. Dump each tweet object into a separate JSON file.
          3. Add build a pandas data frame to show the collection of tweets in a tabular form.
        """
        print("Handling {0} tweets from page: {1}".format(len(tweets), self.tweets_url_params["page"]))
        for tweet in tweets:
            date_time = tweet.get("created_at")
            text = tweet.get("text")
            favourite_likes = int(tweet.get("favorite_count"))
            retweet_count = int(tweet.get("retweet_count"))
            entities = tweet.get("extended_entities", None)
            images_count = 0
            if entities:
                images_count = len(entities.get("media", []))

            data_frame_row = {
                "Time and Date of Creation": pd.to_datetime(date_time),
                "Number of Retweets":retweet_count,
                "Number of Images Present":images_count,
                "Favourite Count":favourite_likes,
                "Text":text,
            }
            self.tabular_data = self.tabular_data.append(data_frame_row, ignore_index=True)
            self._dump_tweet(tweet)
            
    def _dump_tweet(self, tweet):
        """
         Method to dump tweets into a single file
        """
        if not os.path.exists(self.tweets_dir):
            os.mkdir(self.tweets_dir)
            
        tweet_id = tweet.get("id", uuid.uuid4())
        path = "{0}/{1}.jsonl".format(self.tweets_dir, self.file_name)
        
        if os.path.exists(path):
            mode = 'a' # mode is append to a file if file exists.
        else:
            mode = 'w' # mode is create a file if file does not exists.
        with jsonlines.open(path, mode=mode) as writer:
            writer.write(tweet)
        
    def _build_bearer_token(self, token):
        """
         This code block is used to build the format for the access token that Twitter accepts.
         eg: Bearer AADKJKJDKF.....090KKKM09232M
        """
        return "{0} {1}".format("Bearer", token)
    
    def _show_tabular_data(self):
        return self.tabular_data
    
    def _get_total_number_tweets(self):
        return self.tabular_data.shape[0]
    
    def _get_json_for_tweets(self):
        """
         1. This method is useful as it uses the bearer token received via Twitter's OAith APIs to fetch tweets,
            for the required twitter handle, 'midasiiitd' for this code problem, in batches of 20.
            
         2. The method also returns the response of the Twitter API as a Python dictionary.
        """
        bearer_token = self._get_bearer_token()
        authorization_header = self._build_bearer_token(bearer_token)
        response = requests.get(self.tweets_url,
                                headers={"Authorization":authorization_header}, params=self.tweets_url_params)
        if response.status_code != 200:
            raise TwitterAPIException("{0}, error code: {1}".format("Could not fetch tweets from Twitter", response.status_code))
        return json.loads(response.content)
    
    def run(self):
        print("Starting parser........")
        tweets_data = self._get_json_for_tweets()
        while tweets_data:
            self._handle_tweets(tweets_data)
            self.tweets_url_params["page"] += 1
            tweets_data = self._get_json_for_tweets()
        
        print("\nSuccessfully, populated the table as well as dumped the JSON for each in folder /{0} and file: {1}.jsonl \n".format(self.tweets_dir,
                                                                                                                              self.file_name))
        print("-----------------\nPrinting the tabular data\n-----------------")
        total_tweets = self._get_total_number_tweets()
        print("Total number of tweets made by 'midasiiitd', are: {0}".format(total_tweets))
        return self._show_tabular_data()

In [88]:
# Initialise the object
tweets_parser = MidasTweetParser(directory_name="MIDAS_INTERNSIP_PROJECT", file_name="tweets") 

In [None]:
# Run the parser
tweets_parser.run()

Starting parser........
Handling 20 tweets from page: 1
Handling 20 tweets from page: 2
Handling 20 tweets from page: 3
Handling 20 tweets from page: 4
Handling 20 tweets from page: 5
Handling 20 tweets from page: 6
Handling 20 tweets from page: 7
Handling 20 tweets from page: 8
Handling 20 tweets from page: 9
Handling 20 tweets from page: 10


## Conclusion
As we can see by using a single class we have successfully created a parser which fetches tweets from Twitter, populates them in a tabular form as well as dumps each tweets in a **JSONLINES** file.

**By - Akshay Sharma, email: akshay.sharma09695@gmail.com**