<h1><center>Twitter Data Analysis</center></h1>

# Problem Statement

[DATA COLLECTION](#DATA-COLLECTION)
1. Collect a random sample of 10K tweets using the Twitter API and store them in a MongoDB instance.
2. From these collected tweets, parse the 5 most frequently occurring [named-entities](Entity Recognition & Sentiment Analysis.ipynb) (can be a name, person, location, product etc).
3. Now, collect the [latest news from various news source APIs](Entity Recognition & Sentiment Analysis.ipynb) featuring the named-entities you got from Step 2 (use at least one other API/library other than Twitter's to collect this data).


[ANALYSIS](#ANALYSIS)
0. Perform a [Sentiment Analysis](Entity Recognition & Sentiment Analysis.ipynb) on the data collected in Step 1 and 3, and compare the twitter and news sentiments for the common named-entities. Also, do a qualitative comparison of the predicted sentiment versus the original sentiment of the tweet and news articles.
9. You should also perform [temporal](Temporal Analysis Final.ipynb), [spatial](Spatial Analysis.ipynb) and [content analysis](Content Analysis.ipynb) on the collected data.
0. Report these results you found in the steps 5 & 6 using graphs. Brownie points for cool interactive visualisations.


[APPLICATION](#APPLICATION)
7. Set up a web application on Heroku or Digital Ocean Droplet with a user interface where we can input a named-entity and get the comparison between the news and twitter sentiments as an output.
8. Put all your code, along with the MongoDB collection, in a GitHub repository and share the link with us. Also, maintain a README.md explaining your codebase and the approach you took.

_**Please note:**_ The analysis is carried out in different Jupyter Notebooks. Please link on the links in the above cell to access the sheets. 

_**For better and easier access:**_
1. [Entity Recognition & Sentiment Analysis](Entity Recognition & Sentiment Analysis.ipynb)
2. [Content Analysis](Content Analysis.ipynb)
3. [Spatial Analysis](Spatial Analysis.ipynb)
4. [Temporal Analysis](Temporal Analysis Final.ipynb)
5. The [web application](analysis-twitter.herokuapp.com) has been hosted on Heroku. 

## DATA COLLECTION

## (A) Collection of 10,000 random tweets and stored in a MongoDB database

In [1]:
#importing necessary packages
import tweepy
import pymongo
import json

In [None]:
#Variables that contains the user credentials to access Twitter API 
consumer_key = 'XeRPtmT6mBf0Yfd3IjPGhhv5x'
consumer_secret = 'fHzH7CAJ1qujrugS2dZ8ZTkFL102lTL45X8zL5wmfKg5CSXphs'
access_token = '984662830770253824-1W2kGkI2hIKrKJpBMaFEvkbjuV4rdlG'
access_token_secret = 'shgWn5P5bHhIwwKy6qABHEmZ3cvc8odAUHliSA8hBMNMt'

auth = tweepy.OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token,access_token_secret)
api = tweepy.API(auth)

In [2]:
#Creating a connection to MongoDB database 'EngTweetsDb'
MONGO_HOST = 'mongodb://localhost/EngTweetsDb'

In [None]:
#Creating a class StreamListener inheriting from 'StreamListener' and overriding on_connect, on_error, __init__, on_data
class StreamListener(tweepy.StreamListener):    
    #Class provided by tweepy to access the Twitter Streaming API. 

    def on_connect(self):
        print("You are now connected to the streaming API.")
 
    def on_error(self, status_code):
        print('An Error has occured: ' + repr(status_code))
        return False
    
    def __init__(self, api=None):
        super().__init__()
        self.num_tweets = 0
   
    def on_data(self, data):
        try:
            client = pymongo.MongoClient(MONGO_HOST)
            db = client.EngTweetsDb
            datajson = json.loads(data)
            created_at = datajson['created_at']
            print("Tweet collected at " + str(created_at))
            self.num_tweets += 1
            print('Total number of tweets in the db collection= %s'% db.en_tweets_col.count())
            if self.num_tweets < 12:
                db.en_tweets_col.insert(datajson)
                return True
            else:
                return False

        except Exception as e:
            print(e)


In [None]:
#Set up the listener. The 'wait_on_rate_limit=True' is needed to help with Twitter API rate limiting.
listener = StreamListener(api=tweepy.API(wait_on_rate_limit=True)) 

#Creating a stream
streamer = tweepy.Stream(auth=auth, listener=listener)

In [None]:
#Sampling the streaming tweets on the basis of language
print("Tracking...")
streamer.sample(languages=["en"])

In [None]:
#Checking the records stored in 'en_tweets_col' collection of 'EngTweetsDb'
col = pymongo.MongoClient(MONGO_HOST)["EngTweetsDb"]["en_tweets_col"]
col.count()