<a href="https://colab.research.google.com/github/Abhilashcme/Practice-Repository/blob/master/Copy_of_1_Collect_text_data_using_Twitter_APIs_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://drive.google.com/uc?id=1x5IgSr-SDhlPhx7VAKDdo5-3Jv9kz7N0">

Collect text data using Twitter APIs.
--------------------------------------------------

There are a lot of free APIs through which we can collect data and use it to solve problems. We will learn the Twitter API in particular (as it can be used in many applications of NLP like product reviews, sentiment analysis,....).

Problem
------------
You want to collect text data using Twitter APIs.

Solution
------------
Twitter has a gigantic amount of data with a lot of value in it. Social media
marketers are making their living from it. There is an enormous amount
of tweets every day, and every tweet has some story to tell. When all of this
data is collected and analyzed, it gives a tremendous amount of insights to
a business about their company, product, service, etc.

How It Works
-------------------
Log in to the Twitter developer portal

Create your own app in the Twitter developer portal, and get the keys
mentioned below. Once you have these credentials, you can start pulling
data. Keys needed:

> • consumer key: Key associated with the application (Twitter, Facebook, etc.).

> • consumer secret: Password used to authenticate with the authentication server 
(Twitter, Facebook, etc.).

> • access token: Key given to the client after successful authentication of  above keys.

> • access token secret: Password for the access key.

Useful links :
-----------------
https://iag.me/socialmedia/how-to-create-a-twitter-app-in-8-easy-steps/

https://developer.twitter.com/en/docs/tweets/sample-realtime/overview/GET_statuse_sample

In [None]:
# Once all the credentials are in place, use the code below to fetch the data.

# Install tweepy
# !pip install tweepy

# Import the libraries
import numpy as np
import tweepy
import json
import pandas as pd
from tweepy import OAuthHandler

# credentials  --> put your credentials here
consumer_key = "your_consumer_key_here"
consumer_secret = "your_consumer_secret_here"
access_token = "your_access_token_here"
access_token_secret = "your_access_token_secret"

# calling API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Provide the query you want to pull the data. For example,
# pulling data for "bollywood stars" or "US unemployment"  or "Modi Covid19 lockdown"
query = "US unemployment"

# Fetching tweets
Tweets = api.search(query, count = 10, lang='en', exclude='retweets',tweet_mode='extended')

# The query above will pull the top 10 tweets when the term "US unemployment" 
# is searched. The API will pull English tweets since the language 
# given is ‘en’ and it will exclude retweets.
# language codes possible are : https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes 

TweepError: ignored

Getting the Tweets + Some Attributes
---
In this section, we will get some tweets plus some of their related attributes and store them in a structured format.

If we are interested in getting more than 100 tweets at a time, which we are in our case, we will not be able to do so by just using api.search. We will need to use tweepy.Cursor which will allow us to get as many tweets as we desire. 

For our purpose, the end result is that it will just keep going on fetching tweets until we ask it to stop by breaking the loop.

In [None]:
# start by creating an empty DataFrame with the columns we'll need
df = pd.DataFrame(columns = ['Tweets', 'User', 'User_statuses_count', 
                             'user_followers', 'User_location', 'User_verified',
                             'fav_count', 'rt_count', 'tweet_date'])

In [None]:
# Next, lets define a function as follows.

def stream(data, file_name):
    i = 0
    for tweet in tweepy.Cursor(api.search, q=data, count=100, lang='en').items():
        print(i, end='\r')
        df.loc[i, 'Tweets'] = tweet.text
        df.loc[i, 'User'] = tweet.user.name
        
        # indicates the no. of times the user as tweeted 
        df.loc[i, 'User_statuses_count'] = tweet.user.statuses_count  

        # more code goes here
        
        
        
        
        
        i+=1
        if i == 1000:
            break
        else:
            pass

Let's look at this function from the inside out:
--

> First, we followed the same methodology of getting each tweet in a for loop, but this time from tweepy.Cursor.

> <font color='green'>Inside tweepy.Cursor</font>, we pass our api.search and the attributes we want:
q = data: data will be whatever piece of text we pass into the stream function to ask our api.search to search for just like we did passing "un unemployment" in the previous example.

count = 100: Here we are setting the number of tweets to return to 100, via api.search, which is the maximum possible number.

lang = 'en': Here I am simply filtering results to return tweets in English only.

Now, since we put our api.search into tweepy.Cursor, it will not just stop at the first 100 tweets. It will instead keep going on forever; that's why we are using i as a counter to stop the loop after 1000 iterations.

> Next, I am filling my DataFrame with the attributes I am interested in and during each iteration making use of the .loc method in Pandas and my i counter.

The attributes I am passing into each column are self explanatory and you can look into the Twitter API documentation for what other attributes are available and play around with those.

> Finally I am saving the result into an excel file using "df.to_excel" and here I am using a placeholder {} instead of naming the file inside the function because I want to be able to name the file myself when I run the function.

Now, I can just call my function as follows, looking for tweets about <i>"Some Text of your Choice"</i> again and naming my file "my_tweets."

In [None]:
stream(data = ['US unemployment'], file_name = 'my_tweets')

TweepError: ignored

In [None]:
df.head()

Let's Analyze Some Tweets
--

In [None]:
# importing TextBlob. It has build-in sentiment property
from textblob import TextBlob

# The sentiment property returns a named tuple of the form 
# Sentiment(polarity,subjectivity). The polarity score is a float 
# within the range [-1.0, 1.0]. 
# The subjectivity is a float within the range [0.0, 1.0] 
# where 0.0 is very objective and 1.0 is very subjective.

> I would like to add an extra column to this DataFrame that indicates the <font color='green'>sentiment of a tweet.</font>

> We will also need to add another column with the <font color='green'>tweets stripped of useless symbols</font>, then run the sentiment analyzer on those cleaned up tweets to be more effective.

Let's start by writing our tweets cleaning function:

In [None]:
import re

def clean_tweet(tweet):
    return ' '.join(re.sub('(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)', ' ', tweet).split())

In [None]:
# Let's also write our sentiment analyzer function:







In [None]:
# Now let's create our new columns:






In [None]:
# Let's look at some random rows to make sure our functions worked correctly.






In [None]:
# find no. of positive sentiments



In [None]:
# find no. of negative sentiments



In [None]:
# find no. of neutral sentiments



**`Just in case`**

+ Create your twitter account. 
+ Start tweet-ing from today
<br>
<a href = "" > Watch this 3 minute video </a> 