This project creates and updates a Google Sheet with relevant tweets and a basic sentiment analysis of this tweet. 

**The problem**
I was monitoring a few companies in the context of a larger project and needed to get an update of their activity and what the public thought about them. While I was reading the news and checking sometimes social media, I needed to find a more efficient way to see and analysis what was going on. 

**The solution** 
I created several pieces of code to analysis online newspapers and social media. This particular code here scraps Twitter and returns structured information about the latest tweets. 

**The risks**  
- I couldn't scrap tweets that were older than 2 weeks 
- the keywords I used were basics and therefore, combined with the noise on the social platform, I could not trust 100% the results

**The tools**

I used several libraries: 

1. (Aylien)[aylien.com] 
They are a company specialised in NLP product. They provide an API with a limit of calls per day. Their NLP library gave me a more precise results than the TwitterSearch Library as well as a percentage of confidence. 

2. Google Sheet  
I used the Google Sheet API because I am a big fan of observing my results directly in an Excel when I can to give me a clear results and also to share them quickly with others. This API set up has changed a little bit since I have written this code but the concept is the same. More information can be found (here)[https://developers.google.com/sheets/api/] 

3. TwitterSearch
There is a lot of API wrapper for Twitte out there but I like this one in particular because of the clarity of the documentation and the functions provided. (This library)[https://pypi.org/project/TwitterSearch/] has been created by the Technical University of Munich. 


In [None]:
#import libraries

from TwitterSearch import *
from oauth2client.service_account import ServiceAccountCredentials
from textblob import TextBlob
import re
import pandas as pd 
import numpy as np 
import gspread 
from oauth2client.service_account import ServiceAccountCredentials
from aylienapiclient import textapi

In [None]:
#setup


client_aylien = textapi.Client("******", "********************")



ts = TwitterSearch(
                consumer_key = '**********',
                consumer_secret = '**************',
                access_token = '****************',
                access_token_secret = '***************',
                tweet_mode = 'extended'
                )


#Google Sheet API 

client_secret = r'C:\Users\Ethel Karskens\Projects\MyProject_perso.json'

# use creds to create a client to interact with the Google Drive API
scope = ['https://spreadsheets.google.com/feeds']
creds = ServiceAccountCredentials.from_json_keyfile_name(client_secret , scope)
client = gspread.authorize(creds)


In [None]:
#creating a function that will create a dataframe on Google Sheet with all the relevant information I need. 

def TwitterDf(t, keyword):
    
    #opening the Google Sheet
    sh = client.open("NewsScrapy")
    sheet_3 = sh.get_worksheet(2) 
    #starting from the first row
    sheet_3.resize(1) 
    
    #creating a function to clean the tweet 
    def clean_tweet(tweet):
         
         return' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
     
    #function that gives an estimation of the sentiment polarity of this tweet
    def aylien_sentimentPolarity(text): 
        sentiment = client_aylien.Sentiment({'text': text})
        return(sentiment['polarity'])
    
    #function that gives the degree of confidence of the sentiment polarity          
    def aylien_sentimentConfidence(text): 
        sentiment = client_aylien.Sentiment({'text': text})
        return(sentiment['polarity_confidence'])
 
    #function to simplify the sentiment polarity into three categories 
    def analize_sentiment(tweet):
             
             analysis = TextBlob(clean_tweet(tweet))
             if analysis.sentiment.polarity > 0:
                 return 1
             elif analysis.sentiment.polarity == 0:
                 return 0
             else:
                 return -1
    
    #scraping the tweets from Twitter 
    try:
        tso = TwitterSearchOrder() # create a TwitterSearchOrder object
        tso.set_result_type('recent')
        tso.set_language('en')      
        tso.set_keywords(keyword)   
        tso.set_include_entities(True) 
        
        #looping through the tweets collected and extracting the relevant data. 
        for tweet in ts.search_tweets_iterable(tso):  
            
            
            if int(tweet['retweet_count']) > t: 
                  
               report_sheet = [tweet['user']['screen_name'],
                                    tweet['user']['name'],
                                    tweet['user']['id'],
                                    tweet['user']['followers_count'],
                                    tweet['user']['location'],
                               
                               tweet['text'],
                               tweet['entities']['hashtags'],
                               tweet['retweet_count'],                          
                               tweet['geo'],
                               tweet['user']['time_zone'],
                               tweet['created_at'], 
                               analize_sentiment(tweet['text']), 
                               aylien_sentimentPolarity(clean_tweet(tweet['text'])), 
                               aylien_sentimentConfidence(clean_tweet(tweet['text']))]
               
               #appending the rows to the Google Sheet
               sheet_3.append_row(report_sheet)
               
               #verifying the number of tweets threated by displaying in my environment the Tweets ID 
               display(tweet['id']) 
               
                
 
    ## take care of all the errors if something goes wrong         
    except TwitterSearchException as e: 
        print(e)