# SentiPy Tutorial

### *An Application for Twitter Sentiment Analytics*

**SentiPy** provides models to analyze user's sentiments from tweets. The models are based on **Word Embeddings** and **Convolutional Neural Network** (CNN).

In [1]:
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.style.use('seaborn-darkgrid')

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

# 2. Twitter Analytics

In [2]:
import tweepy
from sentipy.io import connect

auth = connect("login.ini")
api = tweepy.API(auth)

You can visualize your feed with:

In [3]:
public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

#COVID19 impact on #education so far:

⛔️363 million students out of school worldwide.

➡️15 countries have ordered… https://t.co/rP6psNIXU3
...or students from other parts of the US who cannot afford to make it home.
Dear Parents of college students: A lot of campuses are closing with #coronavirus and most will be closed in 2-3 we… https://t.co/bYRj57FHm5
THIS HAS NOT BEEN THOUGHT THROUGH
A lot of students are from countries that have massive outbreaks already. How will they be able to afford expensive… https://t.co/iGrBNs6Edq
🔴BREAKING!

One in five students around the world are out of school due to #COVID19.

.@UNESCO mobilizes education… https://t.co/KjgRWHnFa2
RT @futuresfestival: Un escape game à  la découverte du patrimoine ! Une expérience proposée par @ArtGraphPat spécialiste du numérique et l…
RT @CanteIshta: A #Chauvigny (86), un relevé 3D du château d'Harcourt est lancée, en vue de l'étude de ce monument des XIIe-XVe s., par un…
Dans la famille #Surface, @pressecitron deman

### 2.2. Search for Hashtag

In [8]:
# Define the search term and the date_since date as variables
search_words = "#airbus" + " -filter:retweets" # Ignore the ReTweet
date_since = "2020-01-01"
num_tweets = 100

In [9]:
import pandas as pd

# Collect tweets
data = {"user": [],
        "date": [],
        "location": [],
        "lang": [],
        "text": []}

# Get tweets
tweets = tweepy.Cursor(api.search,
              q=search_words,
                       lang="en",
              since=date_since).items(num_tweets)

# Conserve only relevant informations
for (i, tweet) in enumerate(tweets):
    data["user"].append(tweet.user.screen_name)
    data["location"].append(tweet.user.location)
    data["date"].append(tweet.created_at)
    data["text"].append(tweet.text)
    data["lang"].append(tweet.lang)
    # print("Tweet n°{}: {}\n".format(i+1, tweet.text))
    
# Convert in a DataFrame
df = pd.DataFrame(data)
df.head()

Unnamed: 0,user,date,location,lang,text
0,JaccoJackson,2020-03-10 15:52:30,"Lagos, nigeria",en,"Stay home, stock up on medicines and food, DON..."
1,deluxeVIPdining,2020-03-10 15:40:45,,en,FBO Sponsor Lunch at Signature Luton ✈✈ \n\n#N...
2,czech_trader_,2020-03-10 15:34:05,Hlavní město Praha,en,$BA $AIR #airbus - scalped https://t.co/T7zSQ8...
3,78tiger,2020-03-10 15:26:20,Los Angeles,en,Last Of The Giants: Final #Airbus #A380 Convoy...
4,EDDC_Radar,2020-03-10 14:27:59,EDDC,en,(#3C152F) as flight #AIB279A at 31075 ft headi...


In [10]:
print(len(df["text"]))

100


The tweets have a lot of abreviations that may influence the model. Let's strip all of them and add tokens instead:

In [11]:
from sentipy.tokenizer import tokenizer_tweets

df["text"] = [tokenizer_tweets(text) for text in df["text"]]
df.head()

Unnamed: 0,user,date,location,lang,text
0,JaccoJackson,2020-03-10 15:52:30,"Lagos, nigeria",en,"[stay, home, ,, stock, up, on, medicines, and,..."
1,deluxeVIPdining,2020-03-10 15:40:45,,en,"[fbo, sponsor, lunch, at, signature, luton, ✈,..."
2,czech_trader_,2020-03-10 15:34:05,Hlavní město Praha,en,"[$, ba, $, air, <hashtag>, airbus, -, scalped,..."
3,78tiger,2020-03-10 15:26:20,Los Angeles,en,"[last, of, the, giants, :, final, <hashtag>, a..."
4,EDDC_Radar,2020-03-10 14:27:59,EDDC,en,"[(, <hashtag>, 3c152f, ), as, flight, <hashtag..."


In [15]:
print(df["text"][4])

['(', '<hashtag>', '3c152f', ')', 'as', 'flight', '<hashtag>', 'aib279a', 'at', '31075', 'ft', 'heading', 'southwest', 'bound', 'over', 'rehfelde', ',', 'brandenburg', '(', 'germany', ')', '.', 'at', ':', '…', '<url>']
