# Sentiment Analysis
Mining tweet-stream sentiment real time could be challenging: tweets are short, multilingual, truncated/incomplete, contain abbreviations, symbols and emoji. Emojis could be very helpful, we can mine them separately and use as sentiment indicators on the stream.

#### Emoji
[Starter model using polar queries](EmojiSentiment.ipynb)

In [1]:
esent = {}
with open('sentiment.txt','r') as source:
    for line in source.readlines():
        emoticon, sentiment = line.strip().split()
        esent[emoticon] = float(sentiment)

In [2]:
sample  = 'Hola 😋 cómo estás?'
for char in sample:
    if char in esent:
        print(esent[char])

1.0


#### Text
Starter model using [TextBlob-NLTK](http://textblob.readthedocs.io/en/dev/_modules/textblob/en/sentiments.html)

In [3]:
#!pip install textblob
#!python -m textblob.download_corpora

In [4]:
import nltk
# >>>nltk.download('stopwords')

nltk.corpus.stopwords.words('spanish')[:10]

['de', 'la', 'que', 'el', 'en', 'y', 'a', 'los', 'del', 'se']

In [5]:
nltk.corpus.stopwords.words('french')[:10]

['au', 'aux', 'avec', 'ce', 'ces', 'dans', 'de', 'des', 'du', 'elle']

In [6]:
stopwords = { word:1 for word in nltk.corpus.stopwords.words('english') }

# for lang in ['danish', 'dutch', 'finnish', 'french', 'german', 'hungarian', 'italian',
#             'norwegian', 'portuguese', 'russian', 'spanish', 'swedish', 'turkish']:
#    for word in nltk.corpus.stopwords.words('spanish'): stopwords[word] = 1

In [7]:
import re
import sys
import json
from textblob import TextBlob
from collections import Counter
from client import TwitterClient

def score(e, count):
    if count == 1 or abs(esent[e]) == 1: return esent[e]
    # scale up the signal for repeated emoticon
    s = abs(esent[e]) ** 1/count
    if esent[e] < 0: return - s
    return s


def extract(data):
    try:
        obj = json.loads(data)
        if 'text' in obj:
            # extract hashtags
            hashtags = ['#{}'.format(re.sub('\W+', '', term)) for term in obj['text'].split() if term[0] == '#']
            # extract emoji
            emo = Counter([c for c in obj['text'] if c in esent])
            polarity = 0.
            # remove links and all non-letter chacracters
            words = re.sub('(@\S+)|(https?\://\S+)|([\W\d_]+)', ' ', obj['text'].lower()).split()
            # remove stopwords
            words = [word for word in words if word not in stopwords]
            if obj['lang'] == 'en':
                # calculate sentiment score
                polarity = TextBlob(' '.join(words)).sentiment.polarity
            if len(emo) > 0:
                # get emoji sentiment scores
                sentiment = [score(e, c) for e, c in emo.items() if esent[e] != 0]
                # calculate tweet sentiment score as average of text and emoji scores
                if polarity != 0:
                    sentiment.append(polarity)
                if len(sentiment) > 0:
                    polarity = sum(sentiment)/len(sentiment)
            if polarity > 0.25 or polarity < -0.25:
                print('\n-- {} -------------------------------------------'.format(obj['lang']))
                print('Words: {}'.format(words))
                print('Emoji: {}'.format(list(emo.keys())))
                print('Hashtags: {}'.format(hashtags))
                print('Sentiment: {:.3f}'.format(polarity))
                print(obj['text'])
    except:
        print('------------------------------------------------------------------------')
        print('Error: {}'.format(sys.exc_info()))

twitter = TwitterClient()
twitter.stream('', geo = True, broadcast = extract, count = 100)


-- in -------------------------------------------
Words: ['awak', 'pernah', 'pakai', 'peeling', 'produk', 'bila', 'stop', 'muka', 'jadi', 'breakout', 'jom', 'detox', 'muka', 'awak', 'dengan', 'mary', 'kay', 'botani']
Emoji: ['👉']
Hashtags: []
Sentiment: 1.000
👉Awak pernah pakai peeling produk,  bila stop muka jadi breakout? Jom detox muka awak dengan MARY KAY,  100% botani… https://t.co/SFAoKGfMTt

-- ja -------------------------------------------
Words: ['水餃子も大好きですね']
Emoji: ['💗']
Hashtags: []
Sentiment: 1.000
@sezuna1168 水餃子も大好きですね💗

-- und -------------------------------------------
Words: []
Emoji: ['♀', '\U0001f937']
Hashtags: []
Sentiment: 0.261
@TanevinG 🤷🏻‍♀️🤷🏻‍♀️🤷🏻‍♀️

-- en -------------------------------------------
Words: ['kingsmoovy', 'sorry', 'hear', 'man', 'condolences']
Emoji: []
Hashtags: []
Sentiment: -0.500
@Small @KingSmoovy Sorry to hear that man, My condolences.

-- en -------------------------------------------
Words: ['dream', 'team', 'happy', 'birthday', 'deb