# Emoji Sentiment
To figure out emoji sentiment we run a query with a set of strong positive keywords and another query with a set of strong negative keywords, then calculate the scores ( $\frac{positive + negative}{total} = 1$ ) for emoji appearing in the results. To avoid mixed signal we only consider short messages ( less then 10 words ). Scores will be stored in RDB ( here MySQL ) and updated each time job runs.

Let's get [full emoji list](https://unicode.org/emoji/charts/full-emoji-list.html), [import to local database with sentiment scores initialzed to 0](../python-api/README.ipynb).

In [1]:
import os
import re
import sys
import time
import json
import client

score = {}
with open('emo-list.txt','r') as source:
    score = { chr(int(c)):{ 'total':0, 'positive':0, 'negative':0 }
              for c in source.read().strip().split() }

labels = ('positive','negative')

options = [
    ('best','worst'),
    # TODO
    ('success','failure')]

insert = """
INSERT INTO emoji (code, <label>, total)
    VALUES('{}', {}, {})
    ON DUPLICATE KEY UPDATE
    <label> = <label> + VALUES(<label>),
    total = total + VALUES(total);
    """

update = """
UPDATE emoji
    SET sentiment = (positive - negative)/total
    WHERE total > 0;
    """

def extract(label):
    def func(data):
        try:
            obj = json.loads(data)
            if 'text' in obj and len(obj['text'].split()) < 10:
                emo = { c:1 for c in obj['text'].lower() if c in score }
                #emo = [ c for c in obj['text'].lower() if c in score ]
                for e in emo:
                    score[e]['total'] += 1
                    score[e][label] += 1
                    print('{}\tPositive: {}\tNegative: {}\tTotal: {}'\
                        .format(e, score[e]['positive'], score[e]['negative'], score[e]['total']))
        except:
            print('Error: {}'.format(sys.exc_info()))

    return func

twitter = client.TwitterClient()

### Positive

In [2]:
# get 1000 tweets and extract ones with emoji
twitter.stream('best', broadcast = extract('positive'), count = 1000)

😍	Positive: 1	Negative: 0	Total: 1
🗽	Positive: 1	Negative: 0	Total: 1
💍	Positive: 1	Negative: 0	Total: 1
😂	Positive: 1	Negative: 0	Total: 1
🤷	Positive: 1	Negative: 0	Total: 1
♀	Positive: 1	Negative: 0	Total: 1
💕	Positive: 1	Negative: 0	Total: 1
‼	Positive: 1	Negative: 0	Total: 1
🙏	Positive: 1	Negative: 0	Total: 1
🙌	Positive: 1	Negative: 0	Total: 1
😩	Positive: 1	Negative: 0	Total: 1
😂	Positive: 2	Negative: 0	Total: 2
🐐	Positive: 1	Negative: 0	Total: 1
📽	Positive: 1	Negative: 0	Total: 1
😂	Positive: 3	Negative: 0	Total: 3
🏴	Positive: 1	Negative: 0	Total: 1
💕	Positive: 2	Negative: 0	Total: 2
😍	Positive: 2	Negative: 0	Total: 2
💯	Positive: 1	Negative: 0	Total: 1
😭	Positive: 1	Negative: 0	Total: 1
😂	Positive: 4	Negative: 0	Total: 4
😂	Positive: 5	Negative: 0	Total: 5
😍	Positive: 3	Negative: 0	Total: 3
😩	Positive: 2	Negative: 0	Total: 2
😛	Positive: 1	Negative: 0	Total: 1
💕	Positive: 3	Negative: 0	Total: 3
❤	Positive: 1	Negative: 0	Total: 1
😭	Positive: 2	Negative: 0	Total: 2
😍	Positive: 4	Negati

### Negative

In [3]:
twitter.stream('worst', broadcast = extract('negative'), count = 1000)

🙃	Positive: 1	Negative: 1	Total: 2
♂	Positive: 0	Negative: 1	Total: 1
🤦	Positive: 0	Negative: 1	Total: 1
😩	Positive: 3	Negative: 1	Total: 4
🤔	Positive: 0	Negative: 1	Total: 1
😩	Positive: 3	Negative: 2	Total: 5
😂	Positive: 7	Negative: 1	Total: 8
💯	Positive: 2	Negative: 1	Total: 3
😩	Positive: 3	Negative: 3	Total: 6
😩	Positive: 3	Negative: 4	Total: 7


In [4]:
for e in sorted(score, key = lambda e: score[e]['total'], reverse = True):
    if score[e]['total'] > 0:
        positive = score[e]['positive']/score[e]['total']
        negative = score[e]['negative']/score[e]['total']
        print('{}\tTotal: {}\tPositive: {:.2f}\tNegative: {:.2f}\tSentiment: {:.2f}'\
            .format(e, score[e]['total'], positive, negative, positive - negative))

😂	Total: 8	Positive: 0.88	Negative: 0.12	Sentiment: 0.75
😩	Total: 7	Positive: 0.43	Negative: 0.57	Sentiment: -0.14
😍	Total: 5	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
🙌	Total: 3	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
💕	Total: 3	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
❤	Total: 3	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
💯	Total: 3	Positive: 0.67	Negative: 0.33	Sentiment: 0.33
😭	Total: 2	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
🙃	Total: 2	Positive: 0.50	Negative: 0.50	Sentiment: 0.00
🤘	Total: 1	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
💖	Total: 1	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
♂	Total: 1	Positive: 0.00	Negative: 1.00	Sentiment: -1.00
♀	Total: 1	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
🙏	Total: 1	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
💀	Total: 1	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
‼	Total: 1	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
📷	Total: 1	Positive: 1.00	Negative: 0.00	Sentiment: 1.00
🏴	Total: 1	Positive: 1.00	Neg

That is the data we are going to store and update with each run. Run includes strong polar keywords, preferably short and frequent, where negation using *not* is unlikely: wonderful terrible; meilleur pire; erfolg fehler; tesoro basura; ispiratore depressiva; люблю ненавижу; ...

In [5]:
!cat emo-job.py

#!/var/python

import os
import re
import sys
import time
import json
import client
from datetime import datetime

score = {}
with open('emo-list.txt','r') as source:
    score = { chr(int(c)):{ 'total':0, 'positive':0, 'negative':0 }
              for c in source.read().strip().split() }

labels = ('positive','negative')
stats = { 'total':0 }

options = [
    ('best','worst'),
    ('awesome','awful'),
    ('wonderful','terrible'),
    ('success','failure'),
    ('perfect','fault'),
    ('tesoro','basura'),
    ('люблю','ненавижу')]

insert = """
INSERT INTO emo_sent (code, span, <label>, total)
    VALUES('{}', '{}', {}, {})
    ON DUPLICATE KEY UPDATE
    <label> = <label> + VALUES(<label>),
    total = total + VALUES(total);
    """

update = """
UPDATE emo_sent a INNER JOIN (
    SELECT span, IF(negative > 0 AND positive > 0, positive/negative, 1) as ratio
    FROM emo_job_stats) as b USING(span)
SET a.sentiment = (a.positive - a.negative * b.

In [6]:
!cat emo-job.sh

#!/bin/bash

########################################################################
# job calculating current and updating existing emoji sentiment score
# run: emo-job.sh <MINUTES RUN>
########################################################################
source ~/.local.cnf
cd ~/projects/twitter/python

if [ "$1" > 0 ]; then
    timer=$1
else
    timer=5
fi

((opt = (RANDOM % 10)))

mysql -u root -p$MYSQL_ROOT_PASS -B --disable-column-names -e \
"SELECT code FROM $DATABASE.emoji WHERE composite = 1" > emo-list.txt

echo "----- Positive index: $opt ----- Run: $timer minutes ------------"
python emo-job.py $timer 0 $opt
mysql -u root -p$MYSQL_ROOT_PASS $DATABASE < emo-job-update.sql

echo "----- Negative index: $opt ----- Run: $timer minutes ------------"
python emo-job.py $timer 1 $opt
mysql -u root -p$MYSQL_ROOT_PASS $DATABASE < emo-job-update.sql

mysql -u root -p$MYSQL_ROOT_PASS -B --disable-column-names -e "
SELECT a.chars, b.sentiment FROM emoji a

Data is available at [Project data API](../python-api/README.ipynb). In our emo-sent job we used runtime as proxy for the number of messages, however, as it appears, the positive content is strongly dominating with the ratio about 10 to 1. For that reason we also track this ratio and using current value for sentiment calculation.