# Twitch Chat Analyzer

<p align="left"><img src="./img/me.jpeg" width="300"></p>

Developed by [Danilo Santitto](https://github.com/Warcreed)

## Project Goal

The main goal of this project is to provide a useful tool for keeping track of events related to live chat on Twitch using **Sentiment Analysis**.

## What is Twitch?

<img src="./img/logo twitch.png">

**Twitch** is a video live streaming service operated by Twitch Interactive, a subsidiary of **Amazon**. Introduced in June 2011 as a spin-off of the streaming platform **Justin.tv**.

## Mods where are you??

**Moderators** are people that helps the Streamer relies to prevent their chat from becoming a jungle of frustrated **monkeys**.

<img src="./img/monkey_typewriter.jpg">

This tool is intended to help moderators and streamers keep track of the interactions between the streamer and his audience by making use of Sentiment Analysis

## Project structure

<img src="./img/twitch_chat _analyzer_workflow.svg">

## Data Source

<img src="./img/sample_chat.png" width="300">

## IRC (Internet Relay Chat)

Internet Relay Chat (IRC) is an internet live messaging protocol. It allows full-duplex communication between two users and the simultaneous dialogue of groups of people grouped in discussion "rooms", called "channels".

[Twitch IRC](https://dev.twitch.tv/docs/irc)

<img src="./img/IRChat1.png" width="300">

<img src="./img/IRCNetwork.png">

IRC is an open network protocol that uses the TCP transmission protocol, and optionally Transport Layer Security. An IRC server, called IRCd, is also able to connect with other IRC servers, thus forming a communication network that users access through a client. Many IRC servers do not require the user to authenticate, but a unique nickname must be specified at the IRC network level.

## Data Ingestion
Data ingestion is the process of obtaining and importing data for immediate use or storage in a database.

<img src="./img/data_ingestion.png" width="500">

## IRC PircBotX Demo

## Kafka Connect

Kafka Connect, an open source component of Apache Kafka®, is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems. 

<img src="./img/connect_vs_flume.jpg">

## Why Kafka Connect?

Mainly because it is well suited to the Kafka ecosystem and I use Kafka in this project. I had to explore more!

When configured correctly, both Apache Kafka and Flume are highly reliable with zero data loss guarantees.

## Custom Kafka Connector Code

## Data Streaming
Data streaming is data that is continuously generated by different sources.

<img src="./img/data_streaming.png" width="700">

Coming from a Linkedin project, **Apache Kafka** is a community distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log.

## Data Processing
Getting raw data and perform elaboration on it to add or give them new meaning.

<img src="./img/data_processing.png" width="700">

## Spark, Spark Streaming and Spark SQL

Spark is a unified analysis engine for large data processing. It can analyze large volumes of data, using distributed systems. it is really fast because it performs operations in RAM memory. It is fault-tolerant.

<img src="./img/spark_streaming_sql.png" width="700">

## Spark MLlib.... or not?

Not having found a good dataset to train my system with, I had to look [somewere else](https://www.youtube.com/watch?v=-bzWSJG93P8)...

<img src="./img/darth_vader.jpg" width="600">

<img src="./img/vader_vs_mllib.jpg">

## Vader Sentiment Analysis

VADER (**V**alence **A**ware **D**ictionary and s**E**ntiment **R**easoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. Vader comes with a pre-trained system. A **lexicon** is a list of lexical features (e.g., words) which are generally labelled according to their semantic orientation as either positive or negative.

## Vader Demo Live

In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

print(analyzer.polarity_scores("This is a bad day to go hiking on Etna"))

{'neg': 0.28, 'neu': 0.72, 'pos': 0.0, 'compound': -0.5423}


## Vader Demo Live with custom classes

In [3]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

def get_sentiment_analyzer_en(phrase): 
    polarity = analyzer.polarity_scores(phrase)         
    if polarity["compound"] >= 0.05:
        if polarity['pos'] - polarity["neu"] > 0.1:
            return 'very_positive'
        elif 0 <= abs(polarity['pos'] - polarity["neu"]) <= 0.6:
            if polarity['neg'] > 0.05:
                return 'ironic'
        return 'positive_opinion'
    elif polarity["compound"] <= -0.05:
        if polarity['neg'] - polarity["neu"] > 0.1:
            return 'very_negative'
        elif 0 <= abs(polarity['neg'] - polarity["neu"]) <= 0.6:
            if polarity['pos'] > 0.05:
                return 'ironic'
        return 'negative_opinion'
    else:
        if polarity["pos"] > 0 and polarity["neg"] > 0:
            return "ironic"
        elif polarity['neu'] - polarity["pos"] < 0.4:
            return "positive_opinion"
        elif polarity['neu'] - polarity["neg"] < 0.4:
            return 'negative_opinion'
        return 'neutral_opinion'

# trump got banned from twitch :D  
print(get_sentiment_analyzer_en("trump got banned from twitch :D"))  

ironic


<img src="./img/palpatine.jpg">

 ## Vader Demo Live with custom lexicon

In [5]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

twitch_emotes = {
    '<3': 0.4,
    '4head': 1,
    'babyrage': -0.7,
    'biblethump': -0.7,
    'blessrng': 0.3,
    'bloodtrail': 0.7,
    'coolstorybob': -1,
    'residentsleeper': -1,
    'kappa': 0.3,
    'lul': -0.1,
    'pogchamp': 1.5,
    'heyguys': 1,
    'wutface': -1.5,
    'kreygasm': 1,
    'seemsgood': 0.7,
    'kappapride': 0.7,
    'feelsgoodman': 1,
    'notlikethis': -1
} 

analyzer = SentimentIntensityAnalyzer()
analyzer.lexicon.update(twitch_emotes)

print(analyzer.polarity_scores("kappa"))

{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.0772}


## Data Indexing and Elasticsearch

Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene.

<img src="./img/data_indexing.png" width="500">

## Data Visualization and Kibana

Kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data.

<img src="./img/sample_dashboard_kibana.png" width="900">

# Thanks for your attention 