# <font color=blue><center>Live Dashboard - Twitter Sentiment Analysis</center></font>
## Agenda
### Architecture
- Overview of data flow
- Tech Stack
- End result

### Environment Setup
- AWS EC2 instance and security group creation
- Docker installation and running
- Usage of docker-composer and starting all the tools
- How to access tools in local machine

### Model Creation
- Dataset exploration and bucketizing
- Stratified sampling and dataset splitting
- Feature extraction and pipeline creation
- Model training and evaluation
- Saving model and application

### Extraction
- Streaming data from Twitter API using NiFi
- Creating Kafka topic and publishing messages to it

### Transformation and Load
- Schema extraction from the stream of Tweets
- Reading data from Kafka as Streaming Dataframe
- Extraction and cleansing of Twitter data
- Sentiment analysis of tweet
- Continuous data load to MongoDB

### Visualization
- Scatter graph and Table definition with python Dash with intervals
- Graph and Table app call-back

### Code walkthrough
- Schema Generator
- SentimentAnalyzer
- StreamListener
- SentimentVisualizer

## <font color=blue>Architecture</font>
### Overview of data flow
#### Data Flow Architecture
![alt text](live_dashboard_-_twitter_sentiment_analysis.png)
### Tech Stack
* AWS EC2
* Docker
* Jupyter Lab
* Spark Structured Streaming and MLlib
* NiFi
* Kafka
* Python
* MongoDB
* Plotly
* Dash

### End result
#### NiFi Processor Setup
![alt text](nifi.PNG)
#### Tweets Stored in MongoDB
![alt text](mongo.PNG)
#### Live Dashboard
![alt text](dash.PNG)

## <font color=blue>Environment Setup</font>
### AWS EC2 instance and security group creation
- t2.xlarge instance
- 32GB of storage recommended
- Allow ports 4000 - 38888
- Connect to ec2 via ssh
 <code>ssh -i "D:\path\to\private\key.pem" user@Public_DNS</code>
 <br/>Example:<code>ssh -i "D:\Users\pyerravelly\Desktop\twitter_analysis.pem" ec2-user@ec2-54-203-235-65.us-west-2.compute.amazonaws.com</code><br/>
- Port forwarding 
 <code>ssh -i "D:\path\to\private\key.pem" user@Public_DNS</code>
 <br/>Example:<code>ssh -i "D:\Users\pyerravelly\Desktop\twitter_analysis.pem" ec2-user@ec2-34-208-254-29.us-west-2.compute.amazonaws.com -L 2081:localhost:2041 -L 4888:localhost:4888 -L 2080:localhost:2080 -L 8050:localhost:8050 -L 4141:localhost:4141</code><br/>
- Copy from local to ec2
  <code>scp -r -i "D:\Users\pyerravelly\Desktop\twitter_analysis.pem"</code>
  <br/>Example:<code>scp -r -i "D:\Users\pyerravelly\Desktop\twitter_analysis.pem" D:\Users\pyerravelly\Downloads\spark-standalone-cluster-on-docker-master\build\docker\docker-exp ec2-user@ec2-34-208-254-29.us-west-2.compute.amazonaws.com:/home/ec2-user/docker_exp
</code>

### Docker installation and running
    
### Usage of docker-composer and starting all the tools

- Commands to install Docker

<code>sudo yum update -y</code>
<code><br/>sudo yum install docker</code>
<code><br/>sudo curl -L "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose</code>
<code><br/>sudo chmod +x /usr/local/bin/docker-compose</code>
<code><br/>sudo gpasswd -a $USER docker</code>
<code><br/>newgrp docker</code>
<br/>Start Docker: <code>sudo systemctl start docker</code>
<br/>Stop Docker: <code>sudo systemctl stop docker</code>

- How to access tools in local machine <br/>
    List Docker containers running: <code>docker ps</code><br/>
    CLI access in Docker container: <code>docker exec -i -t docker_kafka_1 bash</code><br/>
    NiFi at: http://localhost:2080/nifi/ <br/>
    Mongo Express at: http://localhost:4141/ <br/>
    Jupyter Lab at: http://localhost:4888/lab? <br/>

## <font color=blue>Model Creation</font>
### Classification
- supervised machine learning algorithms that identify which category an item belongs to
- feature and label

### Dataset exploration and bucketizing
- http://jmcauley.ucsd.edu/data/amazon/
- reviewText — text of the review
  
  overall — rating of the product
  
  summary — summary of the review
- Bucketize dataset to lable whether features are positive or negative

### Stratified sampling and dataset splitting
- partitioning data to homogeneous sample and making sensitive to negative labels
- split stratified sample to 80% training data and 20% test data

### Feature extraction and pipeline creation
- Tokenizing, removing stop words, TF-IDF(Count Vectorixzation, IDF) and Logistic Regression
- Create pipeline to train the model

### Model training and evaluation
- Training model with the data prepared
- Evaluation of model with Binary Classification Evaluator

### Saving model and application
- Save classification model
- Example application of created model

## <font color=blue>Extraction</font>
### Nifi
- Processor
- Connection

### Twitter App Creation
- Goto https://developer.twitter.com/en/portal/projects-and-apps

### Streaming data from Twitter API using NiFi
- Nifi Setup

### Kafka
- Topic
- Publish
- Subscribe

### Topic and publishing messages to it
- Topic creation through CLI
- Publish tweets via NiFi

#### Commands
<code>docker ps</code> to get kafka container name

<code>docker exec -i -t docker_kafka_1 bash</code> enter into kafka CLI

<code>kafka-topics.sh --create --topic tweets --partitions 1 --replication-factor 1 --if-not-exists --zookeeper zookeeper:2181</code> creation of topic named tweets

<code>kafka-console-consumer.sh --bootstrap-server localhost:29092 --topic twitter_demo --from-beginning --max-messages 30</code> consume/read data from topic

## <font color=blue>Transformation and Load</font>
### Read Streaming Data and Cleansing
- Schema extraction from the stream of Tweets
- Reading data from Kafka as Streaming Dataframe
- Extraction and cleansing of Twitter data
- Sentiment analysis of tweet

### Writing data to MongoDB
- Continuous data load to MongoDB

## <font color=blue>Visualization</font>
- Scatter graph and Table definition with intervals using Python Plotly and Dash
- Graph and Table app call-back

## <font color=blue>Code walkthrough</font>
- Schema Generator
- SentimentAnalyzer
- StreamListener
- SentimentVisualizer