# Project Part 1

B) An exploratory analysis section that has the histograms, charts, tables, etc. that are the output from your exploratory analysis.

## Introduction/Background

In this notebook, I cover the exploratory data analysis (EDA) portion of my natural language processing project,
which aims to train a deep learning model in the recognition of emotion. Through employing the _Lexicon/AFINN_
dataset, which is a list of English language words and their positive or negative intensity
from -5 (__Most Negative__) to +5 (__Most Positive__), I will examine the depth of the data provided by
the _Statements_ dataset. The _Statements_ dataset is a compiled list of English language statements,
many of which are not complete sentences, as well as the emotion that best fits them.

## Exploratory Data Analysis

* https://neptune.ai/blog/exploratory-data-analysis-natural-language-processing-tools
* https://regenerativetoday.com/exploratory-data-analysis-of-text-data-including-visualization-and-sentiment-analysis/
* https://medium.com/swlh/text-summarization-guide-exploratory-data-analysis-on-text-data-4e22ce2dd6ad  
* https://www.kdnuggets.com/2019/05/complete-exploratory-data-analysis-visualization-text-data.html  


In [16]:
import matplotlib, numpy, nltk
import sklearn, gensim, wordcloud
import textblob, spacy, textstat
import pandas as pd
import seaborn as sns
from tqdm import tqdm, tqdm_pandas

In [17]:
nltk.download('stopwords')
from nltk.corpus import stopwords

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\cjens\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [41]:
statements = pd.read_csv("dataset/Statements/Data-all.csv")
lexiconAFINN = pd.read_csv("dataset/Lexicons/afinn.csv")

In [42]:
statements.head(70)

Unnamed: 0,Statement,Emotion
0,a boyfriend with whom i split up with came ove...,anger
1,a certain friend tried to push me off a seat i...,anger
2,a father of children killed in an accident,sadness
3,a few monthe ago,anger
4,a friend of mine suggested that i become a fil...,joy
...,...,...
65,i acknowledge that i am not actually fat by de...,fear
66,i act as head of family when he is far too you...,joy
67,i acted like a little girl by acting cute towa...,anger
68,i acted withdrawn and cold towards others in s...,sadness


In [43]:
lexiconAFINN.head(5)

Unnamed: 0.1,Unnamed: 0,word,value
0,1,abandon,-2
1,2,abandoned,-2
2,3,abandons,-2
3,4,abducted,-2
4,5,abduction,-2


In [44]:
afinnDict = dict(zip(lexiconAFINN.word, lexiconAFINN.value))
stmtAfinnValues = pd.Series(statements.Statement.apply(lambda stmt: sum([afinnDict[w] for w in stmt.split() if w in afinnDict]), 1))

In [45]:
stmtAfinnValues

0        1
1        0
2       -5
3        0
4        4
        ..
19995    0
19996    0
19997    3
19998    0
19999    4
Name: Statement, Length: 20000, dtype: int64

In [46]:
type(stmtAfinnValues)

pandas.core.series.Series

In [47]:
statements.insert(2, "Intensity", stmtAfinnValues)