# Read comma separated file (csv) --- Sentiment Analysis: Emotions in Text

The code below can read the **Sentiment Analysis: Emotions in Text** dataset, which consists of one file containing tweets and their emotion labels (neutral, worry, happiness, sadness, love, surprise, fun, relief, hate, empty, enthusiasm, boredom, anger).

The code extracts the emotion label and the actual tweet text for each example, and outputs the a list of dictionaries in JSON format.

The data originates from: https://www.crowdflower.com/wp-content/uploads/2016/07/text_emotion.csv

In [3]:
import csv

def read_twitter_emotions(file_name):
    """ Read Twitter Emotions in Text CSV data file and return as JSON """
    print("Reading", file_name)
    data = []
    csvfile = open(file_name, "r")
    for i, line in enumerate(csv.DictReader(csvfile, delimiter=",", fieldnames=None)): # csv.DictReader returns each line as dictionary, if fieldnames are not given, it assumes that first line in the file defines fieldnames 
        if i % 1000 == 999:
            print(i+1, "tweets")
        one_example={}
        one_example["text"]=line["content"] # we have (tweet_id, sentiment, author and content) in the original data
        one_example["class"] = line["sentiment"]
        data.append(one_example)
    return data

In [4]:
data=read_twitter_emotions("data/text_emotion.csv")
print("Examples:", len(data))
print("Fist example:", data[0])

Reading data/text_emotion.csv
1000 tweets
2000 tweets
3000 tweets
4000 tweets
5000 tweets
6000 tweets
7000 tweets
8000 tweets
9000 tweets
10000 tweets
11000 tweets
12000 tweets
13000 tweets
14000 tweets
15000 tweets
16000 tweets
17000 tweets
18000 tweets
19000 tweets
20000 tweets
21000 tweets
22000 tweets
23000 tweets
24000 tweets
25000 tweets
26000 tweets
27000 tweets
28000 tweets
29000 tweets
30000 tweets
31000 tweets
32000 tweets
33000 tweets
34000 tweets
35000 tweets
36000 tweets
37000 tweets
38000 tweets
39000 tweets
40000 tweets
Examples: 40000
Fist example: {'text': '@tiffanylue i know  i was listenin to bad habit earlier and i started freakin at his part =[', 'class': 'empty'}


### Here we calculate simple label statistics to be aware how many times each label appears in the data.

In [5]:
from collections import Counter
label_counter=Counter()
for example in data:
    label_counter.update([example["class"]]) # counter.update needs a list of new items, more efficient would be label_counter=Counter([item["class"] for item in data]), because then we update the counter only once
print("Labels:", label_counter.most_common(20))

Labels: [('neutral', 8638), ('worry', 8459), ('happiness', 5209), ('sadness', 5165), ('love', 3842), ('surprise', 2187), ('fun', 1776), ('relief', 1526), ('hate', 1323), ('empty', 827), ('enthusiasm', 759), ('boredom', 179), ('anger', 110)]


--> Does "empty" mean empty feeling or empty annotation?

### Save data into JSON for later use

In [6]:
import json

print(data[0])
with open("data/text_emotion.json","wt") as f:
    json.dump(data,f,indent=2) # indent: data will be pretty-printed with that indent level

{'text': '@tiffanylue i know  i was listenin to bad habit earlier and i started freakin at his part =[', 'class': 'empty'}
