# Sentiment analysis of Tweets

The directory `data/Tweets` contains a dataset of tweets by airlines customers. Tweets can contain positive (Label=1) or negative (Label=0) comments. 

The `train.csv` contains annotated data, while `test.csv` includes only the textual data.

Your task is to train a classifier to predict the class of the tweets in `test.csv`, you should generate a comma separated file `tweet_results.csv` with a **single column** named `Label` containing the predicted class of the corresponding tweet.

The test data was originally annotated and the accuracy of your prediction will be evaluated w.r.t. this ground truth.

This notebook should contain all the description of your experiments, the code to generate the classifier **and** the result file, as well as the rationale for your choices. If you prefer to split your work in different source files and/or notebooks, then use this notebook as a guide to the rest of your submitted material.

# Add your code below

***


In [3]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC

# Load training data
train_data = pd.read_csv('data/Tweets/train.csv')

# Feature extraction using TF-IDF Vectorization (same case as exercise 1 we use the whole dataset to train the classifier)
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(train_data['text'])
y_train = train_data['Label']

# Train a classifier (SVM in this case)
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train, y_train)

# Load test data
test_data = pd.read_csv('data/Tweets/test.csv')

# Feature extraction for test data
X_test = vectorizer.transform(test_data['text'])

# Predict classes for test data
predicted_classes = svm_classifier.predict(X_test)

# Generate results file
results_df = pd.DataFrame({'Label': predicted_classes, 'Tweet': test_data['text']})
results_df.to_csv('tweet_results.csv', index=False)


By having a look at the generated CSV file we can perceive the sentiment behind the tweet and verify that the classifier has, in most of the cases, predicted the correct label. Sometimes, due to the fact that words may be used metaphorically or with a specific semantic meaning the classifier may produce a false positive or a false negative.