# Homework: Entity-Level Sentiment Analysis with Twitter Dataset

## What is the Twitter Dataset?

This dataset focuses on entity-level sentiment analysis of tweets. It aims to determine the sentiment expressed towards specific entities mentioned in Twitter messages.

##  Motivation

Twitter is a platform where users share their opinions in real-time. By analyzing these messages, we can gain insights into public perception and trends related to specific entities.

## Problem Statement
The task is to predict the sentiment (Positive, Negative, Neutral) of a message concerning a given entity. If the message is irrelevant to the entity (Irrelevant), it is classified as Neutral.

## What Do We Expect from You in This Assignment?

We expect you to use Deep Learning and NLP techniques to analyze the messages in the dataset and correctly classify them into the sentiment categories (Positive, Negative, Neutral).

## Dataset

The dataset has been shared along with the homework *(twitter_training.csv)*.

## If you have any question about the homework, you can contact us at the following e-mail adresses:



*   burcusunturlu@gmail.com
*   ozgeflzcn@gmail.com



## 1 - Import Libraries

Main Libraries for you to deploy your model (Feel free to use other libraries that you think helpful):

*   Pandas
*   Numpy
*   Sklearn
*   nltk
*   keras

## 2 - Importing the Data (65 points)

## 2.1 - Loading the Data


*   Import the dataset from the file.


In [None]:
import pandas as pd

# Read your csv file and define column names
columns = ['tweet_id', 'entity', 'sentiment', 'tweet_content']
data = pd.read_csv('/content/twitter_training.csv', names = columns)

# Replace 'Irrelevant' sentiment with 'Neutral'
data['sentiment'] = data['sentiment'].replace('Irrelevant', 'Neutral')

# Look at your data
data.head()

## 2.2 - Exploratory Data Analysis (EDA) (20 points)

Please investigate your data according to:
* Understand the
classes. Visualize the distribution of sentiment classes within the dataset.
* Check distributions.
* Check null values.
* Drop unnecessary columns (e.g., unrelated metadata).

## 2.3 - Data Preparation (25 points)

* Clean the comments. Remove irrelevant characters (e.g., URLs, mentions). Normalize the text (lowercasing, removing punctuation, etc.).
* Remove/unremove stopwords based on your assumption.
* Tokenize the comments.
* Lemmatize the comments.
* Vectorization.
* Word count analysis and outlier detection.

## 2.4 - TF(Term Frequency) - IDF(Inverse Document Frequency) (15 points)

* Explain TF & IDF.
* Apply TF & IDF methods.

## 2.5 - Train/Test Split (5 points)

* Prepare the target variables and split the data into training and testing sets.

# 3 - Training Deep Learning Models (30 Points)

* Import relevant libraries.
* Explain the difference between Neural Networks (NN) and Convolutional Neural Networks (CNN).

In [None]:
from keras.models import Sequential
from keras.layers import Conv1D, GlobalMaxPooling1D, Flatten
from keras.layers import Dense, Input, Embedding, Dropout, Activation

## 3.1 - Training NN models

* Construct NN models from basic one (exp. with one layer) to complex (more layer included).
* Experiment with different optimizers, regularization methods, drop-out rates, and normalization techniques.
* Evaluate in test data for different trials.

# 4 - Testing with your Own Input (5 points)

* Test the trained model with your own input sentences to predict the sentiment based on an entity.

In [None]:
# Try a sentence related to an entity, you can replace with your own example
sentence = "I love the new features of the Windows!!"
entity = "Microsoft" # specify the entity
tmp_pred, tmp_sentiment = predict(sentence, entity)
print(f"The sentiment of the sentence about {entity}: \n***\n{sentence}\n***\nis {tmp_sentiment}.")

# 5 - Bonus - Training CNN Models (20 points)

* Construct CNN models from basic (e.g., one layer) to complex (more layers included).
* Use different optimizers, regularization methods, drop-out, normalization etc.
* Evaluate in test data for different trials.

## Additional Notes

* Ensure all models and visualizations are well-commented.
* Include explanations for key steps like tokenization, vectorization, and model selection.
* Please complete your homework using this notebook.