#Name Bhavesh Kumar Bohara
#MML2022013

Implement RNN using the below link:
https://www.kaggle.com/code/pushpakgote/natural-language-processing-with-rnn

This assignment is intended to help you learn the implementation of RNN.
Your task is to review and understand the code (only Sentiment Analysis) for predicting sentiment in the IMDB dataset. Create a new notebook file demonstrating your understanding of the code with the IMDB dataset.
After completing the first task, create another notebook and build an RNN model for the Airline Sentiment Analysis data (the attached dataset) to predict the sentiment of the tweet about the airline. Split the dataset into training and testing datasets to evaluate the model's accuracy.
Additionally, you need to write a function that inputs a text file. Each line of text file contains a tweet and outputs the corresponding sentiment predicted by the model alongside the tweet. Please note that the model should classify tweets into Positive, Negative, and Neutral categories.

Deadline: 11:59 PM, 12th March 2023. Submissions after the deadline will not be evaluated.


Note:
Please include code with only the necessary documentation to help others understand your thought process. We encourage you to use your own style and approach when writing the code. We will evaluate your submission based on the clarity of your documentation, the quality of your code, and the accuracy of your output.
Submit only two notebook(.ipynb) files. Don't upload a zip file or anything else.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

# Load the dataset
df = pd.read_csv('/content/drive/MyDrive/Airline_Sentiment_Analysis - Airline_Sentiment_Analysis.csv')
df.head()

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone
0,5.703061e+17,neutral,1.0,,,Virgin America,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,,Eastern Time (US & Canada)
1,5.703011e+17,positive,0.3486,,0.0,Virgin America,,jnardino,,0,@VirginAmerica plus you've added commercials t...,,2015-02-24 11:15:59 -0800,,Pacific Time (US & Canada)
2,5.703011e+17,neutral,0.6837,,,Virgin America,,yvonnalynn,,0,@VirginAmerica I didn't today... Must mean I n...,,2015-02-24 11:15:48 -0800,Lets Play,Central Time (US & Canada)
3,5.70301e+17,negative,1.0,Bad Flight,0.7033,Virgin America,,jnardino,,0,@VirginAmerica it's really aggressive to blast...,,2015-02-24 11:15:36 -0800,,Pacific Time (US & Canada)
4,5.703008e+17,negative,1.0,Can't Tell,1.0,Virgin America,,jnardino,,0,@VirginAmerica and it's a really big bad thing...,,2015-02-24 11:14:45 -0800,,Pacific Time (US & Canada)


Preprocess the dataset

In [None]:
# Remove the unnecessary columns
df = df[['text', 'airline_sentiment']]

# Convert sentiment to numeric values
df['airline_sentiment'] = df['airline_sentiment'].replace({'positive': 2, 'neutral': 1, 'negative': 0})

# Tokenize the text
tokenizer = Tokenizer(num_words=10000, oov_token='<OOV>')
tokenizer.fit_on_texts(df['text'])
sequences = tokenizer.texts_to_sequences(df['text'])

# Pad the sequences
padded_sequences = pad_sequences(sequences, maxlen=100, truncating='post')

# Split the dataset into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(padded_sequences, df['airline_sentiment'], test_size=0.2, random_state=42)


Build and train the RNN model

In [None]:
# Build the RNN model
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(10000, 16, input_length=100),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(3, activation='softmax')
])

# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=128, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Evaluate the model's accuracy

In [None]:
# Evaluate the model's accuracy
_, accuracy = model.evaluate(X_test, y_test)
print('Accuracy: %.2f%%' % (accuracy*100))


Accuracy: 77.32%


Write a function to predict the sentiment of tweets in a text file

In [None]:
def predict_sentiments(filename):
    with open(filename, 'r') as f:
        tweets = f.readlines()

    # Tokenize the tweets
    sequences = tokenizer.texts_to_sequences(tweets)
    padded_sequences = pad_sequences(sequences, maxlen=100, truncating='post')

    # Predict the sentiment
    predictions = model.predict(padded_sequences)
    predicted_sentiments = [np.argmax(prediction) for prediction in predictions]

    # Map numeric values to sentiment labels
    label_map = {0: 'Negative', 1: 'Neutral', 2: 'Positive'}
    predicted_sentiments = [label_map[sentiment] for sentiment in predicted_sentiments]

    # Print the tweets and predicted sentiments
    for tweet, sentiment in zip(tweets, predicted_sentiments):
        print(tweet.strip(), '-', sentiment)


In [None]:


# Define the predict_sentiments function
def predict_sentiments(filename):
    with open(filename, 'r') as f:
        tweets = f.readlines()

    # Tokenize the tweets
    sequences = tokenizer.texts_to_sequences(tweets)
    padded_sequences = pad_sequences(sequences, maxlen=100, truncating='post')

    # Predict the sentiment
    predictions = model.predict(padded_sequences)
    predicted_sentiments = [np.argmax(prediction) for prediction in predictions]

    # Map numeric values to sentiment labels
    label_map = {0: 'Negative', 1: 'Neutral', 2: 'Positive'}
    predicted_sentiments = [label_map[sentiment] for sentiment in predicted_sentiments]

    # Print the tweets and predicted sentiments
    for tweet, sentiment in zip(tweets, predicted_sentiments):
        print(tweet.strip(), '-', sentiment)
