<a href="https://colab.research.google.com/github/Harshavii/Sent-Analysis/blob/main/sen.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Loading dataset**

In [2]:
import pandas as pd
df = pd.read_csv('Tweets.csv')

# keeping only relevant columns
df = df[['text', 'airline_sentiment']]

# mapping sentiment labels to 0 and 1 (0 for negative, 1 for positive)
df['label'] = df['airline_sentiment'].map({'negative': 0, 'neutral': 0, 'positive': 1})


In [3]:
df.head

<bound method NDFrame.head of                                                     text airline_sentiment  \
0                    @VirginAmerica What @dhepburn said.           neutral   
1      @VirginAmerica plus you've added commercials t...          positive   
2      @VirginAmerica I didn't today... Must mean I n...           neutral   
3      @VirginAmerica it's really aggressive to blast...          negative   
4      @VirginAmerica and it's a really big bad thing...          negative   
...                                                  ...               ...   
14635  @AmericanAir thank you we got on a different f...          positive   
14636  @AmericanAir leaving over 20 minutes Late Flig...          negative   
14637  @AmericanAir Please bring American Airlines to...           neutral   
14638  @AmericanAir you have my money, you change my ...          negative   
14639  @AmericanAir we have 8 ppl so we need 2 know h...           neutral   

       label  
0          0  
1  

# **Importing required libraries**

In [6]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from transformers import BertTokenizer, TFBertModel
from tensorflow.keras.layers import Dense, Flatten, Input
from tensorflow.keras.models import Model


# **Splitting the data into training and testing sets**

In [5]:
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

In [None]:
# LSTM Model

# # Tokenize and pad sequences
# tokenizer = Tokenizer(num_words=5000, oov_token="<OOV>")
# tokenizer.fit_on_texts(train_data['text'])
# X_train_seq = pad_sequences(tokenizer.texts_to_sequences(train_data['text']), maxlen=100)
# X_test_seq = pad_sequences(tokenizer.texts_to_sequences(test_data['text']), maxlen=100)

# # Build LSTM model
# model = Sequential()
# model.add(Embedding(input_dim=5000, output_dim=64, input_length=100))
# model.add(LSTM(100))
# model.add(Dense(1, activation='sigmoid'))

# # Compile the model
# model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# # Train the model
# model.fit(X_train_seq, train_data['label'], epochs=5, batch_size=32, validation_split=0.2)

# # Evaluate the model
# loss, accuracy = model.evaluate(X_test_seq, test_data['label'])
# print(f"Accuracy: {accuracy}")


# **Integrating transformer's BERT model**

In [7]:
# Load pre-trained BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = TFBertModel.from_pretrained('bert-base-uncased')

# Tokenize and pad sequences using BERT tokenizer
X_train_seq = [tokenizer.encode(text, add_special_tokens=True, max_length=100, truncation=True) for text in train_data['text']]
X_test_seq = [tokenizer.encode(text, add_special_tokens=True, max_length=100, truncation=True) for text in test_data['text']]

# Pad sequences to a consistent length
X_train_seq = pad_sequences(X_train_seq, maxlen=100, dtype="long", value=0, truncating="post", padding="post")
X_test_seq = pad_sequences(X_test_seq, maxlen=100, dtype="long", value=0, truncating="post", padding="post")

# Build a model with BERT embeddings
input_layer = Input(shape=(100,), dtype='int32')
bert_embedding = bert_model(input_layer)[0]
flat = Flatten()(bert_embedding)
dense_layer = Dense(256, activation='relu')(flat)
output_layer = Dense(1, activation='sigmoid')(dense_layer)

model = Model(inputs=input_layer, outputs=output_layer)

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X_train_seq, train_data['label'], epochs=5, batch_size=16, validation_split=0.2)

# Evaluate the model
loss, accuracy = model.evaluate(X_test_seq, test_data['label'])
print(f"Accuracy: {accuracy}")


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions w

Epoch 1/5




Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Accuracy: 0.8432376980781555


# **Testing**

In [10]:
# Predict using the BERT-based model
dl_predictions = model.predict(X_test_seq)
dl_predictions_classes = (dl_predictions > 0.5).astype("int32")

# Testing
for i in range(5):
    # Convert BERT predictions to classes
    predicted_class = dl_predictions_classes[i][0]

    # Convert numeric label to text label
    predicted_label = "Positive" if predicted_class == 1 else "Negative"

    print(f"Review: {test_data['text'].iloc[i]}")
    print(f"True Label: {test_data['label'].iloc[i]}")
    print(f"Predicted Label: {predicted_label}\n")


Review: @SouthwestAir you're my early frontrunner for best airline! #oscars2016
True Label: 1
Predicted Label: Negative

Review: @USAirways how is it that my flt to EWR was Cancelled Flightled yet flts to NYC from USAirways are still flying?
True Label: 0
Predicted Label: Negative

Review: @JetBlue what is going on with your BDL to DCA flights yesterday and today?! Why is every single one getting delayed?
True Label: 0
Predicted Label: Negative

Review: @JetBlue do they have to depart from Washington, D.C.??
True Label: 0
Predicted Label: Negative

Review: @JetBlue I can probably find some of them. Are the ticket #s on there?
True Label: 0
Predicted Label: Negative



# **Prediction as per the user input**

In [21]:
# Prompt user for input
for i in range(2):
    user_input = input("Enter something: ")

    # Tokenize and pad the user input using BERT tokenizer
    user_input_seq = [tokenizer.encode(user_input, add_special_tokens=True, max_length=100, truncation=True)]
    user_input_seq = pad_sequences(user_input_seq, maxlen=100, dtype="long", value=0, truncating="post", padding="post")

    # Predict sentiment for user input using the trained BERT-based model
    user_prediction = model.predict(user_input_seq)
    # print(user_prediction)
    user_prediction_class = (user_prediction > 0.5).astype("int32")

    # Display the prediction
    predicted_label = "Positive" if user_prediction_class[0][0] == 1 else "Negative"
    print(f"Predicted Sentiment: {predicted_label}\n")


Enter something: I am frustrated
Predicted Sentiment: Negative

Enter something: You seems angry
Predicted Sentiment: Negative

