# **Text Classification Practical Project**

In [None]:
!pip install tensorflow



## **Install Libraries**

In [7]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
import string
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from keras.models import Model
from keras.layers import LSTM, Activation, Dense, Dropout, Input, Embedding, SpatialDropout1D
from keras.optimizers import RMSprop
# Updated imports:
from tensorflow.keras.preprocessing.text import Tokenizer  # Changed import path
from tensorflow.keras.utils import pad_sequences # sequence module is replaced with pad_sequences
from keras.utils import to_categorical
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.models import Sequential
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics import confusion_matrix
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 255)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


## **Dataset Reading**

In [None]:
data_path="/content/IMDB Dataset.csv"

In [None]:
df=pd.read_csv(data_path)
df.head()

Unnamed: 0,review,sentiment
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of v...",positive
1,"A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. <br /><br />The actors are extremely well chosen-...",positive
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well b...",positive
3,"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.<br /><br />This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.<br...",negative
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situ...",positive


### **Total rows and column**

In [None]:
df.shape

(50000, 2)

### **Work with 10000 data only**

In [None]:
df=df.iloc[:10000]
df.shape

(10000, 2)

In [None]:
df.review[0]

"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fa

### **Balanced Dataset**

In [None]:
df['sentiment'].value_counts()

Unnamed: 0_level_0,count
sentiment,Unnamed: 1_level_1
positive,5028
negative,4972


### **Checking null values and fix it**

In [None]:
df.isnull().sum()

Unnamed: 0,0
review,0
sentiment,0


In [None]:
df.drop_duplicates(inplace=True)

In [None]:
df.duplicated().sum()

0

### **Basic Preprocessing**


*   remove html tags
*   Lower case
*   remove stopwords



In [None]:
import re
def remove_tags(raw_text):
    cleaned_text = re.sub(re.compile('<.*?>'), '', raw_text)
    return cleaned_text

In [None]:
df['review'] = df['review'].apply(remove_tags)

In [None]:
df.head()

Unnamed: 0,review,sentiment
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.The first thing that struck me about Oz was its brutality and unflinching scenes of violence, whi...",positive
1,"A wonderful little production. The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. The actors are extremely well chosen- Michael Sheen not only ...",positive
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well b...",positive
3,"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.OK, first of al...",negative
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situ...",positive


In [None]:
df['review'][0]

"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.I would say the main appeal of the show is due to the fact that it goes where other shows wo

In [None]:
df['review'] = df['review'].apply(lambda x:x.lower())

In [None]:
df['review'][0]

"one of the other reviewers has mentioned that after watching just 1 oz episode you'll be hooked. they are right, as this is exactly what happened with me.the first thing that struck me about oz was its brutality and unflinching scenes of violence, which set in right from the word go. trust me, this is not a show for the faint hearted or timid. this show pulls no punches with regards to drugs, sex or violence. its is hardcore, in the classic use of the word.it is called oz as that is the nickname given to the oswald maximum security state penitentary. it focuses mainly on emerald city, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. em city is home to many..aryans, muslims, gangstas, latinos, christians, italians, irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.i would say the main appeal of the show is due to the fact that it goes where other shows wo

### **Remove stopwords**

In [None]:
from nltk.corpus import stopwords
import nltk

nltk.download('stopwords')
sw_list = stopwords.words('english')

df['review'] = df['review'].apply(lambda x: [item for item in x.split() if item not in sw_list]).apply(lambda x:" ".join(x))


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
df['review'][0]

"one reviewers mentioned watching 1 oz episode hooked. right, exactly happened me.the first thing struck oz brutality unflinching scenes violence, set right word go. trust me, show faint hearted timid. show pulls punches regards drugs, sex violence. hardcore, classic use word.it called oz nickname given oswald maximum security state penitentary. focuses mainly emerald city, experimental section prison cells glass fronts face inwards, privacy high agenda. em city home many..aryans, muslims, gangstas, latinos, christians, italians, irish more....so scuffles, death stares, dodgy dealings shady agreements never far away.i would say main appeal show due fact goes shows dare. forget pretty pictures painted mainstream audiences, forget charm, forget romance...oz mess around. first episode ever saw struck nasty surreal, say ready it, watched more, developed taste oz, got accustomed high levels graphic violence. violence, injustice (crooked guards who'll sold nickel, inmates who'll kill order g

In [None]:
Ldf.head()

Unnamed: 0,review,sentiment
0,"one reviewers mentioned watching 1 oz episode hooked. right, exactly happened me.the first thing struck oz brutality unflinching scenes violence, set right word go. trust me, show faint hearted timid. show pulls punches regards drugs, sex violence. ha...",positive
1,"wonderful little production. filming technique unassuming- old-time-bbc fashion gives comforting, sometimes discomforting, sense realism entire piece. actors extremely well chosen- michael sheen ""has got polari"" voices pat too! truly see seamless edit...",positive
2,"thought wonderful way spend time hot summer weekend, sitting air conditioned theater watching light-hearted comedy. plot simplistic, dialogue witty characters likable (even well bread suspected serial killer). may disappointed realize match point 2: r...",positive
3,"basically there's family little boy (jake) thinks there's zombie closet & parents fighting time.this movie slower soap opera... suddenly, jake decides become rambo kill zombie.ok, first going make film must decide thriller drama! drama movie watchable...",negative
4,"petter mattei's ""love time money"" visually stunning film watch. mr. mattei offers us vivid portrait human relations. movie seems telling us money, power success people different situations encounter. variation arthur schnitzler's play theme, director ...",positive


In [None]:
X=df['review']
y=df['sentiment']

In [None]:
X.head()

Unnamed: 0,review
0,"one reviewers mentioned watching 1 oz episode hooked. right, exactly happened me.the first thing struck oz brutality unflinching scenes violence, set right word go. trust me, show faint hearted timid. show pulls punches regards drugs, sex violence. ha..."
1,"wonderful little production. filming technique unassuming- old-time-bbc fashion gives comforting, sometimes discomforting, sense realism entire piece. actors extremely well chosen- michael sheen ""has got polari"" voices pat too! truly see seamless edit..."
2,"thought wonderful way spend time hot summer weekend, sitting air conditioned theater watching light-hearted comedy. plot simplistic, dialogue witty characters likable (even well bread suspected serial killer). may disappointed realize match point 2: r..."
3,"basically there's family little boy (jake) thinks there's zombie closet & parents fighting time.this movie slower soap opera... suddenly, jake decides become rambo kill zombie.ok, first going make film must decide thriller drama! drama movie watchable..."
4,"petter mattei's ""love time money"" visually stunning film watch. mr. mattei offers us vivid portrait human relations. movie seems telling us money, power success people different situations encounter. variation arthur schnitzler's play theme, director ..."


In [None]:
y.head()

Unnamed: 0,sentiment
0,positive
1,positive
2,positive
3,negative
4,positive


### **Label Encoding to convert string to numeric**

In [None]:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()

y = encoder.fit_transform(y)

In [None]:
y

array([1, 1, 1, ..., 0, 0, 1])

### **Train_Test Split**

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=1)

In [None]:
X_train.shape

(7986,)

In [None]:
X_test.shape

(1997,)

In [None]:
X_train.head()

Unnamed: 0,review
6713,"i've waiting superhero movie like long time. ""mystery men"" takes place among classic comic-strip spoofs tv like ""batman"" ""captain nice"" cartoons like ""underdog"" ""super chicken."" spirit lives them: comic tongue-in-cheek tone; courage aim heroic life ri..."
1178,"movie excellent acted, excellent directed overall excellent story. ive real life experiance boy like 'radio'. football program town, weve mentally challenged boy every year practice, travel, fun football team. movie really true identify 100%. boy like..."
4707,"movie makes want throw every time see it. take first movie, reverse plot (ariel wants leave sea, daughter wants go sea), take characters give new animals new names, throw crappy animation biggest suck factor, possible, get little mermaid 2. basically ..."
6772,"first saw movie elementary school, back 1960s. fascinated character played ingrid bergman introduction french quarter new orleans. first part movie best comes back exact revenge father's wife daughter (her mother driven disgrace). time meets wonderful..."
7461,show made persons iq lower 80. jokes show lame. deserted island anything better watch garbage.... hate accent behavior stupid jokes pranks try perform...it really pisses viewers gave reba 6.7 voting...sure knew people iq lower 80 know many them! peopl...


In [None]:
X_test.head()

Unnamed: 0,review
5333,"8 simple rules dating teenage daughter auspicious start. supremely-talented tom shadyac involved project. meant comedy would nothing less spectacular, that's exactly happened: show remains one freshest, funniest, wittiest shows made long time. every l..."
4113,"one imdb reviewer puts it, ""...imagine 2001: space odyssey desert"" far brief summarisation expect piece cinema (i deeply hesitate use word ""film""). lecture philosophical views creationism, mythos surrounding humanities existence, after, been, be. mayb..."
6853,"although ""better"" first mulva (which say much anyways, would rather watch paint dry) still sucks. favor avoid anything low budget pictures guys. suckered buying dvds support indy filmmakers boy regret it. even officially ""released"" yet (not bootlegs-b..."
3219,"film worst film, ranks high me. slasher movie be. takes place university seems handful students. teachers dumber sack hammers. filled good catholic priest, sexually repressed humor. bad hair, bad clothes. dialogue cliched hard believe able predict lin..."
7399,"astounding film. well showing actual footage key events failed coup oust chavez, given background picture describes class-divided society. many rich, appears, choice people's democratic choice, willing use military regime change. 'be careful say front..."


### **Applying Bag of Words**

In [None]:
# Applying BoW
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()

In [None]:
X_train_bow = cv.fit_transform(X_train).toarray()
X_test_bow = cv.transform(X_test).toarray()

In [None]:
X_train_bow

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [None]:
X_test_bow

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [None]:
y_train

array([1, 1, 0, ..., 0, 0, 1])

### **Naive Bayes**

In [None]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train_bow,y_train)

In [None]:
y_pred = classifier.predict(X_test_bow)


In [None]:
from sklearn.metrics import accuracy_score,confusion_matrix
accuracy_score(y_test,y_pred)

0.6324486730095142

In [None]:
confusion_matrix(y_test,y_pred)

array([[717, 235],
       [499, 546]])

### **Random Classifier**

In [None]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier()
classifier.fit(X_train_bow,y_train)


In [None]:
y_pred = classifier.predict(X_test_bow)

In [None]:
accuracy_score(y_test,y_pred)

0.8457686529794692

In [None]:
confusion_matrix(y_test,y_pred)

array([[807, 145],
       [163, 882]])

In [None]:
cv= CountVectorizer(max_features=3000)
X_train_bow = cv.fit_transform(X_train).toarray()
X_test_bow = cv.transform(X_test).toarray()

rf=RandomForestClassifier()
rf.fit(X_train_bow,y_train)
y_pred=rf.predict(X_test_bow)
accuracy_score(y_test,y_pred)
#

0.8327491236855283

## **N_Grams**

In [None]:
cv= CountVectorizer(max_features=5000,ngram_range=(2,2))
X_train_bow = cv.fit_transform(X_train).toarray()
X_test_bow = cv.transform(X_test).toarray()
rf=RandomForestClassifier()
rf.fit(X_train_bow,y_train)
y_pred=rf.predict(X_test_bow)
accuracy_score(y_test,y_pred)

0.7516274411617426

### **Tfidf-using Term Frequency - Inverse Document Frequency**

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf=TfidfVectorizer()
X_train_tf=tf.fit_transform(X_train).toarray()
X_test_tf=tf.transform(X_test).toarray()

rf=RandomForestClassifier()
rf.fit(X_train_tf,y_train)
y_pred=rf.predict(X_test_tf)
accuracy_score(y_test,y_pred)

0.8397596394591887

In [None]:
X_train.head()

Unnamed: 0,review
6713,"i've waiting superhero movie like long time. ""mystery men"" takes place among classic comic-strip spoofs tv like ""batman"" ""captain nice"" cartoons like ""underdog"" ""super chicken."" spirit lives them: comic tongue-in-cheek tone; courage aim heroic life ri..."
1178,"movie excellent acted, excellent directed overall excellent story. ive real life experiance boy like 'radio'. football program town, weve mentally challenged boy every year practice, travel, fun football team. movie really true identify 100%. boy like..."
4707,"movie makes want throw every time see it. take first movie, reverse plot (ariel wants leave sea, daughter wants go sea), take characters give new animals new names, throw crappy animation biggest suck factor, possible, get little mermaid 2. basically ..."
6772,"first saw movie elementary school, back 1960s. fascinated character played ingrid bergman introduction french quarter new orleans. first part movie best comes back exact revenge father's wife daughter (her mother driven disgrace). time meets wonderful..."
7461,show made persons iq lower 80. jokes show lame. deserted island anything better watch garbage.... hate accent behavior stupid jokes pranks try perform...it really pisses viewers gave reba 6.7 voting...sure knew people iq lower 80 know many them! peopl...


In [None]:
X_test.head()

Unnamed: 0,review
5333,"8 simple rules dating teenage daughter auspicious start. supremely-talented tom shadyac involved project. meant comedy would nothing less spectacular, that's exactly happened: show remains one freshest, funniest, wittiest shows made long time. every l..."
4113,"one imdb reviewer puts it, ""...imagine 2001: space odyssey desert"" far brief summarisation expect piece cinema (i deeply hesitate use word ""film""). lecture philosophical views creationism, mythos surrounding humanities existence, after, been, be. mayb..."
6853,"although ""better"" first mulva (which say much anyways, would rather watch paint dry) still sucks. favor avoid anything low budget pictures guys. suckered buying dvds support indy filmmakers boy regret it. even officially ""released"" yet (not bootlegs-b..."
3219,"film worst film, ranks high me. slasher movie be. takes place university seems handful students. teachers dumber sack hammers. filled good catholic priest, sexually repressed humor. bad hair, bad clothes. dialogue cliched hard believe able predict lin..."
7399,"astounding film. well showing actual footage key events failed coup oust chavez, given background picture describes class-divided society. many rich, appears, choice people's democratic choice, willing use military regime change. 'be careful say front..."


In [None]:
y_train

array([1, 1, 0, ..., 0, 0, 1])

In [None]:
y_test

array([1, 1, 0, ..., 1, 0, 0])

### **word2vec with randomforest**

In [None]:

from gensim.models import Word2Vec
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder

In [None]:
df['review']
df2=df
df2['review'][0]

"one reviewers mentioned watching 1 oz episode hooked. right, exactly happened me.the first thing struck oz brutality unflinching scenes violence, set right word go. trust me, show faint hearted timid. show pulls punches regards drugs, sex violence. hardcore, classic use word.it called oz nickname given oswald maximum security state penitentary. focuses mainly emerald city, experimental section prison cells glass fronts face inwards, privacy high agenda. em city home many..aryans, muslims, gangstas, latinos, christians, italians, irish more....so scuffles, death stares, dodgy dealings shady agreements never far away.i would say main appeal show due fact goes shows dare. forget pretty pictures painted mainstream audiences, forget charm, forget romance...oz mess around. first episode ever saw struck nasty surreal, say ready it, watched more, developed taste oz, got accustomed high levels graphic violence. violence, injustice (crooked guards who'll sold nickel, inmates who'll kill order g

In [None]:
df2['tokenized_review'] = df2['review'].apply(lambda x: x.lower().split())

In [None]:
word2vec_model = Word2Vec(sentences=df['tokenized_review'], vector_size=100, window=5, min_count=1, workers=4)

# Step 4: Create feature vectors for each review
def get_review_vector(review):
    # Get the word vectors for words in the review
    vectors = [word2vec_model.wv[word] for word in review if word in word2vec_model.wv]
    # Average the vectors; handle cases where there are no known words
    return np.mean(vectors, axis=0) if vectors else np.zeros(word2vec_model.vector_size)

# Apply the function to create feature vectors
df['feature_vector'] = df['tokenized_review'].apply(get_review_vector)

# Convert feature vectors into a usable format for modeling
X = np.vstack(df['feature_vector'])
y = df['sentiment']

# Step 5: Encode the target variable
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

# Step 6: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

# Step 7: Train a Random Forest classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Step 8: Make predictions and evaluate the model
y_pred = rf_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 0.72


# **Using Hugging Face Pretrained Model to Predict Sentiment**

In [None]:
# Install required libraries
!pip install transformers datasets torch pandas scikit-learn

# 1. Import Required Libraries
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
)

# 2. Load the Dataset
# Upload the CSV file in Colab
#from google.colab import files
#uploaded = files.upload()

# Load the dataset
data = pd.read_csv("/content/IMDB Dataset.csv")
data=data.iloc[:1000]  # Replace with the actual file name
data.head()

# 3. Preprocess the Dataset
# Map sentiment labels to integers
data['label'] = data['sentiment'].map({'positive': 1, 'negative': 0})

# Extract reviews and labels
reviews = data['review'].tolist()
labels = data['label'].tolist()

# Split the dataset into training and test sets
train_texts, test_texts, train_labels, test_labels = train_test_split(
    reviews, labels, test_size=0.2, random_state=42
)

# 4. Load Pre-Trained Tokenizer
model_name = "distilbert-base-uncased"  # Pre-trained model
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Tokenize the data
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=128)
test_encodings = tokenizer(test_texts, truncation=True, padding=True, max_length=128)

# 5. Prepare the Dataset for Hugging Face
class SentimentDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

train_dataset = SentimentDataset(train_encodings, train_labels)
test_dataset = SentimentDataset(test_encodings, test_labels)

# 6. Load the Pre-Trained Model
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# 7. Define Training Arguments
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    report_to="none",  # Disable W&B logging
)

# 8. Train the Model
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

trainer.train()

# 9. Evaluate the Model
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

# 10. Make Predictions
# New reviews for prediction
new_reviews = ["This movie was fantastic!", "I did not like the film at all."]

# Tokenize new reviews
new_encodings = tokenizer(new_reviews, truncation=True, padding=True, max_length=128, return_tensors="pt")

# Make predictions
outputs = model(**new_encodings)
predictions = torch.argmax(outputs.logits, dim=1)

# Map predictions to sentiment labels
sentiment_map = {0: "negative", 1: "positive"}
predicted_sentiments = [sentiment_map[pred.item()] for pred in predictions]

print("Predicted Sentiments:", predicted_sentiments)




Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
1,0.415,0.422337
2,0.2657,0.392241
3,0.2192,0.385131


Evaluation Results: {'eval_loss': 0.3851306438446045, 'eval_runtime': 43.9647, 'eval_samples_per_second': 4.549, 'eval_steps_per_second': 0.296, 'epoch': 3.0}
Predicted Sentiments: ['positive', 'negative']


## **New reviews for prediction**

In [None]:
new_reviews = ["This movie was bad", "I like the film very much."]
new_encodings = tokenizer(new_reviews, truncation=True, padding=True, max_length=128, return_tensors="pt")
outputs = model(**new_encodings)
predictions = torch.argmax(outputs.logits, dim=1)
sentiment_map = {0: "negative", 1: "positive"}
predicted_sentiments = [sentiment_map[pred.item()] for pred in predictions]
print("Predicted Sentiments:", predicted_sentiments)

Predicted Sentiments: ['negative', 'positive']


In [None]:
new_reviews = ["This movie was good and excellent!", "I like the film very much."]
new_encodings = tokenizer(new_reviews, truncation=True, padding=True, max_length=128, return_tensors="pt")
outputs = model(**new_encodings)
predictions = torch.argmax(outputs.logits, dim=1)
sentiment_map = {0: "negative", 1: "positive"}
predicted_sentiments = [sentiment_map[pred.item()] for pred in predictions]
print("Predicted Sentiments:", predicted_sentiments)

Predicted Sentiments: ['positive', 'positive']


### **Make a Function to pass new reviews**

In [None]:
def predict_sentiment(reviews, model, tokenizer):
    encodings = tokenizer(reviews, truncation=True, padding=True, max_length=128, return_tensors="pt")
    outputs = model(**encodings)
    predictions = torch.argmax(outputs.logits, dim=1)
    sentiment_map = {0: "negative", 1: "positive"}
    predicted_sentiments = [sentiment_map[pred.item()] for pred in predictions]
    return predicted_sentiments


In [None]:
new_reviews = ["This movie was good and excellent!", "I like the film very much."]
predicted_sentiments = predict_sentiment(new_reviews, model, tokenizer)
print("Predicted Sentiments:", predicted_sentiments)


Predicted Sentiments: ['positive', 'positive']


# **Create a loop to auto generate reviews and predict sentiments**

In [None]:
def predict_sentiment(reviews, model, tokenizer, batch_size=10):
    predicted_sentiments = []
    sentiment_map = {0: "negative", 1: "positive"}

    # Process reviews in batches
    for i in range(0, len(reviews), batch_size):
        batch = reviews[i:i+batch_size]
        encodings = tokenizer(batch, truncation=True, padding=True, max_length=128, return_tensors="pt")
        outputs = model(**encodings)
        predictions = torch.argmax(outputs.logits, dim=1)
        batch_sentiments = [sentiment_map[pred.item()] for pred in predictions]
        predicted_sentiments.extend(batch_sentiments)

    return predicted_sentiments

# Example usage
# Generate a list of 100 sample reviews
new_reviews = [f"Review {i}: This movie was amazing!" if i % 2 == 0 else f"Review {i}: I did not enjoy this movie." for i in range(1, 101)]

# Predict sentiments for all reviews
predicted_sentiments = predict_sentiment(new_reviews, model, tokenizer, batch_size=10)

# Print the results
for review, sentiment in zip(new_reviews, predicted_sentiments):
    print(f"Review: {review} | Sentiment: {sentiment}")


Review: Review 1: I did not enjoy this movie. | Sentiment: negative
Review: Review 2: This movie was amazing! | Sentiment: positive
Review: Review 3: I did not enjoy this movie. | Sentiment: negative
Review: Review 4: This movie was amazing! | Sentiment: positive
Review: Review 5: I did not enjoy this movie. | Sentiment: negative
Review: Review 6: This movie was amazing! | Sentiment: positive
Review: Review 7: I did not enjoy this movie. | Sentiment: negative
Review: Review 8: This movie was amazing! | Sentiment: positive
Review: Review 9: I did not enjoy this movie. | Sentiment: negative
Review: Review 10: This movie was amazing! | Sentiment: positive
Review: Review 11: I did not enjoy this movie. | Sentiment: negative
Review: Review 12: This movie was amazing! | Sentiment: positive
Review: Review 13: I did not enjoy this movie. | Sentiment: negative
Review: Review 14: This movie was amazing! | Sentiment: positive
Review: Review 15: I did not enjoy this movie. | Sentiment: negative
Re