<a id="1"></a>
# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #48cae4; cursor: pointer;font-family: cursive;"><b> 1. Introduction </b></div>

Sentiment analysis is one of the most common and impactful applications of Natural Language Processing (NLP).  
It enables machines to understand the emotional tone behind text ‚Äî a core component in tasks like product reviews, social media monitoring, and customer feedback analysis.

In this notebook, we dive into the IMDB dataset, which contains 50,000 labeled movie reviews, to build and compare two classic NLP pipelines:

- A **Logistic Regression** model with **TF-IDF** features  
- A **BiLSTM** model powered by **Word2Vec** embeddings

Through this exploration, you‚Äôll see how even traditional models can produce meaningful results in sentiment classification ‚Äî and how preprocessing, feature extraction, and model selection all come together in an end-to-end NLP workflow.

üîÑ In a follow-up notebook (**NLP with IMDB: Classic Models vs. Transformers**), we will compare these results with predictions from a **Transformer-based model (DeBERTa)**, implemented in a separate notebook (**NLP with IMDB: Transformers - DeBERTa**).


![Simple RNN](https://miro.medium.com/v2/resize:fit:828/format:webp/1*3ltsv1uzGR6UBjZ6CUs04A.jpeg)

# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> 2. Importing Libraries </b></div>

In this section, we import all the libraries required for data processing, feature extraction, and model building. These libraries help us perform tasks like loading and manipulating data, tokenizing text, building neural networks.


In [1]:
# Core libraries
import numpy as np
import pandas as pd

# NLP preprocessing
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('punkt')
nltk.download('omw-1.4')
nltk.download('stopwords')

# Scikit-learn for splitting data, evaluation and models 
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression


# TensorFlow/Keras for deep learning (BiLSTM)
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    LSTM, Bidirectional, Dense, Dropout, Masking
)
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import EarlyStopping


[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /usr/share/nltk_data...
[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> 3. Loading the Dataset </b></div>

In this step, we load the IMDB dataset, which contains 50,000 movie reviews labeled as either positive or negative.  
Each review is stored as a text entry along with its sentiment label.

We use `pandas.read_csv()` to load the dataset into a DataFrame for further processing.


In [2]:
path = '/kaggle/input/imdb-dataset-of-50k-movie-reviews/IMDB Dataset.csv'
df = pd.read_csv(path)

# Show the first 5 rows
df.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> 4. Data Preprocessing </b></div>

Text data in its raw form usually contains noise such as HTML tags, punctuation, numbers, links, and stopwords. These elements do not contribute meaningfully to the model and may negatively affect its performance.

In this step, we define a `preprocess_text()` function to clean the reviews by:

- Converting all text to lowercase.
- Removing HTML tags.
- Removing URLs.
- Removing non-alphabetic characters.
- Removing extra spaces.
- Removing stopwords (commonly used words like "the", "is", etc. that carry little meaning in classification tasks).

Finally, we apply this function to the review column and create a new column called `clean_review`.

**Why preprocessing is important in NLP:**
- It reduces noise in the data.
- Helps models focus on meaningful patterns.
- Improves accuracy and generalization of NLP models.


In [3]:
# Define English stopwords
stop_words = set(stopwords.words('english'))

# Define preprocessing function
def preprocess_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove HTML tags
    text = re.sub(r'<.*?>', '', text)
    # Remove URLs
    text = re.sub(r'http\S+|www\S+', '', text)
    # Remove non-alphabetic characters
    text = re.sub(r'[^a-z\s]', '', text)
    # Remove extra spaces
    text = re.sub(r'\s+', ' ', text).strip()
    # Remove stopwords
    tokens = text.split()
    tokens = [word for word in tokens if word not in stop_words]
    return " ".join(tokens)

# Apply preprocessing to the 'review' column
df['clean_review'] = df['review'].apply(preprocess_text)

# Show sample after preprocessing
df[['review', 'clean_review']].head()


Unnamed: 0,review,clean_review
0,One of the other reviewers has mentioned that ...,one reviewers mentioned watching oz episode yo...
1,A wonderful little production. <br /><br />The...,wonderful little production filming technique ...
2,I thought this was a wonderful way to spend ti...,thought wonderful way spend time hot summer we...
3,Basically there's a family where a little boy ...,basically theres family little boy jake thinks...
4,"Petter Mattei's ""Love in the Time of Money"" is...",petter matteis love time money visually stunni...


### Convert Sentiment Labels

The sentiment labels in the dataset are in text form: "positive" or "negative".  
To make them usable for machine learning models, we map:

- "negative" ‚Üí 0  
- "positive" ‚Üí 1

This numeric format is required for most classification algorithms.


In [4]:
# Keep only Positive and Negative samples
df['sentiment'] = df['sentiment'].map({'negative': 0, 'positive': 1})

### Split the Dataset

We divide the dataset into three parts:

- **Training set (64%)**: Used to train the model.
- **Validation set (16%)**: Used to tune model parameters and prevent overfitting.
- **Test set (20%)**: Used to evaluate final performance (will be used in a separate notebook).

We use `train_test_split` from scikit-learn and apply stratified sampling to maintain equal class distribution across splits.


In [5]:
# Step 1: Split into train (80%) and test (20%)
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42, stratify=df['sentiment'])

# Step 2: Split train into train (80% of 80%) and val (20% of 80%) ‚Üí 64% train, 16% val
train_df, val_df = train_test_split(train_df, test_size=0.2, random_state=42, stratify=train_df['sentiment'])

# Show sizes
print(f"Train size: {len(train_df)}")
print(f"Validation size: {len(val_df)}")
print(f"Test size: {len(test_df)}")


Train size: 32000
Validation size: 8000
Test size: 10000


# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> 5. Tokenization</b></div>

After cleaning the text, we perform **tokenization**, which is the process of splitting each review (a string) into individual words (called tokens).

We use `nltk.word_tokenize` to convert each cleaned review into a list of words.  
This results in a new column `tokens` that holds the tokenized version of each review.

### Why is Tokenization Important in NLP?

- It transforms raw text into structured units (words or subwords) that can be processed by models.
- Most NLP models operate on word-level inputs, so tokenization is a **critical first step**.
- It enables later steps like embedding, padding, and sequence modeling (e.g., using RNNs or Transformers).

Tokenization is the bridge between raw text and numerical representation of language.


In [6]:
# Apply word_tokenize to the cleaned review column
train_df['tokens'] = train_df['clean_review'].apply(word_tokenize)
val_df['tokens'] = val_df['clean_review'].apply(word_tokenize)
test_df['tokens'] = test_df['clean_review'].apply(word_tokenize)

# Show sample tokens
train_df[['clean_review', 'tokens']].head()


Unnamed: 0,clean_review,tokens
26680,oh yes agree others describe appalling acting ...,"[oh, yes, agree, others, describe, appalling, ..."
16648,basic hook lincoln slow slowness represents th...,"[basic, hook, lincoln, slow, slowness, represe..."
29967,utter trash im huge fan cusacks sole reason wa...,"[utter, trash, im, huge, fan, cusacks, sole, r..."
34122,meet cosmo jason priestley nerdy young bookie ...,"[meet, cosmo, jason, priestley, nerdy, young, ..."
823,dont know people criticise show muchit great f...,"[dont, know, people, criticise, show, muchit, ..."


# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> 6. Text Representation Methods </b></div>

Before feeding text into machine learning or deep learning models, we must first convert it into a numerical format. This step is known as **text representation**, and it‚Äôs one of the most important steps in any NLP pipeline.

In this section, we explore two common techniques:
- **TF-IDF Feature Extraction**: A classical method that assigns importance to words based on their frequency.
- **Word2Vec Embedding**: A neural embedding technique that captures the semantic meaning of words.

Each method has its advantages and is suited to different types of models.


## 6.1 TF-IDF Feature Extraction

TF-IDF (Term Frequency ‚Äì Inverse Document Frequency) is a classic technique used to convert text data into numerical features.

We use the `TfidfVectorizer` from Scikit-learn to:
- Join tokens back into full sentences (as TF-IDF works on raw text).
- Fit the vectorizer on the training data.
- Transform all text into a matrix of numerical values.

### Why use TF-IDF?

- It highlights **important words** in each document while down-weighting common words across all documents.
- It produces a **sparse matrix** representation useful for traditional ML models like Logistic Regression or SVM.
- It is **fast and interpretable**, making it suitable for baseline models.

TF-IDF doesn‚Äôt understand word meaning or context, but it‚Äôs often surprisingly effective for many NLP tasks.


In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Join the tokens back into full sentences (as TF-IDF expects raw text input)
train_texts = train_df['tokens'].apply(lambda x: ' '.join(x))
val_texts = val_df['tokens'].apply(lambda x: ' '.join(x))
test_texts = test_df['tokens'].apply(lambda x: ' '.join(x))

# Initialize TF-IDF vectorizer with a maximum of 5000 features
tfidf = TfidfVectorizer(max_features=5000)

# Fit the vectorizer on training data and transform it
X_train_tfidf = tfidf.fit_transform(train_texts)

# Transform validation and test sets using the fitted vectorizer
X_val_tfidf = tfidf.transform(val_texts)
X_test_tfidf = tfidf.transform(test_texts)


## 6.2 Word2Vec Embedding

Unlike TF-IDF, Word2Vec learns **dense vector representations** for each word based on its context in the training data.  
This allows us to capture **semantic meaning**‚Äîwords with similar meaning tend to have similar vectors.

We use Gensim's `Word2Vec` to:
- Train an embedding model on our tokenized training data.
- Convert each review into a sequence of word vectors using a helper function.

### Why use Word2Vec?

- It captures **semantic relationships** between words (e.g., king - man + woman ‚âà queen).
- The vectors are **dense and compact**, suitable for neural networks.
- It enables deep models like BiLSTM to process richer information than sparse counts.

### Importance of Embedding in NLP:

- Embedding layers transform words into a format the model can learn from.
- Good embeddings boost model performance by providing meaningful context.
- They serve as the **foundation of modern NLP architectures** (including Transformers, BERT, etc.).

In short, Word2Vec gives your model a better understanding of language structure and meaning.


In [8]:
from gensim.models import Word2Vec

# Train a Word2Vec model on the tokenized text
w2v_model = Word2Vec(sentences=train_df['tokens'], vector_size=200, window=6, min_count=2)

# Function to convert tokens to sequence of word vectors
def tokens_to_sequence(tokens):
    vectors = [w2v_model.wv[word] for word in tokens if word in w2v_model.wv]
    return vectors

# Apply tokens_to_sequence to convert tokens to sequences of word vectors
X_train_seq_w2v = train_df['tokens'].apply(tokens_to_sequence).tolist()
X_val_seq_w2v = val_df['tokens'].apply(tokens_to_sequence).tolist()
X_test_seq_w2v = test_df['tokens'].apply(tokens_to_sequence).tolist()

# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> 7. Model Training </b></div>

A model is a mathematical structure that learns patterns from data.  
Training a model means teaching it to make predictions by learning from labeled examples (inputs and outputs).


## 7.1 Logistic Regression

We begin by training a simple yet powerful baseline model ‚Äî Logistic Regression ‚Äî using the TF-IDF features.


**Logistic Regression** is a linear model used for binary classification tasks.  
It predicts the probability that a given input belongs to a certain class (e.g., positive or negative).

### Why use Logistic Regression in NLP?
- It is **fast**, **interpretable**, and **easy to implement**.
- It often provides **strong baselines** for text classification when combined with good feature extraction (like TF-IDF).
- Despite being simple, it works surprisingly well on many real-world NLP tasks ‚Äî especially when:
  - The dataset is not huge.
  - Interpretability is important.
  - You need fast prototyping or lightweight deployment.

While LSTMs and Transformers are more powerful for context and deep semantics, Logistic Regression still holds value for:
- Quick experimentation.
- Low-resource environments.
- Tasks where deep models are overkill.

### Explanation of hyperparameters:
- `max_iter=250`: The maximum number of iterations to converge (sometimes TF-IDF + large data needs more steps).
- `C=0.1`: Inverse of regularization strength ‚Äî lower means stronger regularization (helps avoid overfitting).
- `penalty='l2'`: L2 regularization adds a penalty for large weights (encourages simpler models).

### Why evaluate on the validation set?

Validation accuracy shows how well the model generalizes.  
If it's much lower than training accuracy ‚Üí possible **overfitting**.  
If both are close ‚Üí the model is likely generalizing well.


In [9]:
# Initialize Logistic Regression model
log_reg = LogisticRegression(max_iter=250, C=0.1, penalty='l2')
# Train the model
log_reg.fit(X_train_tfidf, train_df['sentiment'])

# Predict on training set
log_train_preds = log_reg.predict(X_train_tfidf)
train_accuracy = accuracy_score(train_df['sentiment'], log_train_preds)

# Predict on validation set
log_val_preds = log_reg.predict(X_val_tfidf)
val_accuracy = accuracy_score(val_df['sentiment'], log_val_preds)

# Print results
print("Logistic Regression Accuracy:")
print(f"Train Accuracy: {train_accuracy:.4f}")
print(f"Validation Accuracy: {val_accuracy:.4f}")


Logistic Regression Accuracy:
Train Accuracy: 0.8768
Validation Accuracy: 0.8698


## 7.2 BiLSTM (Bidirectional LSTM)

Recurrent Neural Networks (RNNs) are designed to handle sequential data.  
Long Short-Term Memory (LSTM) is a special type of RNN that solves the vanishing gradient problem, allowing the model to learn long-term dependencies.

**BiLSTM** extends LSTM by processing the input sequence in both forward and backward directions.  
This gives the model more context, especially useful in NLP where both past and future words can influence meaning.


### Sequence Padding

Before feeding data into the BiLSTM model, we need to make sure that all input sequences have the same shape.  
This is done using `pad_sequences()`:

- `max_len = 280`: the maximum number of tokens per review.
- `vector_size = 200`: the size of each word vector from Word2Vec.
- Padding and truncation are applied **after** the sequence (`post`) for consistency.

This step produces a 3D tensor of shape `(samples, max_len, vector_size)`, which is required for LSTM input.


In [10]:
# Define max sequence length and vector size (same as Word2Vec vector size)
max_len = 280
vector_size = 200

# Pad sequences to make them all of the same shape (max_len x vector_size)
X_train_seq_padded = pad_sequences(X_train_seq_w2v, maxlen=max_len,
                                   dtype='float32',padding='post',
                                   truncating='post', value=0.0)
X_val_seq_padded = pad_sequences(X_val_seq_w2v, maxlen=max_len,
                                 dtype='float32',padding='post',
                                 truncating='post', value=0.0)
X_test_seq_padded = pad_sequences(X_test_seq_w2v, maxlen=max_len,
                                  dtype='float32',padding='post',
                                  truncating='post', value=0.0)


### BiLSTM Model Architecture

The model is built using Keras `Sequential` API:

- `Masking`: ignores padding (zeros) in the input.
- `Bidirectional(LSTM)`: two stacked BiLSTM layers to capture forward and backward context.
- `Dropout`: added after each layer to reduce overfitting.
- `Dense` layers: fully connected layers for classification.
- Final `Dense(1, activation='sigmoid')`: outputs probability of positive sentiment.

We compile the model with:
- `binary_crossentropy` loss (since it‚Äôs a binary classification task).
- `Adam` optimizer with learning rate `0.001`.

### Why BiLSTM for NLP?

- Text is **sequential** ‚Äî the meaning of a word depends on its position and neighboring words.
- LSTM captures dependencies from **past tokens** (left-to-right).
- BiLSTM captures **both past and future tokens** (left-to-right and right-to-left).
- This makes it more powerful than regular LSTM in tasks like sentiment analysis, named entity recognition, and more.

While Transformers have surpassed BiLSTM in many benchmarks, BiLSTM still performs well, especially:
- When data is not massive.
- When model size or training time is limited.
- In low-resource or real-time applications.


In [11]:
# Build the BiLSTM model
model = Sequential([
    Masking(mask_value=0.0, input_shape=(max_len, vector_size)), 
    Bidirectional(LSTM(256, return_sequences=True, 
                       kernel_regularizer=regularizers.l2(0.0005))),
    Dropout(0.4),
    Bidirectional(LSTM(128, kernel_regularizer=regularizers.l2(0.0005))),
    Dropout(0.4),
    Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.0005)),
    Dropout(0.4),
    Dense(32, activation='relu', kernel_regularizer=regularizers.l2(0.0005)),
    Dropout(0.4),
    Dense(1, activation='sigmoid') 
])

# Compile the model
model.compile(loss='binary_crossentropy',
              optimizer= Adam(learning_rate=0.001),
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train_seq_padded,
                    train_df['sentiment'],
                    epochs=12,
                    batch_size=64,
                    validation_data=(X_val_seq_padded, val_df['sentiment']))


  super().__init__(**kwargs)


Epoch 1/12
[1m500/500[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m58s[0m 103ms/step - accuracy: 0.7502 - loss: 0.9726 - val_accuracy: 0.8475 - val_loss: 0.5257
Epoch 2/12
[1m500/500[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m47s[0m 94ms/step - accuracy: 0.8548 - loss: 0.4926 - val_accuracy: 0.8406 - val_loss: 0.4448
Epoch 3/12
[1m500/500[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m46s[0m 93ms/step - accuracy: 0.8646 - loss: 0.4209 - val_accuracy: 0.8869 - val_loss: 0.3466
Epoch 4/12
[1m500/500[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m46s[0m 92ms/step - accuracy: 0.8751 - loss: 0.3650 - val_accuracy: 0.8839 - val_loss: 0.3334
Epoch 5/12
[1m500/500[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m46s[0m 92ms/step - accuracy: 0.8797 - loss: 0.3473 - val_accuracy: 0.883

# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> 8. Export Predictions for Evaluation </b></div>

After generating predictions and probabilities from both models, we save the results into two CSV files:

- `logistic_preds.csv`: Contains predictions from the Logistic Regression model.
- `bilstm_preds.csv`: Contains predictions from the BiLSTM model.

Each file includes:
- The original review text.
- The true sentiment label.
- The predicted label.
- The probability for both negative and positive classes.

These files will be used in a **separate notebook** titled:  
**"NLP with IMDB: Classic Models vs. Transformers"**,  
which is dedicated to analyzing and comparing the performance of different models (Logistic Regression, BiLSTM, and DeBERTa).

We separated the notebooks because:
- Running both BiLSTM and DeBERTa in the same Kaggle notebook caused memory/GPU issues.
- We had to **restart the kernel** to free resources.
- This modular approach ensures smooth execution and better resource management.


In [12]:
# Logistic Regression Predictions
logistic_probs = log_reg.predict_proba(X_test_tfidf)
logistic_preds = log_reg.predict(X_test_tfidf)

logistic_results_df = pd.DataFrame({
    'review': test_df['review'].values,
    'true_label': test_df['sentiment'].values,
    'predicted_label': logistic_preds,
    'prob_negative': logistic_probs[:, 0],
    'prob_positive': logistic_probs[:, 1]
})

logistic_results_df.to_csv("logistic_preds.csv", index=False)
print("Saved logistic_preds.csv")


# BiLSTM Predictions
bilstm_probs = model.predict(X_test_seq_padded)
bilstm_preds = (bilstm_probs > 0.5).astype(int).flatten()

bilstm_results_df = pd.DataFrame({
    'review': test_df['review'].values,
    'true_label': test_df['sentiment'].values,
    'predicted_label': bilstm_preds,
    'prob_negative': 1 - bilstm_probs.flatten(),
    'prob_positive': bilstm_probs.flatten()
})

bilstm_results_df.to_csv("bilstm_preds.csv", index=False)
print("Saved bilstm_preds.csv")


Saved logistic_preds.csv
[1m313/313[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m8s[0m 23ms/step
Saved bilstm_preds.csv


# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> 9. Conclusion </b></div>

In this notebook, we explored two classic approaches to sentiment analysis using the IMDB movie reviews dataset:

- **Logistic Regression** with TF-IDF features  
- **BiLSTM** with Word2Vec embeddings

We applied essential NLP steps ‚Äî including text preprocessing, tokenization, and feature extraction ‚Äî then trained both models and saved their predictions for further analysis.

---

Natural Language Processing (NLP) enables machines to understand and interpret human language, and even with relatively simple models, we can uncover meaningful insights from text data.  
This notebook demonstrates that classical methods like Logistic Regression and BiLSTM still serve as strong baselines and foundational tools in modern NLP workflows.

---

In the next phase of this project, we will compare the results from these models with a modern Transformer-based model (**DeBERTa**) in a separate notebook:  
**"NLP with IMDB: Classic Models vs. Transformers"**

This comparative analysis will help us better understand how traditional approaches stack up against state-of-the-art deep learning models in real-world NLP applications.


# <div style="text-align:center; background: #03045e; padding: 7px; border-radius:10px 10px; font-size: 1.5em; color: #e3f2fd; cursor: pointer;font-family: cursive;"><b> End Code ‚ò∫ </b></div>