# Sentiment Analysis of IMDb Reviews

This project employs sentiment analysis to classify IMDb movie reviews using static embeddings (Word2Vec) and contextual embeddings (BERT). This comparison aims to evaluate their effectiveness in sentiment classification, providing insights that could guide model selection and application strategies in natural language processing tasks.

### 1. Setup and Installation
#### Library Installation
First, ensure the necessary Python libraries are installed. This setup includes libraries for data manipulation, machine learning, neural networks, and visualization:

In [None]:
%pip install scipy pandas numpy transformers scikit-learn matplotlib seaborn threadpoolctl joblib gensim ipywidgets tqdm
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

#### Import Libraries
Import all required libraries and frameworks, setting up the environment for data processing and model training:

In [None]:
import pandas as pd
import numpy as np
import torch
from torch import nn, optim, utils
from transformers import BertTokenizer, BertForSequenceClassification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, roc_curve, auc, matthews_corrcoef, precision_recall_curve, average_precision_score
import matplotlib.pyplot as plt
import seaborn as sns
import re
from gensim import downloader as api
from tqdm.auto import tqdm
from sklearn.feature_extraction.text import TfidfVectorizer

# Setup device and seeds for reproducibility
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
np.random.seed(42)
torch.manual_seed(42)

### 2. Problem Definition and Hypothesis
#### Objective
The main goal is to determine how well static and contextual embeddings can classify sentiments of IMDb movie reviews, thereby understanding the implications of each method's approach to text representation.

#### Hypothesis
It is hypothesized that contextual embeddings (BERT) will outperform static embeddings (Word2Vec) due to their advanced capability in understanding the context and nuances of language.