## Sentiment Analysis with a Basic RNN: Assignment Overview

In this assignment, you will develop a basic Recurrent Neural Network (RNN) to perform sentiment analysis on movie reviews. Your goal is to classify each review as either positive or negative based on its content. This task will introduce you to the fundamentals of sequence modeling with RNNs and the process of working with text data in natural language processing (NLP). 

### Detailed Steps:

#### 1. **Dataset Preparation**
- **Task**: Utilize the IMDB movie reviews dataset, a popular resource for sentiment analysis.
- **Instructions**:
    1. Load the IMDB dataset. You can use PyTorch's `torchtext.datasets` or another convenient loader.  The dataset comes as 25,000 sentences in a training set and 25,000 in a testing set.  Split the training set further into 20,000 and 5,000 sentences for training and validation, respectively.
    2. Preprocess the data: Tokenize the text reviews, turning each review into a list of words. PyTorch's `torchtext.data.utils.get_tokenizer` can be helpful here.
    3. Build a vocabulary based on the training dataset. This involves mapping each unique word to an integer index. Limit the vocabulary size to keep the model manageable, considering only the most frequent words.
    4. Convert the reviews into sequences of integers using this vocabulary. Remember to handle the conversion of both your training and testing sets.
    5. Determine an appropriate sequence length for padding/truncation to ensure uniform input size.

#### 2. **Model Definition**
- **Task**: Define a basic RNN model in PyTorch for sentiment analysis.
- **Instructions**:
    1. Create a model class that inherits from `torch.nn.Module`. Incorporate at least one RNN layer (`torch.nn.RNN`, `torch.nn.GRU`, or `torch.nn.LSTM`) to handle the sequential nature of text data.
    2. Include an embedding layer (`torch.nn.Embedding`) to transform integer tokens into dense vector representations before feeding them into the RNN layer.
    3. Your model should output a sentiment score, which can be interpreted as the positive review's probability. To achieve this, a final fully connected layer and a sigmoid activation function can be used.

#### 3. **Training the Model**
- **Task**: Implement the training loop for your RNN model on the sentiment analysis task.
- **Instructions**:
    1. Choose a suitable loss function, such as binary cross-entropy (`torch.nn.BCELoss`), since this is a binary classification task.
    2. Select an optimizer (e.g., `torch.optim.Adam`) to update your model's weights during training.
    3. Develop a training loop where you feed batches of data to the model, calculate the loss, and update the model parameters using the optimizer.
    4. Validate your model's performance on a separate portion of the dataset to monitor its generalization ability.
    5. Experiment with different hyperparameters (e.g., learning rate, number of RNN layers) to find a good performance configuration.

#### 4. **Evaluation and Analysis**
- **Task**: Assess your trained model's performance on the testing set.
- **Instructions**:
    1. After training, use your model to predict sentiments for the reviews in the testing set. Calculate the accuracy or other relevant metrics to evaluate performance.
    2. Analyze the results: Consider examining specific cases where the model performs well or poorly to gain insights into its strengths and weaknesses.
    3. Reflect on the limitations of your approach and potential improvements. For instance, consider how more complex models or different preprocessing steps might impact performance.

This assignment will give you practical experience with sequence modeling using RNNs and the foundational steps involved in processing and analyzing text data for sentiment analysis.

### Notes and Resources:

- Section 12.1 in *Inside Deep Learning. Math, Algorithms, and Models* introduces most of what you need for Steps 1-3.
- You can use Pytorch Lightning as we've been doing, or the tools that come with the book.
- The code for each chapter and the `idlmam.py` file you'll need to import from can be found in the [book's Github repo](https://github.com/EdwardRaff/Inside-Deep-Learning).
- This assignment will be broken into 3 deliverables:
    1. Data preparation.  Complete Step 1.
    2. Model formulation and training.  Add Steps 2 and 3.
    3. Evaluation and Analysis.  Add Step 4.
- The final submission will be graded both on technical correctness and on presentation.  Your work should be in well formatted Jupyter notebook with narration and a summary.  