# Base2 Model: Lightweight Text Sentiment Analysis Using PyTorch

## Introduction
The **Base2 model** improves upon the initial sentiment analysis approach by leveraging a **neural network** with **LSTM** (Long Short-Term Memory) layers, which are more powerful for sequential data like text. Unlike the previous model based on `TextBlob`, this model is designed to handle more complex patterns in text while maintaining **computational efficiency** by using a **lightweight tokenizer** and a **custom vocabulary** built directly from the dataset.

### Why Use LSTM?
1. **Sequential Learning**: LSTM is ideal for processing text data as it can learn the relationships between words in a sentence and capture the context necessary to understand sentiment.
2. **Memory Capabilities**: LSTM can "remember" important information across longer sequences, which is particularly useful when analyzing longer text entries.
3. **Efficiency with Lightweight Components**: By using a basic tokenizer and a custom vocabulary built from the data, we reduce computational load and memory usage while still benefiting from deep learning performance.

### Model Architecture
The architecture of this model includes the following layers:
1. **Embedding Layer**: We use a randomly initialized embedding layer, which is learned during training. This reduces the complexity compared to pre-trained embeddings (like GloVe) while still providing effective representations.
2. **LSTM Layer**: This layer processes the sequential data (text) and learns the underlying patterns that contribute to sentiment.
3. **Fully Connected Layer**: This final layer maps the LSTM's output to sentiment categories (positive, negative, neutral).
4. **Softmax Activation**: Converts the output to probabilities for each sentiment class.

### Key Benefits of the Lightweight Model
- **Reduced Complexity**: By eliminating external libraries for embeddings and using a custom tokenizer and vocabulary, the model is faster and uses less memory.
- **Performance**: The model retains its ability to capture nuanced sentiments through LSTM while being optimized for GPU acceleration.
- **Scalability**: It can be extended to larger datasets or integrated with additional features like personalized music recommendations.

## Dataset
The dataset consists of labeled text entries (e.g., movie reviews, product reviews, social media posts) with sentiment labels:
- **Positive**
- **Negative**
- **Neutral**

The text data is preprocessed by:
1. **Tokenization**: Using a basic tokenizer that splits sentences into words based on spaces.
2. **Padding and Truncation**: Ensuring all input sequences have the same length.
3. **Custom Vocabulary**: A lightweight vocabulary built from the dataset itself, ensuring minimal computational overhead.

## Training Process
The model is trained on a split dataset (training and testing), where it learns to predict sentiment based on the text input. After each epoch, we evaluate the model’s performance on the test set using the following metrics:
- **Accuracy**: The percentage of correct predictions.
- **Loss**: Measures how far the predicted sentiments are from the actual labels, minimized during training.

## Music Recommendation System
Once the sentiment is predicted, the app can provide personalized music recommendations based on the sentiment:
- **Positive Sentiment**: Upbeat and energizing music.
- **Negative Sentiment**: Relaxing or comforting music.
- **Neutral Sentiment**: Balanced or neutral music.

The system can deliver these recommendations in real-time, enhancing user experience based on their detected emotional state.

## Next Steps for Improvement
1. **Transformer Models**: In future iterations, models like **BERT** or **GPT** could replace LSTM to achieve higher accuracy in sentiment prediction.
2. **Multimodal Analysis**: We can combine text-based sentiment analysis with other inputs, such as **facial expressions** or **voice analysis**, to detect emotions more comprehensively.
3. **Personalized Recommendations**: The app can learn individual preferences over time and adjust its music recommendations for each user, making the experience more personalized.

## Conclusion
The **Base2 model** enhances the previous sentiment analysis by utilizing **LSTM** in a lightweight, efficient manner. This model balances **performance** and **efficiency** by using a simpler tokenizer and custom vocabulary while retaining the power of deep learning. It can be further improved and scaled by integrating advanced models or additional input modalities.


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from sklearn.model_selection import train_test_split
from collections import Counter

In [2]:
# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [27]:
# dummy data
data = [
    ("The food was delicious!", 1),
    ("This movie was so boring.", 0),
    ("I had a fantastic time at the concert.", 1),
    ("The service was terrible.", 0),
    ("The weather is perfect today.", 1),
    ("I'm not sure I like this new phone.", 2),
    ("This book is a real page-turner!", 1),
    ("The customer service was unhelpful.", 0),
    ("I'm feeling stressed out.", 2),
    ("The view from the mountain was breathtaking.", 1),
    ("This coffee is too strong.", 0),
    ("I'm really enjoying this new hobby.", 1),
    ("The flight was delayed for hours.", 2),
    ("I'm so excited for my upcoming trip.", 1),
    ("The game was a complete disaster.", 0),
    ("The presentation was well-organized.", 1),
    ("I'm disappointed with the outcome.", 2),
    ("I'm feeling very happy right now.", 1),
    ("The restaurant was overpriced.", 0),
    ("The meeting was productive.", 1),
    ("I'm feeling a bit under the weather.", 2),
    ("The show was hilarious!", 1),
    ("The product is faulty.", 0),
    ("The journey was enjoyable.", 1),
    ("I'm not impressed with the quality.", 2),
    ("The party was a huge success.", 1),
    ("The music was too loud.", 0),
    ("I'm looking forward to the weekend.", 1),
    ("The website is difficult to navigate.", 2),
    ("This is the best day ever!", 1),
    ("The design is outdated.", 0),
    ("I'm feeling very relaxed.", 1),
    ("The service was slow and inefficient.", 2),
    ("I love the new design!", 1),
    ("The food was bland and tasteless.", 0),
    ("I'm feeling energized and motivated.", 1),
    ("The instructions were confusing.", 2),
    ("The scenery was picturesque.", 1),
    ("The product is overpriced for what it offers.", 0),
    ("I'm feeling optimistic about the future.", 1),
    ("The communication was poor.", 2),
    ("This is a great opportunity.", 1),
    ("The movie was predictable and cliche.", 0),
    ("I'm feeling refreshed and rejuvenated.", 1),
    ("The event was poorly organized.", 2),
    ("I'm so proud of myself.", 1),
    ("The service was rude and unprofessional.", 0),
    ("I'm feeling grateful for everything I have.", 1),
    ("The website is unreliable.", 2),
    ("The team worked together effectively.", 1),
    ("The results were disappointing.", 0),
    ("I'm feeling calm and collected.", 1),
    ("The product is of low quality.", 2),
    ("The book was insightful and thought-provoking.", 1),
    ("The weather was unpleasant.", 0),
    ("I'm feeling happy and content.", 1),
    ("The customer service was excellent.", 2),
    ("The experience was unforgettable.", 1),
    ("The movie was predictable.", 0),
    ("I'm feeling inspired and motivated.", 1),
    ("The service was slow and inattentive.", 2),
    ("The meal was delicious and well-presented.", 1),
    ("The product is defective.", 0),
    ("I'm feeling optimistic and hopeful.", 1),
    ("The website is user-friendly.", 2),
    ("The scenery was stunning.", 1),
    ("The product is not worth the price.", 0),
    ("I'm feeling peaceful and serene.", 1),
    ("The communication was clear and concise.", 2),
    ("The event was a resounding success.", 1),
    ("The music was loud and distracting.", 0),
    ("I'm feeling excited for what the future holds.", 1),
    ("The website is slow to load.", 2),
    ("The team worked hard and achieved their goals.", 1),
    ("The outcome was unsatisfactory.", 0),
    ("I'm feeling relaxed and at ease.", 1),
    ("The product is durable and reliable.", 2),
    ("The concert was amazing!", 1),
    ("The food was cold and unappetizing.", 0),
    ("I'm feeling happy and carefree.", 1),
    ("The service was friendly and helpful.", 2),
    ("The product is innovative and groundbreaking.", 1),
    ("The experience was underwhelming.", 0),
    ("I'm feeling energized and ready for the day.", 1),
    ("The website is easy to use.", 2),
    ("The view was breathtaking.", 1),
    ("The product is overpriced.", 0),
    ("I'm feeling content and grateful.", 1),
    ("The communication was efficient and timely.", 2),
    ("The event was well-attended.", 1),
    ("The music was too loud and repetitive.", 0),
    ("I'm feeling optimistic about my future.", 1),
    ("The website is mobile-friendly.", 2),
    ("The team worked effectively to solve the problem.", 1),
    ("The results were disappointing.", 0),
    ("I'm feeling calm and focused.", 1),
    ("The product is of poor quality.", 2),
    ("The presentation was engaging and informative.", 1),
    ("The weather was stormy and unpleasant.", 0),
    ("I'm feeling happy and fulfilled.", 1),
    ("The customer service was exceptional.", 2),
    ("The experience was memorable and enjoyable.", 1),
    ("The movie was boring and predictable.", 0),
    ("I'm feeling inspired and creative.", 1),
    ("The service was slow and disorganized.", 2),
    ("The meal was delicious and well-prepared.", 1),
    ("The product is faulty and unreliable.", 0),
    ("I'm feeling hopeful for the future.", 1),
    ("The website is easy to navigate.", 2),
    ("The scenery was beautiful.", 1),
    ("The product is overpriced for the quality.", 0),
    ("I'm feeling peaceful and at ease.", 1),
    ("The communication was clear and effective.", 2),
    ("The event was a success.", 1),
    ("The music was too loud and aggressive.", 0),
    ("I'm feeling excited about the possibilities.", 1),
    ("The website is slow to load and unresponsive.", 2),
    ("The team worked together well.", 1),
    ("The outcome was disappointing.", 0),
    ("I'm feeling calm and collected.", 1),
    ("The product is of poor quality and unreliable.", 2),
    ("The presentation was well-organized and informative.", 1),
    ("The weather was terrible.", 0),
    ("I'm feeling happy and content.", 1),
    ("The customer service was excellent.", 2),
    ("The experience was unforgettable.", 1),
    ("The movie was poorly acted and directed.", 0),
    ("I'm feeling inspired and motivated.", 1),
    ("The service was slow and inefficient.", 2),
    ("The meal was delicious and well-presented.", 1),
    ("The product is faulty and defective.", 0),
    ("I'm feeling optimistic and hopeful.", 1),
    ("The website is user-friendly and easy to navigate.", 2),
    ("The scenery was stunning and breathtaking.", 1),
    ("The product is overpriced for what it offers.", 0),
    ("I'm feeling peaceful and serene.", 1),
    ("The communication was clear and concise.", 2),
    ("The event was a success.", 1),
    ("The music was too loud and repetitive.", 0),
    ("I'm feeling excited for what the future holds.", 1),
    ("The website is mobile-friendly and easy to use.", 2),
    ("The team worked effectively to achieve their goals.", 1),
    ("The results were disappointing.", 0),
    ("I'm feeling calm and focused.", 1),
    ("The product is of poor quality and unreliable.", 2),
    ("The presentation was engaging and informative.", 1),
    ("The weather was unpleasant.", 0),
    ("I'm feeling happy and fulfilled.", 1),
    ("The customer service was exceptional.", 2),
    ("The experience was memorable and enjoyable.", 1),
    ("The movie was poorly written.", 0),
    ("I'm feeling inspired and creative.", 1),
    ("The service was slow and disorganized.", 2),
    ("The meal was delicious and well-prepared.", 1),
    ("The product is faulty and defective.", 0),
    ("I'm feeling hopeful for the future.", 1),
    ("The website is easy to navigate and user-friendly.", 2),
    ("The scenery was beautiful and breathtaking.", 1),
    ("The product is overpriced for the quality.", 0),
    ("I'm feeling peaceful and at ease.", 1),
    ("The communication was clear and effective.", 2),
    ("The event was a success.", 1),
    ("The music was too loud and aggressive.", 0),
    ("I'm feeling excited about the possibilities.", 1),
    ("The website is slow to load and unresponsive.", 2),
    ("The team worked together well to achieve their goals.", 1),
    ("The outcome was disappointing.", 0),
    ("I'm feeling calm and collected.", 1),
    ("The product is of poor quality and unreliable.", 2),
    ("The presentation was well-organized and informative.", 1),
    ("The weather was terrible.", 0),
    ("I'm feeling happy and content.", 1),
    ("The customer service was excellent.", 2),
    ("The experience was unforgettable.", 1),
    ("The movie was poorly acted.", 0),
    ("I'm feeling inspired and motivated.", 1),
    ("The service was slow and inefficient.", 2),
    ("The meal was delicious and well-presented.", 1),
    ("The product is faulty and defective.", 0),
    ("I'm feeling optimistic and hopeful.", 1),
    ("The website is user-friendly and easy to navigate.", 2),
    ("The scenery was stunning and breathtaking.", 1),
    ("The product is overpriced for what it offers.", 0),
    ("I'm feeling peaceful and serene.", 1),
    ("The communication was clear and concise.", 2),
    ("The event was a success.", 1),
    ("The music was too loud and repetitive.", 0),
    ("I'm feeling excited for what the future holds.", 1),
    ("The website is mobile-friendly and easy to use.", 2),
    ("The team worked effectively to achieve their goals.", 1),
    ("The results were disappointing.", 0),
    ("I'm feeling calm and focused.", 1),
    ("The product is of poor quality and unreliable.", 2),
    ("The presentation was engaging and informative.", 1),
    ("The weather was unpleasant.", 0),
    ("I'm feeling happy and fulfilled.", 1),
    ("The customer service was exceptional.", 2),
    ("The experience was memorable and enjoyable.", 1),
    ("The movie was poorly written.", 0),
    ("I'm feeling inspired and creative.", 1),
    ("The service was slow and disorganized.", 2),
    ("The meal was delicious and well-prepared.", 1),
    ("The product is faulty and defective.", 0),
    ("I'm feeling hopeful for the future.", 1),
    ("The website is easy to navigate and user-friendly.", 2),
    ("The scenery was beautiful and breathtaking.", 1),
    ("The product is overpriced for the quality.", 0),
    ("I'm feeling peaceful and at ease.", 1),
    ("The communication was clear and effective.", 2),
    ("The event was a success.", 1),
    ("The music was too loud and aggressive.", 0),
    ("I'm feeling excited about the possibilities.", 1),
    ("The website is slow to load and unresponsive.", 2),
    ("The team worked together well to achieve their goals.", 1),
    ("The outcome was disappointing.", 0),
    ("I'm feeling calm and collected.", 1),
    ("The product is of poor quality and unreliable.", 2),
    ("The presentation was well-organized and informative.", 1),
    ("The weather was terrible.", 0),
    ("I'm feeling happy and content.", 1),
    ("The customer service was excellent.", 2),
    ("The experience was unforgettable.", 1),
    ("The movie was poorly acted.", 0),
    ("I'm feeling inspired and motivated.", 1),
    ("The service was slow and inefficient.", 2),
    ("The meal was delicious and well-presented.", 1),
    ("The product is faulty and defective.", 0),
    ("I'm feeling optimistic and hopeful.", 1),
    ("The website is user-friendly and easy to navigate.", 2),
    ("The scenery was stunning and breathtaking.", 1),
    ("The product is overpriced for what it offers.", 0),
    ("I'm feeling peaceful and serene.", 1),
    ("The communication was clear and concise.", 2),
    ("The event was a success.", 1),
    ("The music was too loud and repetitive.", 0),
    ("I'm feeling excited for what the future holds.", 1),
    ("The website is mobile-friendly and easy to use.", 2),
    ("The team worked effectively to achieve their goals.", 1),
    ("The results were disappointing.", 0),
    ("I'm feeling calm and focused.", 1),
    ("The product is of poor quality and unreliable.", 2)
]

In [28]:
# Basic whitespace tokenizer
def basic_tokenizer(text):
    return text.lower().split()

# Build custom vocabulary based on dataset
def build_vocab(dataset, tokenizer, max_vocab_size=50000):
    word_freq = Counter()
    for sentence, _ in dataset:
        tokens = tokenizer(sentence)
        word_freq.update(tokens)
    
    # Create vocab dict with most common words and assign indices
    vocab = {word: idx+2 for idx, (word, _) in enumerate(word_freq.most_common(max_vocab_size))}
    vocab["<pad>"] = 0  # Padding token
    vocab["<unk>"] = 1  # Unknown token
    return vocab

# Build vocabulary from data
vocab = build_vocab(data, basic_tokenizer)

In [29]:
# Dataset class for PyTorch
class SentimentDataset(Dataset):
    def __init__(self, data, tokenizer, vocab, max_len=50):
        self.data = data
        self.tokenizer = tokenizer
        self.vocab = vocab
        self.max_len = max_len
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        text, label = self.data[idx]
        tokens = self.tokenizer(text)
        indices = [self.vocab.get(token, self.vocab["<unk>"]) for token in tokens]
        if len(indices) < self.max_len:
            indices += [self.vocab["<pad>"]] * (self.max_len - len(indices))
        else:
            indices = indices[:self.max_len]
        return torch.tensor(indices), torch.tensor(label)

In [30]:
# Dataset and Dataloader
train_data, test_data = train_test_split(data, test_size=0.2)
train_dataset = SentimentDataset(train_data, basic_tokenizer, vocab, max_len=50)
test_dataset = SentimentDataset(test_data, basic_tokenizer, vocab, max_len=50)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32)


In [31]:
# Model definition
class SentimentLSTM(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
        super(SentimentLSTM, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        embedded = self.embedding(x)
        lstm_out, _ = self.lstm(embedded)
        output = self.fc(lstm_out[:, -1, :])
        return output


In [36]:
# Parameters
vocab_size = len(vocab)
embedding_dim = 50  # Reduced dimensionality
hidden_dim = 128
output_dim = 3  # For positive, negative, neutral

# Model, loss, and optimizer
model = SentimentLSTM(vocab_size, embedding_dim, hidden_dim, output_dim).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [37]:
# Training loop
for epoch in range(100):
    model.train()
    epoch_loss = 0
    for text, label in train_loader:
        text, label = text.to(device), label.to(device)
        optimizer.zero_grad()
        predictions = model(text)
        loss = criterion(predictions, label)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    
    print(f"Epoch {epoch+1}: Loss = {epoch_loss/len(train_loader)}")

Epoch 1: Loss = 1.0643535554409027
Epoch 2: Loss = 1.0398998061815898
Epoch 3: Loss = 1.0375523567199707
Epoch 4: Loss = 1.0369211435317993
Epoch 5: Loss = 1.0401580035686493
Epoch 6: Loss = 1.0348348716894786
Epoch 7: Loss = 1.0404391884803772
Epoch 8: Loss = 1.0372857451438904
Epoch 9: Loss = 1.0380591849486034
Epoch 10: Loss = 1.0367820759614308
Epoch 11: Loss = 1.0374658306439717
Epoch 12: Loss = 1.0381077031294506
Epoch 13: Loss = 1.0389847854773204
Epoch 14: Loss = 1.0384693443775177
Epoch 15: Loss = 1.0359791815280914
Epoch 16: Loss = 1.0388829211393993
Epoch 17: Loss = 1.0367944935957591
Epoch 18: Loss = 1.0376139481862385
Epoch 19: Loss = 1.0357813636461894
Epoch 20: Loss = 1.0363734165827434
Epoch 21: Loss = 1.038479596376419
Epoch 22: Loss = 1.0358784596125286
Epoch 23: Loss = 1.0362372001012166
Epoch 24: Loss = 1.0376643041769664
Epoch 25: Loss = 1.0375853975613911
Epoch 26: Loss = 1.0369685292243958
Epoch 27: Loss = 1.0375063220659893
Epoch 28: Loss = 1.0368236303329468
Ep

In [38]:
# Evaluation
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for text, label in test_loader:
        text, label = text.to(device), label.to(device)
        outputs = model(text)
        _, predicted = torch.max(outputs, 1)
        total += label.size(0)
        correct += (predicted == label).sum().item()

print(f"Test Accuracy: {100 * correct / total}%")

Test Accuracy: 46.93877551020408%


In [39]:
# Sample Input Sentiment Analysis
def analyze_sentiment(text):
    tokens = basic_tokenizer(text)
    indices = [vocab.get(token, vocab["<unk>"]) for token in tokens]
    if len(indices) < 50:
        indices += [vocab["<pad>"]] * (50 - len(indices))
    else:
        indices = indices[:50]
    input_tensor = torch.tensor(indices).unsqueeze(0).to(device)
    
    model.eval()
    with torch.no_grad():
        output = model(input_tensor)
        _, predicted = torch.max(output, 1)
    return predicted.item()


In [40]:
# Testing with new input
sample_input = "The music was too loud and repetitive"
sentiment = analyze_sentiment(sample_input)
print(f"Predicted Sentiment: {sentiment}")  # 1 -> Positive, 0 -> Negative, 2 -> Neutral

Predicted Sentiment: 1
