# Sentiment Analysis with Neural DSL

This tutorial demonstrates how to build a sentiment analysis model using Neural DSL and LSTMs.

## Overview
- Build an LSTM-based sentiment classifier
- Process text data with embeddings
- Train on movie reviews (IMDB dataset)
- Evaluate and interpret results

## Setup

In [None]:
import os
import sys
import numpy as np
import matplotlib.pyplot as plt

from neural.parser.parser import create_parser, ModelTransformer
from neural.code_generation.code_generator import generate_code

## Define the Model

In [None]:
dsl_code = """
network SentimentAnalyzer {
  input: (None, 200)
  
  layers:
    Embedding(input_dim=20000, output_dim=128)
    Dropout(rate=0.2)
    LSTM(units=128, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)
    LSTM(units=64, dropout=0.2, recurrent_dropout=0.2)
    Dense(units=64, activation="relu")
    Dropout(rate=0.5)
    Dense(units=32, activation="relu")
    Dropout(rate=0.5)
    Output(units=1, activation="sigmoid")

  loss: "binary_crossentropy"
  optimizer: Adam(learning_rate=0.001)
  metrics: ["accuracy"]

  train {
    epochs: 10
    batch_size: 128
    validation_split: 0.2
  }
}
"""

with open('sentiment_analyzer.neural', 'w') as f:
    f.write(dsl_code)

print("Sentiment analysis model defined!")

## Compile the Model

In [None]:
!neural compile sentiment_analyzer.neural --backend tensorflow --output sentiment_analyzer_tf.py
print("Model compiled successfully!")

## Load and Prepare IMDB Data

In [None]:
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.preprocessing import sequence
    
    # Load IMDB dataset
    max_features = 20000
    maxlen = 200
    
    print("Loading IMDB dataset...")
    (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(
        num_words=max_features
    )
    
    print(f"Training sequences: {len(x_train)}")
    print(f"Test sequences: {len(x_test)}")
    
    # Pad sequences
    print("Padding sequences...")
    x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
    x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
    
    print(f"x_train shape: {x_train.shape}")
    print(f"x_test shape: {x_test.shape}")
    
    # Display distribution
    print(f"\nPositive samples in training: {np.sum(y_train)} ({np.mean(y_train)*100:.1f}%)")
    print(f"Negative samples in training: {len(y_train) - np.sum(y_train)} ({(1-np.mean(y_train))*100:.1f}%)")
    
except ImportError:
    print("TensorFlow not installed. Install with: pip install tensorflow")

## Visualize Sample Reviews

In [None]:
try:
    # Get word index
    word_index = keras.datasets.imdb.get_word_index()
    reverse_word_index = {v: k for k, v in word_index.items()}
    
    def decode_review(encoded_review):
        return ' '.join([reverse_word_index.get(i - 3, '?') for i in encoded_review])
    
    # Display a few reviews
    for i in range(3):
        print(f"\n{'='*80}")
        print(f"Review {i+1} - Sentiment: {'Positive' if y_train[i] == 1 else 'Negative'}")
        print(f"{'='*80}")
        print(decode_review(x_train[i])[:500])
        
except Exception as e:
    print(f"Error displaying reviews: {e}")

## Visualize Model Architecture

In [None]:
!neural visualize sentiment_analyzer.neural --format html
print("Visualization saved!")

## Train the Model

In [None]:
# Option 1: Use generated code
# exec(open('sentiment_analyzer_tf.py').read())

# Option 2: Use CLI
!neural run sentiment_analyzer_tf.py --backend tensorflow

## Evaluate Model

In [None]:
# Example evaluation (adapt based on your training)
# test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
# print(f'\nTest accuracy: {test_acc:.4f}')
# print(f'Test loss: {test_loss:.4f}')

## Make Predictions

In [None]:
# predictions = model.predict(x_test[:10])

# for i in range(10):
#     sentiment = "Positive" if predictions[i] > 0.5 else "Negative"
#     actual = "Positive" if y_test[i] == 1 else "Negative"
#     confidence = predictions[i][0] if predictions[i] > 0.5 else 1 - predictions[i][0]
#     
#     print(f"\nReview {i+1}:")
#     print(f"Predicted: {sentiment} ({confidence:.2%} confidence)")
#     print(f"Actual: {actual}")
#     print(decode_review(x_test[i])[:200])

## Test Custom Reviews

In [None]:
def predict_sentiment(review_text, model, max_features=20000, maxlen=200):
    # Tokenize and encode
    tokens = review_text.lower().split()
    encoded = [word_index.get(word, 0) for word in tokens]
    padded = sequence.pad_sequences([encoded], maxlen=maxlen)
    
    # Predict
    prediction = model.predict(padded)[0][0]
    sentiment = "Positive" if prediction > 0.5 else "Negative"
    confidence = prediction if prediction > 0.5 else 1 - prediction
    
    return sentiment, confidence

# Test with custom reviews
test_reviews = [
    "This movie was absolutely fantastic! I loved every minute of it.",
    "Terrible waste of time. The plot was confusing and the acting was poor.",
    "It was okay, nothing special but not terrible either."
]

# for review in test_reviews:
#     sentiment, confidence = predict_sentiment(review, model)
#     print(f"\nReview: {review}")
#     print(f"Sentiment: {sentiment} ({confidence:.2%} confidence)")

## Hyperparameter Optimization

In [None]:
!neural compile sentiment_analyzer.neural --backend tensorflow --hpo

## Debug with NeuralDbg

In [None]:
print("To debug, run in terminal:")
print("neural debug sentiment_analyzer.neural --backend tensorflow --dashboard --port 8050")

## Summary

In this tutorial, we:
1. Built an LSTM-based sentiment classifier
2. Processed text data with embeddings
3. Trained on IMDB movie reviews
4. Made predictions on custom text

## Next Steps
- Try bidirectional LSTMs
- Experiment with GRU layers
- Use pre-trained word embeddings (Word2Vec, GloVe)
- Build multi-class sentiment classifiers
- Explore attention mechanisms