# Minimal Movie Recommendation System

A streamlined movie recommendation system using GPT-2 for text generation.

## What this system does:
1. **Trains** a GPT-2 model on movie data
2. **Generates** movie recommendations based on user input
3. **Provides** simple search interface for movie preferences

## How to search for movies:
- **By Genre**: "action movies", "horror films", "romantic comedies"
- **By Year**: "movies from 1990s", "films from 2000s"
- **By Style**: "family-friendly movies", "thriller films", "animated movies"
- **Combined**: "action movies from 1990s", "romantic comedies for date night"

## Best search practices:
- Keep it simple: "comedy movies" works better than "extremely funny comedic films"
- Use common genres: action, comedy, drama, horror, romance, sci-fi, thriller
- Include time periods: "1980s", "1990s", "2000s"
- Be specific: "horror movies for Halloween" vs just "movies"

In [None]:
# Setup - Install required packages
!pip install torch transformers datasets pandas numpy

# Import essential libraries
import torch
import pandas as pd
import numpy as np
from transformers import GPT2LMHeadModel, GPT2Tokenizer, TrainingArguments, Trainer, DataCollatorForLanguageModeling
from datasets import Dataset
import warnings
warnings.filterwarnings('ignore')

print("✅ Setup complete!")
print(f"🔥 CUDA available: {torch.cuda.is_available()}")
print(f"🎯 Device: {'GPU' if torch.cuda.is_available() else 'CPU'}")

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [None]:
# Simple Movie Data Processing
def create_movie_data():
    """Create sample movie data for training"""
    movies = [
        ['The Matrix', 1999, 'Action,Sci-Fi', 136],
        ['Titanic', 1997, 'Drama,Romance', 194],
        ['The Godfather', 1972, 'Crime,Drama', 175],
        ['Pulp Fiction', 1994, 'Crime,Drama', 154],
        ['Forrest Gump', 1994, 'Drama,Romance', 142],
        ['The Lion King', 1994, 'Animation,Family', 88],
        ['Jurassic Park', 1993, 'Adventure,Sci-Fi', 127],
        ['Terminator 2', 1991, 'Action,Sci-Fi', 137],
        ['Home Alone', 1990, 'Comedy,Family', 103],
        ['Ghost', 1990, 'Drama,Romance', 127],
        ['Goodfellas', 1990, 'Crime,Drama', 146],
        ['Back to the Future', 1985, 'Adventure,Comedy,Sci-Fi', 116],
        ['The Silence of the Lambs', 1991, 'Crime,Drama,Thriller', 118],
        ['Raiders of the Lost Ark', 1981, 'Action,Adventure', 115],
        ['E.T.', 1982, 'Family,Sci-Fi', 115],
        ['Casablanca', 1942, 'Drama,Romance,War', 102],
        ['Star Wars', 1977, 'Action,Adventure,Fantasy', 121],
        ['The Shawshank Redemption', 1994, 'Drama', 142],
        ['Schindlers List', 1993, 'Biography,Drama,History', 195],
        ['Jaws', 1975, 'Adventure,Drama,Thriller', 124],
        ['The Exorcist', 1973, 'Horror', 122],
        ['Halloween', 1978, 'Horror,Thriller', 91],
        ['A Nightmare on Elm Street', 1984, 'Horror', 101],
        ['Friday the 13th', 1980, 'Horror,Mystery,Thriller', 95],
        ['The Shining', 1980, 'Drama,Horror', 146]
    ]

    df = pd.DataFrame(movies, columns=['title', 'year', 'genres', 'runtime'])
    return df

def create_training_texts(df):
    """Create training texts from movie data"""
    texts = []

    for _, row in df.iterrows():
        title = row['title']
        year = row['year']
        genres = row['genres'].replace(',', ', ')
        runtime = row['runtime']

        # Create different text formats
        texts.append(f"I recommend {title}, a {genres.lower()} movie from {year}.")
        texts.append(f"Looking for {genres.lower()} movies? Try {title} from {year}.")
        texts.append(f"{title} is a great {year} {genres.lower()} film.")
        texts.append(f"For {genres.lower()} fans, {title} ({year}) is perfect.")

    return texts

# Load and process data
movie_df = create_movie_data()
training_texts = create_training_texts(movie_df)

print(f"✅ Loaded {len(movie_df)} movies")
print(f"✅ Created {len(training_texts)} training texts")
print(f"\n📖 Sample training text: {training_texts[0]}")

✅ Loaded 25 movies
✅ Created 100 training texts

📖 Sample training text: I recommend The Matrix, a action, sci-fi movie from 1999.


In [None]:
# Model Training - Minimal Setup (Fixed all parameter errors)
import os
os.environ["WANDB_DISABLED"] = "true"  # Disable wandb logging

class MovieRecommendationModel:
    def __init__(self):
        self.tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
        self.model = GPT2LMHeadModel.from_pretrained('gpt2')
        self.tokenizer.pad_token = self.tokenizer.eos_token
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.model.to(self.device)

    def prepare_data(self, texts):
        """Prepare training data"""
        def tokenize(examples):
            return self.tokenizer(examples['text'], truncation=True, padding='max_length', max_length=128)

        dataset = Dataset.from_dict({'text': texts})
        tokenized = dataset.map(tokenize, batched=True)
        tokenized = tokenized.map(lambda x: {'labels': x['input_ids']}, batched=True)
        return tokenized

    def train(self, texts, epochs=1):
        """Train the model"""
        dataset = self.prepare_data(texts)

        # Minimal training arguments - only core parameters
        training_args = TrainingArguments(
            output_dir='./movie_model',
            num_train_epochs=epochs,
            per_device_train_batch_size=2,
            learning_rate=5e-5,
            logging_steps=10,
            save_steps=100,
            report_to=[]  # Disable all logging including wandb
        )

        trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=dataset,
            data_collator=DataCollatorForLanguageModeling(tokenizer=self.tokenizer, mlm=False)
        )

        trainer.train()

    def generate_recommendation(self, user_input):
        """Generate movie recommendation"""
        prompts = [
            f"Looking for {user_input}? I recommend",
            f"If you like {user_input}, try",
            f"For {user_input}, watch"
        ]

        recommendations = []

        for prompt in prompts:
            inputs = self.tokenizer(prompt, return_tensors='pt').to(self.device)

            with torch.no_grad():
                outputs = self.model.generate(
                    inputs.input_ids,
                    max_length=100,
                    temperature=0.7,
                    do_sample=True,
                    pad_token_id=self.tokenizer.eos_token_id
                )

            generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            recommendation = generated_text[len(prompt):].strip()
            recommendations.append(f"{prompt} {recommendation}")

        return recommendations

# Initialize and train model
print("🚀 Initializing model...")
model = MovieRecommendationModel()

print("🔥 Training model (this may take a few minutes)...")
model.train(training_texts, epochs=1)

print("✅ Model training complete!")
print("🎬 Ready to generate movie recommendations!")

🚀 Initializing model...
🔥 Training model (this may take a few minutes)...


Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss
10,3.8324
20,2.7799
30,2.3185
40,1.78
50,1.6433


✅ Model training complete!
🎬 Ready to generate movie recommendations!


In [None]:
# Movie Recommendation System - Simple Interface

def get_movie_recommendations(user_request):
    """Get movie recommendations based on user input"""
    print(f"\n🎬 Finding movies for: '{user_request}'")
    print("🤖 AI Recommendations:")
    print("-" * 40)

    try:
        recommendations = model.generate_recommendation(user_request)

        for i, rec in enumerate(recommendations, 1):
            # Clean up the text
            clean_rec = rec.replace('\n', ' ').strip()
            print(f"   {i}. {clean_rec}")

        print("-" * 40)
        return recommendations

    except Exception as e:
        print(f"❌ Error: {e}")
        return []

# Clean Movie Recommendation Interface

def get_clean_recommendations(user_request):
    """Get clean, easy-to-read movie recommendations"""
    print(f"\n🎯 RECOMMENDATIONS FOR: {user_request.upper()}")
    print("=" * 40)

    try:
        recommendations = model.generate_recommendation(user_request)

        # Extract and display clean movie info
        movies_found = []
        for rec in recommendations:
            text = rec.lower()

            # Look for movies from our training data
            for _, row in movie_df.iterrows():
                movie_title = row['title'].lower()
                if movie_title in text:
                    movie_info = {
                        'title': row['title'],
                        'year': row['year'],
                        'genres': row['genres'].replace(',', ' • ')
                    }
                    if movie_info not in movies_found:
                        movies_found.append(movie_info)
                    break

        # Display clean movie recommendations
        if movies_found:
            for i, movie in enumerate(movies_found[:3], 1):
                print(f"🎬 {i}. {movie['title']} ({movie['year']})")
                print(f"   📁 {movie['genres']}")
                print()
        else:
            # Fallback: clean AI text
            for i, rec in enumerate(recommendations[:3], 1):
                clean_text = rec.split('?')[1] if '?' in rec else rec
                clean_text = clean_text.replace('I recommend', '').replace('try', '').replace('watch', '')
                clean_text = clean_text.strip().strip(',').strip('.')
                if clean_text:
                    print(f"🎬 {i}. {clean_text}")
            print()

        print("=" * 40)
        return movies_found if movies_found else recommendations

    except Exception as e:
        print(f"❌ Error: {e}")
        return []

# Test the system with various movie types
print("🎯 TESTING THE RECOMMENDATION SYSTEM")
print("=" * 45)

# Test with different movie preferences including action
test_requests = [
    "action movies",
    "action movies from 1990s",
    "horror movies",
    "romantic comedies",
    "sci-fi films",
    "comedy movies",
    "thriller movies"
]

for request in test_requests:
    get_movie_recommendations(request)

print("\n✅ System is working! Ready for your movie requests.")
print("🎬 Try the interactive system in the next cell!")

# Test the clean system
print("🎯 TESTING CLEAN RECOMMENDATION SYSTEM")
print("=" * 45)

# Test with popular movie types
test_requests = [
    "action movies",
    "horror movies",
    "romantic comedies",
    "sci-fi films"
]

for request in test_requests:
    get_clean_recommendations(request)

print("✅ Clean system working! Try the interactive version in the next cell!")

🎯 TESTING THE RECOMMENDATION SYSTEM

🎬 Finding movies for: 'action movies'
🤖 AI Recommendations:
----------------------------------------
   1. Looking for action movies? I recommend The Legend from 1991.  The Legend is an action, romance, thriller. The movie is perfect. (1991). The Legend is perfect. (1992.  A great 1980 romantic. (1993). (1977).  The Last of Us (1992). (1993). (1993). (1993).  The Legend of Ron & Cher (1983). (1982). (1982.) (1993). (1982).  This movie is perfect
   2. If you like action movies, try The Godfather from 1973. It is a great 1973. If you had any interest, you will love. (1982) (1990) (1979) (1989) (1973) (1987) (1990) (1993) (1990) (1982) (1987) (1982) (1987) (1986) (1987) (1990) (1987) (1982) (1990) (1982) (1982) (1983) (1982) (1983
   3. For action movies, watch the first two from 1993. The movie is perfect. It, uh, scares, comedy. It is perfect. The best. movie from 1993. The best horror movies from 1993. Yes, 1990. The best horror movies from 1990. 1

In [None]:
# 🎬 CLEAN MOVIE RECOMMENDATION SYSTEM 🎬

print("🎬 MOVIE RECOMMENDATION SYSTEM 🎬")
print("=" * 50)
print("✨ Get clean, easy-to-read movie recommendations!")
print("=" * 50)

def clean_movie_recommender():
    """Clean, minimalistic movie recommendation interface"""

    print("\n🎯 POPULAR SEARCHES:")
    print("• action movies    • horror movies    • comedy movies")
    print("• sci-fi films     • romance movies   • thriller movies")
    print("• drama movies     • animated movies  • adventure movies")

    print("\n💡 TIP: Try adding years like 'action movies from 1990s'")
    print("=" * 50)

    while True:
        try:
            # Get user input
            user_request = input("\n🎬 What movies do you want? (or 'quit'): ").strip()

            # Check if user wants to quit
            if user_request.lower() in ['quit', 'exit', 'stop', 'q']:
                print("\n👋 Thanks for using the system!")
                break

            # Check if input is empty
            if not user_request:
                print("🤔 Please tell me what movies you want!")
                continue

            # Get recommendations
            print(f"\n🎯 RECOMMENDATIONS FOR: {user_request.upper()}")
            print("=" * 50)

            recommendations = model.generate_recommendation(user_request)

            # Extract and display clean movie info
            movies_found = []
            for rec in recommendations:
                # Try to extract movie information from generated text
                text = rec.lower()

                # Look for movie patterns in our training data
                for _, row in movie_df.iterrows():
                    movie_title = row['title'].lower()
                    if movie_title in text:
                        movie_info = {
                            'title': row['title'],
                            'year': row['year'],
                            'genres': row['genres'].replace(',', ' • ')
                        }
                        if movie_info not in movies_found:
                            movies_found.append(movie_info)
                        break

            # If we found specific movies, display them cleanly
            if movies_found:
                for i, movie in enumerate(movies_found[:3], 1):
                    print(f"🎬 {i}. {movie['title']} ({movie['year']})")
                    print(f"   📁 {movie['genres']}")
                    print()
            else:
                # Fallback: show clean AI-generated recommendations
                print("🤖 AI SUGGESTIONS:")
                for i, rec in enumerate(recommendations[:3], 1):
                    # Clean up the generated text
                    clean_text = rec.split('?')[1] if '?' in rec else rec
                    clean_text = clean_text.replace('I recommend', '').replace('try', '').replace('watch', '')
                    clean_text = clean_text.strip().strip(',').strip('.')
                    if clean_text:
                        print(f"🎬 {i}. {clean_text}")
                print()

            print("=" * 50)
            print("💡 Want more? Just type another movie type!")

        except KeyboardInterrupt:
            print("\n\n👋 Thanks for using the system!")
            break
        except Exception as e:
            print(f"❌ Error: {e}")
            print("🔄 Try again with a simpler request!")

# 🚀 START THE CLEAN SYSTEM
print("\n🚀 Starting Clean Movie Recommendation System...")

# Call the clean function
clean_movie_recommender()

print("\n🎉 Session ended! Run again for more recommendations!")

🎬 MOVIE RECOMMENDATION SYSTEM 🎬
✨ Get clean, easy-to-read movie recommendations!

🚀 Starting Clean Movie Recommendation System...

🎯 POPULAR SEARCHES:
• action movies    • horror movies    • comedy movies
• sci-fi films     • romance movies   • thriller movies
• drama movies     • animated movies  • adventure movies

💡 TIP: Try adding years like 'action movies from 1990s'

🎬 What movies do you want? (or 'quit'): action movies

🎯 RECOMMENDATIONS FOR: ACTION MOVIES
🎬 1. A Nightmare on Elm Street (1984)
   📁 Horror

🎬 2. Star Wars (1977)
   📁 Action • Adventure • Fantasy

🎬 3. The Shawshank Redemption (1994)
   📁 Drama

💡 Want more? Just type another movie type!


👋 Thanks for using the system!

🎉 Session ended! Run again for more recommendations!


# 🎬 How to Use This System

## 🚀 Quick Start:
1. **Run all cells** from top to bottom
2. **Edit the last cell** - change `YOUR_REQUEST = "your movie preference"`
3. **Run the last cell** to get recommendations

## 🔍 Best Search Terms:

### **By Genre:**
- `"action movies"` - Action films
- `"horror movies"` - Scary films
- `"romantic comedies"` - Funny love stories
- `"sci-fi films"` - Science fiction
- `"drama movies"` - Dramatic films
- `"comedy films"` - Funny movies
- `"thriller movies"` - Suspenseful films

### **By Time Period:**
- `"movies from 1980s"` - Films from the 80s
- `"movies from 1990s"` - Films from the 90s
- `"movies from 2000s"` - Films from the 2000s

### **By Style:**
- `"family-friendly movies"` - Good for all ages
- `"animated films"` - Cartoon movies
- `"adventure movies"` - Action-adventure films

### **Combined Searches:**
- `"action movies from 1990s"` - 90s action films
- `"horror movies for Halloween"` - Scary movies for Halloween
- `"romantic movies for date night"` - Romance films for couples

## ✅ What You'll Get:
- **3 AI-generated recommendations** for each search
- **Natural language responses** from the AI
- **Instant results** - no waiting

## 🎯 Tips for Better Results:
- **Keep it simple**: "comedy movies" works better than "extremely hilarious films"
- **Use common genres**: action, comedy, drama, horror, romance, sci-fi
- **Include time periods**: "1980s", "1990s", "2000s"
- **Be specific**: "horror movies" is better than just "movies"

## 🔄 To Try Different Movies:
1. Go to the last cell
2. Change `YOUR_REQUEST = "new movie preference"`
3. Run the cell again
4. Get new recommendations!

**Example**: Change `YOUR_REQUEST = "action movies from 1980s"` and run the cell.