🎬 FilmFind - AI-Powered Semantic Movie Discovery Engine

Discover movies that match your mood, not just your keyword.

FilmFind is an AI-powered movie and TV series recommendation system that understands natural language queries, interprets emotional intent, and delivers personalized recommendations using semantic search, hybrid embeddings, and LLM-powered re-ranking.

🎯 Overview

FilmFind goes beyond traditional movie recommendation systems by:

Understanding Natural Language: Ask in plain English like "dark sci-fi movies like Interstellar with less romance"
Semantic Search: Uses vector embeddings to understand themes, tones, and emotions
Hybrid Intelligence: Combines semantic similarity, metadata filtering, and LLM reasoning
Explainable AI: Each recommendation comes with reasoning and match scores
Multi-Signal Scoring: Balances semantic similarity, popularity, ratings, and recency

Example Queries

"Shows like Stranger Things but with more horror elements"
"Movies about F1 racing with intense competition and personal rivalries"
"Mystery and magical adventures like Harry Potter with school settings"
"Lighthearted sitcoms like Friends about group of friends navigating life and relationships"
"Series like Dark"

✨ Key Features

🧠 Intelligent Query Understanding

Natural language processing (NLP) to extract intent, themes, and constraints
Emotion-aware classification across 8 emotional dimensions
Reference title detection and similarity matching
Multi-language support (English, Hindi, Korean, Telugu, etc.)

🔍 Advanced Search & Retrieval

Semantic Vector Search: FAISS-powered similarity search
Hybrid Embeddings: Combines plot, themes, genres, cast, and emotional vectors
Multi-Signal Ranking: Balances semantic similarity, popularity, ratings, and metadata
Smart Filtering: Year range, language, genre, streaming services, runtime

🤖 LLM-Powered Re-Ranking

Uses Groq API (Llama 3.1 70B) or Ollama for intelligent re-ranking
Contextual understanding of nuanced queries
Generates human-readable explanations for each recommendation
Cost-optimized with caching and free tier APIs

📊 Rich Metadata Integration

10,000+ movies and TV shows from TMDB
Cast, crew, keywords, genres, ratings, popularity
Streaming availability (Netflix, Prime, Disney+, etc.)
Posters, backdrops, trailers

🌟 What Makes FilmFind Unique

Comparison with Existing Platforms

Feature	Letterboxd	FilmCrave	MOVIERECS.AI	FilmFind
Natural Language Queries	❌ Basic search	❌ No	⚠️ Simple prompts	✅ Deep intent extraction
Semantic Vector Search	❌	❌	❌	✅ FAISS-powered
Emotion-Aware Matching	⚠️ User tags only	❌	❌	✅ 8-dimensional emotion vectors
LLM Re-Ranking	❌	❌	❌	✅ RAG with Llama 3.1
Complex Multi-Condition Queries	❌	❌	❌	✅ Fully supported
Explainable Recommendations	❌	❌	❌	✅ XAI with reasoning
Multi-Language Support	❌	❌	❌	✅ 10+ languages
Streaming Provider Filters	⚠️ Limited	❌	❌	✅ Full integration
Cost	Social only	Paid	Limited free	✅ 100% Free Tier

Core Differentiators

Emotion-Aware Engine: Scores movies across 8 emotional dimensions (Joy, Fear, Sadness, Awe, Thrill, Hope, Dark tone, Romance)
Hybrid Vector Embeddings: Combines semantic, emotional, genre, and cast vectors into a unified representation
LLM Query Rewrite: Transforms queries into optimized search vectors with theme extraction
Multi-Agent System: Specialized agents for intent, emotion, filtering, retrieval, and re-ranking
Explainable AI: Every recommendation includes thematic similarity %, emotional match %, and reasoning

🏗️ Architecture

FilmFind uses a multi-layered AI pipeline with the following components:

User Query → NLP Understanding → Semantic Retrieval → Multi-Signal Scoring → LLM Re-Ranking → Explainable Output

High-Level Architecture

Key Components

Data Pipeline
- TMDB API integration for movie metadata
- Embedding generation using sentence-transformers
- Vector database (FAISS) for similarity search
- PostgreSQL/Supabase for metadata storage
Intelligence Layer
- NLP Engine: Query parsing and intent extraction
- Embedding Service: Semantic vector generation
- Vector Search: FAISS similarity retrieval
- Scoring Engine: Multi-signal ranking
- LLM Re-Ranker: Contextual re-ranking with Groq/Ollama
API & Backend
- FastAPI REST API
- Redis caching (Upstash)
- Background jobs for data updates
- Rate limiting and monitoring
Frontend
- Next.js 14+ with App Router
- TailwindCSS + ShadCN UI
- Real-time search with debouncing
- Responsive design
Infrastructure
- AWS ECS (Docker) for backend
- Vercel for frontend
- AWS RDS PostgreSQL
- AWS S3 + CloudFront
- Upstash Redis

📊 System Diagrams

1. System Architecture

Complete end-to-end architecture showing all components and data flow.

2. Flow Diagram

High-level flow of data through the system from query to recommendations.

3. Detailed Flow Chart

Step-by-step processing pipeline with all validation and filtering stages.

4. Sequence Diagram

Interaction sequence between all system components during a search request.

🛠️ Tech Stack

Backend

Python 3.11+: Core language
FastAPI: High-performance async API framework
SQLAlchemy: ORM for database operations
PostgreSQL: Primary database (AWS RDS or Supabase)
FAISS: Vector similarity search (Facebook AI)
Redis: Caching layer (Upstash free tier)
APScheduler: Background job scheduling

AI/ML

sentence-transformers/all-mpnet-base-v2: Semantic embeddings (768-dim)
Groq API: LLM for query understanding and re-ranking (free tier: 30 req/min)
Ollama: Local LLM alternative (Llama 3.2, unlimited)
spaCy: NLP for text processing and entity extraction

Frontend

Next.js 14+: React framework with App Router
TypeScript: Type-safe JavaScript
TailwindCSS: Utility-first CSS framework
ShadCN UI: Beautiful accessible components
Zustand: Lightweight state management

DevOps & Infrastructure (FREE Tier)

Docker: Containerization
GitHub Actions: CI/CD pipeline (2,000 min/month free)
AWS ECS: Container orchestration (750 hours/month free for 12 months)
Vercel: Frontend hosting (free forever for hobby projects)
AWS RDS: Database (t3.micro, 750 hours/month free for 12 months)
AWS S3: Object storage (5GB free)
AWS CloudFront: CDN (1TB transfer/month free)
Sentry: Error monitoring (5k events/month free)

Data Sources

TMDB API: Movie metadata, cast, crew, keywords (free tier)
IMDb Datasets: Additional ratings and metadata (free on Kaggle)

🛠️ Backend Architecture Highlights

TMDB Service Module (`app/services/TMDB/`)

Following Single Responsibility Principle, we've separated the TMDB service into three focused modules:

1. TMDBAPIClient - HTTP Communication Layer

from app.services.TMDB import TMDBAPIClient

client = TMDBAPIClient(api_key="your_key")
movie = client.get_movie(movie_id=550)
popular = client.get_popular_movies(page=1)

✅ Handles all TMDB API requests
✅ Built-in rate limiting (40 requests/10s)
✅ Automatic error handling
✅ Uses HTTPClient utility for retry logic

2. TMDBDataValidator - Data Validation & Cleaning

from app.services.TMDB import TMDBDataValidator

validator = TMDBDataValidator()
is_valid = validator.validate_movie(raw_data)
cleaned = validator.clean_movie_data(raw_data)

✅ Validates required fields
✅ Normalizes data structure
✅ Handles missing/invalid dates
✅ Extracts genres, cast, keywords

3. TMDBService - High-Level Facade

from app.services.TMDB import TMDBService

with TMDBService() as service:
    movie = service.fetch_movie(550)           # Fetch + validate + clean
    popular = service.fetch_popular_movies()   # Batch fetch with validation
    genres = service.get_all_genres()

✅ Simple interface to complex operations
✅ Automatic validation and cleaning
✅ Context manager for resource cleanup
✅ Batch operations with pagination

Reusable Utilities (`app/utils/`)

We've built a comprehensive utilities module following SOLID principles:

HTTPClient - Generic HTTP wrapper with retry logic

from app.utils import HTTPClient

client = HTTPClient(base_url="https://api.example.com", timeout=30)
data = client.get_json("/endpoint", params={"key": "value"})

✅ Automatic retry with exponential backoff
✅ Built-in logging and error handling
✅ Context manager support
✅ Reusable across all services

RateLimiter - API rate limiting utility

from app.utils import RateLimiter

limiter = RateLimiter(max_requests=30, time_window=60)
limiter.check_and_wait()  # Automatically waits if limit exceeded

✅ Sliding window algorithm
✅ Configurable limits
✅ Thread-safe for single-threaded apps

Logger Setup - Consistent logging across the app

from app.utils import setup_logger, get_logger

setup_logger("logs/app.log", "INFO")
logger = get_logger(__name__)
logger.info("Application started")

✅ Console + file logging
✅ Log rotation (10 MB)
✅ Color-coded output

Retry Decorator - Exponential backoff for any function

from app.utils import retry_with_backoff

@retry_with_backoff(max_retries=3, initial_delay=1.0)
def fetch_data():
    return api.get("/data")

✅ Configurable retries
✅ Exponential backoff
✅ Custom exception handling

Constants Module (`app/core/constants.py`)

Centralized constants for better maintainability:

🔗 API URLs (TMDB, Groq, Ollama)
🎯 LLM models and configurations
📊 Scoring weights and dimensions
🗄️ Cache TTLs and key prefixes
🌍 Supported languages and genres
⚙️ All magic numbers and strings in one place

Design Patterns Used

✅ Single Responsibility Principle - Each class has one clear purpose
✅ Dependency Injection - Services don't create their dependencies
✅ Facade Pattern - Simple interfaces to complex subsystems
✅ Strategy Pattern - Multiple ingestion strategies
✅ Decorator Pattern - Retry logic via decorators
✅ Context Manager - Proper resource cleanup

🚀 Getting Started

Prerequisites

Python 3.11 or higher
Node.js 18+ and npm/yarn
PostgreSQL 14+ (or Supabase account)
Redis (local or Upstash account)
TMDB API key (free)
Groq API key (free tier)

Installation

1. Clone the Repository

git clone https://github.com/yourusername/filmfind.git
cd filmfind

2. Backend Setup

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create .env file
cat > .env << EOF
TMDB_API_KEY=your_tmdb_key
GROQ_API_KEY=your_groq_key
DATABASE_URL=postgresql://user:password@localhost:5432/filmfind
REDIS_URL=redis://localhost:6379
VECTOR_MODEL=sentence-transformers/all-mpnet-base-v2
LLM_PROVIDER=groq
EOF

3. Frontend Setup

cd ../frontend

# Install dependencies
npm install

# Create .env.local file
cat > .env.local << EOF
NEXT_PUBLIC_API_URL=http://localhost:8000
EOF

4. Database Setup

cd ../backend

# Run migrations
alembic upgrade head

# Optional: Seed with sample data
python scripts/seed_data.py

5. Data Ingestion

# Fetch movies from TMDB
python scripts/ingest_tmdb.py --limit 10000

# Generate embeddings
python scripts/generate_embeddings.py

# Build vector index
python scripts/build_index.py

6. Run the Application

# Terminal 1: Start backend
cd backend
uvicorn app.main:app --reload --port 8000

# Terminal 2: Start frontend
cd frontend
npm run dev

Open http://localhost:3000 in your browser.

📁 Project Structure

filmfind/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   │   ├── routes/
│   │   │   │   ├── search.py          # Search endpoints
│   │   │   │   ├── movies.py          # Movie detail endpoints
│   │   │   │   └── filters.py         # Filter endpoints
│   │   │   └── dependencies.py        # Dependency injection
│   │   ├── core/
│   │   │   ├── config.py              # Environment settings (Pydantic)
│   │   │   ├── constants.py           # Application constants ✨
│   │   │   ├── database.py            # Database connection
│   │   │   └── cache.py               # Redis cache wrapper
│   │   ├── models/
│   │   │   ├── movie.py               # Movie ORM models (SQLAlchemy)
│   │   │   └── user.py                # User models (optional)
│   │   ├── services/
│   │   │   ├── TMDB/                   # TMDB Service Module ✅ Module 1.1
│   │   │   │   ├── __init__.py         # Module exports
│   │   │   │   ├── tmdb_client.py      # API HTTP client (SRP)
│   │   │   │   ├── tmdb_validator.py   # Data validation & cleaning (SRP)
│   │   │   │   └── tmdb_service.py     # High-level facade
│   │   │   ├── embedding_service.py    # Embedding generation
│   │   │   ├── vector_search.py        # FAISS vector search
│   │   │   ├── query_parser.py         # NLP query parsing
│   │   │   ├── reranker.py             # LLM re-ranking
│   │   │   └── scoring_engine.py       # Multi-signal scoring
│   │   ├── schemas/
│   │   │   ├── search.py              # Search request/response (Pydantic)
│   │   │   └── movie.py               # Movie schemas (Pydantic)
│   │   ├── utils/                     # Reusable utilities ✨
│   │   │   ├── rate_limiter.py        # Rate limiting utility
│   │   │   ├── http_client.py         # HTTP client with retry
│   │   │   ├── logger.py              # Logging setup
│   │   │   └── retry.py               # Retry decorator
│   │   └── main.py                    # FastAPI app entry point
│   ├── scripts/
│   │   ├── ingest_tmdb.py             # Data ingestion ✅ Module 1.1
│   │   ├── generate_embeddings.py     # Embedding generation
│   │   └── build_index.py             # Vector index builder
│   ├── tests/
│   │   ├── test_search.py             # Search endpoint tests
│   │   └── test_embeddings.py         # Embedding tests
│   ├── data/
│   │   ├── raw/                       # Raw TMDB JSON data
│   │   ├── processed/                 # Cleaned data
│   │   └── embeddings/                # Vector embeddings
│   ├── logs/                          # Application logs
│   ├── requirements.txt               # Python dependencies
│   ├── .env.example                   # Environment template
│   ├── Dockerfile                     # Docker configuration
│   └── README.md                      # Backend documentation
│
├── frontend/
│   ├── app/
│   │   ├── page.tsx                   # Home page
│   │   ├── search/
│   │   │   └── page.tsx               # Search page
│   │   ├── movie/[id]/
│   │   │   └── page.tsx               # Movie detail page
│   │   └── layout.tsx                 # Root layout
│   ├── components/
│   │   ├── SearchBar.tsx              # Search input component
│   │   ├── MovieCard.tsx              # Movie card component
│   │   ├── FilterPanel.tsx            # Filter sidebar
│   │   └── ui/                        # ShadCN UI components
│   ├── lib/
│   │   ├── api.ts                     # API client
│   │   └── utils.ts                   # Utility functions
│   ├── hooks/
│   │   └── useSearch.ts               # Search hook
│   ├── package.json
│   └── next.config.js
│
├── images/                            # Architecture diagrams
│   ├── System Archeitecture.png
│   ├── Flow Diagram.png
│   ├── Flow Chart.png
│   └── Sequence-diagram.png
│
├── docs/
│   ├── architecture.md                # Architecture documentation
│   ├── api.md                         # API documentation
│   └── deployment.md                  # Deployment guide
│
├── .github/
│   └── workflows/
│       └── ci-cd.yml                  # GitHub Actions workflow
│
├── docker-compose.yml                 # Docker compose configuration
├── plan.md                            # Implementation plan
├── Project Overview                   # Technical design doc
└── README.md                          # This file

📚 API Documentation

Base URL

Development: http://localhost:8000
Production: https://api.filmfind.com

Endpoints

1. Search Movies

POST /api/search
Content-Type: application/json

{
  "query": "dark sci-fi movies like Interstellar with less romance",
  "limit": 10,
  "filters": {
    "year_min": 2010,
    "year_max": 2024,
    "language": "en",
    "genres": ["Science Fiction"]
  }
}

Response:

{
  "results": [
    {
      "id": 157336,
      "title": "Interstellar",
      "overview": "The adventures of a group of explorers...",
      "rating": 8.4,
      "match_score": 0.95,
      "similarity_explanation": "Strong thematic match: space exploration, time dilation...",
      "poster_url": "https://image.tmdb.org/...",
      "genres": ["Science Fiction", "Drama"],
      "release_date": "2014-11-07"
    }
  ],
  "count": 10,
  "query_interpretation": {
    "themes": ["space", "dark", "science fiction"],
    "excluded": ["romance"],
    "reference_movies": ["Interstellar"]
  }
}

2. Get Similar Movies

GET /api/similar/{movie_id}?limit=10

3. Get Movie Details

GET /api/movie/{movie_id}

4. Filter Movies

POST /api/filter
Content-Type: application/json

{
  "genres": ["Thriller", "Mystery"],
  "year_min": 2015,
  "rating_min": 7.0,
  "language": "en",
  "streaming_providers": ["Netflix", "Prime Video"]
}

5. Trending Movies

GET /api/trending?limit=20&time_window=week

For complete API documentation, see docs/api.md or visit /docs (Swagger UI) when running the backend.

🗓️ Future Enhancements

Mobile app (React Native)
Episode-level recommendations for TV shows
Real-time collaborative filtering
Multi-user social recommendations
Integration with more streaming services
Podcast and documentary support

🎯 Success Metrics

✅ Search response time < 500ms
✅ 90%+ relevant results for test queries
✅ Frontend Lighthouse score > 90
✅ API uptime > 99%
✅ 10,000+ movies indexed
✅ Support for 10+ languages
✅ Zero monthly costs (within free tier limits)
✅ Cache hit rate > 70%
✅ LLM calls within Groq free tier (30 req/min)

💡 Usage Examples

Example 1: Reference-Based Search

Query: "Shows like Stranger Things but with more horror elements"

FilmFind understands:

Reference: Stranger Things
Enhancement: More horror/darker tone
Themes: Supernatural, group of kids, 80s setting
Recommended: The Twilight Zone, Locke & Key, Dark, Archive 81

Example 2: Sports Drama Search

Query: "Movies about F1 racing with intense competition and personal rivalries"

FilmFind understands:

Themes: Formula 1, racing, competition, rivalry
Tone: Intense, dramatic
Sport: Motorsport/F1
Recommended: Rush, Ford v Ferrari, Senna, Grand Prix, Days of Thunder

Example 3: Fantasy Mystery Search

Query: "Mystery and magical adventures like Harry Potter with school settings"

FilmFind understands:

Reference: Harry Potter
Themes: Magic, mystery, coming-of-age
Setting: School/academy
Genre: Fantasy + Mystery
Recommended: The Chronicles of Narnia, Percy Jackson, His Dark Materials, The Magicians, A Discovery of Witches

Example 4: Sitcom Search

Query: "Lighthearted sitcoms like Friends about group of friends navigating life and relationships"

FilmFind understands:

Reference: Friends
Themes: Friendship, relationships, comedy of life
Tone: Lighthearted, feel-good
Genre: Sitcom
Recommended: How I Met Your Mother, New Girl, Brooklyn Nine-Nine, The Big Bang Theory, Modern Family

🤝 Contributing

We welcome contributions! Here's how you can help:

Getting Started

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Write clear, commented code
Follow PEP 8 for Python code
Use TypeScript for frontend code
Write tests for new features
Update documentation as needed

Areas for Contribution

🐛 Bug fixes
✨ New features
📝 Documentation improvements
🎨 UI/UX enhancements
🔧 Performance optimizations
🧪 Test coverage
🌍 Internationalization

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Dheeraj Srirama

🙏 Acknowledgments

TMDB for the comprehensive movie database API
Groq for providing free tier LLM API access
Sentence Transformers for excellent embedding models
FastAPI for the amazing Python framework
Next.js for the powerful React framework
ShadCN UI for beautiful UI components

👨‍💻 Author

Dheeraj Srirama

🌐 Portfolio: dheerajsrirama.netlify.app
💼 LinkedIn: dheerajsrirama
🐙 GitHub: @dheerajram13
📧 Email: sriramadheeraj@gmail.com

📞 Support

If you have any questions, issues, or suggestions:

🐛 Issues: GitHub Issues

⭐ Show Your Support

If you find FilmFind useful, please consider:

Giving it a ⭐ on GitHub
Sharing it with others
Contributing to the project
Reporting bugs and suggesting features

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
backend		backend
data		data
frontend		frontend
images		images
.editorconfig		.editorconfig
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

dheerajram13/FilmFind

Folders and files

Latest commit

History

Repository files navigation

🎬 FilmFind - AI-Powered Semantic Movie Discovery Engine

📖 Table of Contents

🎯 Overview

Example Queries

✨ Key Features

🧠 Intelligent Query Understanding

🔍 Advanced Search & Retrieval

🤖 LLM-Powered Re-Ranking

📊 Rich Metadata Integration

🌟 What Makes FilmFind Unique

Comparison with Existing Platforms

Core Differentiators

🏗️ Architecture

High-Level Architecture

Key Components

📊 System Diagrams

1. System Architecture

2. Flow Diagram

3. Detailed Flow Chart

4. Sequence Diagram

🛠️ Tech Stack

Backend

AI/ML

Frontend

DevOps & Infrastructure (FREE Tier)

Data Sources

🛠️ Backend Architecture Highlights

TMDB Service Module (app/services/TMDB/)

1. TMDBAPIClient - HTTP Communication Layer

2. TMDBDataValidator - Data Validation & Cleaning

3. TMDBService - High-Level Facade

Reusable Utilities (app/utils/)

HTTPClient - Generic HTTP wrapper with retry logic

RateLimiter - API rate limiting utility

Logger Setup - Consistent logging across the app

Retry Decorator - Exponential backoff for any function

Constants Module (app/core/constants.py)

Design Patterns Used

🚀 Getting Started

Prerequisites

Installation

1. Clone the Repository

2. Backend Setup

3. Frontend Setup

4. Database Setup

5. Data Ingestion

6. Run the Application

📁 Project Structure

📚 API Documentation

Base URL

Endpoints

1. Search Movies

2. Get Similar Movies

3. Get Movie Details

4. Filter Movies

5. Trending Movies

🗓️ Future Enhancements

🎯 Success Metrics

💡 Usage Examples

Example 1: Reference-Based Search

Example 2: Sports Drama Search

Example 3: Fantasy Mystery Search

Example 4: Sitcom Search

🤝 Contributing

Getting Started

Development Guidelines

Areas for Contribution

📄 License

👨‍💻 Author

🙏 Acknowledgments

📞 Support

⭐ Show Your Support

About

Topics

TMDB Service Module (`app/services/TMDB/`)

Reusable Utilities (`app/utils/`)

Constants Module (`app/core/constants.py`)

Packages