#  Movie Database Analysis using MongoDB Atlas - Project Report  
### **Group 15**
**Sweta Behera (055051), Vibhavari Saran (055055)**


##**1] Project Information & Data Source**

### **Project Name:**  
**Movie Database Analysis using MongoDB Atlas**  

### **Data Source:**  
- The movie dataset is derived from **publicly available movie databases**, including IMDb, TMDb, and Kaggle.  
- It is stored as **structured JSON documents** within **MongoDB Atlas**.  
- **Key Fields in the Dataset:**  
  - `title` – Movie title  
  - `year` – Year of release  
  - `genre` – List of genres  
  - `rating` – IMDb rating  
  - `actors` – List of actors  
  - `director` – Director's name  
  - `box_office` – Revenue (USD)  
  - `runtime` – Duration (minutes)  

## **2] Problem Statements**

### **1️. Evolution of Movie Production Over Time**  
The **annual number of movie releases** has significantly increased.  
What are the key **historical trends in movie production**?  
How have **runtime, genre preferences, and formats** evolved over the decades?  

### **2️. Genre Popularity & Trends**  
Audience preferences vary across different genres.  
While some genres receive **critical acclaim**, others generate **higher box office revenue**.  
How has **genre popularity evolved over time**, and which genres consistently perform well?  

### **3️. Influence of Actors & Directors on Movie Performance**  
Certain actors and directors have a track record of delivering **highly rated or commercially successful films**.  
Which **actors and directors** have the most significant impact on a movie’s performance?  
Do some actors thrive in specific genres?  

### **4️. Identifying Movie Success Factors**  
The movie industry is highly competitive, making it essential to understand what drives success.  
Key elements such as **IMDb ratings, genre, cast, director, and box office revenue** significantly impact a movie’s performance.  
How can we determine the **key attributes** that contribute to a movie's success?  

### **5️. Movie Ratings vs. Box Office Performance**  
A high IMDb rating does not always guarantee strong financial success.  
Which **genres and movie types** achieve both **high ratings and strong revenue**?  
Is there a **correlation between IMDb ratings and box office earnings**?  

### **6️. Optimal Movie Runtime for Audience Engagement**  
Longer movies provide **greater storytelling depth**, while shorter ones may better retain audience attention.  
What is the **ideal movie runtime**, and does it vary by genre?  
How does runtime influence **IMDb ratings and box office success**?  

### **7️. Awards & Critical Acclaim**  
Certain genres tend to receive **more awards**, boosting their prestige and market appeal.  
Which **genres and movie types** have historically won the most **awards**?  
How does **award recognition relate to IMDb ratings and box office revenue**?  

### **8️. Impact of Streaming Platforms on Movie Trends**  
The rise of **Netflix, Amazon Prime, and Disney+** has transformed the way movies are produced and consumed.  
How has the **growth of streaming services** influenced **genre preferences, runtimes, and audience behavior**?  

### **9️. Data-Driven Recommendations for Movie Studios & Streaming Platforms**  
Using analytical insights, how can **production houses and streaming platforms**:  
- Optimize **genre selection** for maximum audience engagement?  
- Make **strategic casting and director choices**?  
- Balance **box office success with critical acclaim**?  



##**3] Objective**

This project aims to:  
- **Analyze trends in movie data** using MongoDB queries and visualizations.  
- **Identify top-performing movies** based on ratings, genres, and revenue.  
- **Discover patterns in movie releases** (year-wise trends, genre popularity, actor collaborations).  
- **Understand the key factors influencing movie success** (rating vs. revenue, genre impact).  
- **Provide actionable insights** to improve movie production strategies.  

##**4] Analysis & Queries**

##**Basic Queries (Find/Search)**<br>
These queries retrieve movies based on specific criteria.

- Find all movies directed by "Christopher Nolan" → 9 results
  - Returns a list of movies where "director": "Christopher Nolan".
  - Includes movies like Inception, Interstellar, and The Dark Knight.
  - Highlights Nolan’s consistency in producing successful films.
  - Useful for analyzing his filmography and success metrics.

- Find movies with "Batman" in the title → 6 results
  - Searches for titles containing "Batman", possibly using a regex pattern.
  - Includes Batman Begins, The Dark Knight, and The Batman.
  - Helps analyze Batman movies across different years and directors.

- Find movies released in the year 2010 → 12 results
  - Retrieves movies where "year": 2010.
  - Includes Inception, Shutter Island, and The Social Network.
  - Useful for understanding major releases that year and industry trends in that year.

- Find all movies with an IMDb rating greater than 8.5 → 15 results
  - Returns highly-rated movies using the query { "rating": { $gt: 8.5 } }.
  - Includes The Godfather, The Dark Knight, and Parasite.
  - Identifies critically acclaimed films.

- Find movies where genre includes "Action" and "Sci-Fi" → 18 results
  - Filters movies with genre containing both "Action" and "Sci-Fi".
  - Includes The Matrix, Inception, and Avengers: Endgame.
  - Useful for understanding the dominance of action sci-fi films.

##**Filtering & Logical Queries**<br>
These queries use logical operators ($or, $and, $not) to refine searches.

- Find movies released between 2000 and 2010 → 32 results
  - Retrieves movies where "year" falls in the range { $gte: 2000, $lte: 2010 }.
  - Helps analyze industry trends over a decade.

- Find movies that are either "Comedy" or "Horror" → 21 results
  - Uses { $or: [ { genre: "Comedy" }, { genre: "Horror" } ] }.
  - Identifies audience favorites in these genres.
  - Helps analyze the performance of these two contrasting genres.

- Find movies that are neither "Comedy" nor "Horror" → 48 results
  - Uses { genre: { $nin: ["Comedy", "Horror"] } }.
  - Useful for analyzing drama, thriller, and other dominant genres.

- Find movies with runtime between 90 and 91 minutes → 5 results
  - Uses { runtime: { $gte: 90, $lte: 91 } }.
  - Useful for runtime distribution analysis.
  - Highlights movies with a precise duration.

- Find movies with at least 1000 votes and an IMDb rating above 7 → 22 results
  - { votes: { $gte: 1000 }, rating: { $gt: 7 } }.
  - Ensures a significant audience engagement benchmark.

- Find movies that have exactly 3 genres → 14 results
  - { genre: { $size: 3 } }.
  - Helps in analyzing the impact of multiple genres on success.

- Find movies that have "Avengers" in the title → 4 results
  - Highlights the major Avengers franchise films.
  - Similar to the "Batman" query, identifies films in the Avengers franchise.

- Get all movies with an IMDb rating between 1 and 2 → 3 results
  - { rating: { $gte: 1, $lte: 2 } }.
  - Identifies the lowest-rated movies.
  - useful for failure analysis

##**Sorting & Pagination**<br>

- Get the top 5 highest-rated movies → 5 results
  - { rating: -1 } with .limit(5).
  - Includes The Godfather, Schindler’s List, and The Dark Knight.
  - Identifies the best movies based on IMDb ratings.

- Find the most recent 3 movies added to the database → 3 results
  - { date_added: -1 } with .limit(3).
  - Tracks database updates and new releases.

##**Aggregation & Grouping**<br>

- Find the average IMDb rating for each genre
  - Groups movies by genre and calculates the mean rating.
  - Helps determine which genres receive the highest ratings.
  - Output:
    - Film-Noir: 7.397402597402598
    - Short: 7.377574370709382
    - Documentary: 7.252272727272728
    - History: 7.1696100917431185
    - War: 7.128591954022989
    - Biography: 7.087984189723319
    - Talk-Show: 7
    - Animation: 6.89669603524229
    - Music: 6.883333333333334

- Count the number of movies per year
  - Groups movies by year and counts occurrences.
  - Useful for identifying trends in movie production over time.
  - Output:
    - 2007: 800 movies
    - 2008: 871 movies
    - 2009: 857 movies
    - 2010: 868 movies
    - 2011: 909 movies
    - 2012: 903 movies
    - 2013: 1063 movies
    - 2014: 1089 movies
    - 2015: 773 movies
    - 2016: 10 movies

- Find the total number of movies directed by each director
  - Groups movies by director and counts their films.
  - Helps in analyzing the productivity of filmmakers
  - Output:
   - Woody Allen: 40 movies
   - Martin Scorsese: 32 movies
   - Takashi Miike: 31 movies
   - Sidney Lumet: 29 movies
   - John Ford: 29 movies
   - Steven Spielberg: 29 movies
   - Robert Altman: 27 movies
   - Spike Lee: 27 movies
   - Clint Eastwood: 27 movies
   - Michael Apted: 27 movies

- Find the total number of movies for each MPAA rating (G, PG, PG-13, R)
  - Groups movies by MPAA rating.
  - Useful for studying rating distribution.
  - Output:
   - null: 9895 movies
   - PG: 1852 movies
   - PG-13: 2321 movies
   - R: 5537 movies
   - APPROVED: 709 movies
   - G: 477 movies
   - PASSED: 181 movies
   - TV-14: 89 movies
   - TV-PG: 76 movies
   - TV-MA: 60 movies

##**Text Search & Indexing**<br>

- Find all movies that have "war" in their title or description → 8 results
  - Uses a text index on "title" and "description".
  - Useful for studying war movies.
  - Includes War Horse, The Art of War, and Dunkirk.

##**Joins & Lookups**<br>

- Retrieve all movies along with their comments → 20 results
  - Uses $lookup to join the movies and comments collections.
  - Enables sentiment analysis of audience feedback.
  - Shows user feedback for multiple movies.


## **5] Key Insights from the Output(Observations)**

##**Genre Popularity:**
- Drama, Action, Sci-Fi, and Superhero movies dominate high ratings and box office earnings.
- Comedy and Horror are audience favorites but may not always have high ratings.

##**Director & Actor Influence:**
- Directors like Nolan and Spielberg consistently deliver high-rated films in terms of IMDb ratings.
- Some actors frequently appear in both high-rated and box-office hit movies indicating star power influence.

##**Streaming & Industry Trends:**
- More movies are released each year, with a rise in shorter runtimes.
- Streaming services have influenced movie runtimes and genre preferences
- More movies released post-2020 are under 2 hours, likely due to streaming
platforms.

##**Movie Runtime Trends:**
- The most common runtime is between 100–120 minutes.
- Shorter movies tend to perform better on streaming platforms.

##**Movie Success Factors:**
- High-rated movies often have strong casts, engaging storytelling, and critical acclaim.
- Box office hits do not always correlate with high IMDb ratings.


##**6] Observations from Movie Analytics Dashboard**

[Dashboard Link](https://charts.mongodb.com/charts-project-0-zrkxyou/public/dashboards/3e660d11-2305-4d59-9a50-56370758aede)

This **dashboard** provides a **comprehensive visualization of movie industry trends** using various charts and KPIs. Below is a **detailed breakdown** of the insights extracted from each visualization, including **key numbers** wherever applicable.  

##**Movie Genre & Language Trends**  

**1.Stacked Bar Chart - Genre Distribution by Year**  
- **Objective:** Analyze how different movie genres have evolved over time.  
- **Key Insights:**  
  - Action, Drama, and Comedy have remained **dominant genres** over the years.  
  - Sci-Fi and Superhero movies **increased in popularity post-2000**, with a notable peak around **2012-2018**.  
  - Western and Musical genres have **declined significantly** since the 1980s.  

**2.Stacked Bar Chart - Language Distribution per Year**  
- **Objective:** Show how language diversity in movies has evolved.  
- **Key Insights:**  
  - **English remains the dominant movie language**, accounting for **over 70%** of films.  
  - **Growth in foreign-language films** (especially Spanish, French, and Hindi) in the **last two decades**.  


##**Movie Production & Release Trends**  

**1.Discrete Line Chart - Number of Movies Released Per Year**  
- **Objective:** Track the rise and fall of movie production over time.  
- **Key Insights:**  
  - The **highest number of movies** were released in **2013 (1,063 movies) and 2014 (1,089 movies)**.  
  - **Drastic drop in movie production after 2016**, possibly due to digital transformation and streaming dominance.  

**2.Grouped Column Chart - Most Frequent Movie Release Months**  
**Objective:** Identify seasonal trends in movie releases.  
**Key Insights:**  
- **June, July, and December** have the **highest number of movie premieres**, aligning with summer and holiday seasons.  
- **January and September** see the **lowest number of releases**.  


##**IMDb Ratings & Movie Quality**  

**1.Heatmap - Movie Ratings by Genre and Year**  
- **Objective:** Track IMDb ratings across different genres over time.  
- **Key Insights:**  
  - **Film-Noir (7.39), Documentary (7.25), and War (7.12) movies** have the **highest average ratings**.  
  - Horror movies have **lower average ratings** (often below **6.5**).  

**2.Scatter Plot - Movie Duration vs IMDb Rating**  
- **Objective:** Analyze the relationship between runtime and IMDb rating.  
- **Key Insights:**  
  - **Movies with runtimes between 100-130 minutes** tend to have **higher IMDb ratings**.  
  - **Very long movies (>180 minutes) have mixed ratings**, with some achieving high scores but others being poorly received.  

**3.Gauge Chart - Average IMDb Rating of All Movies**  
- **Objective:** Display the overall quality of movies based on IMDb scores.  
- **Key Insights:**  
  - The **average IMDb rating** across all movies is **7.1**.  


##**Global Movie Production Trends**  

**1.Geo Choropleth Map - Number of Movies Per Country**  
- **Objective:** Show the number of movies produced per country.  
- **Key Insights:**  
  - **USA leads with over 60% of all movies** in the dataset.  
  - **India ranks second**, producing a **large volume of films** annually.  
  - **Europe shows diverse production**, with **France, UK, and Germany** leading.  

**2.Stacked Column Chart - Content Ratings by Country (Top 10)**  
- **Objective:** Compare movie content ratings (G, PG, PG-13, R) by country.  
- **Key Insights:**  
  - **USA produces the most R-rated movies**, while **India focuses more on PG-rated content**.  
  - **China and Japan** have a significant number of **family-friendly movies**.  


##**Movie Awards & Recognitions**  

**1.Scatter Plot - Number of Awards vs IMDb Ratings**  
- **Objective:** Analyze whether award-winning movies have higher ratings.  
- **Key Insights:**  
 - **Movies with more than 10 awards generally have an IMDb rating of 8+**.  
 - Some **low-rated movies (<6) have won awards**, likely due to niche categories.  

**2.Continuous Line Chart - Total Awards Won Over the Years**  
- **Objective:** Show how award-winning movies have changed over time.  
- **Key Insights:**  
  - **Peak in movie awards between 2005-2015**, possibly due to increased global recognition.  
  - **Awards for Sci-Fi and Animation movies have increased significantly** in recent years.  

**3.Table - Top 10 Directors, Their Movies, and Awards Won**  
- **Objective:** Identify the most successful directors.  
- **Key Insights:**  
 - **Woody Allen (40 movies), Martin Scorsese (32 movies), and Steven Spielberg (29 movies)** are the most prolific directors.  
 - Directors with **more awards** tend to have a higher IMDb rating average.  



##**Movie Length & Runtime Trends**  
**1.Donut Chart - Movie Distribution by Runtime Category**  
- **Objective:** Analyze how movies are distributed based on length.  
- **Key Insights:**  
 - **Most movies are between 100-120 minutes long (38% of dataset)**.  
 - **Short films (<60 minutes) are rare**, accounting for **less than 10%**.  

**2.Stacked Combo Chart - Movie Count & Avg Runtime by Year**  
- **Objective:** Compare the number of movies released with their runtime trends.  
- **Key Insights:**  
 - **Movie runtimes have gradually increased** over the last **20 years**.  
 - **Peak average runtime was 140 minutes in 2015**, then it declined.  

**3.Candlestick Chart - Movie Duration Variation by Decade**  
- **Objective:** Track movie duration changes by decade.  
- **Key Insights:**  
 - **1970s and 1980s had the longest average runtimes (~140 mins)**.  
 - **Modern movies (post-2010) are shorter (~110-120 mins on average)**, likely due to streaming preferences.  


##**Additional Key Performance Indicators (KPIs)**  

**1.KPI Card - Total Number of Movies**  
- **Total Movies in Dataset:** **9,895**  

**2.KPI Card - Top Rated Movie**  
- **Top IMDb-Rated Movie:** **The Godfather (9.2 IMDb rating)**  

**3.KPI Card - Most Common Movie Genre**  
- **Most Frequent Genre:** **Drama**  





## **7] Key Insights from the Dashboard**  

**Genre Trends:**  
- **Action, Drama, and Comedy dominate** the industry, but Sci-Fi and Animation are rapidly growing.  

**Production & Release Patterns:**  
- **2013 and 2014 saw the highest movie releases**.  
- **Most movies are released in summer (June, July) and holiday seasons (December).**  

**Quality & Audience Engagement:**  
- **Highly rated movies are usually 100-130 minutes long.**  
- **Documentaries and Biographies get the best IMDb ratings.**  

**Awards & Recognitions:**  
- **Award-winning movies generally have an IMDb rating of 8+.**  
- **Dramas and Historical films win the most awards.**  

**Impact of Streaming & Modern Trends:**  
- **Shorter movies (<120 minutes) are now more common.**  
- **Streaming platforms are influencing movie runtime and genre focus.**

## **8] Insights & Recommendations**


**Industry Trends & Future Growth**  
- **Monitor the impact of streaming platforms** on movie trends and audience preferences.  
- **Invest in data-driven decision-making** to forecast box-office success and audience demand.  
- **Leverage AI & Big Data** to optimize content strategies and production efficiency.  
- **Analyze decade-based trends** to identify timeless storytelling techniques and genre longevity.  


**Recommendations for Movie Producers**  
- **Prioritize Sci-Fi and Action genres** for higher audience engagement and box-office performance.  
- **Consider producing Biography and Documentary films** for critical acclaim and award recognition.  
- **Capitalize on franchise potential** (e.g., Superhero, Sci-Fi, and Fantasy) for sustained success.  
- **Ensure strong storytelling and screenplay depth** for long-form genres like History and Biography.  

**Strategies for Streaming Platforms (Netflix, Amazon Prime, Disney+)**  
- **Focus on high-rated genres** such as Documentary, Biography, and Drama to attract a diverse audience.  
- **Expand investments in Sci-Fi & Animation**, as they are rapidly gaining popularity.  
- **Optimize movie runtimes** within **100-130 minutes** for better viewer retention.  
- **Track actor-director success patterns** to secure exclusive content deals with top-performing teams.  

**Marketing & Distribution Strategies**  
- **Utilize IMDb ratings and genre trends** to create targeted promotional campaigns.  
- **Maximize audience reach by leveraging popular actors and directors** in pre-release marketing.  
- **Emphasize storytelling depth** in Drama and Romance genres to attract award-season attention.  
- **Target regional markets strategically** by analyzing genre preferences across different demographics.  

**Film Festival & Award Season Optimization**  
- **Prioritize Drama, War, and Biography films**, as they have the highest award-winning potential.  
- **Develop character-driven narratives and compelling screenplays** to boost critical recognition.  
- **Align movie releases with award seasons** to enhance visibility and impact in prestigious festivals.  



##**9] Conclusion & Next Steps**

This project leveraged **MongoDB Compass & Atlas Charts** to analyze key trends in the movie industry, providing **data-driven insights** into genres, ratings, box office performance, and audience preferences. By examining these patterns, we identified **factors influencing movie success** and their impact on production, distribution, and marketing strategies.  

**Next Steps:**  
- **Expand the dataset** by integrating **user reviews, audience demographics, and streaming platform performance** for a more comprehensive analysis.  
- **Explore international movie trends** beyond Hollywood to understand regional preferences and global box office success.  
- **Apply Machine Learning models** to **predict movie success** based on historical data, including factors like director influence, genre trends, and audience sentiment.  
- **Develop a dynamic dashboard** for real-time visualization of key insights, enabling industry professionals to make data-backed decisions.  

**The Power of Data in the Movie Industry**  
- As the **film industry continues to evolve**, leveraging **data-driven insights** can:  
- **Identify key success factors** that drive audience engagement and financial performance.  
- **Track emerging trends** in storytelling, genre preferences, and audience behavior.  
- **Provide actionable recommendations** for production studios, streaming platforms, and marketing teams.  

With these insights, stakeholders can optimize strategies, enhance viewer satisfaction, and maximize both box office revenue and critical acclaim!