Skip to content

RohitV33/Netflix_SQL_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Netflix Movies and TV Shows Data Analysis using SQL


📌 Overview

This project presents a comprehensive analysis of Netflix’s Movies and TV Shows dataset using SQL.
The objective is to extract meaningful insights and answer real-world business questions related to content distribution, ratings, genres, countries, and keyword-based categorization.

This project demonstrates strong SQL skills such as data cleaning, aggregation, string manipulation, window functions, and analytical querying, making it suitable for Data Analyst roles.


🎯 Objectives

  • Analyze the distribution of content types (Movies vs TV Shows)
  • Identify the most common ratings for movies and TV shows
  • Explore content based on release years, countries, and durations
  • Analyze genres and actor appearances
  • Categorize content based on keywords such as kill and violence

📂 Dataset

  • Source: Kaggle – Netflix Movies and TV Shows Dataset
  • Format: CSV
  • Content: Movies and TV shows available on Netflix

🗂️ Schema

DROP TABLE IF EXISTS netflix;

CREATE TABLE netflix
(
    show_id      VARCHAR(5),
    type         VARCHAR(10),
    title        VARCHAR(250),
    director     VARCHAR(550),
    casts        VARCHAR(1050),
    country      VARCHAR(550),
    date_added   VARCHAR(55),
    release_year INT,
    rating       VARCHAR(15),
    duration     VARCHAR(15),
    listed_in    VARCHAR(250),
    description  VARCHAR(550)
);

🧠 Business Problems and Solutions

  1. Count the Number of Movies vs TV Shows
SELECT 
   type,
   COUNT(*)
FROM netflix
GROUP BY 1;

Objective: Determine the distribution of content types on Netflix.

  1. Find the Most Common Rating for Movies and TV Shows
WITH RatingCounts AS (
    SELECT 
        type,
        rating,
        COUNT(*) AS rating_count
    FROM netflix
    GROUP BY type, rating
),

RankedRatings AS (
    SELECT 
        type,
        rating,
        rating_count,
        RANK() OVER (PARTITION BY type ORDER BY rating_count DESC) AS rank
    FROM RatingCounts
)
SELECT 
    type,
    rating AS most_frequent_rating
FROM RankedRatings
WHERE rank = 1;

Objective: Identify the most frequently occurring rating for each content type.

  1. List All Movies Released in 2020
SELECT *
FROM netflix
WHERE release_year = 2020;

Objective: Retrieve all movies released in the year 2020.

  1. Find the Top 5 Countries with the Most Content on Netflix
SELECT *
FROM (
   SELECT 
       UNNEST(STRING_TO_ARRAY(country, ',')) AS country,
       COUNT(*) AS total_content
   FROM netflix
   GROUP BY 1
)
WHERE country IS NOT NULL
ORDER BY total_content DESC
LIMIT 5;

Objective: Identify the top 5 countries with the highest number of content items.

  1. Identify the Longest Movie
SELECT *
FROM netflix
WHERE type = 'Movie'
ORDER BY SPLIT_PART(duration, ' ', 1)::INT DESC;

Objective: Find the movie with the longest duration.

  1. Find Content Added in the Last 5 Years
SELECT *
FROM netflix
WHERE TO_DATE(date_added, 'Month DD, YYYY') 
      >= CURRENT_DATE - INTERVAL '5 years';

Objective: Retrieve content added to Netflix in the last five years.

  1. Find All Movies and TV Shows by Director Rajiv Chilaka
SELECT *
FROM (
    SELECT 
        *,
        UNNEST(STRING_TO_ARRAY(director, ',')) AS director_name
    FROM netflix
) 
WHERE director_name = 'Rajiv Chilaka';

Objective: List all content directed by Rajiv Chilaka.

  1. List All TV Shows with More Than 5 Seasons
SELECT *
FROM netflix
WHERE type = 'TV Show'
  AND SPLIT_PART(duration, ' ', 1)::INT > 5;

Objective: Identify TV shows with more than five seasons.

  1. Count the Number of Content Items in Each Genre
SELECT 
    UNNEST(STRING_TO_ARRAY(listed_in, ',')) AS genre,
    COUNT(*) AS total_content
FROM netflix
GROUP BY 1;

Objective: Analyze content distribution across genres.

  1. Average Content Release by India (Top 5 Years)
SELECT  
   EXTRACT(YEAR FROM TO_DATE(date_added,'Month DD, YYYY')) AS year,
   COUNT(*) AS total_release,
   ROUND(
       COUNT(*)::NUMERIC /
       (SELECT COUNT(*) FROM netflix WHERE country = 'India')::NUMERIC * 100, 2
   ) AS avg_content
FROM netflix
WHERE country = 'India'
GROUP BY year
ORDER BY avg_content DESC
LIMIT 5;

Objective: Identify the top 5 years with the highest average content releases from India.

  1. List All Documentary Movies
SELECT *
FROM netflix
WHERE listed_in ILIKE '%documentaries%';

Objective: Retrieve all movies classified as documentaries.

  1. Find Content Without a Director
SELECT *
FROM netflix
WHERE director IS NULL;

Objective: Identify content with missing director information.

  1. Find Movies Featuring Salman Khan in the Last 10 Years
SELECT *
FROM netflix
WHERE casts ILIKE '%Salman Khan%'
  AND release_year > EXTRACT(YEAR FROM CURRENT_DATE) - 10;

Objective: Find movies featuring Salman Khan released in the last 10 years.

  1. Top 10 Actors in Indian Movies
SELECT 
    UNNEST(STRING_TO_ARRAY(casts, ',')) AS actor,
    COUNT(*) AS total_content
FROM netflix
WHERE country ILIKE '%India%'
GROUP BY actor
ORDER BY total_content DESC
LIMIT 10;

Objective: Identify the top 10 actors with the most appearances in Indian movies.

  1. Categorize Content Based on Keywords (Kill & Violence)
SELECT 
 category,
 COUNT(*) AS content_count
FROM (
 SELECT 
     CASE 
         WHEN description ILIKE '%kill%' 
           OR description ILIKE '%violence%' THEN 'Bad'
         ELSE 'Good'
     END AS category
 FROM netflix
) categorized_content
GROUP BY category;

Objective: Categorize content as Good or Bad based on keyword presence.

📊 Key Insights

Movies dominate Netflix’s content library

India is one of the top content-producing countries

Keyword-based analysis helps classify content nature

Advanced SQL techniques enable deeper analytical insights

🛠️ Tools & Skills Used

PostgreSQL

SQL (CTEs, Window Functions, String Functions)

Data Cleaning & Analysis

Business-Oriented Problem Solving

👨‍💻 Author

Rohit Verma

Aspiring Data Analyst | SQL | Data Analytics

📌 This project is part of my data analytics portfolio.

⭐ If you found this project helpful, feel free to star the repository!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors