Welcome to my Netflix SQL Data Analysis project. In this project, I've worked on 15 different problems using SQL to explore and analyze the Netflix dataset. Each query focuses on a real-world use case or business insight.
- Analyze the distribution of content types (movies vs TV shows).
- Identify the most common ratings for movies and TV shows.
- List and analyze content based on release years, countries, and durations.
- Explore and categorize content based on specific criteria and keywords.
The data for this project is sourced from the Kaggle dataset:
- Dataset Link: Movies Dataset
DROP TABLE IF EXISTS netflix;
CREATE TABLE netflix
(
show_id VARCHAR(5),
type VARCHAR(10),
title VARCHAR(250),
director VARCHAR(550),
casts VARCHAR(1050),
country VARCHAR(550),
date_added VARCHAR(55),
release_year INT,
rating VARCHAR(15),
duration VARCHAR(15),
listed_in VARCHAR(250),
description VARCHAR(550)
);SELECT
type,
COUNT(*)
FROM netflix
GROUP BY 1;Objective: Determine the distribution of content types on Netflix.
SELECT
type,
rating
FROM
( SELECT
Type,
rating,
COUNT(*),
RANK() OVER(PARTITION BY type ORDER BY COUNT(*) DESC) AS Ranking
FROM netflix
GROUP BY 1,2
) as t1
WHERE
ranking = 1Objective: Identify the most frequently occurring rating for each type of content.
SELECT *
FROM netflix
WHERE release_year = 2020;Objective: Retrieve all movies released in a specific year.
SELECT
UNNEST(STRING_TO_ARRAY(country,',')) AS new_country,
COUNT(show_id) AS total_content
FROM netflix
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5Objective: Identify the top 5 countries with the highest number of content items.
SELECT
*
FROM netflix
WHERE type = 'Movie'
ORDER BY SPLIT_PART(duration, ' ', 1)::INT DESC;Objective: Find the movie with the longest duration.
SELECT *
FROM netflix
WHERE TO_DATE(date_added, 'Month DD, YYYY') >= CURRENT_DATE - INTERVAL '5 years';
Objective: Retrieve content added to Netflix in the last 5 years.
SELECT *
FROM (
SELECT
*,
UNNEST(STRING_TO_ARRAY(director, ',')) AS director_name
FROM netflix
) AS t
WHERE director_name = 'Rajiv Chilaka';Objective: List all content directed by 'Rajiv Chilaka'.
SELECT *
FROM netflix
WHERE type = 'TV Show'
AND SPLIT_PART(duration, ' ', 1)::INT > 5;Objective: Identify TV shows with more than 5 seasons.
SELECT
UNNEST(STRING_TO_ARRAY(listed_in, ',')) AS genre,
COUNT(*) AS total_content
FROM netflix
GROUP BY 1;Objective: Count the number of content items in each genre.
return top 5 year with highest avg content release!
SELECT
country,
release_year,
COUNT(show_id) AS total_release,
ROUND(
COUNT(show_id)::numeric /
(SELECT COUNT(show_id) FROM netflix WHERE country = 'India')::numeric * 100, 2
) AS avg_release
FROM netflix
WHERE country = 'India'
GROUP BY country, release_year
ORDER BY avg_release DESC
LIMIT 5;Objective: Calculate and rank years by the average number of content releases by India.
SELECT *
FROM netflix
WHERE listed_in LIKE '%Documentaries';Objective: Retrieve all movies classified as documentaries.
SELECT *
FROM netflix
WHERE director IS NULL;Objective: List content that does not have a director.
SELECT *
FROM netflix
WHERE casts LIKE '%Salman Khan%'
AND release_year > EXTRACT(YEAR FROM CURRENT_DATE) - 10;Objective: Count the number of movies featuring 'Salman Khan' in the last 10 years.
SELECT
UNNEST(STRING_TO_ARRAY(casts, ',')) AS actor,
COUNT(*)
FROM netflix
WHERE country = 'India'
GROUP BY actor
ORDER BY COUNT(*) DESC
LIMIT 10;Objective: Identify the top 10 actors with the most appearances in Indian-produced movies.
SELECT
category,
COUNT(*) AS content_count
FROM (
SELECT
CASE
WHEN description ILIKE '%kill%' OR description ILIKE '%violence%' THEN 'Bad'
ELSE 'Good'
END AS category
FROM netflix
) AS categorized_content
GROUP BY category;Objective: Categorize content as 'Bad' if it contains 'kill' or 'violence' and 'Good' otherwise. Count the number of items in each category.
Praveen S.
Aspiring Data Analyst | B.Sc. in
- Load the
netflix_titles.csvdataset into your SQL environment. - Use the queries from
Solution.sqlto run the analysis. - Modify parameters like years, names, or keywords to customize results.
-
Data cleaning with
SPLIT_PART,UNNEST, andILIKE -
Ranking and partitioning with
RANK() OVER (PARTITION BY …) -
Text analysis within SQL
-
Real-world analytics from entertainment data
