Welcome to my Netflix SQL Data Analysis project. In this project, I've worked on 15 different problems using SQL to explore and analyze the Netflix dataset. Each query focuses on a real-world use case or business insight.
- Analyze the distribution of content types (movies vs TV shows).
- Identify the most common ratings for movies and TV shows.
- List and analyze content based on release years, countries, and durations.
- Explore and categorize content based on specific criteria and keywords.
The data for this project is sourced from the Kaggle dataset:
- Dataset Link: Movies Dataset
DROP TABLE IF EXISTS netflix;
CREATE TABLE netflix
(
    show_id      VARCHAR(5),
    type         VARCHAR(10),
    title        VARCHAR(250),
    director     VARCHAR(550),
    casts        VARCHAR(1050),
    country      VARCHAR(550),
    date_added   VARCHAR(55),
    release_year INT,
    rating       VARCHAR(15),
    duration     VARCHAR(15),
    listed_in    VARCHAR(250),
    description  VARCHAR(550)
);SELECT 
    type,
    COUNT(*)
FROM netflix
GROUP BY 1;Objective: Determine the distribution of content types on Netflix.
SELECT
	type,
	rating
FROM
(   SELECT 
	Type,
	rating,
	COUNT(*),
	RANK() OVER(PARTITION BY type ORDER BY COUNT(*) DESC) AS Ranking
    FROM netflix
    GROUP BY 1,2
) as t1
WHERE 
	 ranking = 1Objective: Identify the most frequently occurring rating for each type of content.
SELECT * 
FROM netflix
WHERE release_year = 2020;Objective: Retrieve all movies released in a specific year.
SELECT 
	UNNEST(STRING_TO_ARRAY(country,',')) AS new_country,
	COUNT(show_id) AS total_content
FROM netflix
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5Objective: Identify the top 5 countries with the highest number of content items.
SELECT 
    *
FROM netflix
WHERE type = 'Movie'
ORDER BY SPLIT_PART(duration, ' ', 1)::INT DESC;Objective: Find the movie with the longest duration.
SELECT *
FROM netflix
WHERE TO_DATE(date_added, 'Month DD, YYYY') >= CURRENT_DATE - INTERVAL '5 years';
Objective: Retrieve content added to Netflix in the last 5 years.
SELECT *
FROM (
    SELECT 
        *,
        UNNEST(STRING_TO_ARRAY(director, ',')) AS director_name
    FROM netflix
) AS t
WHERE director_name = 'Rajiv Chilaka';Objective: List all content directed by 'Rajiv Chilaka'.
SELECT *
FROM netflix
WHERE type = 'TV Show'
  AND SPLIT_PART(duration, ' ', 1)::INT > 5;Objective: Identify TV shows with more than 5 seasons.
SELECT 
    UNNEST(STRING_TO_ARRAY(listed_in, ',')) AS genre,
    COUNT(*) AS total_content
FROM netflix
GROUP BY 1;Objective: Count the number of content items in each genre.
return top 5 year with highest avg content release!
SELECT 
    country,
    release_year,
    COUNT(show_id) AS total_release,
    ROUND(
        COUNT(show_id)::numeric /
        (SELECT COUNT(show_id) FROM netflix WHERE country = 'India')::numeric * 100, 2
    ) AS avg_release
FROM netflix
WHERE country = 'India'
GROUP BY country, release_year
ORDER BY avg_release DESC
LIMIT 5;Objective: Calculate and rank years by the average number of content releases by India.
SELECT * 
FROM netflix
WHERE listed_in LIKE '%Documentaries';Objective: Retrieve all movies classified as documentaries.
SELECT * 
FROM netflix
WHERE director IS NULL;Objective: List content that does not have a director.
SELECT * 
FROM netflix
WHERE casts LIKE '%Salman Khan%'
  AND release_year > EXTRACT(YEAR FROM CURRENT_DATE) - 10;Objective: Count the number of movies featuring 'Salman Khan' in the last 10 years.
SELECT 
    UNNEST(STRING_TO_ARRAY(casts, ',')) AS actor,
    COUNT(*)
FROM netflix
WHERE country = 'India'
GROUP BY actor
ORDER BY COUNT(*) DESC
LIMIT 10;Objective: Identify the top 10 actors with the most appearances in Indian-produced movies.
SELECT 
    category,
    COUNT(*) AS content_count
FROM (
    SELECT 
        CASE 
            WHEN description ILIKE '%kill%' OR description ILIKE '%violence%' THEN 'Bad'
            ELSE 'Good'
        END AS category
    FROM netflix
) AS categorized_content
GROUP BY category;Objective: Categorize content as 'Bad' if it contains 'kill' or 'violence' and 'Good' otherwise. Count the number of items in each category.
Praveen S.
Aspiring Data Analyst  | B.Sc. in
- Load the netflix_titles.csvdataset into your SQL environment.
- Use the queries from Solution.sqlto run the analysis.
- Modify parameters like years, names, or keywords to customize results.
- 
Data cleaning with SPLIT_PART,UNNEST, andILIKE
- 
Ranking and partitioning with RANK() OVER (PARTITION BY …)
- 
Text analysis within SQL 
- 
Real-world analytics from entertainment data 
