Skip to content

Mids5/Netflix-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Netflix Data Analysis

A data analysis project that investigates the Netflix dataset to derive insights about movie titles, trends, and viewing preferences. This project leverages exploratory data analysis (EDA), visualizations, and possibly statistical techniques to understand what Netflix’s content looks like and how it has evolved.


Table of Contents


Project Overview

This project explores Netflix data (titles, metadata) to reveal insights such as:

  • Which types of content (movies vs TV shows) are more common
  • Trends over time (e.g. releases per year)
  • Genres distribution
  • Possibly correlations between metadata features (ratings, genre, release year)

The analysis is done using a Jupyter Notebook (NetflixData Analysis.ipynb) with the dataset mymoviedb.csv.


Motivation

  • Netflix has become a major content provider worldwide, and its catalog provides rich data for trends.
  • Understanding how content is distributed over time, genre, and other features can help with understanding user preferences, market direction, or content strategy.
  • Visualization and data analysis help in making data-driven observations rather than just assumptions.

Data

  • File: mymoviedb.csv
  • Source: (If you have a specific source or scraped API, link or mention it here)
  • Contents / Features: Likely includes fields such as title, type (Movie/TV Show), director, cast, country, date added, release year, rating, duration, genre(s), etc.
  • Notebook: NetflixData Analysis.ipynb — where data cleaning, exploring, and visualizing is done.

Analysis Plan

Below are the kinds of steps & analyses performed in the notebook:

  1. Data Cleaning & Preprocessing

    • Handling missing or null values
    • Parsing dates, converting types
    • Splitting / normalizing genre information
  2. Exploratory Data Analysis (EDA)

    • Counts of Movies vs TV Shows
    • Distribution of content across time (e.g. by release year or added date)
    • Countries and content origin
    • Genre popularity
  3. Visualization

    • Bar plots for counts by category (type, genre)
    • Time series or histograms for release years
    • Possibly heatmaps or correlation analyses
  4. Insights / Observations

    • What categories are growing the most over time?
    • Which genres are most common / least common?
    • Are there any outlier years or unusual patterns?

Key Findings

(Fill this section with your actual observations. Sample placeholder insights might be:)

  • The number of new Netflix titles added each year has increased steadily since ___
  • Certain genres (e.g. Drama, Comedy) dominate the catalog, while niche genres are less frequent
  • Some years have a spike in content from particular countries
  • There are many missing values in fields like director or rating — which may affect some analyses

Tools & Libraries

Purpose Tool / Library
Data processing pandas, numpy
Visualization matplotlib, seaborn, possibly plotly
Environment Python, Jupyter Notebook
Data input/output CSV handling

How to Use

  1. Clone this repository:
    git clone https://github.com/Mids5/Netflix-Data-Analysis.git
    cd Netflix-Data-Analysis

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published