A comprehensive data analysis project that explores Netflix's content library using both Python and R visualization techniques.
This project analyzes Netflix's content library dataset to extract insights about content types, genres, ratings, and temporal trends. The analysis is implemented in both Python and R to demonstrate different visualization approaches.
netflix_visualization.py - Python script for data preparation, cleaning, and visualization Netflix_visualizations.R - R script for complementary data analysis and visualization Netflix_shows_movies.csv - Source dataset netflix_dataset.csv - Cleaned dataset used R script Generated visualizations:
top_genres.png - Bar chart of top 15 Netflix genres content_type_distribution.png - Distribution of content types ratings_distribution.png - Distribution of content ratings yearly_content_additions.png - Content added to Netflix by year movie_duration_distribution.png - Distribution of movie durations top_genres_r.png - R-generated bar chart of top genres rating_by_type_r.png - R-generated visualization of ratings by content type
Data Cleaning: Handles missing values and prepares data for analysis Exploratory Data Analysis: Provides statistical summaries and distributions Visualizations: Creates insightful charts about Netflix content Cross-language Implementation: Demonstrates both Python and R approaches
Python: pandas, matplotlib, seaborn, numpy R: tidyverse, ggplot2, dplyr, stringr
Python Script bashpython netflix_visualization.py R Script bashRscript Netflix_visualizations.R Sample Insights
Distribution of Movies vs TV Shows in the Netflix library Most popular content genres Content ratings analysis Trends in content additions over time Movie duration analysis
Python 3.x with pandas, matplotlib, seaborn, and numpy R with tidyverse, ggplot2, dplyr packages Netflix dataset CSV file in the working directory
Notes The R script expects the cleaned CSV output from the Python script. Run the Python script first to generate netflix_cleaned.csv.
THANKS