- Project Overview
- Features
- Concepts Covered
- Installation
- How to Use
- Dataset Source
- Insights Summary
- Future Enhancements
- Author
This project demonstrates basic data analysis operations using Pandas on the Amazon Prime Titles Dataset. It focuses on performing data loading, inspection, cleaning, and filtering to understand and manipulate tabular data efficiently using Python.
The goal is to gain hands-on experience with fundamental data-handling techniques essential for any data analytics or data science workflow.
- π Dataset Loading β Reads and displays data using Pandas.
- π§Ύ Data Inspection β Views dataset shape, column names, and data types.
- π§Ή Data Cleaning β Handles missing values, trims spaces, and converts date formats.
- π Filtering and Indexing β Extracts subsets of data based on specific conditions (e.g., release year, country).
- βοΈ Data Transformation β Converts duration strings into numeric values for easier processing.
- Pandas Operations:
read_csv(),head(),tail(),info(),shape,columns - Data Cleaning: Handling missing values with
dropna(), type conversion withto_datetime() - Filtering: Conditional selection using Boolean indexing (
df[df['column'] == value]) - Feature Extraction: Creating new columns (e.g., numeric duration from string)
- Basic Analysis: Viewing subsets and summaries of data
Make sure you have Python and required libraries installed.
pip install pandas numpyYou can run the notebook using:
- Jupyter Notebook, or
- VS Code with the Jupyter extension enabled.
-
Download or clone the project folder.
-
Open the notebook file:
b3856ea8-c4b6-41de-a449-168e3732e8c6.ipynb -
Place the dataset file
amazon_prime_titles.csvin the same folder. -
Run each cell in order to:
- Load and display dataset
- Clean data (remove null values, convert columns)
- Filter data by year, country, or type
- Transform columns (like duration β minutes)
Dataset: Amazon Prime Titles Source: Kaggle β Amazon Prime Movies and TV Shows Description: Contains detailed information about Amazon Prime Video titles including show ID, title, director, cast, country, release year, rating, and duration.
- Some columns such as cast and date_added had missing values, which were cleaned.
- Majority of the data entries represent Movies, with fewer TV Shows.
- The dataset includes movies and shows from various countries, including India and the USA.
- Filtering by release year helped isolate recent titles (e.g., post-2015).
- Converted duration strings (e.g., β90 minβ) into numeric form for further use.
- π Add visualization using Matplotlib/Seaborn for better understanding.
- π§ Include EDA (Exploratory Data Analysis) to discover content trends.
- π§© Create dashboards using Power BI or Streamlit.
- π΅οΈ Add summary statistics such as most frequent countries, top release years, etc.
π€ Name: Prasad Goud π Role: Engineering Student π» Skills Used: Python, Pandas, NumPy, Data Cleaning, Filtering π Year: 2025