# Netflix and Titanic Data Analysis

This project provides a detailed analysis of two datasets: Netflix and Titanic. Using Python and libraries such as Pandas, Seaborn, and Matplotlib, various exploratory tasks were completed to extract meaningful insights. This README outlines the objectives, methodologies, and outputs for the analysis.

---

## Table of Contents
1. [Project Overview](#project-overview)
2. [Dataset Descriptions](#dataset-descriptions)
3. [Netflix Tasks](#netflix-tasks)
    - Q1: Missing Ratings
    - Q2: Films in 2021 by Country
    - Q3: Movies in 2020 with Full Information
    - Q4: Year with the Most Titles
    - Q5: Average Releases Since 2010
4. [Titanic Tasks](#titanic-tasks)
    - Q1: Gender-Based Survival Percentage
    - Q2: Survival Percentage by Gender and Class
5. [How to Use](#how-to-use)
6. [Results](#results)
7. [Future Improvements](#future-improvements)

---

## Project Overview

The purpose of this project is to apply Python's data manipulation and visualization tools to uncover insights from the Netflix and Titanic datasets. The exercises aim to strengthen familiarity with Pandas operations, data visualization, and interpreting real-world data.

---

## Dataset Descriptions

### Netflix Dataset
The Netflix dataset contains information about movies and TV shows available on the platform, including:
- Title
- Type (Movie/TV Show)
- Release Year
- Country
- Rating

### Titanic Dataset
The Titanic dataset provides information about passengers aboard the Titanic, such as:
- Gender
- Passenger Class
- Survival Status

---

## Netflix Tasks

### **Q1: Count Missing Ratings**
- **Objective**: Determine the number of missing ratings in the dataset.
- **Methodology**:
  - Use `isnull()` and `sum()` to count missing values in the `rating` column.
- **Output**:
  - Total missing ratings: **4**

---

### **Q2: Count Films in 2021 by Country**
- **Objective**: Find the number of movies released in 2021 corresponding to a specific country (e.g., Spain).
- **Methodology**:
  - Filter data for `type='Movie'` and `release_year=2021`, then check the `country` column.
- **Output**:
  - Total movies from Spain in 2021: **6**

---

### **Q3: Movies in 2020 with Full Information**
- **Objective**: Count movies released in 2020 with no missing values.
- **Methodology**:
  - Filter for `type='Movie'` and `release_year=2020`.
  - Use `dropna()` to ensure complete records.
- **Output**:
  - Number of movies with full information: **409**

---

### **Q4: Year with the Most Titles**
- **Objective**: Identify the year with the highest number of titles released.
- **Methodology**:
  - Group by `release_year` and count titles using `.size()`.
  - Use `.idxmax()` to find the year with the maximum count.
- **Output**:
  - Year with the most titles: **2018**
  - Total titles: **1147**

---

### **Q5: Average Releases Since 2010**
- **Objective**: Calculate the average number of releases per year since 2010.
- **Methodology**:
  - Filter data for `release_year >= 2010`.
  - Group by `release_year` and calculate the mean of titles per year.
- **Output**:
  - Average releases per year: **622.67**

---

## Titanic Tasks

### **Q1: Gender-Based Survival Percentage**
- **Objective**: Calculate the survival percentage for each gender.
- **Methodology**:
  - Count occurrences of each gender and survivors grouped by gender.
  - Calculate the percentage of survivors for each gender.
- **Output**:
  | Gender | Survival Percentage |
  |--------|---------------------|
  | Male   | 12.93%             |
  | Female | 50.00%             |

---

### **Q2: Survival Percentage Grouped by Gender and Class**
- **Objective**: Calculate survival percentages grouped by gender and passenger class.
- **Methodology**:
  - Group data by `Sex` and `Pclass`, then calculate the mean survival rate.
- **Output**:
  | Gender | Passenger Class | Survival Percentage |
  |--------|-----------------|---------------------|
  | Male   | 1               | 25.14%             |
  | Male   | 2               | 9.94%              |
  | Male   | 3               | 9.53%              |
  | Female | 1               | 63.19%             |
  | Female | 2               | 66.04%             |
  | Female | 3               | 33.33%             |

---

## How to Use

1. Clone this repository to your local machine.
2. Ensure Python 3.x and the following libraries are installed:
   - Pandas
   - Matplotlib
   - Seaborn
3. Run each Jupyter Notebook or Python script corresponding to the Netflix and Titanic tasks.
4. Review the results printed in the console and visualized in graphs.

---

## Results

- The Netflix dataset highlighted trends in missing values, title releases, and regional contributions.
- The Titanic dataset demonstrated the stark survival differences between genders and passenger classes.

---

## Future Improvements

1. Add visualizations for each task to improve interpretability.
2. Automate parameter inputs (e.g., year, country) for greater flexibility.
3. Explore machine learning models to predict survival in Titanic or user preferences in Netflix.
