<a href="https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Assignment001: The Titanic Dataset
---

## Lesson Plan: The Titanic Data Story 🚢

### Part 1: The Basics (The Titanic Story)

* **Introduction to pandas:** What is a **DataFrame**? Think of it like a table or a spreadsheet in Excel. A **DataFrame** is a 2D labeled data structure, like a spreadsheet with rows and columns. A single column is called a **Series**.
* **Loading Data:** We'll use `pd.read_csv()` to load the `titanic_dataset.csv` file into a DataFrame. This is the first step in any data analysis project.
* **Initial Data Inspection:** We'll use key methods to get a feel for our data:
    * `.head()` to get a quick preview of the top rows.
    * `.info()` to check data types and see if we have any missing values.
    * `.describe()` to get a statistical summary of the numerical columns.

### Part 2: Data Selection & Filtering (Finding Passengers)

* **Selecting Columns:** We can grab one or more columns from our DataFrame to focus on specific data.
    * Single column: `df['ColumnName']`
    * Multiple columns: `df[['Col1', 'Col2']]`
* **Filtering Rows:** This is how we find specific groups of data that meet a certain condition. For example, finding all male passengers: `df[df['Sex'] == 'male']`.
* **Combining Conditions:** We can use `&` (and) and `|` (or) to create more complex filters.

### Part 3: Data Cleaning & Manipulation (Getting the Data Right)

* **Handling Missing Values:** Real-world data is messy! We'll learn how to identify missing values with `.isna().sum()` and how to deal with them by filling them in with `.fillna()` or dropping them with `.dropna()`.
* **Creating New Columns:** We can add new columns to our DataFrame based on calculations from existing ones. This is a powerful way to add new insights.

### Part 4: Grouping and Aggregation (Summarizing Insights)

* **Introduction to `groupby()`:** This is one of the most powerful tools in pandas. It allows us to "split" the data into groups based on a certain column, "apply" a function (like a sum or average), and "combine" the results.
* **Practical Examples:** We can use `groupby()` to answer questions like, "What was the survival rate for men versus women?"

---

## Assignment: Titanic Data Analysis 🚢

This assignment will test your skills in data loading, cleaning, selection, and analysis. All the questions can be answered using the `train.csv` file.

**Goal:** Use the provided dataset to uncover interesting facts about the Titanic passengers.

### 1. Setup

First, import pandas and load the dataset into a DataFrame.


In [None]:
import pandas as pd


### 2. Data Inspection

Answer the following questions by inspecting the DataFrame.

* How many rows and columns are in the dataset?
* What are the data types of each column?
* Are there any missing values? If so, which columns are they in and how many are missing in each?

In [None]:
# Your code here

### 3. Basic Data Selection & Filtering

* Find and display the first 10 rows of the dataset.
* Select only the `Name` and `Age` columns and display the first 5 rows.
* Filter the DataFrame to show only the passengers who survived (`Survived` column equals 1). How many rows are in this filtered DataFrame?

In [None]:
# Your code here

### 4. Handling Missing Values

* Calculate the average `Age` of all passengers.
* Fill the missing values in the `Age` column with the average `Age` you just calculated.
* Check to confirm that there are no longer any missing values in the `Age` column.

In [None]:
# Your code here

### 5. Grouping and Aggregation

* Calculate the survival rate (mean of `Survived`) for male vs. female passengers.
* Calculate the average `Fare` paid by passengers in each `Pclass` (passenger class).
* Create a new column called `IsAlone` that is `1` if a passenger has no siblings/spouses (`SibSp`) and no parents/children (`Parch`), and `0` otherwise.

In [None]:
# Your code here

---

### Final Challenge Question

This is the final, multi-step challenge. Use a combination of everything you've learned.

**Question:** Among the passengers who paid a **fare** greater than the median fare, what is the survival rate for each passenger class, broken down by gender?

**Hint:** This will require you to:
1.  Calculate the median fare.
2.  Filter the DataFrame to include only passengers who paid more than this median fare.
3.  Group the filtered data by both `Pclass` and `Sex`.
4.  Calculate the mean of the `Survived` column for each of these groups.

In [None]:
# Your code here