# Pandas Training Notebook with Titanic Data

## Introduction to Pandas


Welcome to the Pandas training notebook. In this notebook, you will learn the basics of the Pandas library for data manipulation and analysis using the Titanic dataset. Follow the instructions and answer the questions to practice your Pandas skills.


## 1. Importing Pandas


### Instruction:
Import the pandas library as pd. This is the first step before you can start using pandas.

### Question:
How do you import the pandas library with the alias `pd`?


In [1]:
import pandas as pd 

## 2. Reading Data


### Instruction:
Pandas can read data from various file formats like CSV, Excel, and SQL databases. You will start by reading a CSV file.

### Question:
Write the command to read a CSV file named `titanic.csv` into a DataFrame called `titanic_data`.


In [7]:
titanic_data = pd.read_csv('data-analytics-course-main/data/titanic.csv')

## 3. Viewing Data


### Instruction:
Once you have loaded data into a DataFrame, it's important to get a sense of what the data looks like.

### Question:
How do you display the first 5 rows of the DataFrame `titanic_data`?


In [9]:
titanic_data.head(5)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## 4. Data Information


### Instruction:
Getting an overview of your DataFrame's structure and summary statistics is crucial.

### Question:
What command would you use to get a concise summary of the DataFrame `titanic_data`, including the number of non-null entries in each column?


In [10]:
titanic_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


## 5. Selecting Data


### Instruction:
You often need to select specific rows and columns from your DataFrame.

### Question:
How do you select the `Age` column from the DataFrame `titanic_data`?


In [12]:
titanic_data['Age']

0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: Age, Length: 891, dtype: float64

## 6. Filtering Data


### Instruction:
Filtering allows you to subset the data based on conditions.

### Question:
How do you filter the DataFrame `titanic_data` to include only rows where the `Age` column is greater than 30?


In [13]:
titanic_data['Age'] > 30

0      False
1       True
2      False
3       True
4       True
       ...  
886    False
887    False
888    False
889    False
890     True
Name: Age, Length: 891, dtype: bool

## 7. Data Aggregation


### Instruction:
Aggregation methods are used to compute summary statistics.

### Question:
How do you calculate the mean of the `Fare` column in the DataFrame `titanic_data`?


In [16]:
titanic_data['Fare'].mean()

np.float64(32.204207968574636)

## 8. Handling Missing Data


### Instruction:
Handling missing data is a common task in data analysis.

### Question:
How do you drop all rows from the DataFrame `titanic_data` that contain any missing values?


In [18]:
titanic_data.dropna()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
...,...,...,...,...,...,...,...,...,...,...,...,...
871,872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47.0,1,1,11751,52.5542,D35,S
872,873,0,1,"Carlsson, Mr. Frans Olof",male,33.0,0,0,695,5.0000,B51 B53 B55,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


## 9. Grouping Data


### Instruction:
Grouping data is useful for performing operations on subsets of your data.

### Question:
How do you group the DataFrame `titanic_data` by the `Pclass` column and calculate the average fare for each class?


In [28]:
titanic_data.groupby('Pclass')['Fare'].mean()

Pclass
1    84.154687
2    20.662183
3    13.675550
Name: Fare, dtype: float64

## 10. Merging DataFrames


### Instruction:
Merging is used to combine multiple DataFrames into a single one.

### Question:
Write the command to merge two DataFrames `df1` and `df2` on a common column named `PassengerId`.


In [29]:
pd.merge(titanic_data, titanic_data, on='PassengerId')

Unnamed: 0,PassengerId,Survived_x,Pclass_x,Name_x,Sex_x,Age_x,SibSp_x,Parch_x,Ticket_x,Fare_x,...,Pclass_y,Name_y,Sex_y,Age_y,SibSp_y,Parch_y,Ticket_y,Fare_y,Cabin_y,Embarked_y
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,...,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,...,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,...,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,...,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,...,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,...,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,...,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,...,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,...,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [30]:
titanic_data.shape

(891, 12)