
# üß© Titanic Dataset Exploration with pandas

üéØ **Goal:**  
In this notebook, you'll explore the Titanic dataset using **pandas**.  
You'll practice the most common pandas functions for data inspection, selection, filtering, cleaning, and analysis.

For each function in the list below:
1. Explain what it does (in your own words, in a Markdown cell).
2. Give at least **two examples** using the Titanic dataset.
3. Add a short comment about the output or why it‚Äôs useful.


In [2]:
import pandas as pd

# Load Titanic dataset
# (Make sure titanic.csv is in the same folder as this notebook)
df = pd.read_csv("titanic_dataset.csv")

# Show first few rows
df.head()


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S



## üß† Step 1: Inspecting the Data

Functions to explore:
- df.head()
- df.tail()
- df.info()
- df.describe()
- df.shape
- df.columns


## df.head()

### 1. Descripci√≥n:  
Es una funci√≥n que por defecto devuelve las primeras 5 filas del dataframe.   
Si se pasa un int por par√°metro, devuelve las primeras n filas.   
Si se pasa un int negativo por par√°metro, devolver√° todas las filas excepto las n √∫ltimas.  

<br>

### 2. Ejemplos:

In [38]:
df.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Fare_per_Age
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,0.304808


In [41]:
df.head(-91)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Fare_per_Age
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,0.304808
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,1.517143
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,0.230000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,796,0,2,"Otter, Mr. Richard",male,39.0,0,0,28213,13.0000,,S,0.333333
796,797,1,1,"Leader, Dr. Alice (Farnham)",female,49.0,0,0,17465,25.9292,D17,S,0.529167
797,798,1,3,"Osman, Mrs. Mara",female,31.0,0,0,349244,8.6833,,S,0.280106
798,799,0,3,"Ibrahim Shawah, Mr. Yousseff",male,30.0,0,0,2685,7.2292,,C,0.240973


### 3. Utilidad

Permite de manera r√°pida ver la estructura de un DataFrame.  

Si se ordenan los datos, nos permite crear nuevos DataFrames filtrando por los n primeros registros.  

Por ejemplo, si los datos estuviesen ordenados por edad:

<br>

**50 pasajeros m√°s j√≥venes** 

```python
df_50_pasajeros_mas_jovenes = df. head(50)
``` 
<br>

**Todos los pasajeros salvo los 50 m√°s ancianos**

```python 
df_sin_50_mas_ancianos = df.head(-50)
```

---


## df.tail()

### 1. Descripci√≥n:  
Es una funci√≥n que por defecto devuelve las primeras 5 √∫ltimas filas del dataframe.   
Si se pasa un int por par√°metro, devuelve las √∫ltimas n filas.   
Si se pasa un int negativo por par√°metro, devolver√° todas las filas excepto las n primeras.  

<br>

### 2. Ejemplos:

In [35]:
df.tail(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Fare_per_Age
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,28.0,1,2,W./C. 6607,23.45,,S,0.8375
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C,1.153846
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q,0.242188


In [44]:
df.tail(-100)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Fare_per_Age
100,101,0,3,"Petranec, Miss. Matilda",female,28.0,0,0,349245,7.8958,,S,0.281993
101,102,0,3,"Petroff, Mr. Pastcho (""Pentcho"")",male,28.0,0,0,349215,7.8958,,S,0.281993
102,103,0,1,"White, Mr. Richard Frasar",male,21.0,0,1,35281,77.2875,D26,S,3.680357
103,104,0,3,"Johansson, Mr. Gustaf Joel",male,33.0,0,0,7540,8.6542,,S,0.262248
104,105,0,3,"Gustafsson, Mr. Anders Vilhelm",male,37.0,2,0,3101276,7.9250,,S,0.214189
...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,0.481481
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,1.578947
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,28.0,1,2,W./C. 6607,23.4500,,S,0.837500
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,1.153846


### 3. Utilidad

Permite de manera r√°pida ver la estructura de un DataFrame.  

Si se ordenan los datos, nos permite crear nuevos DataFrames filtrando por los n √∫ltimos registros.  

Por ejemplo, si los datos estuviesen ordenados por edad:

<br>

**50 pasajeros m√°s ancianos** 

```python
df_50_pasajeros_mas_ancianos = df.tail(50)
``` 
<br>

**Todos los pasajeros salvo los 50 m√°s j√≥venes**

```python 
df_sin_50_mas_jovenes = df.tail(-50)
```

---

## df.info()

### 1. Descripci√≥n:  
Imprime un resumen conciso de un DataFrame. Incluyendo el √≠ndices, columnas, dtypes, cantidad de no nulos y uso de memoria.

<br>

### 2. Ejemplos:

In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   PassengerId   891 non-null    int64  
 1   Survived      891 non-null    int64  
 2   Pclass        891 non-null    int64  
 3   Name          891 non-null    object 
 4   Sex           891 non-null    object 
 5   Age           891 non-null    float64
 6   SibSp         891 non-null    int64  
 7   Parch         891 non-null    int64  
 8   Ticket        891 non-null    object 
 9   Fare          891 non-null    float64
 10  Cabin         204 non-null    object 
 11  Embarked      889 non-null    object 
 12  Fare_per_Age  891 non-null    float64
dtypes: float64(3), int64(5), object(5)
memory usage: 90.6+ KB


### 3. Utilidad

Permite ver de manera r√°pida la estructura de un DataFrame.

---

## df.describe()

### 1. Descripci√≥n:  
Genera estad√≠sticas descriptivas del DataFrame.

<br>

### 2. Ejemplos:

In [55]:
df.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare,Fare_per_Age
count,891.0,891.0,891.0,891.0,891.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.361582,0.523008,0.381594,32.204208,2.073904
std,257.353842,0.486592,0.836071,13.019697,1.102743,0.806057,49.693429,7.309062
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0,0.0
25%,223.5,0.0,2.0,22.0,0.0,0.0,7.9104,0.299519
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542,0.535341
75%,668.5,1.0,3.0,35.0,1.0,0.0,31.0,1.417335
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292,164.728261


In [56]:
df.describe(include='object')

Unnamed: 0,Name,Sex,Ticket,Cabin,Embarked
count,891,891,891,204,889
unique,891,2,681,147,3
top,"Braund, Mr. Owen Harris",male,347082,B96 B98,S
freq,1,577,7,4,644


### 3. Utilidad

La funci√≥n .describe() de pandas permite obtener un resumen estad√≠stico r√°pido de las columnas num√©ricas de un DataFrame.  
Identifica valores extremos, medias, medianas y percentiles.  
Tambi√©n puede aplicarse a columnas no num√©ricas con:
```python
df.describe(include='object')
```

Este incluir√°:   
count (cantidad de valores)  
unique (n√∫mero de valores √∫nicos),   
top (el valor m√°s frecuente)  
freq (la frecuencia del valor m√°s frecuente)  

---

## df.shape

### 1. Descripci√≥n:  
Devuelve una tupla que representa las dimesiones del DataFrame

<br>

### 2. Ejemplos:

In [59]:
df.shape

(891, 13)

### 3. Utilidad

Para el dataset de los pasajeros del Titanic indicar√≠a que se trata de una tabla de 13 columnas por 891 filas.

---

## df.columns()

### 1. Descripci√≥n:  
Devuelve todas las etiquetas de las columnas del DataFrame.

<br>

### 2. Ejemplos:

In [61]:
df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'Fare_per_Age'],
      dtype='object')

### 3.Utilidad

Podr√≠a ser util si obtenemos la lista de etiquetas para iterarla en alg√∫n bucle.

---


## üîç Step 2: Selecting Data

Functions to explore:
- df["column"]
- df[["col1", "col2"]]
- df.loc[]
- df.iloc[]


## df["column"]

### 1. Descripci√≥n:  
Devuelve los registros de la columna especificada en formato **Serie**.  


### 2. Ejemplos:


In [65]:
df["Age"].head()



0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64

### 3.Utilidad

Para ver solo aquella o aquellas columnas en las que queramos trabajar.

---

## df["col1", "col2"]

### 1. Descripci√≥n:  
Devuelve los registros de las columnas especificadas en formato **DataFrame**.  


### 2. Ejemplos:


In [68]:
df[["Sex", "Age", "Survived"]].head()

Unnamed: 0,Sex,Age,Survived
0,male,22.0,0
1,female,38.0,1
2,female,26.0,1
3,female,35.0,1
4,male,35.0,0


### 3.Utilidad

Para crear un sub-DataFrame con varias columnas para an√°lisis o visualizaci√≥n.

---

## df.loc[]

### 1. Descripci√≥n:    

Accede a un grupo de filas y columnas a trav√©s de sus etiquetas.


### 2. Ejemplos:

In [None]:
df.loc[(df['Age'] < 18) & (df['Survived'] == 0), ['Name', 'Age']]

Unnamed: 0,Name,Age
7,"Palsson, Master. Gosta Leonard",2.0
14,"Vestrom, Miss. Hulda Amanda Adolfina",14.0
16,"Rice, Master. Eugene",2.0
24,"Palsson, Miss. Torborg Danira",8.0
50,"Panula, Master. Juha Niilo",7.0
59,"Goodwin, Master. William Frederick",11.0
63,"Skoog, Master. Harald",4.0
71,"Goodwin, Miss. Lillian Amy",16.0
86,"Ford, Mr. William Neal",16.0
111,"Zabour, Miss. Hileni",14.5


In [12]:
df.loc[df['Sex']=='female','Name']

1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
8      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
9                    Nasser, Mrs. Nicholas (Adele Achem)
                             ...                        
880         Shelley, Mrs. William (Imanita Parrish Hall)
882                         Dahlberg, Miss. Gerda Ulrika
885                 Rice, Mrs. William (Margaret Norton)
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
Name: Name, Length: 314, dtype: object

### 3.Utilidad

En el primer ejemplo calculo un dataframe con los pasajeros fallecidos menores de edad, en las filas paso las condiciones y en las columnas imprimo nombre y edad, al imprimir varias columnas me devuelve un DataFrame.

En el segundo ejemplo imprimo los nombres de todas las mujeres del barco, al imprimir solo una columna me devuelve una Serie.

---

## df.iloc[]

### 1. Descripci√≥n:  



### 2. Ejemplos:


### 3.Utilidad


---


## üîé Step 3: Filtering Rows

Functions to explore:
- df[df["Age"] > 30]
- df.query("Sex == 'female' and Survived == 1")


In [24]:

# Example: Filtering data
df[df["Age"] > 50].head()
df.query("Sex == 'female' and Survived == 1").head()


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C



## üßπ Step 4: Handling Missing Data

Functions to explore:
- df.isna()
- df.isna().sum()
- df.dropna()
- df.fillna()


In [25]:

# Example: Check missing values
df.isna().sum()

# Fill missing ages with median
df["Age"] = df["Age"].fillna(df["Age"].median())
df["Age"].head()


0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64


## üìä Step 5: Grouping and Aggregating

Functions to explore:
- df.groupby("Sex")["Survived"].mean()
- df["Fare"].mean()
- df["Age"].median()


In [26]:

# Example: Aggregation
df.groupby("Sex")["Survived"].mean()
df.groupby("Pclass")["Fare"].mean()


Pclass
1    84.154687
2    20.662183
3    13.675550
Name: Fare, dtype: float64


## üìà Step 6: Sorting and Counting

Functions to explore:
- df.sort_values("Age")
- df["Sex"].unique()
- df["Pclass"].value_counts()


In [27]:

# Example: Sorting and counting
df.sort_values("Age").head()
df["Pclass"].value_counts()


Pclass
3    491
1    216
2    184
Name: count, dtype: int64


## ‚öôÔ∏è Step 7: Creating or Modifying Columns

Functions to explore:
- df.assign()
- df.apply()
- df["new_col"] = ...
- pd.concat()
- pd.merge()


In [28]:

# Example: Create new column
df["Fare_per_Age"] = df["Fare"] / df["Age"]
df[["Age", "Fare", "Fare_per_Age"]].head()


Unnamed: 0,Age,Fare,Fare_per_Age
0,22.0,7.25,0.329545
1,38.0,71.2833,1.875876
2,26.0,7.925,0.304808
3,35.0,53.1,1.517143
4,35.0,8.05,0.23



## üíæ Step 8: Exporting Data

Function to explore:
- df.to_csv("output.csv", index=False)


In [29]:

# Example: Save cleaned data
df.to_csv("titanic_cleaned.csv", index=False)



## üß© Step 9: Summary

Reflect on what you learned:
- Which functions were most useful?
- What insights did you gain from the Titanic dataset?
