
# üß© Titanic Dataset Exploration with pandas

üéØ **Goal:**  
In this notebook, you'll explore the Titanic dataset using **pandas**.  
You'll practice the most common pandas functions for data inspection, selection, filtering, cleaning, and analysis.

For each function in the list below:
1. Explain what it does (in your own words, in a Markdown cell).
2. Give at least **two examples** using the Titanic dataset.
3. Add a short comment about the output or why it‚Äôs useful.


In [None]:

import pandas as pd

# Load Titanic dataset
# (Make sure titanic.csv is in the same folder as this notebook)
df = pd.read_csv("titanic_dataset.csv")

# Show first few rows
df.head()



## üß† Step 1: Inspecting the Data

Functions to explore:
- df.head()
- df.tail()
- df.info()
- df.describe()
- df.shape
- df.columns


df.head muestras las primer filas del dataframe tomando por defecto 5

In [None]:

# Ejemplo: df.head()
df.head()

# Ejemplo 2
df.head(10)


df.tail muestra las √∫ltimas filas del dataframe tomando por defecto 5

In [None]:
#Ejemplo: df.tail()
print(df.tail()) #√öltimas 5 filas

#Ejemplo 2
print(df.tail(2))  #√öltimas 2 filas

df.info muestra un resumen de datos del DataFrame estos son:
Cantidad de filas y columnas
Nombres de las columnas
Tipos de datos
Valores nulos

In [None]:
#Ejemplo: df.info()

df.info()

#Ejemplo 2 
df2 = pd.DataFrame({
    "Nombre": ["Ana", "Luis", "Mar√≠a"],
    "Edad": [23, None, 19],
    "Ciudad": ["Madrid", "Sevilla", "Barcelona"]
})
df2.info()

df.describe() muestra datos estad√≠sticos de las columnas con valores num√©ricos, algunos de estos datos estos son:
Media
Desviaci√≥n
M√≠nimo
M√°ximo
Porcentajes

In [None]:
#Ejemplo: df.describe()

print(df.describe())

#Ejemplo 2

df2 = pd.DataFrame({
    "Altura": [1.75, 1.60, 1.82, 1.90],
    "Peso": [70, 55, 80, 90]
})
df2.describe()


df.shape() devuelve una tupla con las dimensiones del dataframe

In [13]:
#Ejemplo: df.shape()
print(df.shape)

#Ejemplo 2
filas, columnas = df.shape
print(f"El titanic_dataset tiene {filas} filas y {columnas} columnas")

(891, 12)
El titanic_dataset tiene 891 filas y 12 columnas


df.columns() devuelve una lista con los nombres de las columnas

In [None]:
#Ejemplo: df.columns
print(df.columns)

#Ejemplo 2
for columna in df.columns:
    print("Columna:", columna)


## üîç Step 2: Selecting Data
Functions to explore:
- df["column"]
- df[["col1", "col2"]]
- df.loc[]
- df.iloc[]


df["column"] selecciona una columna y devuelve una serie de una columna con su √≠ndice

In [None]:

# Ejemplo: df["column"]
print(df["Age"].head())

#Ejemplo 2
df["Sex"].head()


0      male
1    female
2    female
3    female
4      male
Name: Sex, dtype: object

df[["col1","col2"]] selecciona varias columnas a la vez y devuelve un Dataframe 

In [None]:
# Ejemplo: df[["col","col2"]] 
print(df[["Sex", "Age", "Survived"]].head())

# Ejemplo 2
df[["Pclass", "Survived"]].head()

Unnamed: 0,Pclass,Survived
0,3,0
1,1,1
2,3,1
3,1,1
4,3,0


df.loc[] Selecciona filas y columnas por etiquetas

In [None]:
#Ejemplo: df.loc[]
print(df.loc[0, "Name"])

#Ejemplo 2
df.loc[0:4, ["Name", "Age", "Sex"]]

Unnamed: 0,Name,Age,Sex
0,"Braund, Mr. Owen Harris",22.0,male
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,female
2,"Heikkinen, Miss. Laina",26.0,female
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0,female
4,"Allen, Mr. William Henry",35.0,male


df.iloc[] Selecciona filas y columnas por su posici√≥n num√©rica

In [None]:
#Ejemplo: df.iloc[]
print(df.iloc[0])

#Ejemplo 2
df.iloc[0:3, 1:4]

Unnamed: 0,Survived,Pclass,Name
0,0,3,"Braund, Mr. Owen Harris"
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th..."
2,1,3,"Heikkinen, Miss. Laina"



## üîé Step 3: Filtering Rows

Functions to explore:
- df[df["Age"] > 30]
- df.query("Sex == 'female' and Survived == 1")


In [None]:

# Example: Filtering data
df[df["Age"] > 50].head()
df.query("Sex == 'female' and Survived == 1").head()



## üßπ Step 4: Handling Missing Data

Functions to explore:
- df.isna()
- df.isna().sum()
- df.dropna()
- df.fillna()


In [None]:

# Example: Check missing values
df.isna().sum()

# Fill missing ages with median
df["Age"] = df["Age"].fillna(df["Age"].median())
df["Age"].head()



## üìä Step 5: Grouping and Aggregating

Functions to explore:
- df.groupby("Sex")["Survived"].mean()
- df["Fare"].mean()
- df["Age"].median()


In [None]:

# Example: Aggregation
df.groupby("Sex")["Survived"].mean()
df.groupby("Pclass")["Fare"].mean()



## üìà Step 6: Sorting and Counting

Functions to explore:
- df.sort_values("Age")
- df["Sex"].unique()
- df["Pclass"].value_counts()


In [None]:

# Example: Sorting and counting
df.sort_values("Age").head()
df["Pclass"].value_counts()



## ‚öôÔ∏è Step 7: Creating or Modifying Columns

Functions to explore:
- df.assign()
- df.apply()
- df["new_col"] = ...
- pd.concat()
- pd.merge()


In [None]:

# Example: Create new column
df["Fare_per_Age"] = df["Fare"] / df["Age"]
df[["Age", "Fare", "Fare_per_Age"]].head()



## üíæ Step 8: Exporting Data

Function to explore:
- df.to_csv("output.csv", index=False)


In [None]:

# Example: Save cleaned data
df.to_csv("titanic_cleaned.csv", index=False)



## üß© Step 9: Summary

Reflect on what you learned:
- Which functions were most useful?
- What insights did you gain from the Titanic dataset?
