<a href="https://colab.research.google.com/github/CarlosOrte/Descriptive-and-Predictive-Analytics-Code/blob/main/Practica6ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


**Práctica 6. Manejo de librerias Pandas**

Maestría en Inteligencia artificial y analítica de datos

Programación para analítica descriptiva y predictiva

Matrícula: 266231




## **Cargamos las librerías y dataset**

In [3]:
import pandas as pd
import numpy as np

# Cargar el archivo titanic.csv desde Google Drive
df = pd.read_csv('/content/drive/MyDrive/Titanic-Dataset.csv')

# Ver las primeras filas
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## **Análisis de supervivencia por sexo y clase**

Proporción de supervivencia por combinación Sex–Pclass

In [4]:
survival_by_group = df.groupby(['Sex', 'Pclass'])['Survived'].mean()
survival_by_group


Unnamed: 0_level_0,Unnamed: 1_level_0,Survived
Sex,Pclass,Unnamed: 2_level_1
female,1,0.968085
female,2,0.921053
female,3,0.5
male,1,0.368852
male,2,0.157407
male,3,0.135447


Combinación con mayor supervivencia

In [5]:
max_group = survival_by_group.idxmax()
max_value = survival_by_group.max()

max_group, max_value


(('female', np.int64(1)), 0.9680851063829787)

Combinación con menor supervivencia

In [6]:
min_group = survival_by_group.idxmin()
min_value = survival_by_group.min()

min_group, min_value


(('male', np.int64(3)), 0.13544668587896252)

## **Identificación de familias grandes**

Crear la columna FamilySize

In [7]:
df['FamilySize'] = df['SibSp'] + df['Parch']
df[['SibSp', 'Parch', 'FamilySize']].head()


Unnamed: 0,SibSp,Parch,FamilySize
0,1,0,1
1,1,0,1
2,0,0,0
3,1,0,1
4,0,0,0


Identificar familias grandes (FamilySize > 3)

In [8]:
familias_grandes = df[df['FamilySize'] > 3]

Número de pasajeros en familias grandes

In [9]:
num_familias_grandes = familias_grandes.shape[0]
num_familias_grandes


62

Proporción de supervivencia en familias grandes

In [10]:
proporcion_supervivencia_familias_grandes = familias_grandes['Survived'].mean()
proporcion_supervivencia_familias_grandes


np.float64(0.16129032258064516)

## **Segmentación por grupos de edad**

Clasificar menores y mayores de edad

In [11]:
def clasificar_edad(edad):
    if edad < 18:
        return 'Menor de edad'
    else:
        return 'Mayor de edad'


In [12]:
df['GrupoEdad'] = df['Age'].apply(lambda x: clasificar_edad(x) if not pd.isna(x) else np.nan)
df[['Age', 'GrupoEdad']].head()


Unnamed: 0,Age,GrupoEdad
0,22.0,Mayor de edad
1,38.0,Mayor de edad
2,26.0,Mayor de edad
3,35.0,Mayor de edad
4,35.0,Mayor de edad


## **Comparación de promedios (NumPy vs Pandas)**

Promedios usando NumPy (ignorando nulos)

In [13]:
edad_prom_np = np.nanmean(df['Age'])
fare_prom_np = np.nanmean(df['Fare'])

edad_prom_np, fare_prom_np


(np.float64(29.69911764705882), np.float64(32.204207968574636))

Promedios usando Pandas

In [14]:
edad_prom_pd = df['Age'].mean()
fare_prom_pd = df['Fare'].mean()

edad_prom_pd, fare_prom_pd


(np.float64(29.69911764705882), np.float64(32.204207968574636))

## **Intervalos de clase con NumPy y análisis con Pandas**

Crear intervalos equidistantes con numpy.linspace

In [15]:
intervalos = np.linspace(df['Fare'].min(), df['Fare'].max(), 6)
intervalos


array([  0.     , 102.46584, 204.93168, 307.39752, 409.86336, 512.3292 ])

Asignar intervalos a cada pasajero

In [16]:
df['FareInterval'] = pd.cut(df['Fare'], bins=intervalos)
df[['Fare', 'FareInterval']].head()


Unnamed: 0,Fare,FareInterval
0,7.25,"(0.0, 102.466]"
1,71.2833,"(0.0, 102.466]"
2,7.925,"(0.0, 102.466]"
3,53.1,"(0.0, 102.466]"
4,8.05,"(0.0, 102.466]"


Número de pasajeros por intervalo

In [17]:
conteo_intervalos = df['FareInterval'].value_counts().sort_index()
conteo_intervalos


Unnamed: 0_level_0,count
FareInterval,Unnamed: 1_level_1
"(0.0, 102.466]",823
"(102.466, 204.932]",33
"(204.932, 307.398]",17
"(307.398, 409.863]",0
"(409.863, 512.329]",3


Proporción de supervivientes por intervalo

In [18]:
supervivencia_intervalos = df.groupby('FareInterval')['Survived'].mean()
supervivencia_intervalos


  supervivencia_intervalos = df.groupby('FareInterval')['Survived'].mean()


Unnamed: 0_level_0,Survived
FareInterval,Unnamed: 1_level_1
"(0.0, 102.466]",0.36695
"(102.466, 204.932]",0.757576
"(204.932, 307.398]",0.647059
"(307.398, 409.863]",
"(409.863, 512.329]",1.0
