## Preparación de datos

#### Objetivos del Proyecto
* Identificar grupos homogéneos de clientes mediante la segmentación para personalizar campañas de marketing que ayude a aumentar la tasa de conversión y optimizar el retorno sobre la inversión, además reducir costos al enfocar los esfuerzos en segmentos con mayor probabilidad de respuesta.

* Se desea mejorar la identificación de segmentos de clientes basados en características demográficas y financieras, automatizando la asignación de los clientes a grupos específicos para diseñar estrategias de marketing personalizadas, así como facilitar el análisis masivo de datos, haciendo el proceso más rápido y eficiente.

In [26]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

In [27]:
clean_datos = pd.read_csv('bank_dataset_clean.csv')

In [28]:
#clean_datos["deposit"]=clean_datos["deposit"].astype("category")

clean_datos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11162 entries, 0 to 11161
Data columns (total 17 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   age        11162 non-null  float64
 1   job        11162 non-null  object 
 2   marital    11162 non-null  object 
 3   education  11162 non-null  object 
 4   default    11162 non-null  object 
 5   balance    11162 non-null  float64
 6   housing    11162 non-null  object 
 7   loan       11162 non-null  object 
 8   contact    11162 non-null  object 
 9   day        11162 non-null  int64  
 10  month      11162 non-null  object 
 11  duration   11162 non-null  float64
 12  campaign   11162 non-null  int64  
 13  pdays      11162 non-null  int64  
 14  previous   11162 non-null  int64  
 15  poutcome   11162 non-null  object 
 16  deposit    11162 non-null  object 
dtypes: float64(3), int64(4), object(10)
memory usage: 1.4+ MB


In [29]:
# Codificación de la variable objetivo
clean_datos['deposit_encoded'] = clean_datos['deposit'].map({'yes': 1, 'no': 0})

# Verificar los cambios
print(clean_datos[['deposit', 'deposit_encoded']].head())

clean_datos.head()

  deposit  deposit_encoded
0     yes                1
1     yes                1
2     yes                1
3     yes                1
4     yes                1


Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,deposit,deposit_encoded
0,59.0,admin.,married,secondary,no,2343.0,yes,no,unknown,5,may,1042.0,1,-1,0,unknown,yes,1
1,56.0,admin.,married,secondary,no,45.0,no,no,unknown,5,may,1079.9,1,-1,0,unknown,yes,1
2,41.0,technician,married,secondary,no,1270.0,yes,no,unknown,5,may,1079.9,1,-1,0,unknown,yes,1
3,55.0,services,married,secondary,no,2476.0,yes,no,unknown,5,may,579.0,1,-1,0,unknown,yes,1
4,54.0,admin.,married,tertiary,no,184.0,no,no,unknown,5,may,673.0,2,-1,0,unknown,yes,1


In [30]:
# Crear el objeto OneHotEncoder
encoder = OneHotEncoder(sparse_output=False, drop=None)

# Ajustar el encoder a la columna 'job' y transformar
job_encoded = encoder.fit_transform(clean_datos[['job']])

# Crear un DataFrame con las columnas codificadas
job_encoded_df = pd.DataFrame(job_encoded, columns=encoder.get_feature_names_out(['job']))

# Obtener el índice de la columna 'job' para posicionar las nuevas columnas
job_index = clean_datos.columns.get_loc('job')

# Insertar las nuevas columnas justo después de la columna 'job'
datos_encoder = pd.concat([clean_datos.iloc[:, :job_index + 1], job_encoded_df, clean_datos.iloc[:, job_index + 1:]], axis=1)

# Verificar los cambios
print("Columnas después de reordenar:")
print(clean_datos.head())

Columnas después de reordenar:
    age         job  marital  education default  balance housing loan  \
0  59.0      admin.  married  secondary      no   2343.0     yes   no   
1  56.0      admin.  married  secondary      no     45.0      no   no   
2  41.0  technician  married  secondary      no   1270.0     yes   no   
3  55.0    services  married  secondary      no   2476.0     yes   no   
4  54.0      admin.  married   tertiary      no    184.0      no   no   

   contact  day month  duration  campaign  pdays  previous poutcome deposit  \
0  unknown    5   may    1042.0         1     -1         0  unknown     yes   
1  unknown    5   may    1079.9         1     -1         0  unknown     yes   
2  unknown    5   may    1079.9         1     -1         0  unknown     yes   
3  unknown    5   may     579.0         1     -1         0  unknown     yes   
4  unknown    5   may     673.0         2     -1         0  unknown     yes   

   deposit_encoded  
0                1  
1            

In [31]:
datos_encoder.head()

Unnamed: 0,age,job,job_admin.,job_blue-collar,job_entrepreneur,job_housemaid,job_management,job_retired,job_self-employed,job_services,...,contact,day,month,duration,campaign,pdays,previous,poutcome,deposit,deposit_encoded
0,59.0,admin.,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,unknown,5,may,1042.0,1,-1,0,unknown,yes,1
1,56.0,admin.,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,unknown,5,may,1079.9,1,-1,0,unknown,yes,1
2,41.0,technician,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,unknown,5,may,1079.9,1,-1,0,unknown,yes,1
3,55.0,services,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,unknown,5,may,579.0,1,-1,0,unknown,yes,1
4,54.0,admin.,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,unknown,5,may,673.0,2,-1,0,unknown,yes,1


Nota: Se que no esta completa la actividad, pero tengo algunas dudas y no queria dejar pasar mas tiempo sin entregar algo. Espero despues de nuestra tutoria terminar lo que me queda faltando en este ejercicio. 
Gracias!