# Alcaldía de Bucaramanga

## Description

#### 0. Libraries

We load the libraries to use.

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

#### 1. Load Datasets

For this exercise we have the following datasets:
* Anonymous SISBEN for 2021
* Anonymous SISBEN for 2022
* Census of street dwellers
* Dictionary used for the census

In [2]:
sb2021 = pd.read_csv('SISBEN_2021.csv', delimiter='~', encoding='latin1')
sb2022 = pd.read_csv('SISBEN_2022.csv', delimiter='~', encoding='latin1')
chc = pd.read_csv('CHC.csv', delimiter='~', encoding='latin1')
d_chc = pd.read_csv('diccionario_CHC.csv', delimiter='~', encoding='latin1')

  exec(code_obj, self.user_global_ns, self.user_ns)
  exec(code_obj, self.user_global_ns, self.user_ns)


We received an error related to file upload. Python cannot define the data type for all columns.

#### 1.1 Sisben 2021 Exploration

We explored the first five records.

In [3]:
sb2021.head()

Unnamed: 0,ID_data,origen,cod_mpio,nom_mpio,cod_dpto,nom_dpto,id_registro,secuencia_reg,ide_ficha_origen,ide_origen,...,ind_i_15,ind_c,ind_h5,icbf_ninos_beneficiarios,icbf_madres_gestantes,icbf_ninos_venezolanos,hogar_final,estado_s3,hogar_corregido,familia
0,1,sisben iv 14188777,,,,,,,68001093549300001017,1,...,0.0,375,0.0,,,,680010935493000010171,0.0,680010935493000010171,1056714.0
1,1,sisben iv 14028290,,,,,,,68001065548600002007,1,...,0.0,6,1.0,,,,680010655486000020071,0.0,680010655486000020071,1043841.0
2,1,sisben iv 14205420,,,,,,,68001047548200001063,2,...,0.0,45,0.0,,,,680010475482000010632,0.0,680010475482000010632,1031856.0
3,1,sisben iv 14047259,,,,,,,68001001547500000397,1,...,0.0,3,0.0,,,,680010015475000003971,0.0,680010015475000003971,1000217.0
4,1,sisben iv 14131664,,,,,,,68001026548300001183,2,...,0.0,3,0.0,,,,680010265483000011832,0.0,680010265483000011832,1016672.0


We explored the final five records.

In [4]:
sb2021.tail()

Unnamed: 0,ID_data,origen,cod_mpio,nom_mpio,cod_dpto,nom_dpto,id_registro,secuencia_reg,ide_ficha_origen,ide_origen,...,ind_i_15,ind_c,ind_h5,icbf_ninos_beneficiarios,icbf_madres_gestantes,icbf_ninos_venezolanos,hogar_final,estado_s3,hogar_corregido,familia
357443,3,sisben iii 012020,68001.0,bucaramanga,68.0,santander,51262577.0,508|0,68|001|508|1,680015081.0,...,,,,,,,680015081.0,,,
357444,3,sisben iii 012020,68001.0,bucaramanga,68.0,santander,51513743.0,16769|0,68|001|16769|1,68001167691.0,...,,,,,,,68001167691.0,,,
357445,3,sisben iii 012020,68001.0,bucaramanga,68.0,santander,51520049.0,54069|0,68|001|54069|1,68001540691.0,...,,,,,,,68001540691.0,,,
357446,3,victimas,68001.0,bucaramanga,68.0,santander,,0|0,0|0,,...,,,,,,,,,,
357447,3,victimas,68001.0,bucaramanga,68.0,santander,,0|0,0|0,,...,,,,,,,,,,


Now we will explore all the columns present in the dataset.

In [5]:
sb2021.columns

Index(['ID_data', 'origen', 'cod_mpio', 'nom_mpio', 'cod_dpto', 'nom_dpto',
       'id_registro', 'secuencia_reg', 'ide_ficha_origen', 'ide_origen',
       'id_hogar_s3', 'tip_parentesco', 'fecha_encuesta', 'pri_nombre',
       'seg_nombre', 'pri_apellido', 'seg_apellido', 'tip_documento',
       'num_documento_a', 'sexo_persona', 'fec_nacimiento', 'maestra_grupo',
       'nivel_sisben', 'fea_pers', 'fea_hog', 'cm_benef_pers', 'cm_benef_hog',
       'cm_priori_pers', 'cm_priori_hog', 'jea_pers', 'jea_hog',
       'iva_fea_hog_v3', 'iva_cm_v3', 'iva_cm_hog_v3', 'iva_fea_v3',
       'no_cubiertos_trans_hog', 'gen_e_per', 'gen_e_hog', 'victima_pers',
       'victima_hog', 'bdua_regimen', 'tipo_afiliado',
       'puntaje_sisben_3_trunc', 'edad_c', 'ind_fondo_pensiones_siv',
       'tip_actividad_mes_siv', 'tip_empleado_siv', 'discapacidad',
       'coord_x_auto_enc', 'coord_y_auto_enc', 'direcc', 'cod_barrio',
       'nom_barrio', 'cod_vereda', 'nom_vereda', 'tel_contacto',
       'tel_con

With the dataset loaded, we will explore the null values.

In [6]:
sb2021.isnull().sum()

ID_data                        0
origen                         0
cod_mpio                  339030
nom_mpio                  339030
cod_dpto                  339030
                           ...  
icbf_ninos_venezolanos    357325
hogar_final                13521
estado_s3                  18418
hogar_corregido            18418
familia                    18418
Length: 82, dtype: int64

We notice that there is a high number of null values in the column. Manually reviewing the dataset, we observe that these null values correspond to gaps in the SISBEN.

We decided to replace these null values within the dataset with a default value of zero that will indicate that the record did not have a value for the respondent variable in the SISBEN.

In [7]:
f_sb2021 = sb2021.fillna(0)

We will create a copy of our dataset with the replaced values.

We explored the first five records.

In [8]:
f_sb2021.head()

Unnamed: 0,ID_data,origen,cod_mpio,nom_mpio,cod_dpto,nom_dpto,id_registro,secuencia_reg,ide_ficha_origen,ide_origen,...,ind_i_15,ind_c,ind_h5,icbf_ninos_beneficiarios,icbf_madres_gestantes,icbf_ninos_venezolanos,hogar_final,estado_s3,hogar_corregido,familia
0,1,sisben iv 14188777,0.0,0,0.0,0,0.0,0,68001093549300001017,1,...,0.0,375,0.0,0,0.0,0.0,680010935493000010171,0.0,680010935493000010171,1056714.0
1,1,sisben iv 14028290,0.0,0,0.0,0,0.0,0,68001065548600002007,1,...,0.0,6,1.0,0,0.0,0.0,680010655486000020071,0.0,680010655486000020071,1043841.0
2,1,sisben iv 14205420,0.0,0,0.0,0,0.0,0,68001047548200001063,2,...,0.0,45,0.0,0,0.0,0.0,680010475482000010632,0.0,680010475482000010632,1031856.0
3,1,sisben iv 14047259,0.0,0,0.0,0,0.0,0,68001001547500000397,1,...,0.0,3,0.0,0,0.0,0.0,680010015475000003971,0.0,680010015475000003971,1000217.0
4,1,sisben iv 14131664,0.0,0,0.0,0,0.0,0,68001026548300001183,2,...,0.0,3,0.0,0,0.0,0.0,680010265483000011832,0.0,680010265483000011832,1016672.0


We explored the final five records.

In [9]:
f_sb2021.tail()

Unnamed: 0,ID_data,origen,cod_mpio,nom_mpio,cod_dpto,nom_dpto,id_registro,secuencia_reg,ide_ficha_origen,ide_origen,...,ind_i_15,ind_c,ind_h5,icbf_ninos_beneficiarios,icbf_madres_gestantes,icbf_ninos_venezolanos,hogar_final,estado_s3,hogar_corregido,familia
357443,3,sisben iii 012020,68001.0,bucaramanga,68.0,santander,51262577.0,508|0,68|001|508|1,680015081,...,0.0,0,0.0,0,0.0,0.0,680015081,0.0,0,0.0
357444,3,sisben iii 012020,68001.0,bucaramanga,68.0,santander,51513743.0,16769|0,68|001|16769|1,68001167691,...,0.0,0,0.0,0,0.0,0.0,68001167691,0.0,0,0.0
357445,3,sisben iii 012020,68001.0,bucaramanga,68.0,santander,51520049.0,54069|0,68|001|54069|1,68001540691,...,0.0,0,0.0,0,0.0,0.0,68001540691,0.0,0,0.0
357446,3,victimas,68001.0,bucaramanga,68.0,santander,0.0,0|0,0|0,0,...,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0.0
357447,3,victimas,68001.0,bucaramanga,68.0,santander,0.0,0|0,0|0,0,...,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0.0


We will explore our null values within the dataset.

In [10]:
f_sb2021.isnull().sum()

ID_data                   0
origen                    0
cod_mpio                  0
nom_mpio                  0
cod_dpto                  0
                         ..
icbf_ninos_venezolanos    0
hogar_final               0
estado_s3                 0
hogar_corregido           0
familia                   0
Length: 82, dtype: int64

We will now explore the data types of our columns.

In [11]:
f_sb2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 357448 entries, 0 to 357447
Data columns (total 82 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   ID_data                   357448 non-null  int64  
 1   origen                    357448 non-null  object 
 2   cod_mpio                  357448 non-null  float64
 3   nom_mpio                  357448 non-null  object 
 4   cod_dpto                  357448 non-null  float64
 5   nom_dpto                  357448 non-null  object 
 6   id_registro               357448 non-null  float64
 7   secuencia_reg             357448 non-null  object 
 8   ide_ficha_origen          357448 non-null  object 
 9   ide_origen                357448 non-null  object 
 10  id_hogar_s3               357448 non-null  object 
 11  tip_parentesco            357448 non-null  float64
 12  fecha_encuesta            357448 non-null  object 
 13  pri_nombre                357448 non-null  o

Some values related to respondent information are not available and were replaced by a constant value. Since this information is not relevant to the study, it was decided to eliminate these columns.

In [12]:
f_sb2021.drop(['ID_data', 'pri_nombre', 'seg_nombre', 'pri_apellido', 'seg_apellido'], axis=1, inplace=True)

Our final columns are as follows:

In [13]:
f_sb2021.convert_dtypes().dtypes

origen                    string
cod_mpio                   Int64
nom_mpio                  object
cod_dpto                   Int64
nom_dpto                  object
                           ...  
icbf_ninos_venezolanos     Int64
hogar_final               object
estado_s3                  Int64
hogar_corregido           object
familia                    Int64
Length: 77, dtype: object

And the detail of the data type of each column is as follows:

In [14]:
f_sb2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 357448 entries, 0 to 357447
Data columns (total 77 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   origen                    357448 non-null  object 
 1   cod_mpio                  357448 non-null  float64
 2   nom_mpio                  357448 non-null  object 
 3   cod_dpto                  357448 non-null  float64
 4   nom_dpto                  357448 non-null  object 
 5   id_registro               357448 non-null  float64
 6   secuencia_reg             357448 non-null  object 
 7   ide_ficha_origen          357448 non-null  object 
 8   ide_origen                357448 non-null  object 
 9   id_hogar_s3               357448 non-null  object 
 10  tip_parentesco            357448 non-null  float64
 11  fecha_encuesta            357448 non-null  object 
 12  tip_documento             357448 non-null  int64  
 13  num_documento_a           357448 non-null  i

We will try a dtypes to force python to infer on the actual data types of the dataframe.

In [15]:
f_sb2021 = f_sb2021.convert_dtypes(infer_objects=False)

And the detail of the data type of each column is as follows:

In [16]:
f_sb2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 357448 entries, 0 to 357447
Data columns (total 77 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   origen                    357448 non-null  string
 1   cod_mpio                  357448 non-null  Int64 
 2   nom_mpio                  357448 non-null  object
 3   cod_dpto                  357448 non-null  Int64 
 4   nom_dpto                  357448 non-null  object
 5   id_registro               357448 non-null  Int64 
 6   secuencia_reg             357448 non-null  object
 7   ide_ficha_origen          357448 non-null  string
 8   ide_origen                357448 non-null  object
 9   id_hogar_s3               357448 non-null  object
 10  tip_parentesco            357448 non-null  Int64 
 11  fecha_encuesta            357448 non-null  object
 12  tip_documento             357448 non-null  Int64 
 13  num_documento_a           357448 non-null  Int64 
 14  sexo

The values for the coordinates are still in Object format. We will change the data type.

In [17]:
f_sb2021['coord_x_auto_enc'] = f_sb2021['coord_x_auto_enc'].str.replace(",", ".")
f_sb2021['coord_y_auto_enc'] = f_sb2021['coord_x_auto_enc'].str.replace(",", ".")

In [18]:
f_sb2021 = f_sb2021.astype({'coord_x_auto_enc':'float', 'coord_y_auto_enc':'float'})

Now, we explore our dataset.

In [19]:
f_sb2021.head()

Unnamed: 0,origen,cod_mpio,nom_mpio,cod_dpto,nom_dpto,id_registro,secuencia_reg,ide_ficha_origen,ide_origen,id_hogar_s3,...,ind_i_15,ind_c,ind_h5,icbf_ninos_beneficiarios,icbf_madres_gestantes,icbf_ninos_venezolanos,hogar_final,estado_s3,hogar_corregido,familia
0,sisben iv 14188777,0,0,0,0,0,0,68001093549300001017,1,0,...,0,375,0,0,0,0,680010935493000010171,0,680010935493000010171,1056714
1,sisben iv 14028290,0,0,0,0,0,0,68001065548600002007,1,0,...,0,6,1,0,0,0,680010655486000020071,0,680010655486000020071,1043841
2,sisben iv 14205420,0,0,0,0,0,0,68001047548200001063,2,0,...,0,45,0,0,0,0,680010475482000010632,0,680010475482000010632,1031856
3,sisben iv 14047259,0,0,0,0,0,0,68001001547500000397,1,0,...,0,3,0,0,0,0,680010015475000003971,0,680010015475000003971,1000217
4,sisben iv 14131664,0,0,0,0,0,0,68001026548300001183,2,0,...,0,3,0,0,0,0,680010265483000011832,0,680010265483000011832,1016672


We notice that our dataset has some date columns. 

In [20]:
f_sb2021['fecha_encuesta']

0         2019-09-10 11:18:02
1         2019-10-03 10:03:35
2         2019-09-09 11:04:10
3         2019-08-26 08:21:44
4         2019-09-12 09:13:02
                 ...         
357443        10/22/2019 0:00
357444    2011-08-03 00:00:00
357445         3/28/2011 0:00
357446                      0
357447                      0
Name: fecha_encuesta, Length: 357448, dtype: object

We will convert to datetime.

In [21]:
f_sb2021['fecha_encuesta'] = pd.to_datetime(f_sb2021['fecha_encuesta'])

Now, we explore our columns:

In [22]:
f_sb2021['fecha_encuesta']

0        2019-09-10 11:18:02
1        2019-10-03 10:03:35
2        2019-09-09 11:04:10
3        2019-08-26 08:21:44
4        2019-09-12 09:13:02
                 ...        
357443   2019-10-22 00:00:00
357444   2011-08-03 00:00:00
357445   2011-03-28 00:00:00
357446   1970-01-01 00:00:00
357447   1970-01-01 00:00:00
Name: fecha_encuesta, Length: 357448, dtype: datetime64[ns]

We analyze the other column to date:

In [23]:
f_sb2021['fec_nacimiento']

0         2013-12-27 00:00:00
1         2014-08-20 00:00:00
2         1929-10-15 00:00:00
3         1930-02-24 00:00:00
4         1931-09-09 00:00:00
                 ...         
357443    2003-01-05 00:00:00
357444    2003-08-04 00:00:00
357445    2003-12-08 00:00:00
357446    1997-07-02 00:00:00
357447              7/27/2001
Name: fec_nacimiento, Length: 357448, dtype: string

Now, we convert the column date type:

In [24]:
f_sb2021['fec_nacimiento'] = pd.to_datetime(f_sb2021['fec_nacimiento'])

Now, we check out final column:

In [25]:
f_sb2021['fec_nacimiento']

0        2013-12-27
1        2014-08-20
2        1929-10-15
3        1930-02-24
4        1931-09-09
            ...    
357443   2003-01-05
357444   2003-08-04
357445   2003-12-08
357446   1997-07-02
357447   2001-07-27
Name: fec_nacimiento, Length: 357448, dtype: datetime64[ns]

We notice that some values for the column marking the type of health scheme have string and int values.

In [26]:
f_sb2021['gen_e_hog'].unique()

array([0, 1.0, '1', 'subsidiado', 'contributivo'], dtype=object)

We map the new values.

In [27]:
f_sb2021['gen_e_hog'] = f_sb2021['gen_e_hog'].map({'0'   : 0,
                                                   '1.0' : 1,
                                                   '1'   : 1,
                                                   'subsidiado': 0,
                                                   'contributivo' : 1},
                                                   na_action = None)

And fill the null values:

In [28]:
f_sb2021['gen_e_hog'] = f_sb2021['gen_e_hog'].fillna(0)

And explore the unique column values:

In [29]:
f_sb2021['gen_e_hog'].unique()

array([0., 1.])

Well, our final dataset is:

In [30]:
f_sb2021.head()

Unnamed: 0,origen,cod_mpio,nom_mpio,cod_dpto,nom_dpto,id_registro,secuencia_reg,ide_ficha_origen,ide_origen,id_hogar_s3,...,ind_i_15,ind_c,ind_h5,icbf_ninos_beneficiarios,icbf_madres_gestantes,icbf_ninos_venezolanos,hogar_final,estado_s3,hogar_corregido,familia
0,sisben iv 14188777,0,0,0,0,0,0,68001093549300001017,1,0,...,0,375,0,0,0,0,680010935493000010171,0,680010935493000010171,1056714
1,sisben iv 14028290,0,0,0,0,0,0,68001065548600002007,1,0,...,0,6,1,0,0,0,680010655486000020071,0,680010655486000020071,1043841
2,sisben iv 14205420,0,0,0,0,0,0,68001047548200001063,2,0,...,0,45,0,0,0,0,680010475482000010632,0,680010475482000010632,1031856
3,sisben iv 14047259,0,0,0,0,0,0,68001001547500000397,1,0,...,0,3,0,0,0,0,680010015475000003971,0,680010015475000003971,1000217
4,sisben iv 14131664,0,0,0,0,0,0,68001026548300001183,2,0,...,0,3,0,0,0,0,680010265483000011832,0,680010265483000011832,1016672


We found some categorical values. For the purposes of the activity we will choose to use a LabelEncoding of Sklearn to guarantee the encoding of the data.

In the next installment we will evaluate if this method was adequate for the encoding process.

For bdua_regimen column we have:

In [31]:
f_sb2021['bdua_regimen'].unique()

array(['0', 'contributivo', 'subsidiado', 0, 'excepcion'], dtype=object)

Created a new column; we have to replace some '0' and 0 values.

In [32]:
f_sb2021['bdua_regimen_n'] = f_sb2021['bdua_regimen'].replace(['0', 0], ['subsidiado','subsidiado'])

Check the column values:

In [33]:
f_sb2021['bdua_regimen_n'].unique()

array(['subsidiado', 'contributivo', 'excepcion'], dtype=object)

And use a label encoder:

In [34]:
f_sb2021['bdua_regimen_lenco'] = LabelEncoder().fit_transform(f_sb2021['bdua_regimen_n'])

Drop the old column with the replaced values:

In [35]:
del f_sb2021['bdua_regimen_n']

And check the results:

In [36]:
f_sb2021['bdua_regimen_lenco'].unique()

array([2, 0, 1])

We realized the same steps for the tipo_afiliado column:

In [37]:
f_sb2021['tipo_afiliado'].unique()

array([0, 'cotizante', 'cabeza de familia', 'beneficiario', 'adicional',
       'otro miembro del nucleo familiar'], dtype=object)

In [38]:
f_sb2021['tipo_afiliado_n'] = f_sb2021['bdua_regimen'].replace([0], ['beneficiario'])

In [39]:
f_sb2021['tipo_afiliado_lenco'] = LabelEncoder().fit_transform(f_sb2021['tipo_afiliado_n'])
del f_sb2021['tipo_afiliado_n']
f_sb2021['tipo_afiliado_lenco'].unique()

array([0, 2, 4, 1, 3])

Display all the columns:

In [40]:
pd.set_option('display.max_columns', None)
f_sb2021.head()

Unnamed: 0,origen,cod_mpio,nom_mpio,cod_dpto,nom_dpto,id_registro,secuencia_reg,ide_ficha_origen,ide_origen,id_hogar_s3,tip_parentesco,fecha_encuesta,tip_documento,num_documento_a,sexo_persona,fec_nacimiento,maestra_grupo,nivel_sisben,fea_pers,fea_hog,cm_benef_pers,cm_benef_hog,cm_priori_pers,cm_priori_hog,jea_pers,jea_hog,iva_fea_hog_v3,iva_cm_v3,iva_cm_hog_v3,iva_fea_v3,no_cubiertos_trans_hog,gen_e_per,gen_e_hog,victima_pers,victima_hog,bdua_regimen,tipo_afiliado,puntaje_sisben_3_trunc,edad_c,ind_fondo_pensiones_siv,tip_actividad_mes_siv,tip_empleado_siv,discapacidad,coord_x_auto_enc,coord_y_auto_enc,direcc,cod_barrio,nom_barrio,cod_vereda,nom_vereda,tel_contacto,tel_contacto_2,tel_fijo,ind_i_1,ind_i_2,ind_i_3,ind_i_4,ind_i_5,ind_i_6,ind_i_7,ind_i_8,ind_i_9,ind_i_10,ind_i_11,ind_i_12,ind_i_13,ind_i_14,ind_i_15,ind_c,ind_h5,icbf_ninos_beneficiarios,icbf_madres_gestantes,icbf_ninos_venezolanos,hogar_final,estado_s3,hogar_corregido,familia,bdua_regimen_lenco,tipo_afiliado_lenco
0,sisben iv 14188777,0,0,0,0,0,0,68001093549300001017,1,0,4,2019-09-10 11:18:02,5,1,2,2013-12-27,3,3,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0.0,0,0,0,0,6324,6,9,9,99,ninguna,7.106287,7.106287,kr 4 oeste # 43 - 18 in 1,145,campo hermoso,0,sin vereda,3022550240,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,375,0,0,0,0,680010935493000010171,0,680010935493000010171,1056714,2,0
1,sisben iv 14028290,0,0,0,0,0,0,68001065548600002007,1,0,14,2019-10-03 10:03:35,5,2,2,2014-08-20,2,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0,0,3818,5,9,9,99,ninguna,7.131871,7.131871,kr 61 # 16 - 13 pi 3,114,buenos aires,0,sin vereda,3166780089,0,0,1,0,1,0,1,0,0,1,1,0,0,0,0,0,0,6,1,0,0,0,680010655486000020071,0,680010655486000020071,1043841,2,0
2,sisben iv 14205420,0,0,0,0,0,0,68001047548200001063,2,0,1,2019-09-09 11:04:10,1,3,1,1929-10-15,3,2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0.0,0,0,contributivo,cotizante,4508,90,3,0,99,ninguna,7.10397,7.10397,kr 9 b oeste # 44 - 19,145,campo hermoso,0,sin vereda,6961948,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,45,0,0,0,0,680010475482000010632,0,680010475482000010632,1031856,0,2
3,sisben iv 14047259,0,0,0,0,0,0,68001001547500000397,1,0,1,2019-08-26 08:21:44,1,4,1,1930-02-24,4,3,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0.0,0,0,contributivo,cotizante,6204,90,3,6,99,mas de una,7.098105,7.098105,kr 21 # 70 - 15,193,nueva granada,0,sin vereda,6474573,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,3,0,0,0,0,680010015475000003971,0,680010015475000003971,1000217,0,2
4,sisben iv 14131664,0,0,0,0,0,0,68001026548300001183,2,0,1,2019-09-12 09:13:02,1,5,1,1931-09-09,3,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0.0,0,0,subsidiado,cabeza de familia,219,88,2,1,5,mas de una,7.112876,7.112876,kr 7 # 43 - 56,142,alfonso lopez,0,sin vereda,6335390,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,3,0,0,0,0,680010265483000011832,0,680010265483000011832,1016672,2,4


Some emetrics, we have:

In [41]:
f_sb2021.describe()

Unnamed: 0,cod_mpio,cod_dpto,id_registro,tip_parentesco,tip_documento,num_documento_a,sexo_persona,maestra_grupo,nivel_sisben,fea_pers,fea_hog,cm_benef_pers,cm_benef_hog,cm_priori_pers,cm_priori_hog,jea_pers,jea_hog,iva_fea_hog_v3,iva_cm_v3,iva_cm_hog_v3,iva_fea_v3,no_cubiertos_trans_hog,gen_e_per,gen_e_hog,victima_pers,victima_hog,edad_c,ind_fondo_pensiones_siv,tip_actividad_mes_siv,tip_empleado_siv,coord_x_auto_enc,coord_y_auto_enc,cod_barrio,cod_vereda,tel_contacto_2,tel_fijo,ind_i_1,ind_i_2,ind_i_3,ind_i_4,ind_i_5,ind_i_6,ind_i_7,ind_i_8,ind_i_9,ind_i_10,ind_i_11,ind_i_12,ind_i_13,ind_i_14,ind_i_15,ind_h5,icbf_madres_gestantes,icbf_ninos_venezolanos,estado_s3,familia,bdua_regimen_lenco,tipo_afiliado_lenco
count,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,213334.0,213334.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0,357448.0
mean,3503.845085,3.503794,439006.7,2.556397,1.693734,178724.5,1.514978,1.7492,1.176731,0.128326,0.179411,0.032545,0.071725,0.010147,0.024247,0.005721,0.020686,0.02021,0.006233,0.017278,0.004971,0.737044,0.002137,0.002294,0.125652,0.167526,31.452505,2.061301,1.832513,38.524082,7.129752,7.129752,179.823812,10.018736,0.0,0.0,0.347536,0.057569,0.031689,0.166617,0.118848,0.003575,0.06141,0.447679,0.103117,0.022529,0.115018,0.086186,0.015669,0.026026,0.166528,0.122628,0.001723,0.000344,0.007688,1332909.0,1.067422,2.827502
std,15032.919593,15.032699,3837649.0,2.344619,1.371381,103186.493848,0.502818,1.323988,1.182537,0.334453,0.383696,0.177442,0.258033,0.10022,0.153815,0.075421,0.14233,0.140718,0.078703,0.130306,0.070332,0.440239,0.046182,0.047841,0.331457,0.373446,21.357693,2.854455,2.567013,46.812928,0.028998,0.028998,6690.266456,52.660067,0.0,0.0,0.476188,0.232927,0.17517,0.372634,0.32361,0.059687,0.240082,0.497256,0.304112,0.148397,0.319044,0.280639,0.124193,0.159213,0.372554,0.32801,0.041477,0.018547,0.764598,561824.9,0.997723,1.128237
min,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.070567,7.070567,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,1.0,1.0,89362.75,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14.0,0.0,0.0,1.0,7.104343,7.104343,50.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1024537.0,0.0,2.0
50%,0.0,0.0,0.0,2.0,1.0,178724.5,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,29.0,2.0,1.0,5.0,7.133427,7.133427,107.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1055015.0,2.0,2.0
75%,0.0,0.0,0.0,3.0,2.0,268086.25,2.0,3.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,47.0,2.0,3.0,99.0,7.150585,7.150585,197.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2018853.0,2.0,4.0
max,68001.0,68.0,51545430.0,16.0,8.0,357448.0,2.0,4.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,112.0,9.0,9.0,99.0,7.25397,7.25397,999999.0,999.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,81.0,2059524.0,2.0,4.0


#### 1.2 Sisben 2022 Exploration

We realice some similar steps with the 2022 dataset:

Print the first five values:

In [42]:
sb2022.head()

Unnamed: 0,documento,cod_dpto,Departamento,cod_mpio,Municipio,fec_paquete,num_paquete,num_ficha,ide_ficha_origen,ide_edificacion,ver_estructura,ori_encuesta,Cod_clase,Cod_centro_poblado,Cod_area_coordinacion,Cod_area_operativa,Cod_uni_cobertura,Cod_comuna_x,Cod_corregimiento,NOM_CORREGIMIENTO,Cod_vereda,NOM_VEREDA,Cod_barrio,NOM_BARRIO,Cod_enumerador,tot_viviendas,tot_hogares,ord_vivienda,ind_direccion,uso_vivienda,Ide_foto,fec_ini_encuesta,fec_fin_encuesta,Coord_x_manual_rec,Coord_y_manual_rec,Coord_x_auto_rec,Coord_y_auto_rec,Gps_Alt_auto_rec,Fec_captura_gps_rec,Gps_Distancia_rec,Coord_x_manual_enc,Coord_y_manual_enc,Coord_x_auto_enc,Coord_y_auto_enc,Gps_Alt_auto_enc,Fec_captura_gps_enc,Gps_Distancia_enc,Est_nov_cartografia,Cod_digitador,Fec_digitacion,ind_formato,Num_hogares_recuento,est_ficha,Cod_equipo_encuesta,Num_visita,Cod_Chip,Dir_Chip,Num_solicitud,Cod_UC_total,tip_vivienda,tip_mat_paredes,tip_mat_pisos,ind_tiene_energia,tip_estrato_energia,ind_tiene_alcantarillado,ind_tiene_gas,ind_tiene_recoleccion,ind_tiene_acueducto,tip_estrato_acueducto,num_cuartos_vivienda,num_hogares_vivienda,cod_encuestador,cod_supervisor,cod_critico,fec_ini_vivienda,fec_fin_vivienda,ide_hogar,tip_ocupa_vivienda,num_cuartos_exclusivos,num_cuartos_dormir,num_cuartos_unicos_dormir,tip_sanitario,tip_ubi_sanitario,tip_uso_sanitario,tip_origen_agua,ind_agua_llega_7dias,num_dias_llega,ind_agua_llega_24horas,num_horas_llega,tip_uso_agua_beber,tip_elimina_basura,ind_tiene_cocina,tip_prepara_alimentos,tip_uso_cocina,tip_energia_cocina,ind_tiene_nevera,ind_tiene_lavadora,ind_tiene_pc,ind_tiene_internet,ind_tiene_moto,ind_tiene_tractor,ind_tiene_carro,ind_tiene_bien_raiz,ind_gasto_alimento,vlr_gasto_alimento,ind_gasto_transporte,vlr_gasto_transporte,ind_gasto_educacion,vlr_gasto_educacion,ind_gasto_salud,vlr_gasto_salud,ind_gasto_serv_publicos,vlr_gasto_serv_publicos,ind_gasto_celular,vlr_gasto_celular,ind_gasto_arriendo,vlr_gasto_arriendo,ind_gasto_otros,vlr_gasto_otros,vlr_total_gastos,num_habita_vivienda,ind_evento_inundacion,num_evento_inundacion,ind_evento_avalancha,num_evento_avalancha,ind_evento_terremoto,num_evento_terremoto,ind_evento_incendio,num_evento_incendio,ind_evento_vendaval,num_evento_vendaval,ind_evento_hundimiento,num_evento_hundimiento,num_personas_posibles,num_personas_hogar,fec_ini_visita1,fec_fin_visita1,res_visita1,fec_ini_visita2,fec_fin_visita2,res_visita2,ide_informante,Tip_informante,ide_firma_informante,Cau_sin_firma,Email_contacto,Ind_declaracion,ide_persona,ide_nacional,ind_escaner,sexo_persona,ori_persona,tip_documento,fec_nacimiento,edad_calculada,fec_documento,cod_dpto_documento,cod_mpio_documento,Cod_pais_documento,tip_parentesco,tip_estado_civil,ind_conyuge_vive_hogar,ide_conyuge,ind_padre_vive_hogar,ide_padre,ind_pariente_domestico,ide_serv_domestico,ind_discap_ver,ind_discap_oir,ind_discap_hablar,ind_discap_moverse,ind_discap_bañarse,ind_discap_salir,ind_discap_entender,ind_discap_ninguna,tip_seg_social,ind_enfermo_30,ind_acudio_salud,ind_fue_atendido_salud,ind_esta_embarazada,ind_tuvo_hijos,tip_cuidado_niños,ind_recibe_comida,ind_leer_escribir,ind_estudia,niv_educativo,grado_alcanzado,ind_fondo_pensiones,tip_actividad_mes,num_sem_buscando,tip_empleado,ind_ingr_salario,vlr_ingr_salario,ind_ingr_honorarios,vlr_ingr_honorarios,ind_ingr_cosecha,num_mes_ingr_cosecha,vlr_ingr_cosecha,ind_ingr_pension,vlr_ingr_pension,ind_ingr_remesa_pais,vlr_ingr_remesa_pais,ind_ingr_remesa_exterior,vlr_ingr_remesa_exterior,ind_ingr_arriendos,vlr_ingr_arriendos,ind_otros_ingresos,vlr_otros_ingresos,ind_ingr_estado,vlr_ingr_fam_accion,vlr_ingr_col_mayor,vlr_ingr_otro_subsidio,fec_ini_persona1,fec_fin_persona1,fec_ini_persona2,fec_fin_persona2,ide_Unigasto,Jefe_UG,H_5,I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,I11,I12,I13,I14,I15,Grupo,Nivel,ide_Ug,persug,Capital,marca,estado,Linea_p,replicacion,fec_actualizacion_cns,C,Clasificacion,Latitud,Longitud,BARRIO,COMUNA
0,1,68,SANTANDER,68001,BUCARAMANGA,2019-09-10 15:39:36.563,37,39567,"6,80011E+19",3,40018082017,1,1,0,126,126003,741294,5,0,SIN CORREGIMIENTO,0,SIN VEREDA,145,CAMPO HERMOSO,5493,0,1,1,1,1,Fachada68001093549300001017.jpg,2019-09-10 11:00:31.000,2019-09-10 11:18:02.000,710643900.0,-7313946300,710635973,-7313932346,919,2019-08-10 12:01:20.000,17,710625500.0,-7313942000.0,710628748.0,-7313937000.0,898,2019-09-10 11:00:38,6,0,5493,2019-08-10 12:01:32.000,1,1,1,93,1,,,0,"1,7068E+33",2,1,2,1,2,1,1,1,1,2,4,1,5562,5518,9999,2019-09-10 11:00:38.000,2019-09-10 11:18:02.000,1,1,4,3,3,1,1,1,1,1,9,1,99,1,1,1,1,1,2,1,2,2,2,2,2,2,2,1,300000,2,0,2,0,2,0,1,100000,2,0,1,250000,2,0,650000,1,2,0,2,0,2,0,2,0,2,0,2,0,3,2,2019-09-10 11:03:42.000,2019-09-10 11:18:02.000,1,,,0,1,1,Firma_68001093549300001017_1.png,0,nicolepaola271213@gmail.com,1,2,15770911,0,2,2,5,2013-12-27 00:00:00,5,2013-12-30 00:00:00,0,0,862,3,5,9,99,1,1,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,2,1,1,0,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-10 11:10:16.000,2019-09-10 11:12:44.000,2019-09-10 11:15:42.000,2019-09-10 11:15:42.000,1,,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,C,7.0,"6,80017E+26",2,1,0,0,14559840000.0,1,2021-09-02 20:44:57.463,375,C07,7.10628748,-73.13936591,CAMPO HERMOSO,05. GARCIA ROVIRA
1,2,68,SANTANDER,68001,BUCARAMANGA,2019-10-03 16:57:01.163,56,62156,"6,80011E+19",20,40018082017,1,1,0,110,110005,743596,14,0,SIN CORREGIMIENTO,0,SIN VEREDA,114,BUENOS AIRES,5486,0,1,1,1,1,Fachada68001065548600002007.jpg,2019-10-03 09:42:06.000,2019-10-03 10:03:35.000,713184200.0,-7309992200,713187966,-7310000482,1241,2019-09-12 12:04:38.000,10,713180900.0,-7309990000.0,713187059.0,-7309989000.0,1231,2019-10-03 09:43:18,6,0,5486,2019-09-12 12:04:59.000,1,1,1,65,1,,,0,"1,7068E+33",2,1,3,1,1,1,1,1,1,1,4,1,5539,5528,9999,2019-10-03 09:43:18.000,2019-10-03 10:03:35.000,1,1,4,3,3,1,1,1,1,1,9,1,99,1,1,1,3,1,2,1,2,2,2,2,2,2,2,1,450000,1,100000,2,0,2,0,1,100000,2,0,1,400000,2,0,1050000,2,2,0,2,0,2,0,2,0,2,0,2,0,4,4,2019-10-03 09:45:36.000,2019-10-03 10:03:35.000,1,,,0,1,1,Firma_68001065548600002007_1.png,0,,1,4,15805690,0,2,2,5,2014-08-20 00:00:00,5,2014-11-11 00:00:00,0,0,862,14,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,2,2,0,0,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-10-03 09:55:31.000,2019-10-03 09:57:33.000,2019-10-03 10:00:08.000,2019-10-03 10:00:08.000,1,,1,1,0,1,0,1,0,0,1,1,0,0,0,0,0,0,B,2.0,"6,80017E+26",4,1,0,0,14559840000.0,0,,6,B02,7.13187059,-73.09989155,BUENOS AIRES,14. MORRORICO
2,3,68,SANTANDER,68001,BUCARAMANGA,2021-11-20 16:28:56.757,1269,43544,"6,8001E+19",8,40118082017,1,1,0,115,115001,742122,9,0,SIN CORREGIMIENTO,0,SIN VEREDA,193,NUEVA GRANADA,5475,0,1,1,1,1,Fachada68001001547500000397.jpg,2019-08-26 07:56:27.000,2021-11-20 08:25:00.000,709813300.0,-7311354000,709820867,-7311348259,894,2019-08-02 07:49:47.000,10,709814300.0,-7311353000.0,709810543.0,-7311369000.0,895,Aug 26 2019 7:56AM,19,0,5475,2019-08-02 07:49:57.000,1,1,1,1,2,,,65502,"1,7068E+33",1,1,2,1,3,1,1,1,1,3,5,1,5534,5523,9999,2019-08-26 07:56:45.000,2021-11-20 08:25:00.000,1,3,5,4,4,1,1,1,1,1,9,1,99,1,1,1,1,1,2,1,1,1,1,2,2,2,2,1,250000,1,20000,2,0,1,103600,1,235000,1,5000,2,0,2,0,613600,4,2,0,2,0,2,0,2,0,2,0,2,0,2,2,2019-08-26 08:00:38.000,2019-08-26 08:21:44.000,1,,2021-11-20 08:25:00.000,1,2,1,Firma_68001001547500000397_1.png,0,,1,1,15738860,1,1,1,3,1930-02-24 00:00:00,91,1953-07-29 00:00:00,11,11001,170,1,3,9,99,2,99,2,99,2,2,2,2,2,2,2,1,1,2,9,9,9,9,9,9,2,2,0,0,2,7,999,99,9,0,9,0,9,99,0,2,0,2,0,2,0,2,0,2,0,2,0,0,0,2019-08-26 08:10:14.000,2019-08-26 08:11:32.000,2021-11-20 08:21:02.000,2021-11-20 08:21:02.000,1,1.0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,D,3.0,"6,80017E+26",2,1,0,0,14559840000.0,1,2021-11-20 16:39:25.767,6,D03,7.09810543,-73.11369419,NUEVA GRANADA,09. LA PEDREGOSA
3,4,68,SANTANDER,68001,BUCARAMANGA,2019-09-12 14:46:54.523,39,43875,"6,8001E+19",16,40018082017,1,1,0,126,126007,741367,5,0,SIN CORREGIMIENTO,0,SIN VEREDA,142,ALFONSO LÓPEZ,5483,0,2,1,1,1,Fachada68001026548300001183.jpg,2019-09-12 08:59:11.000,2019-09-12 09:13:02.000,711285300.0,-7313170000,711291560,-7313165695,934,2019-08-13 07:36:02.000,8,711301200.0,-7313176000.0,711287649.0,-7313167000.0,917,2019-09-12 08:59:24,18,0,5483,2019-08-13 07:36:22.000,1,2,1,26,1,,,0,"1,7068E+33",1,1,3,1,2,1,1,1,1,2,3,2,5566,5540,9999,2019-09-12 08:59:25.000,2019-09-12 09:13:02.000,2,1,0,0,0,1,1,2,1,1,9,1,99,1,1,1,5,3,7,2,2,2,2,2,2,2,2,1,200000,2,0,2,0,2,0,2,0,2,0,1,120000,2,0,320000,2,2,0,2,0,2,0,2,0,2,0,2,0,1,1,2019-09-12 09:08:45.000,2019-09-12 09:13:02.000,1,,,0,1,1,Firma_68001026548300001183_2.png,0,,1,1,15773328,1,1,1,3,1931-09-09 00:00:00,87,1953-09-25 00:00:00,11,11001,170,1,4,9,99,2,99,2,99,1,2,1,2,2,1,2,2,3,2,9,9,9,9,9,9,1,2,2,3,2,1,999,5,99,0,1,200000,99,99,0,2,0,1,100000,2,0,2,0,2,0,1,0,50000,0,2019-09-12 09:10:11.000,2019-09-12 09:11:03.000,2019-09-12 09:11:06.000,2019-09-12 09:11:06.000,1,1.0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,C,7.0,"6,80017E+26",1,1,0,0,14559840000.0,0,,3,C07,7.11287649,-73.1316689,ALFONSO LOPEZ,05. GARCIA ROVIRA
4,5,68,SANTANDER,68001,BUCARAMANGA,2019-09-24 14:55:21.997,48,53705,"6,80011E+19",3,40018082017,1,1,0,128,128007,741235,4,0,SIN CORREGIMIENTO,0,SIN VEREDA,85,CAMILO TORRES,5485,0,1,2,2,1,Fachada68001067548500001513.jpg,2019-09-24 08:20:07.000,2019-09-24 08:42:56.000,712222100.0,-7313914800,712198417,-7313905531,892,2019-09-06 08:40:40.000,28,712197200.0,-7313910000.0,712197304.0,-7313916000.0,945,2019-09-24 08:20:18,6,0,5485,2019-09-06 08:42:11.000,1,1,1,67,1,,,0,"1,7068E+33",2,1,3,1,1,2,2,1,2,99,4,1,5509,5559,9999,2019-09-24 08:20:18.000,2019-09-24 08:42:56.000,1,1,1,1,1,3,1,1,6,9,9,9,99,1,1,1,1,1,3,1,2,2,2,2,2,2,2,1,300000,2,0,2,0,2,0,1,100000,2,0,1,350000,2,0,750000,2,2,0,2,0,2,0,2,0,2,0,2,0,8,4,2019-09-24 08:26:22.000,2019-09-24 08:42:56.000,1,,,0,2,1,Firma_68001067548500001514_1.png,0,,1,4,15790647,0,1,2,5,2016-01-06 00:00:00,3,2016-01-26 00:00:00,0,0,862,11,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,2,1,9,9,9,99,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-24 08:36:44.000,2019-09-24 08:39:02.000,2019-09-24 08:41:29.000,2019-09-24 08:41:29.000,1,,1,1,0,0,0,1,0,0,1,1,0,1,1,0,0,1,A,5.0,"6,80017E+26",4,1,0,0,14559840000.0,0,,705,A05,7.12197304,-73.13916206,ASENTAMIENTO CAMILO TORRES,04. OCCIDENTAL


and the final five values:

In [43]:
sb2022.tail()

Unnamed: 0,documento,cod_dpto,Departamento,cod_mpio,Municipio,fec_paquete,num_paquete,num_ficha,ide_ficha_origen,ide_edificacion,ver_estructura,ori_encuesta,Cod_clase,Cod_centro_poblado,Cod_area_coordinacion,Cod_area_operativa,Cod_uni_cobertura,Cod_comuna_x,Cod_corregimiento,NOM_CORREGIMIENTO,Cod_vereda,NOM_VEREDA,Cod_barrio,NOM_BARRIO,Cod_enumerador,tot_viviendas,tot_hogares,ord_vivienda,ind_direccion,uso_vivienda,Ide_foto,fec_ini_encuesta,fec_fin_encuesta,Coord_x_manual_rec,Coord_y_manual_rec,Coord_x_auto_rec,Coord_y_auto_rec,Gps_Alt_auto_rec,Fec_captura_gps_rec,Gps_Distancia_rec,Coord_x_manual_enc,Coord_y_manual_enc,Coord_x_auto_enc,Coord_y_auto_enc,Gps_Alt_auto_enc,Fec_captura_gps_enc,Gps_Distancia_enc,Est_nov_cartografia,Cod_digitador,Fec_digitacion,ind_formato,Num_hogares_recuento,est_ficha,Cod_equipo_encuesta,Num_visita,Cod_Chip,Dir_Chip,Num_solicitud,Cod_UC_total,tip_vivienda,tip_mat_paredes,tip_mat_pisos,ind_tiene_energia,tip_estrato_energia,ind_tiene_alcantarillado,ind_tiene_gas,ind_tiene_recoleccion,ind_tiene_acueducto,tip_estrato_acueducto,num_cuartos_vivienda,num_hogares_vivienda,cod_encuestador,cod_supervisor,cod_critico,fec_ini_vivienda,fec_fin_vivienda,ide_hogar,tip_ocupa_vivienda,num_cuartos_exclusivos,num_cuartos_dormir,num_cuartos_unicos_dormir,tip_sanitario,tip_ubi_sanitario,tip_uso_sanitario,tip_origen_agua,ind_agua_llega_7dias,num_dias_llega,ind_agua_llega_24horas,num_horas_llega,tip_uso_agua_beber,tip_elimina_basura,ind_tiene_cocina,tip_prepara_alimentos,tip_uso_cocina,tip_energia_cocina,ind_tiene_nevera,ind_tiene_lavadora,ind_tiene_pc,ind_tiene_internet,ind_tiene_moto,ind_tiene_tractor,ind_tiene_carro,ind_tiene_bien_raiz,ind_gasto_alimento,vlr_gasto_alimento,ind_gasto_transporte,vlr_gasto_transporte,ind_gasto_educacion,vlr_gasto_educacion,ind_gasto_salud,vlr_gasto_salud,ind_gasto_serv_publicos,vlr_gasto_serv_publicos,ind_gasto_celular,vlr_gasto_celular,ind_gasto_arriendo,vlr_gasto_arriendo,ind_gasto_otros,vlr_gasto_otros,vlr_total_gastos,num_habita_vivienda,ind_evento_inundacion,num_evento_inundacion,ind_evento_avalancha,num_evento_avalancha,ind_evento_terremoto,num_evento_terremoto,ind_evento_incendio,num_evento_incendio,ind_evento_vendaval,num_evento_vendaval,ind_evento_hundimiento,num_evento_hundimiento,num_personas_posibles,num_personas_hogar,fec_ini_visita1,fec_fin_visita1,res_visita1,fec_ini_visita2,fec_fin_visita2,res_visita2,ide_informante,Tip_informante,ide_firma_informante,Cau_sin_firma,Email_contacto,Ind_declaracion,ide_persona,ide_nacional,ind_escaner,sexo_persona,ori_persona,tip_documento,fec_nacimiento,edad_calculada,fec_documento,cod_dpto_documento,cod_mpio_documento,Cod_pais_documento,tip_parentesco,tip_estado_civil,ind_conyuge_vive_hogar,ide_conyuge,ind_padre_vive_hogar,ide_padre,ind_pariente_domestico,ide_serv_domestico,ind_discap_ver,ind_discap_oir,ind_discap_hablar,ind_discap_moverse,ind_discap_bañarse,ind_discap_salir,ind_discap_entender,ind_discap_ninguna,tip_seg_social,ind_enfermo_30,ind_acudio_salud,ind_fue_atendido_salud,ind_esta_embarazada,ind_tuvo_hijos,tip_cuidado_niños,ind_recibe_comida,ind_leer_escribir,ind_estudia,niv_educativo,grado_alcanzado,ind_fondo_pensiones,tip_actividad_mes,num_sem_buscando,tip_empleado,ind_ingr_salario,vlr_ingr_salario,ind_ingr_honorarios,vlr_ingr_honorarios,ind_ingr_cosecha,num_mes_ingr_cosecha,vlr_ingr_cosecha,ind_ingr_pension,vlr_ingr_pension,ind_ingr_remesa_pais,vlr_ingr_remesa_pais,ind_ingr_remesa_exterior,vlr_ingr_remesa_exterior,ind_ingr_arriendos,vlr_ingr_arriendos,ind_otros_ingresos,vlr_otros_ingresos,ind_ingr_estado,vlr_ingr_fam_accion,vlr_ingr_col_mayor,vlr_ingr_otro_subsidio,fec_ini_persona1,fec_fin_persona1,fec_ini_persona2,fec_fin_persona2,ide_Unigasto,Jefe_UG,H_5,I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,I11,I12,I13,I14,I15,Grupo,Nivel,ide_Ug,persug,Capital,marca,estado,Linea_p,replicacion,fec_actualizacion_cns,C,Clasificacion,Latitud,Longitud,BARRIO,COMUNA
237736,237737,68,SANTANDER,68001,BUCARAMANGA,2019-08-26 16:00:47.863,25,25307,"6,80011E+19",12,40018082017,1,1,0,115,115001,742082,9,0,SIN CORREGIMIENTO,0,SIN VEREDA,202,SAN PEDRO,5496,0,1,1,1,1,Fachada68001063549600000665.jpg,2019-08-26 09:13:52.000,2019-08-26 09:45:32.000,709923700.0,-7311086000,709910410,-7311086318,937,2019-08-02 10:01:21.000,14,709920200.0,-7311094000.0,709920618.0,-7311103000.0,907,2019-08-26 09:14:48,10,0,5496,2019-08-02 10:01:34.000,1,1,1,63,1,,,0,"1,7068E+33",2,1,2,1,2,1,1,1,1,2,5,1,5543,5555,9999,2019-08-26 09:14:48.000,2019-08-26 09:45:32.000,1,1,1,1,1,1,1,1,1,1,9,1,99,1,1,1,1,1,2,2,2,2,1,2,2,2,2,1,560000,1,5000,2,0,1,20000,1,40000,1,1000,1,200000,2,0,826000,2,2,0,2,0,2,0,2,0,2,0,2,0,3,3,2019-08-26 09:16:32.000,2019-08-26 09:45:32.000,1,,,0,1,1,Firma_68001063549600000665_1.png,0,,1,3,15740066,0,1,2,5,2017-03-17 00:00:00,2,2017-04-27 00:00:00,0,0,862,3,5,9,99,1,1,2,99,2,2,2,2,2,2,2,1,0,1,2,9,9,9,2,1,9,9,9,99,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-08-26 09:31:48.000,2019-08-26 09:40:19.000,2019-08-26 09:43:45.000,2019-08-26 09:43:45.000,1,,1,0,0,0,0,1,0,0,1,1,1,0,0,0,0,1,C,2.0,"6,80017E+26",3,1,0,0,14559840000.0,0,,585,C02,7.099206,-73.11103,SAN PEDRO CLAVER,09. LA PEDREGOSA
237737,237738,68,SANTANDER,68001,BUCARAMANGA,2019-09-25 17:42:41.477,49,53490,"6,80011E+19",24,40018082017,1,1,0,128,128007,741223,4,0,SIN CORREGIMIENTO,0,SIN VEREDA,84,CUYANITA,5482,0,1,1,2,1,Fachada68001125548200000414.jpg,2019-09-25 10:33:14.000,2019-09-25 10:53:49.000,712295500.0,-7313916300,712306028,-7313911675,860,2019-09-06 09:04:16.000,12,712307600.0,-7313915000.0,712315514.0,-7313914000.0,853,2019-09-25 10:33:38,8,0,5482,2019-09-06 09:04:34.000,1,1,1,125,1,,,0,"1,7068E+33",1,1,5,1,1,2,2,1,2,99,2,1,5486,5508,9999,2019-09-25 10:33:38.000,2019-09-25 10:53:49.000,1,1,2,2,1,2,1,1,6,9,9,9,99,1,1,1,3,1,3,2,2,2,2,2,2,2,2,1,500000,1,180000,2,0,2,0,1,40000,2,0,1,250000,2,0,970000,1,2,0,2,0,2,0,2,0,2,0,2,0,9,3,2019-09-25 10:34:16.000,2019-09-25 10:53:49.000,1,,,0,99,4,Firma_68001125548200000414_1.png,0,,1,3,15793607,0,2,2,5,2019-02-25 00:00:00,0,2019-05-07 00:00:00,0,0,218,11,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,2,1,9,9,9,99,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-25 10:46:50.000,2019-09-25 10:50:06.000,2019-09-25 10:51:19.000,2019-09-25 10:51:19.000,1,,1,0,0,0,0,1,0,0,1,1,0,1,1,1,0,0,B,4.0,"6,80017E+26",3,1,0,0,14559840000.0,0,,555,B04,7.123155,-73.139136,CUYANITA,04. OCCIDENTAL
237738,237739,68,SANTANDER,68001,BUCARAMANGA,2019-09-18 15:40:12.360,44,46213,"6,80011E+19",34,40018082017,1,1,0,128,128006,741136,4,0,SIN CORREGIMIENTO,0,SIN VEREDA,82,GIRARDOT,5480,0,1,2,2,1,Fachada68001057548000001223.jpg,2019-09-18 10:51:21.000,2019-09-18 11:14:09.000,712123600.0,-7313444600,712116249,-7313448032,939,2019-08-14 12:52:33.000,8,712123100.0,-7313446000.0,712106645.0,-7313445000.0,958,2019-09-18 10:55:35,18,0,5480,2019-08-14 12:53:03.000,1,1,1,57,1,,,0,"1,7068E+33",2,1,2,1,3,1,1,1,1,3,4,1,5543,5555,9999,2019-09-18 10:55:36.000,2019-09-18 11:14:09.000,1,1,4,3,3,1,1,1,1,1,9,1,99,2,1,1,1,1,2,1,2,2,1,1,2,2,2,1,900000,1,40000,2,0,1,80000,1,200000,1,35000,1,900000,1,30000,2185000,2,2,0,2,0,2,0,2,0,2,0,2,0,8,2,2019-09-18 10:57:37.000,2019-09-18 11:14:09.000,1,,,0,1,1,Firma_68001057548000001224_1.png,0,,1,2,15782700,0,1,2,5,2018-07-12 00:00:00,1,2018-07-13 00:00:00,0,0,862,11,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,2,1,9,9,9,99,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-18 11:05:16.000,2019-09-18 11:10:58.000,2019-09-18 11:12:47.000,2019-09-18 11:12:47.000,1,,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,C,12.0,"6,80017E+26",2,1,0,0,14559840000.0,0,,375,C12,7.121066,-73.134447,GIRARDOT,04. OCCIDENTAL
237739,237740,68,SANTANDER,68001,BUCARAMANGA,2019-11-05 17:08:58.407,73,82331,"6,8001E+19",7,40018082017,1,1,0,104,104010,740472,2,0,SIN CORREGIMIENTO,0,SIN VEREDA,71,BOSQUE NORTE,5473,0,1,2,1,1,Fachada68001019547300001799.jpg,2019-11-05 07:58:20.000,2019-11-05 12:14:29.000,714243400.0,-7313238400,714238689,-7313228754,912,2019-09-30 07:59:50.000,11,714241700.0,-7313237000.0,714236053.0,-7313234000.0,905,2019-11-05 07:59:10,6,0,5473,2019-09-30 08:01:16.000,1,1,1,19,2,,,0,"1,7068E+33",2,1,3,1,1,1,1,1,1,1,3,1,5490,5523,9999,2019-11-05 07:59:10.000,2019-11-05 12:14:29.000,1,1,3,2,2,1,1,1,1,1,9,1,99,1,1,1,1,1,2,2,2,2,2,2,2,2,2,1,200000,1,250000,1,60000,2,0,1,80000,2,0,1,350000,2,0,940000,1,2,0,2,0,2,0,2,0,2,0,2,0,6,2,2019-11-05 08:12:56.000,2019-11-05 08:13:06.000,3,2019-11-05 12:02:55.000,2019-11-05 12:14:29.000,1,1,1,Firma_68001019547300001800_1.png,0,,1,2,15834283,0,1,2,5,2018-11-01 00:00:00,1,2018-11-09 00:00:00,0,0,604,11,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,1,1,1,9,9,5,1,9,9,9,99,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-11-05 12:09:11.000,2019-11-05 12:11:53.000,2019-11-05 12:13:02.000,2019-11-05 12:13:02.000,1,,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,C,4.0,"6,80017E+26",2,1,0,0,14559840000.0,0,,225,C04,7.142361,-73.132345,BOSQUE NORTE,02. NORORIENTAL
237740,237741,68,SANTANDER,68001,BUCARAMANGA,2019-11-05 17:08:58.407,73,82782,"6,8001E+19",18,40018082017,1,1,0,104,104010,740452,2,0,SIN CORREGIMIENTO,0,SIN VEREDA,69,VILLA MERCEDES,5483,0,3,1,1,1,Fachada68001026548300002692.jpg,2019-11-05 09:58:57.000,2019-11-05 10:29:07.000,714341100.0,-7313351600,714345127,-7313345655,880,2019-09-30 10:28:15.000,7,714347200.0,-7313357000.0,714337048.0,-7313362000.0,873,2019-11-05 10:02:41,12,0,5483,2019-09-30 10:28:42.000,1,3,1,26,1,,,0,"1,7068E+33",1,1,3,1,1,1,1,1,1,1,3,3,5573,5571,9999,2019-11-05 10:02:41.000,2019-11-05 10:29:07.000,3,1,1,1,1,1,1,2,1,1,9,1,99,1,1,1,1,2,2,2,2,2,2,2,2,2,2,1,100000,2,0,2,0,2,0,2,0,2,0,1,80000,2,0,180000,2,2,0,2,0,2,0,2,0,2,0,2,0,1,1,2019-11-05 10:22:16.000,2019-11-05 10:29:07.000,1,,,0,1,1,Firma_68001026548300002692_3.png,0,,1,1,15834650,0,1,2,8,1998-11-25 00:00:00,20,2018-09-25 00:00:00,0,0,862,1,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,1,2,4,11,2,1,999,5,99,0,1,600000,99,99,0,2,0,2,0,2,0,2,0,2,0,2,0,0,0,2019-11-05 10:24:05.000,2019-11-05 10:26:39.000,2019-11-05 10:26:44.000,2019-11-05 10:26:44.000,1,1.0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,C,14.0,"6,80017E+26",1,1,0,0,14559840000.0,0,,375,C14,7.14337,-73.133621,VILLA MERCEDES,02. NORORIENTAL


In [44]:
sb2022.columns

Index(['documento', 'cod_dpto', 'Departamento', 'cod_mpio', 'Municipio',
       'fec_paquete', 'num_paquete', 'num_ficha', 'ide_ficha_origen',
       'ide_edificacion',
       ...
       'estado', 'Linea_p', 'replicacion', 'fec_actualizacion_cns', 'C',
       'Clasificacion', 'Latitud', 'Longitud', 'BARRIO', 'COMUNA'],
      dtype='object', length=250)

Fill the NA values with cero. Some columns have the "1" mark when the value is True. We fill the missing with "0".

In [45]:
f_sb2022 = sb2022.fillna(0)

And, display all the columns:

In [46]:
pd.set_option('display.max_columns', None)
f_sb2022.head()

Unnamed: 0,documento,cod_dpto,Departamento,cod_mpio,Municipio,fec_paquete,num_paquete,num_ficha,ide_ficha_origen,ide_edificacion,ver_estructura,ori_encuesta,Cod_clase,Cod_centro_poblado,Cod_area_coordinacion,Cod_area_operativa,Cod_uni_cobertura,Cod_comuna_x,Cod_corregimiento,NOM_CORREGIMIENTO,Cod_vereda,NOM_VEREDA,Cod_barrio,NOM_BARRIO,Cod_enumerador,tot_viviendas,tot_hogares,ord_vivienda,ind_direccion,uso_vivienda,Ide_foto,fec_ini_encuesta,fec_fin_encuesta,Coord_x_manual_rec,Coord_y_manual_rec,Coord_x_auto_rec,Coord_y_auto_rec,Gps_Alt_auto_rec,Fec_captura_gps_rec,Gps_Distancia_rec,Coord_x_manual_enc,Coord_y_manual_enc,Coord_x_auto_enc,Coord_y_auto_enc,Gps_Alt_auto_enc,Fec_captura_gps_enc,Gps_Distancia_enc,Est_nov_cartografia,Cod_digitador,Fec_digitacion,ind_formato,Num_hogares_recuento,est_ficha,Cod_equipo_encuesta,Num_visita,Cod_Chip,Dir_Chip,Num_solicitud,Cod_UC_total,tip_vivienda,tip_mat_paredes,tip_mat_pisos,ind_tiene_energia,tip_estrato_energia,ind_tiene_alcantarillado,ind_tiene_gas,ind_tiene_recoleccion,ind_tiene_acueducto,tip_estrato_acueducto,num_cuartos_vivienda,num_hogares_vivienda,cod_encuestador,cod_supervisor,cod_critico,fec_ini_vivienda,fec_fin_vivienda,ide_hogar,tip_ocupa_vivienda,num_cuartos_exclusivos,num_cuartos_dormir,num_cuartos_unicos_dormir,tip_sanitario,tip_ubi_sanitario,tip_uso_sanitario,tip_origen_agua,ind_agua_llega_7dias,num_dias_llega,ind_agua_llega_24horas,num_horas_llega,tip_uso_agua_beber,tip_elimina_basura,ind_tiene_cocina,tip_prepara_alimentos,tip_uso_cocina,tip_energia_cocina,ind_tiene_nevera,ind_tiene_lavadora,ind_tiene_pc,ind_tiene_internet,ind_tiene_moto,ind_tiene_tractor,ind_tiene_carro,ind_tiene_bien_raiz,ind_gasto_alimento,vlr_gasto_alimento,ind_gasto_transporte,vlr_gasto_transporte,ind_gasto_educacion,vlr_gasto_educacion,ind_gasto_salud,vlr_gasto_salud,ind_gasto_serv_publicos,vlr_gasto_serv_publicos,ind_gasto_celular,vlr_gasto_celular,ind_gasto_arriendo,vlr_gasto_arriendo,ind_gasto_otros,vlr_gasto_otros,vlr_total_gastos,num_habita_vivienda,ind_evento_inundacion,num_evento_inundacion,ind_evento_avalancha,num_evento_avalancha,ind_evento_terremoto,num_evento_terremoto,ind_evento_incendio,num_evento_incendio,ind_evento_vendaval,num_evento_vendaval,ind_evento_hundimiento,num_evento_hundimiento,num_personas_posibles,num_personas_hogar,fec_ini_visita1,fec_fin_visita1,res_visita1,fec_ini_visita2,fec_fin_visita2,res_visita2,ide_informante,Tip_informante,ide_firma_informante,Cau_sin_firma,Email_contacto,Ind_declaracion,ide_persona,ide_nacional,ind_escaner,sexo_persona,ori_persona,tip_documento,fec_nacimiento,edad_calculada,fec_documento,cod_dpto_documento,cod_mpio_documento,Cod_pais_documento,tip_parentesco,tip_estado_civil,ind_conyuge_vive_hogar,ide_conyuge,ind_padre_vive_hogar,ide_padre,ind_pariente_domestico,ide_serv_domestico,ind_discap_ver,ind_discap_oir,ind_discap_hablar,ind_discap_moverse,ind_discap_bañarse,ind_discap_salir,ind_discap_entender,ind_discap_ninguna,tip_seg_social,ind_enfermo_30,ind_acudio_salud,ind_fue_atendido_salud,ind_esta_embarazada,ind_tuvo_hijos,tip_cuidado_niños,ind_recibe_comida,ind_leer_escribir,ind_estudia,niv_educativo,grado_alcanzado,ind_fondo_pensiones,tip_actividad_mes,num_sem_buscando,tip_empleado,ind_ingr_salario,vlr_ingr_salario,ind_ingr_honorarios,vlr_ingr_honorarios,ind_ingr_cosecha,num_mes_ingr_cosecha,vlr_ingr_cosecha,ind_ingr_pension,vlr_ingr_pension,ind_ingr_remesa_pais,vlr_ingr_remesa_pais,ind_ingr_remesa_exterior,vlr_ingr_remesa_exterior,ind_ingr_arriendos,vlr_ingr_arriendos,ind_otros_ingresos,vlr_otros_ingresos,ind_ingr_estado,vlr_ingr_fam_accion,vlr_ingr_col_mayor,vlr_ingr_otro_subsidio,fec_ini_persona1,fec_fin_persona1,fec_ini_persona2,fec_fin_persona2,ide_Unigasto,Jefe_UG,H_5,I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,I11,I12,I13,I14,I15,Grupo,Nivel,ide_Ug,persug,Capital,marca,estado,Linea_p,replicacion,fec_actualizacion_cns,C,Clasificacion,Latitud,Longitud,BARRIO,COMUNA
0,1,68,SANTANDER,68001,BUCARAMANGA,2019-09-10 15:39:36.563,37,39567,"6,80011E+19",3,40018082017,1,1,0,126,126003,741294,5,0,SIN CORREGIMIENTO,0,SIN VEREDA,145,CAMPO HERMOSO,5493,0,1,1,1,1,Fachada68001093549300001017.jpg,2019-09-10 11:00:31.000,2019-09-10 11:18:02.000,710643900.0,-7313946300,710635973,-7313932346,919,2019-08-10 12:01:20.000,17,710625500.0,-7313942000.0,710628748.0,-7313937000.0,898,2019-09-10 11:00:38,6,0,5493,2019-08-10 12:01:32.000,1,1,1,93,1,0.0,0.0,0,"1,7068E+33",2,1,2,1,2,1,1,1,1,2,4,1,5562,5518,9999,2019-09-10 11:00:38.000,2019-09-10 11:18:02.000,1,1,4,3,3,1,1,1,1,1,9,1,99,1,1,1,1,1,2,1,2,2,2,2,2,2,2,1,300000,2,0,2,0,2,0,1,100000,2,0,1,250000,2,0,650000,1,2,0,2,0,2,0,2,0,2,0,2,0,3,2,2019-09-10 11:03:42.000,2019-09-10 11:18:02.000,1,0,0,0,1,1,Firma_68001093549300001017_1.png,0,nicolepaola271213@gmail.com,1,2,15770911,0,2,2,5,2013-12-27 00:00:00,5,2013-12-30 00:00:00,0,0,862,3,5,9,99,1,1,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,2,1,1,0,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-10 11:10:16.000,2019-09-10 11:12:44.000,2019-09-10 11:15:42.000,2019-09-10 11:15:42.000,1,0.0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,C,7.0,"6,80017E+26",2,1,0,0,14559840000.0,1,2021-09-02 20:44:57.463,375,C07,7.10628748,-73.13936591,CAMPO HERMOSO,05. GARCIA ROVIRA
1,2,68,SANTANDER,68001,BUCARAMANGA,2019-10-03 16:57:01.163,56,62156,"6,80011E+19",20,40018082017,1,1,0,110,110005,743596,14,0,SIN CORREGIMIENTO,0,SIN VEREDA,114,BUENOS AIRES,5486,0,1,1,1,1,Fachada68001065548600002007.jpg,2019-10-03 09:42:06.000,2019-10-03 10:03:35.000,713184200.0,-7309992200,713187966,-7310000482,1241,2019-09-12 12:04:38.000,10,713180900.0,-7309990000.0,713187059.0,-7309989000.0,1231,2019-10-03 09:43:18,6,0,5486,2019-09-12 12:04:59.000,1,1,1,65,1,0.0,0.0,0,"1,7068E+33",2,1,3,1,1,1,1,1,1,1,4,1,5539,5528,9999,2019-10-03 09:43:18.000,2019-10-03 10:03:35.000,1,1,4,3,3,1,1,1,1,1,9,1,99,1,1,1,3,1,2,1,2,2,2,2,2,2,2,1,450000,1,100000,2,0,2,0,1,100000,2,0,1,400000,2,0,1050000,2,2,0,2,0,2,0,2,0,2,0,2,0,4,4,2019-10-03 09:45:36.000,2019-10-03 10:03:35.000,1,0,0,0,1,1,Firma_68001065548600002007_1.png,0,0,1,4,15805690,0,2,2,5,2014-08-20 00:00:00,5,2014-11-11 00:00:00,0,0,862,14,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,2,2,0,0,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-10-03 09:55:31.000,2019-10-03 09:57:33.000,2019-10-03 10:00:08.000,2019-10-03 10:00:08.000,1,0.0,1,1,0,1,0,1,0,0,1,1,0,0,0,0,0,0,B,2.0,"6,80017E+26",4,1,0,0,14559840000.0,0,0,6,B02,7.13187059,-73.09989155,BUENOS AIRES,14. MORRORICO
2,3,68,SANTANDER,68001,BUCARAMANGA,2021-11-20 16:28:56.757,1269,43544,"6,8001E+19",8,40118082017,1,1,0,115,115001,742122,9,0,SIN CORREGIMIENTO,0,SIN VEREDA,193,NUEVA GRANADA,5475,0,1,1,1,1,Fachada68001001547500000397.jpg,2019-08-26 07:56:27.000,2021-11-20 08:25:00.000,709813300.0,-7311354000,709820867,-7311348259,894,2019-08-02 07:49:47.000,10,709814300.0,-7311353000.0,709810543.0,-7311369000.0,895,Aug 26 2019 7:56AM,19,0,5475,2019-08-02 07:49:57.000,1,1,1,1,2,0.0,0.0,65502,"1,7068E+33",1,1,2,1,3,1,1,1,1,3,5,1,5534,5523,9999,2019-08-26 07:56:45.000,2021-11-20 08:25:00.000,1,3,5,4,4,1,1,1,1,1,9,1,99,1,1,1,1,1,2,1,1,1,1,2,2,2,2,1,250000,1,20000,2,0,1,103600,1,235000,1,5000,2,0,2,0,613600,4,2,0,2,0,2,0,2,0,2,0,2,0,2,2,2019-08-26 08:00:38.000,2019-08-26 08:21:44.000,1,0,2021-11-20 08:25:00.000,1,2,1,Firma_68001001547500000397_1.png,0,0,1,1,15738860,1,1,1,3,1930-02-24 00:00:00,91,1953-07-29 00:00:00,11,11001,170,1,3,9,99,2,99,2,99,2,2,2,2,2,2,2,1,1,2,9,9,9,9,9,9,2,2,0,0,2,7,999,99,9,0,9,0,9,99,0,2,0,2,0,2,0,2,0,2,0,2,0,0,0,2019-08-26 08:10:14.000,2019-08-26 08:11:32.000,2021-11-20 08:21:02.000,2021-11-20 08:21:02.000,1,1.0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,D,3.0,"6,80017E+26",2,1,0,0,14559840000.0,1,2021-11-20 16:39:25.767,6,D03,7.09810543,-73.11369419,NUEVA GRANADA,09. LA PEDREGOSA
3,4,68,SANTANDER,68001,BUCARAMANGA,2019-09-12 14:46:54.523,39,43875,"6,8001E+19",16,40018082017,1,1,0,126,126007,741367,5,0,SIN CORREGIMIENTO,0,SIN VEREDA,142,ALFONSO LÓPEZ,5483,0,2,1,1,1,Fachada68001026548300001183.jpg,2019-09-12 08:59:11.000,2019-09-12 09:13:02.000,711285300.0,-7313170000,711291560,-7313165695,934,2019-08-13 07:36:02.000,8,711301200.0,-7313176000.0,711287649.0,-7313167000.0,917,2019-09-12 08:59:24,18,0,5483,2019-08-13 07:36:22.000,1,2,1,26,1,0.0,0.0,0,"1,7068E+33",1,1,3,1,2,1,1,1,1,2,3,2,5566,5540,9999,2019-09-12 08:59:25.000,2019-09-12 09:13:02.000,2,1,0,0,0,1,1,2,1,1,9,1,99,1,1,1,5,3,7,2,2,2,2,2,2,2,2,1,200000,2,0,2,0,2,0,2,0,2,0,1,120000,2,0,320000,2,2,0,2,0,2,0,2,0,2,0,2,0,1,1,2019-09-12 09:08:45.000,2019-09-12 09:13:02.000,1,0,0,0,1,1,Firma_68001026548300001183_2.png,0,0,1,1,15773328,1,1,1,3,1931-09-09 00:00:00,87,1953-09-25 00:00:00,11,11001,170,1,4,9,99,2,99,2,99,1,2,1,2,2,1,2,2,3,2,9,9,9,9,9,9,1,2,2,3,2,1,999,5,99,0,1,200000,99,99,0,2,0,1,100000,2,0,2,0,2,0,1,0,50000,0,2019-09-12 09:10:11.000,2019-09-12 09:11:03.000,2019-09-12 09:11:06.000,2019-09-12 09:11:06.000,1,1.0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,C,7.0,"6,80017E+26",1,1,0,0,14559840000.0,0,0,3,C07,7.11287649,-73.1316689,ALFONSO LOPEZ,05. GARCIA ROVIRA
4,5,68,SANTANDER,68001,BUCARAMANGA,2019-09-24 14:55:21.997,48,53705,"6,80011E+19",3,40018082017,1,1,0,128,128007,741235,4,0,SIN CORREGIMIENTO,0,SIN VEREDA,85,CAMILO TORRES,5485,0,1,2,2,1,Fachada68001067548500001513.jpg,2019-09-24 08:20:07.000,2019-09-24 08:42:56.000,712222100.0,-7313914800,712198417,-7313905531,892,2019-09-06 08:40:40.000,28,712197200.0,-7313910000.0,712197304.0,-7313916000.0,945,2019-09-24 08:20:18,6,0,5485,2019-09-06 08:42:11.000,1,1,1,67,1,0.0,0.0,0,"1,7068E+33",2,1,3,1,1,2,2,1,2,99,4,1,5509,5559,9999,2019-09-24 08:20:18.000,2019-09-24 08:42:56.000,1,1,1,1,1,3,1,1,6,9,9,9,99,1,1,1,1,1,3,1,2,2,2,2,2,2,2,1,300000,2,0,2,0,2,0,1,100000,2,0,1,350000,2,0,750000,2,2,0,2,0,2,0,2,0,2,0,2,0,8,4,2019-09-24 08:26:22.000,2019-09-24 08:42:56.000,1,0,0,0,2,1,Firma_68001067548500001514_1.png,0,0,1,4,15790647,0,1,2,5,2016-01-06 00:00:00,3,2016-01-26 00:00:00,0,0,862,11,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,2,1,9,9,9,99,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-24 08:36:44.000,2019-09-24 08:39:02.000,2019-09-24 08:41:29.000,2019-09-24 08:41:29.000,1,0.0,1,1,0,0,0,1,0,0,1,1,0,1,1,0,0,1,A,5.0,"6,80017E+26",4,1,0,0,14559840000.0,0,0,705,A05,7.12197304,-73.13916206,ASENTAMIENTO CAMILO TORRES,04. OCCIDENTAL


Now, we analyzed all column's data type:

In [47]:
f_sb2022.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 237741 entries, 0 to 237740
Columns: 250 entries, documento to COMUNA
dtypes: float64(10), int64(203), object(37)
memory usage: 453.5+ MB


We observed that all columns have the "object" data type for default. We will need to change this data type for some specific columns that have int or float values.

But, we decided to explore all data type columns with a for statement and print the five first values for each column.

In [48]:
for col in f_sb2022:
    print (f_sb2022[col].apply(type))

0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: documento, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: cod_dpto, Length: 237741, dtype: object
0         <class 'str'>
1         <class 'str'>
2         <class 'str'>
3         <class 'str'>
4         <class 'str'>
              ...      
237736    <class 'str'>
237737    <class 'str'>
237738    <class 'str'>
237739    <class 'str'>
237740    <class 'str'>
Name: Departamento, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 

0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ord_vivienda, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ind_direccion, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: uso_vivienda, Length: 237741, dtype: object
0         <class 'str'>
1         <class 'str'>
2        

0         <class 'float'>
1         <class 'float'>
2         <class 'float'>
3         <class 'float'>
4         <class 'float'>
               ...       
237736    <class 'float'>
237737    <class 'float'>
237738    <class 'float'>
237739    <class 'float'>
237740    <class 'float'>
Name: Cod_Chip, Length: 237741, dtype: object
0         <class 'float'>
1         <class 'float'>
2         <class 'float'>
3         <class 'float'>
4         <class 'float'>
               ...       
237736    <class 'float'>
237737    <class 'float'>
237738    <class 'float'>
237739    <class 'float'>
237740    <class 'float'>
Name: Dir_Chip, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: Num_solicitud, Length: 237741, dtype: object
0         <class 'str

0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ind_agua_llega_7dias, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: num_dias_llega, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ind_agua_llega_24horas, Length: 237741, dtype: object
0         <class 'int'>
1         <cla

0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ind_gasto_serv_publicos, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: vlr_gasto_serv_publicos, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ind_gasto_celular, Length: 237741, dtype: object
0         <class 'int'>
1      

0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'str'>
237740    <class 'int'>
Name: fec_ini_visita2, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'str'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'str'>
237740    <class 'int'>
Name: fec_fin_visita2, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: res_visita2, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2    

0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ind_pariente_domestico, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ide_serv_domestico, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ind_discap_ver, Length: 237741, dtype: object
0         <class 'int'>
1         <class

0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: vlr_ingr_salario, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: ind_ingr_honorarios, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: vlr_ingr_honorarios, Length: 237741, dtype: object
0         <class 'int'>
1         <class

0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: I3, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: I4, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class 'int'>
4         <class 'int'>
              ...      
237736    <class 'int'>
237737    <class 'int'>
237738    <class 'int'>
237739    <class 'int'>
237740    <class 'int'>
Name: I5, Length: 237741, dtype: object
0         <class 'int'>
1         <class 'int'>
2         <class 'int'>
3         <class

0         <class 'str'>
1         <class 'str'>
2         <class 'str'>
3         <class 'str'>
4         <class 'str'>
              ...      
237736    <class 'str'>
237737    <class 'str'>
237738    <class 'str'>
237739    <class 'str'>
237740    <class 'str'>
Name: COMUNA, Length: 237741, dtype: object


Now, we change the data type from some columns with no string values.

In [49]:
f_sb2022 = f_sb2022.astype({'documento':'int', 'cod_dpto':'int', 'cod_mpio':'int',
                            'num_paquete':'int', 'num_ficha':'int', 'ide_edificacion':'int',
                            'ori_encuesta':'int', 'Cod_clase':'int', 'Cod_centro_poblado':'int',
                            'Cod_area_coordinacion':'int', 'Cod_area_operativa':'int', 'Cod_uni_cobertura':'int',
                            'Cod_comuna_x':'int', 'Cod_corregimiento':'int', 'Cod_vereda':'int',
                            'Cod_barrio':'int', 'Cod_enumerador':'int', 'tot_viviendas':'int',
                            'tot_hogares':'int', 'ord_vivienda':'int','ind_direccion':'int',
                            'uso_vivienda':'int', 'Coord_x_manual_rec':'int','Coord_y_manual_rec':'int',
                            'Coord_x_auto_rec':'int','Coord_y_auto_rec':'int','Gps_Alt_auto_rec':'int',
                            'Gps_Distancia_rec':'int','Coord_x_manual_enc':'int','Coord_y_manual_enc':'int',
                            'Coord_x_auto_enc':'int','Coord_y_auto_enc':'int','Gps_Alt_auto_enc':'int',
                            'Gps_Distancia_enc':'int','Est_nov_cartografia':'int','Cod_digitador':'int',
                            'ind_formato':'int','Num_hogares_recuento':'int','est_ficha':'int',
                            'Cod_equipo_encuesta':'int','Num_visita':'int','Cod_Chip':'int',
                            'Dir_Chip':'int','Num_solicitud':'int','tip_vivienda':'int',
                            'tip_mat_paredes':'int','tip_mat_pisos':'int','ind_tiene_energia':'int',
                            'tip_estrato_energia':'int','ind_tiene_alcantarillado':'int','ind_tiene_gas':'int',
                            'ind_tiene_recoleccion':'int','ind_tiene_acueducto':'int',
                            'tip_estrato_acueducto':'int','num_cuartos_vivienda':'int',
                            'num_hogares_vivienda':'int','cod_encuestador':'int','cod_supervisor':'int',
                            'cod_critico':'int','ide_hogar':'int','tip_ocupa_vivienda':'int',
                            'num_cuartos_exclusivos':'int','num_cuartos_dormir':'int',
                            'num_cuartos_unicos_dormir':'int','tip_sanitario':'int','tip_ubi_sanitario':'int',
                            'tip_uso_sanitario':'int','tip_origen_agua':'int','ind_agua_llega_7dias':'int',
                            'num_dias_llega':'int','ind_agua_llega_24horas':'int','num_horas_llega':'int',
                            'tip_uso_agua_beber':'int','tip_elimina_basura':'int','ind_tiene_cocina':'int',
                            'tip_prepara_alimentos':'int','tip_uso_cocina':'int','tip_energia_cocina':'int',
                            'ind_tiene_nevera':'int','ind_tiene_lavadora':'int','ind_tiene_pc':'int',
                            'ind_tiene_internet':'int','ind_tiene_moto':'int','ind_tiene_tractor':'int',
                            'ind_tiene_carro':'int','ind_tiene_bien_raiz':'int','ind_gasto_alimento':'int',
                            'vlr_gasto_alimento':'int','ind_gasto_transporte':'int','vlr_gasto_transporte':'int',
                            'ind_gasto_educacion':'int','vlr_gasto_educacion':'int','ind_gasto_salud':'int',
                            'vlr_gasto_salud':'int','ind_gasto_serv_publicos':'int','vlr_gasto_serv_publicos':'int',
                            'ind_gasto_celular':'int','vlr_gasto_celular':'int','ind_gasto_arriendo':'int',
                            'vlr_gasto_arriendo':'int','ind_gasto_otros':'int','vlr_gasto_otros':'int',
                            'vlr_total_gastos':'int','num_habita_vivienda':'int','ind_evento_inundacion':'int',
                            'num_evento_inundacion':'int','ind_evento_avalancha':'int','num_evento_avalancha':'int',
                            'ind_evento_terremoto':'int','num_evento_terremoto':'int','ind_evento_incendio':'int',
                            'num_evento_incendio':'int','ind_evento_vendaval':'int','num_evento_vendaval':'int',
                            'ind_evento_hundimiento':'int','num_evento_hundimiento':'int',
                            'num_personas_posibles':'int','num_personas_hogar':'int',
                            'res_visita1':'int','res_visita2':'int','ide_informante':'int','Tip_informante':'int',
                            'Cau_sin_firma':'int','Ind_declaracion':'int','ide_persona':'int',
                            'ide_nacional':'int','ind_escaner':'int','sexo_persona':'int','ori_persona':'int',
                            'tip_documento':'int','edad_calculada':'int', 'cod_dpto_documento':'int',
                            'cod_mpio_documento':'int','Cod_pais_documento':'int','tip_parentesco':'int',
                            'tip_estado_civil':'int','ind_conyuge_vive_hogar':'int','ide_conyuge':'int',
                            'ind_padre_vive_hogar':'int','ide_padre':'int','ind_pariente_domestico':'int',
                            'ide_serv_domestico':'int','ind_discap_ver':'int','ind_discap_oir':'int',
                            'ind_discap_hablar':'int','ind_discap_moverse':'int','ind_discap_bañarse':'int',
                            'ind_discap_salir':'int','ind_discap_entender':'int','ind_discap_ninguna':'int',
                            'tip_seg_social':'int','ind_enfermo_30':'int','ind_acudio_salud':'int',
                            'ind_fue_atendido_salud':'int','ind_esta_embarazada':'int','ind_tuvo_hijos':'int',
                            'tip_cuidado_niños':'int','ind_recibe_comida':'int','ind_leer_escribir':'int',
                            'ind_estudia':'int','niv_educativo':'int','grado_alcanzado':'int','ind_fondo_pensiones':'int',
                            'tip_actividad_mes':'int','num_sem_buscando':'int','tip_empleado':'int',
                            'ind_ingr_salario':'int','vlr_ingr_salario':'int','ind_ingr_honorarios':'int',
                            'vlr_ingr_honorarios':'int','ind_ingr_cosecha':'int','num_mes_ingr_cosecha':'int',
                            'vlr_ingr_cosecha':'int','ind_ingr_pension':'int','vlr_ingr_pension':'int',
                            'ind_ingr_remesa_pais':'int','vlr_ingr_remesa_pais':'int','ind_ingr_remesa_exterior':'int',
                            'vlr_ingr_remesa_exterior':'int','ind_ingr_arriendos':'int','vlr_ingr_arriendos':'int',
                            'ind_otros_ingresos':'int','vlr_otros_ingresos':'int','ind_ingr_estado':'int',
                            'vlr_ingr_fam_accion':'int','vlr_ingr_col_mayor':'int','vlr_ingr_otro_subsidio':'int',
                            'ide_Unigasto':'int','Jefe_UG':'int','H_5':'int','I1':'int','I2':'int','I3':'int',
                            'I4':'int','I5':'int','I6':'int','I7':'int','I8':'int','I9':'int','I10':'int','I11':'int',
                            'I12':'int','I13':'int','I14':'int','I15':'int','Nivel':'int',
                            'persug':'int','Capital':'int','marca':'int','estado':'int','Linea_p':'int',
                            'replicacion':'int'
})

Now, we change to datetime data type for all date columns:

In [50]:
date_c_2022 = ['fec_paquete','fec_ini_encuesta','fec_fin_encuesta','Fec_captura_gps_rec','Fec_captura_gps_enc',
               'Fec_digitacion','fec_ini_vivienda','fec_fin_vivienda','fec_ini_visita1','fec_fin_visita1',
               'fec_ini_visita2','fec_fin_visita2','fec_nacimiento','fec_documento','fec_ini_persona1',
               'fec_fin_persona1','fec_ini_persona2','fec_fin_persona2','fec_actualizacion_cns']
for i in range(len(date_c_2022)):
    f_sb2022[date_c_2022[i]] = pd.to_datetime(f_sb2022[date_c_2022[i]])

And show the results:

In [51]:
f_sb2022.head()

Unnamed: 0,documento,cod_dpto,Departamento,cod_mpio,Municipio,fec_paquete,num_paquete,num_ficha,ide_ficha_origen,ide_edificacion,ver_estructura,ori_encuesta,Cod_clase,Cod_centro_poblado,Cod_area_coordinacion,Cod_area_operativa,Cod_uni_cobertura,Cod_comuna_x,Cod_corregimiento,NOM_CORREGIMIENTO,Cod_vereda,NOM_VEREDA,Cod_barrio,NOM_BARRIO,Cod_enumerador,tot_viviendas,tot_hogares,ord_vivienda,ind_direccion,uso_vivienda,Ide_foto,fec_ini_encuesta,fec_fin_encuesta,Coord_x_manual_rec,Coord_y_manual_rec,Coord_x_auto_rec,Coord_y_auto_rec,Gps_Alt_auto_rec,Fec_captura_gps_rec,Gps_Distancia_rec,Coord_x_manual_enc,Coord_y_manual_enc,Coord_x_auto_enc,Coord_y_auto_enc,Gps_Alt_auto_enc,Fec_captura_gps_enc,Gps_Distancia_enc,Est_nov_cartografia,Cod_digitador,Fec_digitacion,ind_formato,Num_hogares_recuento,est_ficha,Cod_equipo_encuesta,Num_visita,Cod_Chip,Dir_Chip,Num_solicitud,Cod_UC_total,tip_vivienda,tip_mat_paredes,tip_mat_pisos,ind_tiene_energia,tip_estrato_energia,ind_tiene_alcantarillado,ind_tiene_gas,ind_tiene_recoleccion,ind_tiene_acueducto,tip_estrato_acueducto,num_cuartos_vivienda,num_hogares_vivienda,cod_encuestador,cod_supervisor,cod_critico,fec_ini_vivienda,fec_fin_vivienda,ide_hogar,tip_ocupa_vivienda,num_cuartos_exclusivos,num_cuartos_dormir,num_cuartos_unicos_dormir,tip_sanitario,tip_ubi_sanitario,tip_uso_sanitario,tip_origen_agua,ind_agua_llega_7dias,num_dias_llega,ind_agua_llega_24horas,num_horas_llega,tip_uso_agua_beber,tip_elimina_basura,ind_tiene_cocina,tip_prepara_alimentos,tip_uso_cocina,tip_energia_cocina,ind_tiene_nevera,ind_tiene_lavadora,ind_tiene_pc,ind_tiene_internet,ind_tiene_moto,ind_tiene_tractor,ind_tiene_carro,ind_tiene_bien_raiz,ind_gasto_alimento,vlr_gasto_alimento,ind_gasto_transporte,vlr_gasto_transporte,ind_gasto_educacion,vlr_gasto_educacion,ind_gasto_salud,vlr_gasto_salud,ind_gasto_serv_publicos,vlr_gasto_serv_publicos,ind_gasto_celular,vlr_gasto_celular,ind_gasto_arriendo,vlr_gasto_arriendo,ind_gasto_otros,vlr_gasto_otros,vlr_total_gastos,num_habita_vivienda,ind_evento_inundacion,num_evento_inundacion,ind_evento_avalancha,num_evento_avalancha,ind_evento_terremoto,num_evento_terremoto,ind_evento_incendio,num_evento_incendio,ind_evento_vendaval,num_evento_vendaval,ind_evento_hundimiento,num_evento_hundimiento,num_personas_posibles,num_personas_hogar,fec_ini_visita1,fec_fin_visita1,res_visita1,fec_ini_visita2,fec_fin_visita2,res_visita2,ide_informante,Tip_informante,ide_firma_informante,Cau_sin_firma,Email_contacto,Ind_declaracion,ide_persona,ide_nacional,ind_escaner,sexo_persona,ori_persona,tip_documento,fec_nacimiento,edad_calculada,fec_documento,cod_dpto_documento,cod_mpio_documento,Cod_pais_documento,tip_parentesco,tip_estado_civil,ind_conyuge_vive_hogar,ide_conyuge,ind_padre_vive_hogar,ide_padre,ind_pariente_domestico,ide_serv_domestico,ind_discap_ver,ind_discap_oir,ind_discap_hablar,ind_discap_moverse,ind_discap_bañarse,ind_discap_salir,ind_discap_entender,ind_discap_ninguna,tip_seg_social,ind_enfermo_30,ind_acudio_salud,ind_fue_atendido_salud,ind_esta_embarazada,ind_tuvo_hijos,tip_cuidado_niños,ind_recibe_comida,ind_leer_escribir,ind_estudia,niv_educativo,grado_alcanzado,ind_fondo_pensiones,tip_actividad_mes,num_sem_buscando,tip_empleado,ind_ingr_salario,vlr_ingr_salario,ind_ingr_honorarios,vlr_ingr_honorarios,ind_ingr_cosecha,num_mes_ingr_cosecha,vlr_ingr_cosecha,ind_ingr_pension,vlr_ingr_pension,ind_ingr_remesa_pais,vlr_ingr_remesa_pais,ind_ingr_remesa_exterior,vlr_ingr_remesa_exterior,ind_ingr_arriendos,vlr_ingr_arriendos,ind_otros_ingresos,vlr_otros_ingresos,ind_ingr_estado,vlr_ingr_fam_accion,vlr_ingr_col_mayor,vlr_ingr_otro_subsidio,fec_ini_persona1,fec_fin_persona1,fec_ini_persona2,fec_fin_persona2,ide_Unigasto,Jefe_UG,H_5,I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,I11,I12,I13,I14,I15,Grupo,Nivel,ide_Ug,persug,Capital,marca,estado,Linea_p,replicacion,fec_actualizacion_cns,C,Clasificacion,Latitud,Longitud,BARRIO,COMUNA
0,1,68,SANTANDER,68001,BUCARAMANGA,2019-09-10 15:39:36.563,37,39567,"6,80011E+19",3,40018082017,1,1,0,126,126003,741294,5,0,SIN CORREGIMIENTO,0,SIN VEREDA,145,CAMPO HERMOSO,5493,0,1,1,1,1,Fachada68001093549300001017.jpg,2019-09-10 11:00:31,2019-09-10 11:18:02,710643900,1275988292,710635973,1276002246,919,2019-08-10 12:01:20,17,710625500,-2147483648,710628748,-2147483648,898,2019-09-10 11:00:38,6,0,5493,2019-08-10 12:01:32,1,1,1,93,1,0,0,0,"1,7068E+33",2,1,2,1,2,1,1,1,1,2,4,1,5562,5518,9999,2019-09-10 11:00:38,2019-09-10 11:18:02,1,1,4,3,3,1,1,1,1,1,9,1,99,1,1,1,1,1,2,1,2,2,2,2,2,2,2,1,300000,2,0,2,0,2,0,1,100000,2,0,1,250000,2,0,650000,1,2,0,2,0,2,0,2,0,2,0,2,0,3,2,2019-09-10 11:03:42,2019-09-10 11:18:02,1,1970-01-01,1970-01-01 00:00:00,0,1,1,Firma_68001093549300001017_1.png,0,nicolepaola271213@gmail.com,1,2,15770911,0,2,2,5,2013-12-27,5,2013-12-30,0,0,862,3,5,9,99,1,1,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,2,1,1,0,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-10 11:10:16,2019-09-10 11:12:44,2019-09-10 11:15:42,2019-09-10 11:15:42,1,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,C,7,"6,80017E+26",2,1,0,0,-2147483648,1,2021-09-02 20:44:57.463,375,C07,7.10628748,-73.13936591,CAMPO HERMOSO,05. GARCIA ROVIRA
1,2,68,SANTANDER,68001,BUCARAMANGA,2019-10-03 16:57:01.163,56,62156,"6,80011E+19",20,40018082017,1,1,0,110,110005,743596,14,0,SIN CORREGIMIENTO,0,SIN VEREDA,114,BUENOS AIRES,5486,0,1,1,1,1,Fachada68001065548600002007.jpg,2019-10-03 09:42:06,2019-10-03 10:03:35,713184200,1279942392,713187966,1279934110,1241,2019-09-12 12:04:38,10,713180900,-2147483648,713187059,-2147483648,1231,2019-10-03 09:43:18,6,0,5486,2019-09-12 12:04:59,1,1,1,65,1,0,0,0,"1,7068E+33",2,1,3,1,1,1,1,1,1,1,4,1,5539,5528,9999,2019-10-03 09:43:18,2019-10-03 10:03:35,1,1,4,3,3,1,1,1,1,1,9,1,99,1,1,1,3,1,2,1,2,2,2,2,2,2,2,1,450000,1,100000,2,0,2,0,1,100000,2,0,1,400000,2,0,1050000,2,2,0,2,0,2,0,2,0,2,0,2,0,4,4,2019-10-03 09:45:36,2019-10-03 10:03:35,1,1970-01-01,1970-01-01 00:00:00,0,1,1,Firma_68001065548600002007_1.png,0,0,1,4,15805690,0,2,2,5,2014-08-20,5,2014-11-11,0,0,862,14,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,2,2,0,0,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-10-03 09:55:31,2019-10-03 09:57:33,2019-10-03 10:00:08,2019-10-03 10:00:08,1,0,1,1,0,1,0,1,0,0,1,1,0,0,0,0,0,0,B,2,"6,80017E+26",4,1,0,0,-2147483648,0,1970-01-01 00:00:00.000,6,B02,7.13187059,-73.09989155,BUENOS AIRES,14. MORRORICO
2,3,68,SANTANDER,68001,BUCARAMANGA,2021-11-20 16:28:56.757,1269,43544,"6,8001E+19",8,40118082017,1,1,0,115,115001,742122,9,0,SIN CORREGIMIENTO,0,SIN VEREDA,193,NUEVA GRANADA,5475,0,1,1,1,1,Fachada68001001547500000397.jpg,2019-08-26 07:56:27,2021-11-20 08:25:00,709813300,1278580592,709820867,1278586333,894,2019-08-02 07:49:47,10,709814300,-2147483648,709810543,-2147483648,895,2019-08-26 07:56:00,19,0,5475,2019-08-02 07:49:57,1,1,1,1,2,0,0,65502,"1,7068E+33",1,1,2,1,3,1,1,1,1,3,5,1,5534,5523,9999,2019-08-26 07:56:45,2021-11-20 08:25:00,1,3,5,4,4,1,1,1,1,1,9,1,99,1,1,1,1,1,2,1,1,1,1,2,2,2,2,1,250000,1,20000,2,0,1,103600,1,235000,1,5000,2,0,2,0,613600,4,2,0,2,0,2,0,2,0,2,0,2,0,2,2,2019-08-26 08:00:38,2019-08-26 08:21:44,1,1970-01-01,2021-11-20 08:25:00,1,2,1,Firma_68001001547500000397_1.png,0,0,1,1,15738860,1,1,1,3,1930-02-24,91,1953-07-29,11,11001,170,1,3,9,99,2,99,2,99,2,2,2,2,2,2,2,1,1,2,9,9,9,9,9,9,2,2,0,0,2,7,999,99,9,0,9,0,9,99,0,2,0,2,0,2,0,2,0,2,0,2,0,0,0,2019-08-26 08:10:14,2019-08-26 08:11:32,2021-11-20 08:21:02,2021-11-20 08:21:02,1,1,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,D,3,"6,80017E+26",2,1,0,0,-2147483648,1,2021-11-20 16:39:25.767,6,D03,7.09810543,-73.11369419,NUEVA GRANADA,09. LA PEDREGOSA
3,4,68,SANTANDER,68001,BUCARAMANGA,2019-09-12 14:46:54.523,39,43875,"6,8001E+19",16,40018082017,1,1,0,126,126007,741367,5,0,SIN CORREGIMIENTO,0,SIN VEREDA,142,ALFONSO LÓPEZ,5483,0,2,1,1,1,Fachada68001026548300001183.jpg,2019-09-12 08:59:11,2019-09-12 09:13:02,711285300,1276764592,711291560,1276768897,934,2019-08-13 07:36:02,8,711301200,-2147483648,711287649,-2147483648,917,2019-09-12 08:59:24,18,0,5483,2019-08-13 07:36:22,1,2,1,26,1,0,0,0,"1,7068E+33",1,1,3,1,2,1,1,1,1,2,3,2,5566,5540,9999,2019-09-12 08:59:25,2019-09-12 09:13:02,2,1,0,0,0,1,1,2,1,1,9,1,99,1,1,1,5,3,7,2,2,2,2,2,2,2,2,1,200000,2,0,2,0,2,0,2,0,2,0,1,120000,2,0,320000,2,2,0,2,0,2,0,2,0,2,0,2,0,1,1,2019-09-12 09:08:45,2019-09-12 09:13:02,1,1970-01-01,1970-01-01 00:00:00,0,1,1,Firma_68001026548300001183_2.png,0,0,1,1,15773328,1,1,1,3,1931-09-09,87,1953-09-25,11,11001,170,1,4,9,99,2,99,2,99,1,2,1,2,2,1,2,2,3,2,9,9,9,9,9,9,1,2,2,3,2,1,999,5,99,0,1,200000,99,99,0,2,0,1,100000,2,0,2,0,2,0,1,0,50000,0,2019-09-12 09:10:11,2019-09-12 09:11:03,2019-09-12 09:11:06,2019-09-12 09:11:06,1,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,C,7,"6,80017E+26",1,1,0,0,-2147483648,0,1970-01-01 00:00:00.000,3,C07,7.11287649,-73.1316689,ALFONSO LOPEZ,05. GARCIA ROVIRA
4,5,68,SANTANDER,68001,BUCARAMANGA,2019-09-24 14:55:21.997,48,53705,"6,80011E+19",3,40018082017,1,1,0,128,128007,741235,4,0,SIN CORREGIMIENTO,0,SIN VEREDA,85,CAMILO TORRES,5485,0,1,2,2,1,Fachada68001067548500001513.jpg,2019-09-24 08:20:07,2019-09-24 08:42:56,712222100,1276019792,712198417,1276029061,892,2019-09-06 08:40:40,28,712197200,-2147483648,712197304,-2147483648,945,2019-09-24 08:20:18,6,0,5485,2019-09-06 08:42:11,1,1,1,67,1,0,0,0,"1,7068E+33",2,1,3,1,1,2,2,1,2,99,4,1,5509,5559,9999,2019-09-24 08:20:18,2019-09-24 08:42:56,1,1,1,1,1,3,1,1,6,9,9,9,99,1,1,1,1,1,3,1,2,2,2,2,2,2,2,1,300000,2,0,2,0,2,0,1,100000,2,0,1,350000,2,0,750000,2,2,0,2,0,2,0,2,0,2,0,2,0,8,4,2019-09-24 08:26:22,2019-09-24 08:42:56,1,1970-01-01,1970-01-01 00:00:00,0,2,1,Firma_68001067548500001514_1.png,0,0,1,4,15790647,0,1,2,5,2016-01-06,3,2016-01-26,0,0,862,11,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,2,1,9,9,9,99,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-24 08:36:44,2019-09-24 08:39:02,2019-09-24 08:41:29,2019-09-24 08:41:29,1,0,1,1,0,0,0,1,0,0,1,1,0,1,1,0,0,1,A,5,"6,80017E+26",4,1,0,0,-2147483648,0,1970-01-01 00:00:00.000,705,A05,7.12197304,-73.13916206,ASENTAMIENTO CAMILO TORRES,04. OCCIDENTAL


Now, we observed that some specific columns (such as C, latitud and longitud) have "," for decimal point. And some sintax error such as N.ULL or 0..000.

For "c" columns:

In [52]:
f_sb2022['C'] = f_sb2022['C'].str.replace(",", ".").astype(float)

For geographic ubication columns:

In [53]:
f_sb2022['Latitud'] = f_sb2022['Latitud'].str.replace(",", ".")

In [54]:
f_sb2022['Latitud'] = f_sb2022['Latitud'].replace(['N.ULL', '0..00000000'], ["0", "0"])

In [55]:
f_sb2022['Longitud'] = f_sb2022['Latitud'].replace(['NUL.L'], ["0"])

And change the column's data type

In [56]:
f_sb2022 = f_sb2022.astype({'Latitud':'float', 'Longitud':'float'})

Well, there're our final dataset:

In [57]:
f_sb2022.head()

Unnamed: 0,documento,cod_dpto,Departamento,cod_mpio,Municipio,fec_paquete,num_paquete,num_ficha,ide_ficha_origen,ide_edificacion,ver_estructura,ori_encuesta,Cod_clase,Cod_centro_poblado,Cod_area_coordinacion,Cod_area_operativa,Cod_uni_cobertura,Cod_comuna_x,Cod_corregimiento,NOM_CORREGIMIENTO,Cod_vereda,NOM_VEREDA,Cod_barrio,NOM_BARRIO,Cod_enumerador,tot_viviendas,tot_hogares,ord_vivienda,ind_direccion,uso_vivienda,Ide_foto,fec_ini_encuesta,fec_fin_encuesta,Coord_x_manual_rec,Coord_y_manual_rec,Coord_x_auto_rec,Coord_y_auto_rec,Gps_Alt_auto_rec,Fec_captura_gps_rec,Gps_Distancia_rec,Coord_x_manual_enc,Coord_y_manual_enc,Coord_x_auto_enc,Coord_y_auto_enc,Gps_Alt_auto_enc,Fec_captura_gps_enc,Gps_Distancia_enc,Est_nov_cartografia,Cod_digitador,Fec_digitacion,ind_formato,Num_hogares_recuento,est_ficha,Cod_equipo_encuesta,Num_visita,Cod_Chip,Dir_Chip,Num_solicitud,Cod_UC_total,tip_vivienda,tip_mat_paredes,tip_mat_pisos,ind_tiene_energia,tip_estrato_energia,ind_tiene_alcantarillado,ind_tiene_gas,ind_tiene_recoleccion,ind_tiene_acueducto,tip_estrato_acueducto,num_cuartos_vivienda,num_hogares_vivienda,cod_encuestador,cod_supervisor,cod_critico,fec_ini_vivienda,fec_fin_vivienda,ide_hogar,tip_ocupa_vivienda,num_cuartos_exclusivos,num_cuartos_dormir,num_cuartos_unicos_dormir,tip_sanitario,tip_ubi_sanitario,tip_uso_sanitario,tip_origen_agua,ind_agua_llega_7dias,num_dias_llega,ind_agua_llega_24horas,num_horas_llega,tip_uso_agua_beber,tip_elimina_basura,ind_tiene_cocina,tip_prepara_alimentos,tip_uso_cocina,tip_energia_cocina,ind_tiene_nevera,ind_tiene_lavadora,ind_tiene_pc,ind_tiene_internet,ind_tiene_moto,ind_tiene_tractor,ind_tiene_carro,ind_tiene_bien_raiz,ind_gasto_alimento,vlr_gasto_alimento,ind_gasto_transporte,vlr_gasto_transporte,ind_gasto_educacion,vlr_gasto_educacion,ind_gasto_salud,vlr_gasto_salud,ind_gasto_serv_publicos,vlr_gasto_serv_publicos,ind_gasto_celular,vlr_gasto_celular,ind_gasto_arriendo,vlr_gasto_arriendo,ind_gasto_otros,vlr_gasto_otros,vlr_total_gastos,num_habita_vivienda,ind_evento_inundacion,num_evento_inundacion,ind_evento_avalancha,num_evento_avalancha,ind_evento_terremoto,num_evento_terremoto,ind_evento_incendio,num_evento_incendio,ind_evento_vendaval,num_evento_vendaval,ind_evento_hundimiento,num_evento_hundimiento,num_personas_posibles,num_personas_hogar,fec_ini_visita1,fec_fin_visita1,res_visita1,fec_ini_visita2,fec_fin_visita2,res_visita2,ide_informante,Tip_informante,ide_firma_informante,Cau_sin_firma,Email_contacto,Ind_declaracion,ide_persona,ide_nacional,ind_escaner,sexo_persona,ori_persona,tip_documento,fec_nacimiento,edad_calculada,fec_documento,cod_dpto_documento,cod_mpio_documento,Cod_pais_documento,tip_parentesco,tip_estado_civil,ind_conyuge_vive_hogar,ide_conyuge,ind_padre_vive_hogar,ide_padre,ind_pariente_domestico,ide_serv_domestico,ind_discap_ver,ind_discap_oir,ind_discap_hablar,ind_discap_moverse,ind_discap_bañarse,ind_discap_salir,ind_discap_entender,ind_discap_ninguna,tip_seg_social,ind_enfermo_30,ind_acudio_salud,ind_fue_atendido_salud,ind_esta_embarazada,ind_tuvo_hijos,tip_cuidado_niños,ind_recibe_comida,ind_leer_escribir,ind_estudia,niv_educativo,grado_alcanzado,ind_fondo_pensiones,tip_actividad_mes,num_sem_buscando,tip_empleado,ind_ingr_salario,vlr_ingr_salario,ind_ingr_honorarios,vlr_ingr_honorarios,ind_ingr_cosecha,num_mes_ingr_cosecha,vlr_ingr_cosecha,ind_ingr_pension,vlr_ingr_pension,ind_ingr_remesa_pais,vlr_ingr_remesa_pais,ind_ingr_remesa_exterior,vlr_ingr_remesa_exterior,ind_ingr_arriendos,vlr_ingr_arriendos,ind_otros_ingresos,vlr_otros_ingresos,ind_ingr_estado,vlr_ingr_fam_accion,vlr_ingr_col_mayor,vlr_ingr_otro_subsidio,fec_ini_persona1,fec_fin_persona1,fec_ini_persona2,fec_fin_persona2,ide_Unigasto,Jefe_UG,H_5,I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,I11,I12,I13,I14,I15,Grupo,Nivel,ide_Ug,persug,Capital,marca,estado,Linea_p,replicacion,fec_actualizacion_cns,C,Clasificacion,Latitud,Longitud,BARRIO,COMUNA
0,1,68,SANTANDER,68001,BUCARAMANGA,2019-09-10 15:39:36.563,37,39567,"6,80011E+19",3,40018082017,1,1,0,126,126003,741294,5,0,SIN CORREGIMIENTO,0,SIN VEREDA,145,CAMPO HERMOSO,5493,0,1,1,1,1,Fachada68001093549300001017.jpg,2019-09-10 11:00:31,2019-09-10 11:18:02,710643900,1275988292,710635973,1276002246,919,2019-08-10 12:01:20,17,710625500,-2147483648,710628748,-2147483648,898,2019-09-10 11:00:38,6,0,5493,2019-08-10 12:01:32,1,1,1,93,1,0,0,0,"1,7068E+33",2,1,2,1,2,1,1,1,1,2,4,1,5562,5518,9999,2019-09-10 11:00:38,2019-09-10 11:18:02,1,1,4,3,3,1,1,1,1,1,9,1,99,1,1,1,1,1,2,1,2,2,2,2,2,2,2,1,300000,2,0,2,0,2,0,1,100000,2,0,1,250000,2,0,650000,1,2,0,2,0,2,0,2,0,2,0,2,0,3,2,2019-09-10 11:03:42,2019-09-10 11:18:02,1,1970-01-01,1970-01-01 00:00:00,0,1,1,Firma_68001093549300001017_1.png,0,nicolepaola271213@gmail.com,1,2,15770911,0,2,2,5,2013-12-27,5,2013-12-30,0,0,862,3,5,9,99,1,1,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,2,1,1,0,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-10 11:10:16,2019-09-10 11:12:44,2019-09-10 11:15:42,2019-09-10 11:15:42,1,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,C,7,"6,80017E+26",2,1,0,0,-2147483648,1,2021-09-02 20:44:57.463,3.75,C07,7.106287,7.106287,CAMPO HERMOSO,05. GARCIA ROVIRA
1,2,68,SANTANDER,68001,BUCARAMANGA,2019-10-03 16:57:01.163,56,62156,"6,80011E+19",20,40018082017,1,1,0,110,110005,743596,14,0,SIN CORREGIMIENTO,0,SIN VEREDA,114,BUENOS AIRES,5486,0,1,1,1,1,Fachada68001065548600002007.jpg,2019-10-03 09:42:06,2019-10-03 10:03:35,713184200,1279942392,713187966,1279934110,1241,2019-09-12 12:04:38,10,713180900,-2147483648,713187059,-2147483648,1231,2019-10-03 09:43:18,6,0,5486,2019-09-12 12:04:59,1,1,1,65,1,0,0,0,"1,7068E+33",2,1,3,1,1,1,1,1,1,1,4,1,5539,5528,9999,2019-10-03 09:43:18,2019-10-03 10:03:35,1,1,4,3,3,1,1,1,1,1,9,1,99,1,1,1,3,1,2,1,2,2,2,2,2,2,2,1,450000,1,100000,2,0,2,0,1,100000,2,0,1,400000,2,0,1050000,2,2,0,2,0,2,0,2,0,2,0,2,0,4,4,2019-10-03 09:45:36,2019-10-03 10:03:35,1,1970-01-01,1970-01-01 00:00:00,0,1,1,Firma_68001065548600002007_1.png,0,0,1,4,15805690,0,2,2,5,2014-08-20,5,2014-11-11,0,0,862,14,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,9,9,2,2,0,0,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-10-03 09:55:31,2019-10-03 09:57:33,2019-10-03 10:00:08,2019-10-03 10:00:08,1,0,1,1,0,1,0,1,0,0,1,1,0,0,0,0,0,0,B,2,"6,80017E+26",4,1,0,0,-2147483648,0,1970-01-01 00:00:00.000,6.0,B02,7.131871,7.131871,BUENOS AIRES,14. MORRORICO
2,3,68,SANTANDER,68001,BUCARAMANGA,2021-11-20 16:28:56.757,1269,43544,"6,8001E+19",8,40118082017,1,1,0,115,115001,742122,9,0,SIN CORREGIMIENTO,0,SIN VEREDA,193,NUEVA GRANADA,5475,0,1,1,1,1,Fachada68001001547500000397.jpg,2019-08-26 07:56:27,2021-11-20 08:25:00,709813300,1278580592,709820867,1278586333,894,2019-08-02 07:49:47,10,709814300,-2147483648,709810543,-2147483648,895,2019-08-26 07:56:00,19,0,5475,2019-08-02 07:49:57,1,1,1,1,2,0,0,65502,"1,7068E+33",1,1,2,1,3,1,1,1,1,3,5,1,5534,5523,9999,2019-08-26 07:56:45,2021-11-20 08:25:00,1,3,5,4,4,1,1,1,1,1,9,1,99,1,1,1,1,1,2,1,1,1,1,2,2,2,2,1,250000,1,20000,2,0,1,103600,1,235000,1,5000,2,0,2,0,613600,4,2,0,2,0,2,0,2,0,2,0,2,0,2,2,2019-08-26 08:00:38,2019-08-26 08:21:44,1,1970-01-01,2021-11-20 08:25:00,1,2,1,Firma_68001001547500000397_1.png,0,0,1,1,15738860,1,1,1,3,1930-02-24,91,1953-07-29,11,11001,170,1,3,9,99,2,99,2,99,2,2,2,2,2,2,2,1,1,2,9,9,9,9,9,9,2,2,0,0,2,7,999,99,9,0,9,0,9,99,0,2,0,2,0,2,0,2,0,2,0,2,0,0,0,2019-08-26 08:10:14,2019-08-26 08:11:32,2021-11-20 08:21:02,2021-11-20 08:21:02,1,1,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,D,3,"6,80017E+26",2,1,0,0,-2147483648,1,2021-11-20 16:39:25.767,6.0,D03,7.098105,7.098105,NUEVA GRANADA,09. LA PEDREGOSA
3,4,68,SANTANDER,68001,BUCARAMANGA,2019-09-12 14:46:54.523,39,43875,"6,8001E+19",16,40018082017,1,1,0,126,126007,741367,5,0,SIN CORREGIMIENTO,0,SIN VEREDA,142,ALFONSO LÓPEZ,5483,0,2,1,1,1,Fachada68001026548300001183.jpg,2019-09-12 08:59:11,2019-09-12 09:13:02,711285300,1276764592,711291560,1276768897,934,2019-08-13 07:36:02,8,711301200,-2147483648,711287649,-2147483648,917,2019-09-12 08:59:24,18,0,5483,2019-08-13 07:36:22,1,2,1,26,1,0,0,0,"1,7068E+33",1,1,3,1,2,1,1,1,1,2,3,2,5566,5540,9999,2019-09-12 08:59:25,2019-09-12 09:13:02,2,1,0,0,0,1,1,2,1,1,9,1,99,1,1,1,5,3,7,2,2,2,2,2,2,2,2,1,200000,2,0,2,0,2,0,2,0,2,0,1,120000,2,0,320000,2,2,0,2,0,2,0,2,0,2,0,2,0,1,1,2019-09-12 09:08:45,2019-09-12 09:13:02,1,1970-01-01,1970-01-01 00:00:00,0,1,1,Firma_68001026548300001183_2.png,0,0,1,1,15773328,1,1,1,3,1931-09-09,87,1953-09-25,11,11001,170,1,4,9,99,2,99,2,99,1,2,1,2,2,1,2,2,3,2,9,9,9,9,9,9,1,2,2,3,2,1,999,5,99,0,1,200000,99,99,0,2,0,1,100000,2,0,2,0,2,0,1,0,50000,0,2019-09-12 09:10:11,2019-09-12 09:11:03,2019-09-12 09:11:06,2019-09-12 09:11:06,1,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,C,7,"6,80017E+26",1,1,0,0,-2147483648,0,1970-01-01 00:00:00.000,3.0,C07,7.112876,7.112876,ALFONSO LOPEZ,05. GARCIA ROVIRA
4,5,68,SANTANDER,68001,BUCARAMANGA,2019-09-24 14:55:21.997,48,53705,"6,80011E+19",3,40018082017,1,1,0,128,128007,741235,4,0,SIN CORREGIMIENTO,0,SIN VEREDA,85,CAMILO TORRES,5485,0,1,2,2,1,Fachada68001067548500001513.jpg,2019-09-24 08:20:07,2019-09-24 08:42:56,712222100,1276019792,712198417,1276029061,892,2019-09-06 08:40:40,28,712197200,-2147483648,712197304,-2147483648,945,2019-09-24 08:20:18,6,0,5485,2019-09-06 08:42:11,1,1,1,67,1,0,0,0,"1,7068E+33",2,1,3,1,1,2,2,1,2,99,4,1,5509,5559,9999,2019-09-24 08:20:18,2019-09-24 08:42:56,1,1,1,1,1,3,1,1,6,9,9,9,99,1,1,1,1,1,3,1,2,2,2,2,2,2,2,1,300000,2,0,2,0,2,0,1,100000,2,0,1,350000,2,0,750000,2,2,0,2,0,2,0,2,0,2,0,2,0,8,4,2019-09-24 08:26:22,2019-09-24 08:42:56,1,1970-01-01,1970-01-01 00:00:00,0,2,1,Firma_68001067548500001514_1.png,0,0,1,4,15790647,0,1,2,5,2016-01-06,3,2016-01-26,0,0,862,11,5,9,99,2,99,2,99,2,2,2,2,2,2,2,1,0,2,9,9,9,9,2,1,9,9,9,99,9,9,999,99,99,0,99,0,99,99,0,99,0,99,0,99,0,99,0,99,0,99,0,0,0,2019-09-24 08:36:44,2019-09-24 08:39:02,2019-09-24 08:41:29,2019-09-24 08:41:29,1,0,1,1,0,0,0,1,0,0,1,1,0,1,1,0,0,1,A,5,"6,80017E+26",4,1,0,0,-2147483648,0,1970-01-01 00:00:00.000,7.05,A05,7.121973,7.121973,ASENTAMIENTO CAMILO TORRES,04. OCCIDENTAL


#### 1.3 CHC Dataset

For the CHC Dataset we have:

In [58]:
chc.head()

Unnamed: 0,DIRECTORIO,TIP_FOR,P1,P1S1,P2,P2S1,P5,CTL_1,P8R,P9,P10R,P11R,P12,P13,P14,P15R,P16S1,P16S2,P16S3,P16S4,P16S5,P16S6,P16S7,P16S8,P16S9,P17,P17S1,P17S2,P17S3,P17S4,P17S5,P17S6,P17S7,P17S8,P17S9,P17S10,P18,P19,P20,P20S1,P20S2,P20S3,P20S4,P20S5,P20S1A1,P20S2A1,P20S3A1,P20S4A1,P20S5A1,P21R,P22,P23,P23S1R,P24,P25,P26,P26S1,P26S2,P26S3,P26S4,P26S5,P26S6,P26_1,P26_2S1,P26_2S2,P26_2S3,P26_2S4,P26_2S5,P26_2S6,P26_2S7,P26_2S8,P27,P28R,P29,P30,P30S1,P30S2,P30S3,P30S4,P30S5,P30S6,P30S7,P30S8,P30S9,P30S1A1,P30S2A1,P30S3A1,P30S4A1,P30S5A1,P30S6A1,P30S7A1,P30S8A1,P30S9A1,P30_1,P30_2,P31,P32,P32S1,P32S2,P32S3,P32S4,P32S5,P32S6,P32S7,P32S8,P33,P33S1,P33S2,P33S3,P33S4,P33S5,P33S6,P33_1,P33_2,P33_2S1,P33_2S2,P33_2S3,P33_2S4,P33_2S5,P34,P35,P36R,P37S1,P37S2,P37S3,P37S4,P37S5,P37S6,P37S7,COMPLETA
0,112159,1,68,68001,1,3,2,1,31.0,1.0,3.0,1.0,2.0,1.0,2.0,2.0,4.0,4.0,4.0,3.0,4.0,4.0,4.0,4.0,4.0,2.0,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,,,,,,1.0,1.0,,17.0,5.0,9.0,,2.0,2.0,2.0,2.0,2.0,2.0,,,,,,,,,,1.0,3.0,5.0,,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,13.0,30.0,14.0,16.0,16.0,14.0,,14.0,,6.0,1.0,2.0,,,,,,,,,,,1.0,2.0,1.0,2.0,2.0,2.0,2.0,,2.0,2.0,2.0,2.0,2.0,1.0,,,,,,,,,,1
1,112165,1,68,68001,1,3,2,1,38.0,2.0,3.0,2.0,1.0,1.0,2.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,,,,,,1.0,7.0,,5.0,1.0,6.0,,2.0,2.0,2.0,1.0,2.0,2.0,4.0,,1.0,,,,,,1.0,1.0,2.0,4.0,,1.0,1.0,2.0,1.0,2.0,1.0,2.0,2.0,2.0,25.0,25.0,,25.0,,25.0,,,,6.0,3.0,1.0,1.0,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,2.0,1.0,,2.0,2.0,2.0,2.0,2.0,1.0,,,,,,,,,,1
2,112169,1,68,68001,1,3,2,1,25.0,2.0,3.0,1.0,1.0,1.0,2.0,2.0,4.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,,,,,,1.0,7.0,,1.0,1.0,3.0,,1.0,2.0,2.0,2.0,2.0,2.0,1.0,1.0,1.0,1.0,,,,,1.0,1.0,3.0,4.0,,1.0,2.0,2.0,2.0,2.0,1.0,2.0,1.0,2.0,12.0,,,,,13.0,,11.0,,6.0,1.0,1.0,1.0,,,,,,,,,,1.0,2.0,2.0,1.0,2.0,2.0,1.0,,2.0,2.0,1.0,1.0,1.0,5.0,,,,,,,,,,1
3,112191,1,68,68001,1,3,2,1,26.0,1.0,3.0,1.0,1.0,2.0,2.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,1.0,,1.0,,,,,,,,,1.0,1.0,,2.0,2.0,2.0,2.0,2.0,,,,,,2.0,5.0,,5.0,1.0,3.0,,2.0,2.0,2.0,1.0,2.0,2.0,4.0,,1.0,,,,,,1.0,1.0,4.0,7.0,,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,15.0,11.0,14.0,22.0,12.0,5.0,,5.0,,6.0,1.0,1.0,1.0,,,,,,,,,,2.0,2.0,1.0,1.0,2.0,2.0,2.0,,2.0,2.0,1.0,2.0,2.0,1.0,,,,,,,,,,1
4,112203,1,68,68001,1,3,2,1,44.0,1.0,3.0,1.0,1.0,1.0,2.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,,,,,,1.0,9.0,,30.0,1.0,1.0,,2.0,2.0,2.0,2.0,2.0,2.0,,,,,,,,,,1.0,2.0,5.0,,1.0,2.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,20.0,,20.0,,,,,,,1.0,,2.0,,,,,,,,,,,2.0,2.0,2.0,1.0,1.0,2.0,1.0,,2.0,2.0,2.0,2.0,2.0,1.0,,,,,,,,,,1


and the final fives values:

In [59]:
chc.tail()

Unnamed: 0,DIRECTORIO,TIP_FOR,P1,P1S1,P2,P2S1,P5,CTL_1,P8R,P9,P10R,P11R,P12,P13,P14,P15R,P16S1,P16S2,P16S3,P16S4,P16S5,P16S6,P16S7,P16S8,P16S9,P17,P17S1,P17S2,P17S3,P17S4,P17S5,P17S6,P17S7,P17S8,P17S9,P17S10,P18,P19,P20,P20S1,P20S2,P20S3,P20S4,P20S5,P20S1A1,P20S2A1,P20S3A1,P20S4A1,P20S5A1,P21R,P22,P23,P23S1R,P24,P25,P26,P26S1,P26S2,P26S3,P26S4,P26S5,P26S6,P26_1,P26_2S1,P26_2S2,P26_2S3,P26_2S4,P26_2S5,P26_2S6,P26_2S7,P26_2S8,P27,P28R,P29,P30,P30S1,P30S2,P30S3,P30S4,P30S5,P30S6,P30S7,P30S8,P30S9,P30S1A1,P30S2A1,P30S3A1,P30S4A1,P30S5A1,P30S6A1,P30S7A1,P30S8A1,P30S9A1,P30_1,P30_2,P31,P32,P32S1,P32S2,P32S3,P32S4,P32S5,P32S6,P32S7,P32S8,P33,P33S1,P33S2,P33S3,P33S4,P33S5,P33S6,P33_1,P33_2,P33_2S1,P33_2S2,P33_2S3,P33_2S4,P33_2S5,P34,P35,P36R,P37S1,P37S2,P37S3,P37S4,P37S5,P37S6,P37S7,COMPLETA
1443,132977,2,68,68001,1,4,1,1,36.0,1.0,3.0,1.0,1.0,1.0,1.0,2.0,4.0,2.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,1.0,,1.0,,,,,,,,,11.0,,,2.0,2.0,2.0,2.0,2.0,,,,,,1.0,7.0,,20.0,1.0,1.0,,2.0,2.0,2.0,2.0,2.0,2.0,,,,,,,,,,1.0,4.0,5.0,,1.0,2.0,1.0,2.0,2.0,1.0,2.0,2.0,2.0,15.0,,16.0,,,20.0,,,,6.0,1.0,2.0,,,,,,,,,,,1.0,1.0,1.0,2.0,2.0,2.0,2.0,,2.0,2.0,2.0,2.0,2.0,1.0,,,,,,,,,,1
1444,132978,2,68,68001,1,14,1,1,42.0,1.0,3.0,1.0,1.0,1.0,1.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,,,,,,2.0,7.0,,22.0,7.0,1.0,,2.0,2.0,2.0,2.0,2.0,2.0,,,,,,,,,,1.0,3.0,1.0,,1.0,2.0,1.0,2.0,2.0,1.0,2.0,2.0,2.0,14.0,,14.0,,,14.0,,,,6.0,1.0,2.0,,,,,,,,,,,1.0,2.0,1.0,2.0,2.0,2.0,1.0,,2.0,2.0,2.0,2.0,2.0,1.0,,,,,,,,,,1
1445,133061,1,68,68001,1,15,1,1,28.0,1.0,3.0,1.0,1.0,1.0,1.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,,,,,,1.0,7.0,,12.0,1.0,9.0,,2.0,2.0,2.0,2.0,2.0,2.0,,,,,,,,,,1.0,4.0,5.0,,1.0,2.0,2.0,2.0,2.0,1.0,2.0,2.0,2.0,14.0,,,,,19.0,,,,1.0,,2.0,,,,,,,,,,,2.0,2.0,1.0,2.0,2.0,2.0,1.0,,2.0,2.0,2.0,2.0,2.0,1.0,,,,,,,,,,1
1446,133100,1,68,68001,1,3,1,1,32.0,1.0,3.0,1.0,1.0,1.0,1.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,,,,,,1.0,4.0,,10.0,1.0,9.0,,2.0,2.0,2.0,2.0,2.0,2.0,,,,,,,,,,1.0,3.0,5.0,,1.0,2.0,1.0,2.0,2.0,1.0,2.0,2.0,2.0,14.0,,14.0,,,17.0,,,,6.0,2.0,1.0,2.0,1.0,,,,,,,,,2.0,2.0,1.0,2.0,2.0,2.0,2.0,,2.0,2.0,2.0,2.0,2.0,1.0,,,,,,,,,,1
1447,133112,1,68,68001,1,1,1,1,33.0,2.0,3.0,1.0,1.0,1.0,1.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,,,,,,1.0,2.0,,2.0,2.0,6.0,,2.0,2.0,1.0,2.0,2.0,2.0,3.0,,1.0,,,,,,,1.0,3.0,5.0,,2.0,2.0,2.0,2.0,2.0,1.0,2.0,2.0,2.0,,,,,,19.0,,,,6.0,2.0,1.0,2.0,,,,,,,,1.0,,2.0,2.0,2.0,2.0,2.0,2.0,1.0,,2.0,2.0,1.0,2.0,2.0,1.0,,,,,,,,,,1


Well, the NaN values indicates that the value in the column are FALSE for the cheatseet. So, we decided to fill these values with 0 because, when the column value are true, then 1.

In [60]:
f_chc = chc.fillna(0)

And check the results:

In [61]:
f_chc.head()

Unnamed: 0,DIRECTORIO,TIP_FOR,P1,P1S1,P2,P2S1,P5,CTL_1,P8R,P9,P10R,P11R,P12,P13,P14,P15R,P16S1,P16S2,P16S3,P16S4,P16S5,P16S6,P16S7,P16S8,P16S9,P17,P17S1,P17S2,P17S3,P17S4,P17S5,P17S6,P17S7,P17S8,P17S9,P17S10,P18,P19,P20,P20S1,P20S2,P20S3,P20S4,P20S5,P20S1A1,P20S2A1,P20S3A1,P20S4A1,P20S5A1,P21R,P22,P23,P23S1R,P24,P25,P26,P26S1,P26S2,P26S3,P26S4,P26S5,P26S6,P26_1,P26_2S1,P26_2S2,P26_2S3,P26_2S4,P26_2S5,P26_2S6,P26_2S7,P26_2S8,P27,P28R,P29,P30,P30S1,P30S2,P30S3,P30S4,P30S5,P30S6,P30S7,P30S8,P30S9,P30S1A1,P30S2A1,P30S3A1,P30S4A1,P30S5A1,P30S6A1,P30S7A1,P30S8A1,P30S9A1,P30_1,P30_2,P31,P32,P32S1,P32S2,P32S3,P32S4,P32S5,P32S6,P32S7,P32S8,P33,P33S1,P33S2,P33S3,P33S4,P33S5,P33S6,P33_1,P33_2,P33_2S1,P33_2S2,P33_2S3,P33_2S4,P33_2S5,P34,P35,P36R,P37S1,P37S2,P37S3,P37S4,P37S5,P37S6,P37S7,COMPLETA
0,112159,1,68,68001,1,3,2,1,31.0,1.0,3.0,1.0,2.0,1.0,2.0,2.0,4.0,4.0,4.0,3.0,4.0,4.0,4.0,4.0,4.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,17.0,5.0,9.0,0.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,5.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,13.0,30.0,14.0,16.0,16.0,14.0,0.0,14.0,0.0,6.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,2.0,2.0,2.0,2.0,0.0,2.0,2.0,2.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
1,112165,1,68,68001,1,3,2,1,38.0,2.0,3.0,2.0,1.0,1.0,2.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,7.0,0.0,5.0,1.0,6.0,0.0,2.0,2.0,2.0,1.0,2.0,2.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,4.0,0.0,1.0,1.0,2.0,1.0,2.0,1.0,2.0,2.0,2.0,25.0,25.0,0.0,25.0,0.0,25.0,0.0,0.0,0.0,6.0,3.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,2.0,2.0,1.0,0.0,2.0,2.0,2.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
2,112169,1,68,68001,1,3,2,1,25.0,2.0,3.0,1.0,1.0,1.0,2.0,2.0,4.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,7.0,0.0,1.0,1.0,3.0,0.0,1.0,2.0,2.0,2.0,2.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,3.0,4.0,0.0,1.0,2.0,2.0,2.0,2.0,1.0,2.0,1.0,2.0,12.0,0.0,0.0,0.0,0.0,13.0,0.0,11.0,0.0,6.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,2.0,1.0,2.0,2.0,1.0,0.0,2.0,2.0,1.0,1.0,1.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
3,112191,1,68,68001,1,3,2,1,26.0,1.0,3.0,1.0,1.0,2.0,2.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,2.0,5.0,0.0,5.0,1.0,3.0,0.0,2.0,2.0,2.0,1.0,2.0,2.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,4.0,7.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,15.0,11.0,14.0,22.0,12.0,5.0,0.0,5.0,0.0,6.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
4,112203,1,68,68001,1,3,2,1,44.0,1.0,3.0,1.0,1.0,1.0,2.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,9.0,0.0,30.0,1.0,1.0,0.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,5.0,0.0,1.0,2.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,20.0,0.0,20.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,1.0,1.0,2.0,1.0,0.0,2.0,2.0,2.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1


and check the columns data type:

In [62]:
f_chc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1448 entries, 0 to 1447
Columns: 130 entries, DIRECTORIO to COMPLETA
dtypes: float64(121), int64(9)
memory usage: 1.4 MB


All the columns are INT values.

#### 1.4 CHC Dictionary Dataset

CHC Dictionary have the meaning of all CHC columns.

Check the first five values.

In [63]:
d_chc.head()

Unnamed: 0,ITEM,NOMBRE DE VARIABLE,TIPO DE DATO,VALOR,DESCRIPCION
0,DIRECTORIO,DIRECTORIO,NUMBER (15),,Identificador asignado a la encuesta en el pro...
1,Formulario aplicado en:,TIP_FOR,NUMBER (1),1.0,DMC
2,1. Departamento,P1,VARCHAR2 (2 Byte),,
3,1.1 Municipio o Área no municipalizada,P1S1,VARCHAR2 (5 Byte),,
4,2. Clase,P2,NUMBER (1),1.0,2.1 Cabecera municipal (clase 1)


And the final five values.

In [64]:
d_chc.tail()

Unnamed: 0,ITEM,NOMBRE DE VARIABLE,TIPO DE DATO,VALOR,DESCRIPCION
124,,P37S4,NUMBER (1),1.0,1. aparentemente con problemas de salud mental
125,,P37S5,NUMBER (1),1.0,1. totalmente desinteresada
126,,P37S6,NUMBER (1),1.0,1. hay condiciones de riesgo para los encuesta...
127,,P37S7,NUMBER (1),1.0,1. otra
128,Identificador de finalización de la encuesta,COMPLETA,NUMBER (1),1.0,1. Completa
