# Universidad Internacional de La Rioja  

### Máster Universitario en Visual Analytics and Big Data  

---

### **Predicción y Análisis de la Demanda y Suministro de Productos entre la Comunidad Andina y España**  
**Presentado por:** Danilo Andrés Beleño Villafañe  

---

### **Notebook 3: Etapa de comprension de los datos - Perfilado**  


In [1]:
# pip install google-cloud-bigquery
# pip install db-dtypes # Necesario para descargar datos
# pip install google-cloud-bigquery-storage # Optimiza proceso de descarga
# pip install ydata-profiling - Libreria necesaria para perfilado de los datos

In [2]:
import pandas as pd # Manejo de datos
import sweetviz as sv # PRofiling
from google.cloud import bigquery # Cliente de GCP
from ydata_profiling import ProfileReport # Profiling

In [3]:
client = bigquery.Client()

In [4]:
table_id = "unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina"

query = f"SELECT * FROM `{table_id}` LIMIT 1"
query_job = client.query(query)
df = query_job.to_dataframe()

In [5]:
df.head(1)

Unnamed: 0,tipo_movimiento,cod_pais,nombre_pais,cod_provincia,nombre_provincia,nombre_comunidad,estado,euros,dolares,comex_nivel_taric,cod_taric,kilogramos,anio,mes,nivel_taric_detalle,descripcion_taric
0,E,480,Colombia,27,Lugo,Galicia,provisional,2531.4,2735.94,1,4,339.0,2023,11,1,"LECHE, PRODUCTOS LÁCTEOS; HUEVOS"


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   tipo_movimiento      1 non-null      object 
 1   cod_pais             1 non-null      object 
 2   nombre_pais          1 non-null      object 
 3   cod_provincia        1 non-null      object 
 4   nombre_provincia     1 non-null      object 
 5   nombre_comunidad     1 non-null      object 
 6   estado               1 non-null      object 
 7   euros                1 non-null      float64
 8   dolares              1 non-null      float64
 9   comex_nivel_taric    1 non-null      object 
 10  cod_taric            1 non-null      object 
 11  kilogramos           1 non-null      float64
 12  anio                 1 non-null      Int64  
 13  mes                  1 non-null      Int64  
 14  nivel_taric_detalle  1 non-null      Int64  
 15  descripcion_taric    1 non-null      object 

In [7]:
df.columns

Index(['tipo_movimiento', 'cod_pais', 'nombre_pais', 'cod_provincia',
       'nombre_provincia', 'nombre_comunidad', 'estado', 'euros', 'dolares',
       'comex_nivel_taric', 'cod_taric', 'kilogramos', 'anio', 'mes',
       'nivel_taric_detalle', 'descripcion_taric'],
      dtype='object')

In [8]:
columns_to_analyze_lvl = [
    'euros',
    'kilogramos',
    'dolares',
]

In [9]:
def profiling(query, column_name, client, level = '0'):
    query_job = client.query(query)
    df = query_job.to_dataframe()
    profile = ProfileReport(df, title=f"Reporte de Perfilado {column_name}", explorative=True)
    profile.to_file(f"perfilado_ydata/{column_name}_lvl_{level}.html")
    reporte = sv.analyze(df)
    reporte.show_html(f"perfilado_sweetviz/{column_name}_lvl_{level}.html")

In [10]:
for column_name in df.columns:
    if column_name in columns_to_analyze_lvl:
        for level in range(1,6):
            query = f"SELECT {column_name} FROM `{table_id}` WHERE comex_nivel_taric = '{level}'"
            print(query)
            profiling(query, column_name, client, level)
    else:
        query = f"SELECT {column_name} FROM `{table_id}`"
        print(query)
        profiling(query, column_name, client)

SELECT tipo_movimiento FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/tipo_movimiento_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT cod_pais FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/cod_pais_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT nombre_pais FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/nombre_pais_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT cod_provincia FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/cod_provincia_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT nombre_provincia FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/nombre_provincia_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT nombre_comunidad FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/nombre_comunidad_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT estado FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/estado_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT euros FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '1'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/euros_lvl_1.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT euros FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '2'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/euros_lvl_2.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT euros FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '3'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/euros_lvl_3.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT euros FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '4'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/euros_lvl_4.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT euros FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '5'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/euros_lvl_5.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT dolares FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '1'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/dolares_lvl_1.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT dolares FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '2'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/dolares_lvl_2.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT dolares FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '3'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/dolares_lvl_3.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT dolares FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '4'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/dolares_lvl_4.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT dolares FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '5'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/dolares_lvl_5.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT comex_nivel_taric FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/comex_nivel_taric_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT cod_taric FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/cod_taric_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT kilogramos FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '1'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/kilogramos_lvl_1.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT kilogramos FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '2'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/kilogramos_lvl_2.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT kilogramos FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '3'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/kilogramos_lvl_3.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT kilogramos FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '4'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/kilogramos_lvl_4.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT kilogramos FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina` WHERE comex_nivel_taric = '5'


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/kilogramos_lvl_5.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT anio FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/anio_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT mes FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/mes_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT nivel_taric_detalle FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/nivel_taric_detalle_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
SELECT descripcion_taric FROM `unir-predictiv0-andina-espana.datacomex.comex_comunidad_andina`


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

                                             |          | [  0%]   00:00 -> (? left)

Report perfilado_sweetviz/descripcion_taric_lvl_0.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
