# Carga de países (tabla dm_pais)

## Conexión a __MySQL__ BBDD __local__

### Utilizando la libreria SQLAlchemy

Cargamos los objetos pertinentes de la librería sqlalchemy

In [1]:
from sqlalchemy import create_engine

Creamos el motor (engine) de SQLAlchemy

In [2]:
conn = create_engine("mysql+mysqlconnector://root:password@localhost/dw_aero")

Probamos la conexión del motor de SQLAlchemy con la base de datos

In [3]:
try:
    with conn.connect() as connection:
        print("Conexión exitosa.")
except Exception as e:
    print(f"Ocurrió un error al conectar a la base de datos: {e}")

Conexión exitosa.


## Comenzamos con el trabajo

### Comprobación de presencia de datos en la tabla dm_pais
Primero comprobamos si existen datos de la tabla __dm_pais__.

Primero debemos importar Pandas

In [4]:
import pandas as pd

In [5]:
pd.read_sql_table('dm_pais', con=conn, schema='dw_aero')

Unnamed: 0,cod_pais,pais,cod_pais2,cod_continente,continente,longitud,latitud,cod_pais;cod_pais2;desc_pais;cod_continente;desc_continente,country,latitude,longitude,name


## Utilizaremos los ficheros __paises.xls__ y __countries.csv__

### Lectura del fichero paises.xls

In [8]:
#!pip install xlrd==2.0.1

In [7]:
pip install xlrd==2.0.1

Collecting xlrd==2.0.1
  Downloading xlrd-2.0.1-py2.py3-none-any.whl.metadata (3.4 kB)
Downloading xlrd-2.0.1-py2.py3-none-any.whl (96 kB)
Installing collected packages: xlrd
Successfully installed xlrd-2.0.1
Note: you may need to restart the kernel to use updated packages.


In [8]:
df_pais1 = pd.read_excel("https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fraw.githubusercontent.com%2Fbintutr%2FData-Integration%2Frefs%2Fheads%2Fmain%2Fconexi%25C3%25B3n%2520BBDD%2520Mysql%2FDatasets%2Fpaises.xls&wdOrigin=BROWSELINK")
df_pais1.head()

Unnamed: 0,cod_pais,cod_pais2,desc_pais,cod_continente,desc_continente
0,AFG,AF,Afghanistan,AS,Asia
1,ALB,AL,Albania,EU,Europe
2,DZA,DZ,Algeria,AF,Africa
3,ASM,AS,American Samoa,OC,Oceania
4,AND,AD,Andorra,EU,Europe


### Lectura del fichero countries.csv

In [11]:
df_con = pd.read_csv("https://raw.githubusercontent.com/bintutr/Data-Integration/refs/heads/main/conexi%C3%B3n%20BBDD%20Mysql/Datasets/countries.csv", sep=',', header='infer')
df_con.head()

Unnamed: 0,country,latitude,longitude,name
0,AD,42.546245,1.601554,Andorra
1,AE,23.424076,53.847818,United Arab Emirates
2,AF,33.93911,67.709953,Afghanistan
3,AG,17.060816,-61.796428,Antigua and Barbuda
4,AI,18.220554,-63.068615,Anguilla


### Buscamos duplicados en paises

Primero por todos los campos

In [12]:
print(df_pais1.duplicated().sum())

1


In [13]:
df_pais1[df_pais1.duplicated(keep=False)]

Unnamed: 0,cod_pais,cod_pais2,desc_pais,cod_continente,desc_continente
135,MLT,MT,Malta,EU,Europe
136,MLT,MT,Malta,EU,Europe


Duplicados por clave cop_pais2

In [14]:
df_pais1[df_pais1.duplicated(subset='cod_pais2')]

Unnamed: 0,cod_pais,cod_pais2,desc_pais,cod_continente,desc_continente
136,MLT,MT,Malta,EU,Europe


Eliminamos duplicados

In [15]:
df_pais2 = df_pais1.drop_duplicates(subset='cod_pais2', keep='first')

In [16]:
print(df_pais2.duplicated().sum())

0


Unimos los ficheros

In [17]:
df_pais3 = pd.merge(left=df_pais2, right=df_con,
                      how='left', left_on='cod_pais2', right_on='country')

In [18]:
df_pais3.head()

Unnamed: 0,cod_pais,cod_pais2,desc_pais,cod_continente,desc_continente,country,latitude,longitude,name
0,AFG,AF,Afghanistan,AS,Asia,AF,33.93911,67.709953,Afghanistan
1,ALB,AL,Albania,EU,Europe,AL,41.153332,20.168331,Albania
2,DZA,DZ,Algeria,AF,Africa,DZ,28.033886,1.659626,Algeria
3,ASM,AS,American Samoa,OC,Oceania,AS,-14.270972,-170.132217,American Samoa
4,AND,AD,Andorra,EU,Europe,AD,42.546245,1.601554,Andorra


Comprobamos diferencias

In [19]:
df_pais3[df_pais3['country'].isnull()]

Unnamed: 0,cod_pais,cod_pais2,desc_pais,cod_continente,desc_continente,country,latitude,longitude,name
243,Z99,Z9,desconocido,Z9,desconocido,,,,


In [20]:
pd.read_sql_query("DESCRIBE dm_pais", conn)

Unnamed: 0,Field,Type,Null,Key,Default,Extra
0,cod_pais,varchar(3),NO,PRI,,
1,pais,varchar(100),NO,,,
2,cod_pais2,varchar(2),YES,,,
3,cod_continente,varchar(10),YES,,,
4,continente,varchar(100),YES,,,
5,longitud,"decimal(12,6)",YES,,,
6,latitud,"decimal(12,6)",YES,,,
7,cod_pais;cod_pais2;desc_pais;cod_continente;de...,varchar(64),YES,,,
8,country,varchar(50),YES,,,
9,latitude,double,YES,,,


Seleccion de columnas

In [21]:
df_pais_def = df_pais3[['cod_pais', 'desc_pais', 'cod_pais2', 
                        'cod_continente', 'desc_continente', 'longitude', 'latitude', ]]

In [22]:
df_pais_def

Unnamed: 0,cod_pais,desc_pais,cod_pais2,cod_continente,desc_continente,longitude,latitude
0,AFG,Afghanistan,AF,AS,Asia,67.709953,33.939110
1,ALB,Albania,AL,EU,Europe,20.168331,41.153332
2,DZA,Algeria,DZ,AF,Africa,1.659626,28.033886
3,ASM,American Samoa,AS,OC,Oceania,-170.132217,-14.270972
4,AND,Andorra,AD,EU,Europe,1.601554,42.546245
...,...,...,...,...,...,...,...
239,ESH,Western Sahara,EH,AF,Africa,-12.885834,24.215527
240,YEM,Yemen,YE,AS,Asia,48.516388,15.552727
241,ZMB,Zambia,ZM,AF,Africa,27.849332,-13.133897
242,ZWE,Zimbabwe,ZW,AF,Africa,29.154857,-19.015438


Renombrado de columnas

In [23]:
df_pais_def = df_pais_def.rename(columns={"desc_pais": "pais", "desc_continente":"continente", "longitude": "longitud",
                                          "latitude": "latitud"})

In [24]:
df_pais_def.head()

Unnamed: 0,cod_pais,pais,cod_pais2,cod_continente,continente,longitud,latitud
0,AFG,Afghanistan,AF,AS,Asia,67.709953,33.93911
1,ALB,Albania,AL,EU,Europe,20.168331,41.153332
2,DZA,Algeria,DZ,AF,Africa,1.659626,28.033886
3,ASM,American Samoa,AS,OC,Oceania,-170.132217,-14.270972
4,AND,Andorra,AD,EU,Europe,1.601554,42.546245


Volcado a BBDD

In [25]:
df_pais_def.to_sql('dm_pais', con = conn, if_exists = 'append', index=False)

244

In [26]:
df_pais_def = pd.read_sql_table('dm_pais', con=conn)

In [27]:
df_pais_def.head()

Unnamed: 0,cod_pais,pais,cod_pais2,cod_continente,continente,longitud,latitud,cod_pais;cod_pais2;desc_pais;cod_continente;desc_continente,country,latitude,longitude,name
0,ABW,Aruba,AW,,North America,-69.968338,12.52111,,,,,
1,AFG,Afghanistan,AF,AS,Asia,67.709953,33.93911,,,,,
2,AGO,Angola,AO,AF,Africa,17.873887,-11.202692,,,,,
3,AIA,Anguilla,AI,,North America,-63.068615,18.220554,,,,,
4,ALB,Albania,AL,EU,Europe,20.168331,41.153332,,,,,


In [28]:
df_pais_def.count()

cod_pais                                                       244
pais                                                           244
cod_pais2                                                      244
cod_continente                                                 208
continente                                                     244
longitud                                                       242
latitud                                                        242
cod_pais;cod_pais2;desc_pais;cod_continente;desc_continente      0
country                                                          0
latitude                                                         0
longitude                                                        0
name                                                             0
dtype: int64