## MySQL - pandas

- Una de las características más útiles de pandas es su capacidad para interactuar con bases de datos SQL.

| Función                                  | Descripción                                                              |
|------------------------------------------|--------------------------------------------------------------------------|
| **`pd.read_sql_table(table_name, con)`** |Lee una tabla de SQL en un DataFrame de Pandas.                           |
| **`pd.read_sql_query(sql, con)`**        |Lee una consulta SQL y devuelve un DataFrame de Pandas.                   |
| **`pd.read_sql(sql, con)`**              |Lee una consulta SQL o una tabla de base de datos y devuelve un DataFrame.|
| **`pd.DataFrame.to_sql(name, con)`**     |Escribe registros almacenados en un DataFrame a una base de datos SQL.    |
| **`pd.DataFrame.query()`**               |Filtros de SQL en un DataFrame usando una query de SQL.                   |

### Conectarse a la BBDD

Para conectarnos vamos a usar la librería **sqlalchemy** y **pymysql**:

```htlm
pip install sqlalchemy
pip install pymysql
```

In [22]:
import pandas as pd

import pymysql # Para ver la versión
import sqlalchemy # Para ver la versión
from sqlalchemy import create_engine

In [23]:
# Versiones

print(f"pandas=={pd.__version__}")
print(f"pymysql=={pymysql.__version__}")
print(f"sqlalchemy=={sqlalchemy.__version__}")

pandas==2.2.3
pymysql==1.4.6
sqlalchemy==2.0.37


In [24]:
user = "root"
password = "password"
database = "sakila"

# Crear la conexión
engine = create_engine(f"mysql+pymysql://{user}:{password}@localhost/{database}")

# Abrir una conección
connection = engine.connect()

# Cerrar la conección
connection.close()

#### Leer una tabla en SQL

In [25]:
# Leer la tabla "actor"
df = pd.read_sql_table(table_name = "actor", con = engine)

df.head()

Unnamed: 0,actor_id,first_name,last_name,last_update
0,1,PENELOPE,GUINESS,2006-02-15 04:34:33
1,2,NICK,WAHLBERG,2006-02-15 04:34:33
2,3,ED,CHASE,2006-02-15 04:34:33
3,4,JENNIFER,DAVIS,2006-02-15 04:34:33
4,5,JOHNNY,LOLLOBRIGIDA,2006-02-15 04:34:33


#### Leer una query en SQL

In [26]:
query = "SELECT * FROM actor WHERE first_name = 'PENELOPE'"

df = pd.read_sql_query(sql = query, con = engine)

# Mostrar el DataFrame
df

Unnamed: 0,actor_id,first_name,last_name,last_update
0,1,PENELOPE,GUINESS,2006-02-15 04:34:33
1,54,PENELOPE,PINKETT,2006-02-15 04:34:33
2,104,PENELOPE,CRONYN,2006-02-15 04:34:33
3,120,PENELOPE,MONROE,2006-02-15 04:34:33


#### Leer una tabla o una query

La función **`pd.read_sql`** lee una tabla o una query dependiendo de la estructura de la query. Hace lo mismo que las funciones anteriores.

In [27]:
query = "SELECT * FROM actor WHERE first_name = 'PENELOPE'"

df = pd.read_sql(sql = query, con = engine)

# Mostrar el DataFrame
df

Unnamed: 0,actor_id,first_name,last_name,last_update
0,1,PENELOPE,GUINESS,2006-02-15 04:34:33
1,54,PENELOPE,PINKETT,2006-02-15 04:34:33
2,104,PENELOPE,CRONYN,2006-02-15 04:34:33
3,120,PENELOPE,MONROE,2006-02-15 04:34:33


#### Crear una tabla en SQL desde pandas

- **if_exists** toma los parámetros: `"fail"`, `"replace"` o `"append"`.

In [28]:
# Datos
data = {"name" : ["John", "Anna", "Peter", "Linda"],
        "age"  : [28, 24, 35, 32],
        "city" : ["New York", "Paris", "Berlin", "London"]}

df = pd.DataFrame(data)


# Crear una nueva tabla en sakila
df.to_sql(name = "ejemplo_tabla", con = engine, if_exists = "replace", index = False)

4

#### Filtrar datos en un DataFrame

In [29]:
from sklearn.datasets import load_iris

data = load_iris()

X, y, column_names, target_names = data["data"], data["target"], data["feature_names"], data["target_names"]

target_names = {num : target for num, target in enumerate(target_names)}

df = pd.DataFrame(data = X, columns = [x.replace(" ", "_")[:-5] for x in column_names])

df["target"] = [target_names[x] for x in y]

df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


In [30]:
# Si queremos conocer el dataset:

# Ver qué claves tiene
print(data.keys())

# Ver la descripción del dataset
print(data.DESCR)

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
    - sepal length in cm
    - sepal width in cm
    - petal length in cm
    - petal width in cm
    - class:
            - Iris-Setosa
            - Iris-Versicolour
            - Iris-Virginica

:Summary Statistics:

                Min  Max   Mean    SD   Class Correlation
sepal length:   4.3  7.9   5.84   0.83    0.7826
sepal width:    2.0  4.4   3.05   0.43   -0.4194
petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.go

In [31]:
query = "target == 'setosa'"

df.query(expr = query)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa


In [32]:
query = "sepal_length < 5"

df.query(expr = query)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
6,4.6,3.4,1.4,0.3,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa
11,4.8,3.4,1.6,0.2,setosa
12,4.8,3.0,1.4,0.1,setosa
13,4.3,3.0,1.1,0.1,setosa
22,4.6,3.6,1.0,0.2,setosa


In [33]:
query = "sepal_length < 5 & target == 'versicolor'"

df.query(expr = query)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
57,4.9,2.4,3.3,1.0,versicolor


In [34]:
################################################################################################################################