# SQL en Python
<img src=".\img\image-4.png" alt="EDA Path"
    title="A typical EDA path" width="600" height="300" />
## Conectamos con la base de datos

### Documentación de la librería SQLite3 que vamos a utilizar:
https://docs.python.org/3/library/sqlite3.html


In [1]:
# Importamos librerias
import pandas as pd 
import sqlite3 

In [2]:
# Conectamos con la base de datos chinook.db
path = "chinook.db"
connection = sqlite3.connect(path)

# Obtenemos un cursor que utilizaremos para hacer las queries. El cursor representa a la base de datos
curs = connection.cursor()


In [3]:
# Creamos una Query sencilla
query = """
SELECT *
FROM genres
"""

In [5]:
# Ejecutamos la query
my_query = curs.execute(query)
my_query

<sqlite3.Cursor at 0x20a4b1ad730>

In [6]:
# Pedimos el resultado de la query
my_query.fetchall()

[(1, 'Rock'),
 (2, 'Jazz'),
 (3, 'Metal'),
 (4, 'Alternative & Punk'),
 (5, 'Rock And Roll'),
 (6, 'Blues'),
 (7, 'Latin'),
 (8, 'Reggae'),
 (9, 'Pop'),
 (10, 'Soundtrack'),
 (11, 'Bossa Nova'),
 (12, 'Easy Listening'),
 (13, 'Heavy Metal'),
 (14, 'R&B/Soul'),
 (15, 'Electronica/Dance'),
 (16, 'World'),
 (17, 'Hip Hop/Rap'),
 (18, 'Science Fiction'),
 (19, 'TV Shows'),
 (20, 'Sci Fi & Fantasy'),
 (21, 'Drama'),
 (22, 'Comedy'),
 (23, 'Alternative'),
 (24, 'Classical'),
 (25, 'Opera')]

In [8]:
# Con esta función leemos los datos y lo pasamos a un DataFrame de Pandas
def sql_query(query):
    curs.execute(query)
    datos_query = curs.fetchall()
    col_names = [description[0] for description in curs.description]
    return pd.DataFrame(datos_query, columns=col_names)

sql_query(query)

Unnamed: 0,GenreId,Name
0,1,Rock
1,2,Jazz
2,3,Metal
3,4,Alternative & Punk
4,5,Rock And Roll
5,6,Blues
6,7,Latin
7,8,Reggae
8,9,Pop
9,10,Soundtrack


In [9]:
# Tambien podemos obtener el mismo resultado directamente con pandas
pd.read_sql_query(query,connection) # connection es la path de la base de datos

Unnamed: 0,GenreId,Name
0,1,Rock
1,2,Jazz
2,3,Metal
3,4,Alternative & Punk
4,5,Rock And Roll
5,6,Blues
6,7,Latin
7,8,Reggae
8,9,Pop
9,10,Soundtrack


## Ya podemos comenzar con la práctica de chinook:
Antes de empezar a atacar una base de datos, tendremos que saber qué hay dentro, y para ello lo mejor es ver cómo es su **modelo de datos**

![imagen](./img/chinook_data_model.png)

### 1.	Facturas de Clientes de Brasil, Nombre del cliente, Id de factura, fecha de la factura y el país de la factura

In [22]:
query1 = """
SELECT *
FROM invoices i, customers c
WHERE c.CustomerId = i.CustomerId   

"""

In [44]:
# Si las tablas no tienen una key en común no se pueden relacionar directamente. El esquema nos dice que la relación entre estas dos tablas es customersId
# el merge de las tablas solo muestra la intersección de las dos tablas, solo aparecen los elementos comunes (el where sería un inner join)
# si una columna esta en las dos tablas para especificar que columna quiero tendria que decir invoices.FirstName por ejemplo
query1 = """
SELECT invoices.*, FirstName
FROM invoices, customers
WHERE customers.CustomerId = invoices.CustomerId AND customers.Country = "Brazil"
"""

In [42]:
# Otra forma de hacerlo:
query= """
SELECT c.FirstName||" "||c.LastName as "Full Name", i.InvoiceId, i.InvoiceDate, i.BillingCountry
FROM customers c
JOIN invoices i ON c.CustomerId = i.CustomerId
WHERE c.Country = "Brazil"
"""

pd.read_sql_query(query,connection)


Unnamed: 0,Full Name,InvoiceId,InvoiceDate,BillingCountry
0,Luís Gonçalves,98,2010-03-11 00:00:00,Brazil
1,Luís Gonçalves,121,2010-06-13 00:00:00,Brazil
2,Luís Gonçalves,143,2010-09-15 00:00:00,Brazil
3,Luís Gonçalves,195,2011-05-06 00:00:00,Brazil
4,Luís Gonçalves,316,2012-10-27 00:00:00,Brazil
5,Luís Gonçalves,327,2012-12-07 00:00:00,Brazil
6,Luís Gonçalves,382,2013-08-07 00:00:00,Brazil
7,Eduardo Martins,25,2009-04-09 00:00:00,Brazil
8,Eduardo Martins,154,2010-11-14 00:00:00,Brazil
9,Eduardo Martins,177,2011-02-16 00:00:00,Brazil


### 2.	Facturas de Clientes de Brasil

In [53]:
query2 = """
SELECT i.*
FROM customers c
JOIN invoices i ON c.CustomerId = i.CustomerId
WHERE c.Country = "Brazil"
"""

pd.read_sql_query(query2,connection).head()
# Pongo head() aqui simplemente para que la tabla no sea tan larga al hacer los ejercicios

Unnamed: 0,InvoiceId,CustomerId,InvoiceDate,BillingAddress,BillingCity,BillingState,BillingCountry,BillingPostalCode,Total
0,98,1,2010-03-11 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,3.98
1,121,1,2010-06-13 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,3.96
2,143,1,2010-09-15 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,5.94
3,195,1,2011-05-06 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,0.99
4,316,1,2012-10-27 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,1.98


### 3.	Muestra cada factura asociada a cada agente de ventas con su nombre completo.

In [59]:
query3="""
SELECT i.*, e.FirstName||" "||e.LastName as "Full Name"
FROM employees e
JOIN customers c ON e.EmployeeId = c.SupportRepId
JOIN invoices i ON c.CustomerId = i.CustomerId
"""

pd.read_sql_query(query3,connection)

Unnamed: 0,InvoiceId,CustomerId,InvoiceDate,BillingAddress,BillingCity,BillingState,BillingCountry,BillingPostalCode,Total,Full Name
0,98,1,2010-03-11 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,3.98,Jane Peacock
1,121,1,2010-06-13 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,3.96,Jane Peacock
2,143,1,2010-09-15 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,5.94,Jane Peacock
3,195,1,2011-05-06 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,0.99,Jane Peacock
4,316,1,2012-10-27 00:00:00,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,1.98,Jane Peacock
...,...,...,...,...,...,...,...,...,...,...
407,88,57,2010-01-13 00:00:00,"Calle Lira, 198",Santiago,,Chile,,17.91,Steve Johnson
408,217,57,2011-08-20 00:00:00,"Calle Lira, 198",Santiago,,Chile,,1.98,Steve Johnson
409,240,57,2011-11-22 00:00:00,"Calle Lira, 198",Santiago,,Chile,,3.96,Steve Johnson
410,262,57,2012-02-24 00:00:00,"Calle Lira, 198",Santiago,,Chile,,5.94,Steve Johnson


### 4.	Para cada factura muestra el nombre del cliente, el país, el nombre del agente y el total

In [101]:
# Siempre es más facil empezar por una tabla de los extremos
query4= """
SELECT c.FirstName||" "||c.LastName as "Full Name", c.Country, e.FirstName||" "||e.LastName as "Employee", i.Total
FROM employees e
JOIN customers c ON e.EmployeeId = c.SupportRepId
JOIN invoices i ON c.CustomerId = i.CustomerId
"""

pd.read_sql_query(query4,connection).head(2)

Unnamed: 0,Full Name,Country,Employee,Total
0,Luís Gonçalves,Brazil,Jane Peacock,3.98
1,Luís Gonçalves,Brazil,Jane Peacock,3.96


### 5.	Muestra cada artículo de la factura con el nombre de la canción.

In [88]:
query5 ="""
SELECT i.*, t.Name
FROM tracks t
JOIN invoice_items ii ON t.TrackId = ii.TrackId
JOIN invoices i ON ii.InvoiceId = i.InvoiceId
"""

pd.read_sql_query(query5,connection).head(2)

Unnamed: 0,InvoiceId,CustomerId,InvoiceDate,BillingAddress,BillingCity,BillingState,BillingCountry,BillingPostalCode,Total,Name
0,1,2,2009-01-01 00:00:00,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,1.98,Balls to the Wall
1,1,2,2009-01-01 00:00:00,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,1.98,Restless and Wild


### 6.	Muestra todas las canciones con su nombre, formato, álbum y género.

In [108]:
# Es importante dar nombres a los select para ver que estamos mostrando
query6 = """
SELECT t.Name as "Song Name", m.Name as "Format", a.Title as "Album", g.Name as "Genre"
FROM tracks t
JOIN genres g ON t.GenreId = g.GenreId
JOIN albums a ON t.AlbumId = a.AlbumId
JOIN media_types m ON t.MediaTypeId = t.MediaTypeId
"""

pd.read_sql_query(query6,connection).head(2)

Unnamed: 0,Song Name,Format,Album,Genre
0,For Those About To Rock (We Salute You),MPEG audio file,For Those About To Rock We Salute You,Rock
1,For Those About To Rock (We Salute You),Protected AAC audio file,For Those About To Rock We Salute You,Rock


### 7.	Muestra cuántas canciones hay en cada playlist y el nombre de cada playlist.

In [93]:
query7 = """
SELECT pt.TrackId, 
FROM playlists p
JOIN playlist_track pt ON p.PlaylistId = pt.PlaylistId
JOIN tracks t ON TrackId = 

"""

pd.read_sql_query(query7,connection).head(2)

Unnamed: 0,PlaylistId,Name,PlaylistId.1,TrackId
0,1,Music,1,1
1,1,Music,1,2


In [107]:
q = """pragma table_info("media_types")"""
pd.read_sql_query(q, connection).T.head(2)

Unnamed: 0,0,1
cid,0,1
name,MediaTypeId,Name


### 8.	Muestra cuánto ha vendido cada empleado.

In [None]:
query8 = 

pd.read_sql_query(query8,connection)

### 9.	¿Quién ha sido el agente de ventas que más ha vendido en 2009?

### 10.	¿Cuáles son los 3 grupos que más han vendido?

### 11. Muestra cuántas canciones de Rock hay en cada playlist

### 12. Muestra una tabla con todas canciones y su(s) Id de factura, hayan sido vendidas alguna vez o no.

### 13. ¿Cuántos artistas no tienen ningún album?

In [153]:
# Es una left join como en pandas. Quiero comparar la informacion con una columna
query13 = """
SELECT COUNT(*) as "Artists without album"
FROM artists a
LEFT JOIN albums al ON al.ArtistId = a.ArtistId
WHERE al.Title IS NULL
"""

pd.read_sql_query(query13,connection)

Unnamed: 0,Artists without album
0,71
