# Northwind

Existe un ejemplo clásico de un negocio de venta de productos conocido como Northwind. Originalmente diseñado por Microsoft para educar a la gente sobre el uso de access, nosotros emplearemos este ejemplo para ilustrar cómo funcionaría una base de datos operacional y familiarizarnos con la sintaxis SQL.

https://en.wikiversity.org/wiki/Database_Examples/Northwind/SQLite

In [36]:
import sqlite3

# Connect to the DB
conn = sqlite3.connect("northwind.db")
c = conn.cursor()

with open("northwind_script.sql", 'r') as script:
    lines = script.readlines()

c.execute("""PRAGMA foreign_keys = ON;""")
c.executescript("".join(lines));

In [2]:
c.execute("""SELECT name FROM sqlite_master WHERE type='table';""").fetchall()

[('Categories',),
 ('sqlite_sequence',),
 ('Customers',),
 ('Employees',),
 ('Shippers',),
 ('Suppliers',),
 ('Products',),
 ('Orders',),
 ('OrderDetails',)]

[SQLAlchemy](https://www.sqlalchemy.org/) es una solución que nos permite abstraernos de las particularidades de cada proveedor, ya que el habla el dialecto SQL necesario. Esto compaginado con Pandas nos permite realizar todas las operaciones anteriores siempre trabajando con un DataFrame a la hora de devolver los resultados.

Podemos trabajar con distintos motores, empleando las mismas operaciones cada vez:

* [MySQL](https://docs.sqlalchemy.org/en/20/dialects/mysql.html)
* [PostgreSQL](https://docs.sqlalchemy.org/en/20/dialects/postgresql.html)
* [Oracle](https://docs.sqlalchemy.org/en/20/dialects/oracle.html)
* [MS SQL Server](https://docs.sqlalchemy.org/en/20/dialects/mssql.html)

In [3]:
# !pip install sqlalchemy

Creamos el motor de conexión.

In [4]:
from sqlalchemy import create_engine

engine = create_engine('sqlite:///northwind.db', echo=False)

Y dándoselo como parámetro a pandas, podemos obtener la información de una tabla directamente.

In [5]:
import pandas as pd

employee_df = pd.read_sql_table("Employees", con=engine)
employee_df

Unnamed: 0,EmployeeID,LastName,FirstName,BirthDate,Photo,Notes
0,1,Davolio,Nancy,1968-12-08,EmpID1.pic,Education includes a BA in psychology from Col...
1,2,Fuller,Andrew,1952-02-19,EmpID2.pic,Andrew received his BTS commercial and a Ph.D....
2,3,Leverling,Janet,1963-08-30,EmpID3.pic,Janet has a BS degree in chemistry from Boston...
3,4,Peacock,Margaret,1958-09-19,EmpID4.pic,Margaret holds a BA in English literature from...
4,5,Buchanan,Steven,1955-03-04,EmpID5.pic,Steven Buchanan graduated from St. Andrews Uni...
5,6,Suyama,Michael,1963-07-02,EmpID6.pic,Michael is a graduate of Sussex University (MA...
6,7,King,Robert,1960-05-29,EmpID7.pic,Robert King served in the Peace Corps and trav...
7,8,Callahan,Laura,1958-01-09,EmpID8.pic,Laura received a BA in psychology from the Uni...
8,9,Dodsworth,Anne,1969-07-02,EmpID9.pic,Anne has a BA degree in English from St. Lawre...
9,10,West,Adam,1928-09-19,EmpID10.pic,An old chum.


SQLAlchemy permite abstraernos ya que este sistema el que monta las consultas por nosotros dada una conexión. Si explicitamos la consulta usando las funciones de SQLAlchemy nos permite poder cambiar automáticamente entre un dialecto y otro.

In [6]:
from sqlalchemy import select, text, column

stmt = select(text("FirstName")).where(column("EmployeeID") == 1)
print(stmt)

SELECT FirstName 
WHERE "EmployeeID" = :EmployeeID_1


Nosotros seguiremos empleando consultas dado que queremos ahondar en este lenguaje, pero echadle un vistazo a la [documentación](https://docs.sqlalchemy.org/en/20/tutorial/data_select.html).

In [7]:
import pandas as pd

connection = engine.connect()
employee_df = pd.read_sql('SELECT * FROM Employees', con=connection)
employee_df

Unnamed: 0,EmployeeID,LastName,FirstName,BirthDate,Photo,Notes
0,1,Davolio,Nancy,1968-12-08,EmpID1.pic,Education includes a BA in psychology from Col...
1,2,Fuller,Andrew,1952-02-19,EmpID2.pic,Andrew received his BTS commercial and a Ph.D....
2,3,Leverling,Janet,1963-08-30,EmpID3.pic,Janet has a BS degree in chemistry from Boston...
3,4,Peacock,Margaret,1958-09-19,EmpID4.pic,Margaret holds a BA in English literature from...
4,5,Buchanan,Steven,1955-03-04,EmpID5.pic,Steven Buchanan graduated from St. Andrews Uni...
5,6,Suyama,Michael,1963-07-02,EmpID6.pic,Michael is a graduate of Sussex University (MA...
6,7,King,Robert,1960-05-29,EmpID7.pic,Robert King served in the Peace Corps and trav...
7,8,Callahan,Laura,1958-01-09,EmpID8.pic,Laura received a BA in psychology from the Uni...
8,9,Dodsworth,Anne,1969-07-02,EmpID9.pic,Anne has a BA degree in English from St. Lawre...
9,10,West,Adam,1928-09-19,EmpID10.pic,An old chum.


De este modo podemos realizar acciones más elaboradas, que sea la base de datos la que realice las operaciones y obtener los resultados cómodamente en un dataframe.

In [8]:
query = """
SELECT COUNT(*) AS employee_count
FROM Employees  """
pd.read_sql(query, con=connection)

Unnamed: 0,employee_count
0,10


In [9]:
query = """
SELECT BirthDate, typeof(BirthDate)
FROM Employees """
pd.read_sql(query, con=connection)

Unnamed: 0,BirthDate,typeof(BirthDate)
0,1968-12-08,text
1,1952-02-19,text
2,1963-08-30,text
3,1958-09-19,text
4,1955-03-04,text
5,1963-07-02,text
6,1960-05-29,text
7,1958-01-09,text
8,1969-07-02,text
9,1928-09-19,text


Como veis la fecha está en formato texto, habrá que hacer algún tipo de conversión para poder hacer cálculos con este dato.

In [10]:
import datetime

today = datetime.datetime.today().strftime("%Y-%m-%d")
today

'2024-05-23'

Datetime es el comando en SQLite que nos permite convertir una cadena de texto a la fecha que representa.

In [11]:
query = f"""
SELECT *, datetime('{today}') - datetime(birthdate) as Age
FROM Employees """
pd.read_sql(query, con=connection)

Unnamed: 0,EmployeeID,LastName,FirstName,BirthDate,Photo,Notes,Age
0,1,Davolio,Nancy,1968-12-08,EmpID1.pic,Education includes a BA in psychology from Col...,56
1,2,Fuller,Andrew,1952-02-19,EmpID2.pic,Andrew received his BTS commercial and a Ph.D....,72
2,3,Leverling,Janet,1963-08-30,EmpID3.pic,Janet has a BS degree in chemistry from Boston...,61
3,4,Peacock,Margaret,1958-09-19,EmpID4.pic,Margaret holds a BA in English literature from...,66
4,5,Buchanan,Steven,1955-03-04,EmpID5.pic,Steven Buchanan graduated from St. Andrews Uni...,69
5,6,Suyama,Michael,1963-07-02,EmpID6.pic,Michael is a graduate of Sussex University (MA...,61
6,7,King,Robert,1960-05-29,EmpID7.pic,Robert King served in the Peace Corps and trav...,64
7,8,Callahan,Laura,1958-01-09,EmpID8.pic,Laura received a BA in psychology from the Uni...,66
8,9,Dodsworth,Anne,1969-07-02,EmpID9.pic,Anne has a BA degree in English from St. Lawre...,55
9,10,West,Adam,1928-09-19,EmpID10.pic,An old chum.,96


También podemos emplear estas operaciones para filtrar bajo la cláusula `WHERE`

In [12]:
query = f"""
SELECT *
FROM Employees 
WHERE datetime('{today}') - datetime(birthdate) > 65"""
retired_df = pd.read_sql(query, con=connection)
retired_df

Unnamed: 0,EmployeeID,LastName,FirstName,BirthDate,Photo,Notes
0,2,Fuller,Andrew,1952-02-19,EmpID2.pic,Andrew received his BTS commercial and a Ph.D....
1,4,Peacock,Margaret,1958-09-19,EmpID4.pic,Margaret holds a BA in English literature from...
2,5,Buchanan,Steven,1955-03-04,EmpID5.pic,Steven Buchanan graduated from St. Andrews Uni...
3,8,Callahan,Laura,1958-01-09,EmpID8.pic,Laura received a BA in psychology from the Uni...
4,10,West,Adam,1928-09-19,EmpID10.pic,An old chum.


Así nos quedamos con los datos que nos interesan y podemos llevárnoslos a otra tabla.

In [13]:
retired_df.to_sql(name='Retired', con=engine)

5

In [14]:
c.execute("""SELECT name FROM sqlite_master WHERE type='table';""").fetchall()

[('Categories',),
 ('sqlite_sequence',),
 ('Customers',),
 ('Employees',),
 ('Shippers',),
 ('Suppliers',),
 ('Products',),
 ('Orders',),
 ('OrderDetails',),
 ('Retired',)]

In [15]:
query = """
SELECT FirstName, LastName
FROM Employees 
WHERE Notes LIKE '%Sussex%' """
pd.read_sql(query, con=connection)

Unnamed: 0,FirstName,LastName
0,Michael,Suyama


In [16]:
query = """
SELECT *
FROM Employees e
INNER JOIN Orders o ON o.EmployeeID = e.EmployeeID"""
pd.read_sql(query, con=connection)

Unnamed: 0,EmployeeID,LastName,FirstName,BirthDate,Photo,Notes,OrderID,CustomerID,EmployeeID.1,OrderDate,ShipperID
0,5,Buchanan,Steven,1955-03-04,EmpID5.pic,Steven Buchanan graduated from St. Andrews Uni...,10248,90,5,1996-07-04,3
1,6,Suyama,Michael,1963-07-02,EmpID6.pic,Michael is a graduate of Sussex University (MA...,10249,81,6,1996-07-05,1
2,4,Peacock,Margaret,1958-09-19,EmpID4.pic,Margaret holds a BA in English literature from...,10250,34,4,1996-07-08,2
3,3,Leverling,Janet,1963-08-30,EmpID3.pic,Janet has a BS degree in chemistry from Boston...,10251,84,3,1996-07-08,1
4,4,Peacock,Margaret,1958-09-19,EmpID4.pic,Margaret holds a BA in English literature from...,10252,76,4,1996-07-09,2
...,...,...,...,...,...,...,...,...,...,...,...
191,6,Suyama,Michael,1963-07-02,EmpID6.pic,Michael is a graduate of Sussex University (MA...,10439,51,6,1997-02-07,3
192,4,Peacock,Margaret,1958-09-19,EmpID4.pic,Margaret holds a BA in English literature from...,10440,71,4,1997-02-10,2
193,3,Leverling,Janet,1963-08-30,EmpID3.pic,Janet has a BS degree in chemistry from Boston...,10441,55,3,1997-02-10,2
194,3,Leverling,Janet,1963-08-30,EmpID3.pic,Janet has a BS degree in chemistry from Boston...,10442,20,3,1997-02-11,2


In [17]:
query = """
SELECT o.EmployeeID, e.FirstName, count(*) AS Ventas
FROM Employees e
INNER JOIN Orders o ON o.EmployeeID = e.EmployeeID
GROUP BY o.EmployeeID
ORDER BY Ventas DESC"""
pd.read_sql(query, con=connection)

Unnamed: 0,EmployeeID,FirstName,Ventas
0,4,Margaret,40
1,3,Janet,31
2,1,Nancy,29
3,8,Laura,27
4,2,Andrew,20
5,6,Michael,18
6,7,Robert,14
7,5,Steven,11
8,9,Anne,6


# Indices

Los índices nos permiten preparar los datos de ciertos campos por los que preguntemos habitualmente en los filtros. Así, en lugar de mirar la tabla entera en busca de aquellas filas que contengan la información deseada, podremos consultar en e índice qué filas contienen esa información.

https://www.sqlitetutorial.net/sqlite-index/

In [18]:
query = """
EXPLAIN QUERY PLAN 
SELECT *
FROM Products
WHERE ProductName = 'Filo Mix'"""
c.execute(query).fetchall()

[(2, 0, 0, 'SCAN Products')]

In [23]:
c.execute("""CREATE INDEX productname_ind ON Products(ProductName);""").fetchall()

[]

In [20]:
c.execute(query).fetchall()

[(3, 0, 0, 'SEARCH Products USING INDEX productname_ind (ProductName=?)')]

In [29]:
c.execute("""DROP INDEX productname_ind;""").fetchall()

[]

In [30]:
query = """
SELECT p.ProductName, COUNT(o.OrderID) AS Pedidos, SUM(od.Quantity*p.Price) AS Vendido
FROM Employees e
INNER JOIN Orders o ON o.EmployeeID = e.EmployeeID
INNER JOIN OrderDetails od on o.OrderID = od.OrderID
INNER JOIN Products p on p.ProductID = od.ProductID
WHERE ProductName = 'Filo Mix'
GROUP BY p.ProductName
ORDER BY Vendido DESC"""
%timeit pd.read_sql(query, con=connection)

345 µs ± 2.59 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [33]:
c.close()