# Bases de datos en Python

## Índice
1. [Acceso a bases de datos relacionales](#sql)
2. [Crear bases de datos relacionales](#crear)
3. [Bases de datos NoSQL](#nosql)

<a id="sql"></a>
## Acceso a bases de datos relacionales

Podemos acceder a bases de datos SQL con la librería `pymysql`. Para instalarla escribimos en Anaconda Prompt:  
`conda install -c anaconda pymysql`

In [1]:
import pymysql

#### Ejemplo 1
Vamos a conectarnos a la base de datos NBA, que contiene estadísticas de partidos de una temporada.  
Utilizaremos los siguientes parámetros:  
* servidor: relational.fit.cvut.cz
* usuario: guest
* contraseña: relational  
* base de datos: NBA  

![image.png](attachment:image.png)

Vamos a crear una conexión con la base de datos

In [2]:
database_host = 'relational.fit.cvut.cz'
username = 'guest'
password = 'relational'
database_name = 'NBA'

db = pymysql.connect(host=database_host,
                     user=username,
                     password=password,
                     database=database_name)
cursor = db.cursor()

La función `connect()` crea una conexión a la base de datos. Un *cursor* nos permite realizar operaciones con los datos almacenados en la base de datos. 

<img src='https://i.ibb.co/L8HH0G5/cursor.png'>  

Una vez creado el cursor, podemos empezar a ejecutar comandos sobre el contenido de la base de datos utilizando el métido `execute()`,

Al ejecutar queries, utilizamos los métodos `fetchone()` (primera fila) o `fectchall()` (todas las filas) para visizar los resultados de las consultas. Para cerrar la conexión, utilizamso el método `close()`

In [3]:
cursor.execute("SELECT * FROM Player")
cursor.fetchall()

((1, 'Nicolas Batum'),
 (2, 'LaMarcus Aldridge'),
 (3, 'Robin Lopez'),
 (4, 'Wesley Matthews'),
 (5, 'Damian Lillard'),
 (6, 'Thomas Robinson'),
 (7, 'Maurice Williams'),
 (8, 'Will Barton'),
 (9, 'Dorell Wright'),
 (10, 'Earl Watson'),
 (11, 'CJ McCollum'),
 (12, 'Meyers Leonard'),
 (13, 'Victor Claver'),
 (14, 'Kent Bazemore'),
 (15, 'Pau Gasol'),
 (16, 'Chris Kaman'),
 (17, 'Jodie Meeks'),
 (18, 'Kendall Marshall'),
 (19, 'Steve Nash'),
 (20, 'Xavier Henry'),
 (21, 'Robert Sacre'),
 (22, 'Ryan Kelly'),
 (23, 'Nick Young'),
 (24, 'Marshon Brooks'),
 (25, 'Jordan Hill'),
 (26, 'Wesley Johnson'),
 (27, 'Andre Iguodala'),
 (28, 'Draymond Green'),
 (29, "Jermaine O'Neal"),
 (30, 'Klay Thompson'),
 (31, 'Stephen Curry'),
 (32, 'Marreese Speights'),
 (33, 'Harrison Barnes'),
 (34, 'Steve Blake'),
 (35, 'Jordan Crawford'),
 (36, 'Hilton Armstrong'),
 (37, 'Andrew Bogut'),
 (38, 'David Lee'),
 (39, 'Shawn Marion'),
 (40, 'Dirk Nowitzki'),
 (41, 'Samuel Dalembert'),
 (42, 'Monta Ellis'),
 (43

Podemos ver las tablas de la base de datos con la query `SHOW TABLES`

In [4]:
cursor.execute('SHOW TABLES')
cursor.fetchall()

(('Actions',),
 ('Game',),
 ('Player',),
 ('Team',),
 ('joined_drafted_all_players_original',))

In [5]:
cursor.close()

La función `read_sql()` de pandas nos permite crear dataframes a partir de queries. Con este método no es necesario crear un cursor

In [8]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore') # Esconde los warnings
query = "SELECT * FROM Actions"
df = pd.read_sql(query,db)
df

Unnamed: 0,GameId,TeamId,PlayerId,Minutes,FieldGoalsMade,FieldGoalAttempts,3PointsMade,3PointAttempts,FreeThrowsMade,FreeThrowAttempts,...,DefensiveRebounds,TotalRebounds,Assists,PersonalFouls,Steals,Turnovers,BlockedShots,BlocksAgainst,Points,Starter
0,1,7,78,2605,5,14,3,3,0,0,...,3,3,4,2,1,1,0,0,13,1
1,1,7,79,2359,11,19,3,3,8,8,...,8,8,3,2,3,3,1,0,34,1
2,1,7,80,2104,6,7,3,3,3,8,...,7,9,1,3,1,1,2,0,15,1
3,1,7,81,1392,1,5,3,3,0,0,...,2,2,0,4,1,0,0,0,2,1
4,1,7,82,2124,5,8,3,3,1,2,...,3,3,6,1,1,4,0,0,12,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
762,30,25,316,0,0,0,3,3,0,0,...,0,0,0,0,0,0,0,0,0,0
763,30,25,317,0,0,0,3,3,0,0,...,0,0,0,0,0,0,0,0,0,0
764,30,25,318,0,0,0,3,3,0,0,...,0,0,0,0,0,0,0,0,0,0
765,30,17,344,0,0,0,3,3,0,0,...,0,0,0,0,0,0,0,0,0,0


Obtenemos los jugadores con más de 5 asistencias en algún partido

In [10]:
query = '''
SELECT PlayerName, Assists
FROM Actions
JOIN Player
ON Actions.PlayerId = Player.PlayerId
WHERE Assists > 5
ORDER BY Assists DESC
'''
Asistencias5 = pd.read_sql(query,db)
Asistencias5

Unnamed: 0,PlayerName,Assists
0,Brandon Jennings,13
1,Ty Lawson,12
2,Ramon Sessions,11
3,Blake Griffin,11
4,DJ Augustin,11
...,...,...
64,Raymond Felton,6
65,Rajon Rondo,6
66,Zaza Pachulia,6
67,Phil Pressey,6


Obtenemos el TOP 10 jugadores con más asistencias en la temporada

In [12]:
query = '''
SELECT PlayerName, SUM(Assists) AS TotalAssists
FROM Actions ACT
JOIN Player PLA
ON ACT.PlayerId = PLA.PlayerId
GROUP BY PlayerName
ORDER BY TotalAssists DESC
LIMIT 10
'''
TopTenAsistencias = pd.read_sql(query,db)
TopTenAsistencias

Unnamed: 0,PlayerName,TotalAssists
0,Chris Paul,27.0
1,Kemba Walker,25.0
2,Brandon Jennings,22.0
3,Ty Lawson,21.0
4,Demar DeRozan,20.0
5,Ramon Sessions,17.0
6,Andre Miller,17.0
7,Lebron James,16.0
8,DJ Augustin,16.0
9,John Wall,16.0


Obtenemos el TOP 10 Equipos con más puntuación media por partido

In [17]:
query = '''
SELECT TeamName,
       AVG(TotalPoints) AS AvgPoints
FROM(
    SELECT TeamName, GameId, SUM(Points) AS TotalPoints
    FROM Actions
    JOIN Team
    ON Actions.TeamId = Team.TeamId
    GROUP BY TeamName, GameId
    )T1
GROUP BY TeamName
ORDER BY AvgPoints DESC
LIMIT 10
'''
TopTenAsistencias = pd.read_sql(query,db)
TopTenAsistencias

Unnamed: 0,TeamName,AvgPoints
0,Portland Trail Blazers,124.0
1,Cleveland Cavaliers,119.0
2,Dallas Mavericks,116.5
3,Golden State Warriors,112.5
4,Los Angeles Clippers,111.0
5,Charlotte Bobcats,108.3333
6,Phoenix Suns,108.0
7,Denver Nuggets,107.0
8,Los Angeles Lakers,107.0
9,Oklahoma City Thunder,106.0


#### Ejemplo 2
Vamos a conectarnos a la base de datos de los empleados de una empresa  
* servidor: relational.fit.cvut.cz
* usuario: guest
* contraseña: relational  
* base de datos: employees
![image.png](attachment:image.png)

In [22]:
database_host = 'relational.fit.cvut.cz'
username = 'guest'
password = 'relational'
database_name = 'employee'

db = pymysql.connect(host=database_host,
                     user=username,
                     password=password,
                     database=database_name)

Obtener el salario máximo, mínimo y medio por género y cargo

In [26]:
query = '''
SELECT * FROM employees
LIMIT 10
'''

pd.read_sql(query,db)

Unnamed: 0,emp_no,birth_date,first_name,last_name,gender,hire_date
0,10001,1953-09-02,Georgi,Facello,M,1986-06-26
1,10002,1964-06-02,Bezalel,Simmel,F,1985-11-21
2,10003,1959-12-03,Parto,Bamford,M,1986-08-28
3,10004,1954-05-01,Chirstian,Koblick,M,1986-12-01
4,10005,1955-01-21,Kyoichi,Maliniak,M,1989-09-12
5,10006,1953-04-20,Anneke,Preusig,F,1989-06-02
6,10007,1957-05-23,Tzvetan,Zielinski,F,1989-02-10
7,10008,1958-02-19,Saniya,Kalloufi,M,1994-09-15
8,10009,1952-04-19,Sumant,Peac,F,1985-02-18
9,10010,1963-06-01,Duangkaew,Piveteau,F,1989-08-24


In [30]:
query_5 = '''
SELECT COUNT(distinct emp_no)
FROM titles
WHERE title = 'Manager'
AND to_date = '9999-01-01'
'''
# Miramos cuantos Engineers hay en la empresa
## Usamos el 'distinct' para que nos cuente solo los elementos distintos
manager = pd.read_sql(query_5, db)
manager

Unnamed: 0,COUNT(distinct emp_no)
0,9


In [23]:
query = '''
SELECT * FROM titles
LIMIT 10
'''

pd.read_sql(query,db)

Unnamed: 0,emp_no,title,from_date,to_date
0,10001,Senior Engineer,1986-06-26,9999-01-01
1,10002,Staff,1996-08-03,9999-01-01
2,10003,Senior Engineer,1995-12-03,9999-01-01
3,10004,Engineer,1986-12-01,1995-12-01
4,10004,Senior Engineer,1995-12-01,9999-01-01
5,10005,Senior Staff,1996-09-12,9999-01-01
6,10005,Staff,1989-09-12,1996-09-12
7,10006,Senior Engineer,1990-08-05,9999-01-01
8,10007,Senior Staff,1996-02-11,9999-01-01
9,10007,Staff,1989-02-10,1996-02-11


In [29]:
query = '''
SELECT title, gender, MIN(salary), AVG(salary), MAX(salary), COUNT(salary)
FROM employees emp
JOIN salaries sal
ON  emp.emp_no = sal.emp_no
JOIN titles tit
ON emp.emp_no = tit.emp_no
WHERE sal.to_date = '9999-01-01'
AND tit.to_date = '9999-01-01'
GROUP BY title, gender
'''

Salario = pd.read_sql(query,db)
Salario

Unnamed: 0,title,gender,MIN(salary),AVG(salary),MAX(salary),COUNT(salary)
0,Assistant Engineer,M,39827,57197.9674,117636,2148
1,Assistant Engineer,F,39469,57495.9861,106340,1440
2,Engineer,M,38942,59592.9683,130939,18571
3,Engineer,F,39519,59617.3549,115444,12412
4,Manager,M,56654,79350.6,106491,5
5,Manager,F,65400,75690.0,83457,4
6,Senior Engineer,M,39285,70869.9085,140784,51533
7,Senior Engineer,F,39476,70753.8341,138273,34406
8,Senior Staff,M,39012,80735.4795,158220,49232
9,Senior Staff,F,39227,80662.9816,152710,32792


In [31]:
query_5 = '''
SELECT COUNT(distinct emp_no)
FROM titles
WHERE title = 'Manager'
AND to_date = '9999-01-01'
'''
# Miramos cuantos Engineers hay en la empresa
## Usamos el 'distinct' para que nos cuente solo los elementos distintos
manager = pd.read_sql(query_5, db)
manager

Unnamed: 0,COUNT(distinct emp_no)
0,9


<a id="crear"></a>
## Crear bases de datos relacionales

Podemos crear nuestras propias bases de datos utilizando SQLite

In [32]:
import sqlite3

In [33]:
conn = sqlite3.connect('my_database.sqlite')
cursor = conn.cursor()

Podemos crear tablas con el comando `CREATE TABLE`

In [34]:
cursor.execute('''
CREATE TABLE SCHOOL
(ID INT PRIMARY KEY NOT NULL,
 NAME TEXT NOT NULL,
 AGE INT NOT NULL,
 CITY CHAR(50),
 MARKS INT
 )
''')

<sqlite3.Cursor at 0x1a97aaa2cc0>

Para insertar valores usamos `INSERT INTO`. Siempre que hagamos un cambio en nuestra base de datos, tenemos que confirmarlo utilizando `commit()`

In [37]:
cursor.execute('''
INSERT INTO SCHOOL (ID,NAME,AGE,CITY,MARKS) 
                    VALUES(1,"Luis",24,"Madrid",8)
''')

cursor.execute('''
INSERT INTO SCHOOL (ID,NAME,AGE,CITY,MARKS) 
                    VALUES(2,"Ana",34,"Bilbao",9)
''')

cursor.execute('''
INSERT INTO SCHOOL (ID,NAME,AGE,CITY,MARKS) 
                    VALUES(3,"Pedro",19,"Santander",10)
''')

conn.commit()

IntegrityError: UNIQUE constraint failed: SCHOOL.ID

In [38]:
pd.read_sql("SELECT * FROM SCHOOL", conn)

Unnamed: 0,ID,NAME,AGE,CITY,MARKS
0,1,Luis,24,Madrid,8
1,2,Ana,34,Bilbao,9
2,3,Pedro,19,Santander,10


También podemos crear tablas a partir de dataframes

In [51]:
Salario.to_sql('SALARY', conn, index=False)

14

In [43]:
pd.read_sql("SELECT * FROM SALARY", conn)

Unnamed: 0,title,gender,MIN(salary),AVG(salary),MAX(salary),COUNT(salary)
0,Assistant Engineer,M,39827,57197.9674,117636,2148
1,Assistant Engineer,F,39469,57495.9861,106340,1440
2,Engineer,M,38942,59592.9683,130939,18571
3,Engineer,F,39519,59617.3549,115444,12412
4,Manager,M,56654,79350.6,106491,5
5,Manager,F,65400,75690.0,83457,4
6,Senior Engineer,M,39285,70869.9085,140784,51533
7,Senior Engineer,F,39476,70753.8341,138273,34406
8,Senior Staff,M,39012,80735.4795,158220,49232
9,Senior Staff,F,39227,80662.9816,152710,32792


In [42]:
pd.read_sql('SELECT name FROM sqlite_master WHERE type="table"', conn)

Unnamed: 0,name
0,SCHOOL
1,SALARY


En ocasiones necesitamos crear vistas

In [53]:
cursor.execute('''
CREATE VIEW INGENIEROS AS
SELECT * FROM SALARY
WHERE title = 'Engineer'
''')
conn.commit()

OperationalError: view INGENIEROS already exists

In [55]:
pd.read_sql('SELECT * FROM INGENIEROS', conn)

Unnamed: 0,title,gender,MIN(salary),AVG(salary),MAX(salary),COUNT(salary)
0,Engineer,M,38942,59592.9683,130939,18571
1,Engineer,F,39519,59617.3549,115444,12412


Para borrar tablas usamos el comando `DROP TABLE`

In [49]:
# Borrar tablas

cursor.execute('DROP TABLE IF EXISTS SALARY')

In [50]:
# Listar tablas
pd.read_sql('SELECT name FROM sqlite_master WHERE type="table"', conn)

Unnamed: 0,ID,NAME,AGE,CITY,MARKS
0,1,Luis,24,Madrid,8
1,3,Pedro,19,Santander,7


Para actualizar registros usamos el comando `UPDATE`

In [None]:
# Actualizar registros
conn.execute('UPDATE SCHOOL SET MARKS=7 WHERE ID=3')
conn.commit()

In [None]:
pd.read_sql('SELECT * FROM SCHOOL', conn)

Si queremos borrar registros usamos `DELETE`

In [None]:
conn.execute('DELETE FROM SCHOOL WHERE ID=2')
conn.commit()

In [None]:
pd.read_sql('SELECT * FROM SCHOOL', conn)

Existen aplicaciones gratuitas, como [DB Browser for SQLite](https://sqlitebrowser.org/) que nos operar con bases de datos SQL desde una interfaz 

<a id="nosql"></a>
## Material Extra: Bases de datos NoSQL (MongoDB)

Las principales diferencias entre SQL y MongoDB son las siguientes: 
<img src='http://4.bp.blogspot.com/-edz2_QrFvCE/UnzBhKZE3FI/AAAAAAAAAEs/bTEsqnZFTXw/s1600/SQL-MongoDB+Correspondence.PNG'>

Vamos a conectarnos a una base de datos en MongoDB, para lo cual debemos instalar las siguientes librerías:  
`conda install -c anaconda pymongo`  
`conda install -c anaconda dnspython`

In [57]:
from pymongo import MongoClient
import dns

In [58]:
client = MongoClient("mongodb+srv://rzl:rzl@cluster0.a0ju2.mongodb.net/?retryWrites=true&w=majority")

Nos conectamos a la base de datos [Sample Airbnb](https://docs.atlas.mongodb.com/sample-data/sample-airbnb/). Esta base de datos contiene una única colección llamada listingsAndReviews, que contiene documentos representando detalles de viviendas turísticas en airbnb.


In [59]:
db = client.get_database('sample_airbnb')

In [60]:
records = db.listingsAndReviews

In [61]:
# Contamos los documentos
records.count_documents({})

5555

Para hacer queries se utiliza el método `find()`

In [62]:
list(records.find())[0]

{'_id': '1001265',
 'listing_url': 'https://www.airbnb.com/rooms/1001265',
 'name': 'Ocean View Waikiki Marina w/prkg',
 'summary': "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.",
 'space': 'Great studio located on Ala Moana across the street from Yacht Harbor and near Ala Moana Shopping Center. Studio kitchette, parking, wifi, TV, A/C. Amenities include pool, hot tub and tennis. Sweet ocean views with nice ocean breezes.',
 'description': "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village. Great studio located on Ala Moana across the street from Yacht Harbor and near Ala Moana Shopping 

Filtramos viviendas con 2 baños y 3 dormitorios, con alguna review

In [63]:
list(records.find({'bathrooms':2,
                  'bedrooms':3,
                  'number_of_reviews':{'$ne':0}}).limit(3))

[{'_id': '10423504',
  'listing_url': 'https://www.airbnb.com/rooms/10423504',
  'name': 'Bondi Beach Dreaming 3-Bed House',
  'summary': "This peaceful house in North Bondi is 300m to the beach and a minute's walk to cafes and bars. With 3 bedrooms, (can sleep up to 8) it is perfect for families, friends and pets. The kitchen was recently renovated and a new lounge and chairs installed. The house has a peaceful, airy, laidback vibe  - a perfect beach retreat. Longer-term bookings encouraged. Parking for one car. A parking permit for a second car can also be obtained on request.",
  'space': "Serene space with three bedrooms, including a studio at the back, 300m to the beach and near to best cafes and bars in Bondi. Parking for one car. This wonderful house is designed to cater for families or groups with plenty of space and flexible bedding arrangements. There are three bedrooms including a master bedroom with a king bed, separate studio with a queen bed and a room with two single bed

Puedes encontrar más documentación sobre la librería `pymongo` en https://api.mongodb.com/python/current/