### Connet to PostgreSQL

In [1]:
import psycopg2

pgconn = psycopg2.connect(
    host = "localhost",
    user = "postgres",
    password = "dada331",
    database = "dvdrental"
)

In [2]:
import pandas as pd

### To use pandas to_sql() method, we must use SQLAlchemy

**en este caso no se usará to_sql( ) de pandas, pero es necesario recordarlo**

**el metodo to_sql( ) de pandas se usa para guardar un dataframe a una tabla en una base de datos**

### Es necesario usar SQLAlchemy para poder interactuar con pandas y lenguaje sql

In [3]:
from sqlalchemy import create_engine

# connection string: driver://username:password@server/database
engine = create_engine('postgresql+psycopg2://postgres:dada331@localhost/dvdrental')

### Agregation Functions

**Most Common Aggregate Functions**

* AVG() - returns average value
* COUNT() - returns number of values
* MAX() - returns maximum value
* MIN() - returns minimum value
* SUM() - returns the sum of all values

**Agregate function calls happen only in the SELECT clause or the HAVING clause**

**Las llamadas a funciones agregadas ocurren solo en la cláusula SELECT o en la cláusula HAVING**

**Special Notes**
* AVG( ) returns a floating point value many decimal places (e.g 2.342448....)
    * You can use ROUND( ) to specify precision after the decimal.
* COUNT( ) simply returns the number of rows, which means by convention we just use COUNT(*)

In [4]:
pd.read_sql_query(
    '''
    SELECT 
        * 
    FROM 
        film;
    '''
, engine).head(2) # only show 2 records

Unnamed: 0,film_id,title,description,release_year,language_id,rental_duration,rental_rate,length,replacement_cost,rating,last_update,special_features,fulltext
0,133,Chamber Italian,A Fateful Reflection of a Moose And a Husband ...,2006,1,7,4.99,117,14.99,NC-17,2013-05-26 14:50:58.951,[Trailers],'chamber':1 'fate':4 'husband':11 'italian':2 ...
1,384,Grosse Wonderful,A Epic Drama of a Cat And a Explorer who must ...,2006,1,5,4.99,49,19.99,R,2013-05-26 14:50:58.951,[Behind the Scenes],'australia':18 'cat':8 'drama':5 'epic':4 'exp...


In [5]:
pd.read_sql_query(
    '''
    SELECT 
        MIN(replacement_cost)
    FROM 
        film;
    '''
, engine) 

Unnamed: 0,min
0,9.99


In [6]:
pd.read_sql_query(
    '''
    SELECT 
        MAX(replacement_cost),
        MIN(replacement_cost)
    FROM 
        film;
    '''
, engine)

Unnamed: 0,max,min
0,29.99,9.99


In [7]:
pd.read_sql_query(
    '''
    SELECT 
        AVG(replacement_cost)
    FROM 
        film;
    '''
,engine)

Unnamed: 0,avg
0,19.984


In [12]:
# I want to round and then I going to put a comma here and how many decimal places I want
pd.read_sql_query(
    '''
    SELECT 
        ROUND(AVG(replacement_cost), 2)
    FROM 
        film;
    '''
,engine)


Unnamed: 0,round
0,19.98


In [13]:
pd.read_sql_query(
    '''
    SELECT 
        SUM(replacement_cost)
    FROM 
        film;
    '''
,engine)


Unnamed: 0,sum
0,19984.0


### GROUP BY - Part One

* SELECT category_col, AGG(data_col)
* FROM table
* GROUP BY category_col

* The GROUP BY clause must appear right after a FROM or WHERE statement.


**In the SELECT statement, columns must either have an aggregate function or be in the GROUP BY call**

**En la declaración SELECT, las columnas deben tener una función agregada o estar en la llamada GROUP BY**

**WHERE statements should not refer to the aggregation result, later on we will learn to use HAVING to filter on those results**

**Las declaraciones WHERE no deben referirse al resultado de la agregación, más adelante aprenderemos a usar HAVING para filtrar esos resultados**

In [6]:
pd.read_sql_query(
    '''
    SELECT 
        customer_id
    FROM
        payment
    GROUP BY
        customer_id
    ORDER BY
        customer_id
    '''
,engine).head(10)

Unnamed: 0,customer_id
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9,10


**what customer is spending the most money in total?**

In [9]:
pd.read_sql_query(
    '''
    SELECT 
        customer_id,
        SUM(amount)
    FROM
        payment
    GROUP BY
        customer_id
    ORDER BY
        SUM(amount) DESC
    '''
,engine).head(10)

Unnamed: 0,customer_id,sum
0,148,211.55
1,526,208.58
2,178,194.61
3,137,191.62
4,144,189.6
5,459,183.63
6,181,167.67
7,410,167.62
8,236,166.61
9,403,162.67


In [11]:
pd.read_sql_query(
    '''
    SELECT 
        customer_id,
        COUNT(amount)
    FROM
        payment
    GROUP BY
        customer_id
    ORDER BY
        COUNT(amount) DESC
    '''
,engine).head(5)

Unnamed: 0,customer_id,count
0,148,45
1,526,42
2,144,40
3,75,39
4,236,39


In [13]:
pd.read_sql_query(
    '''
    SELECT 
        customer_id,
        staff_id,
        SUM(amount)
    FROM
        payment
    GROUP BY
        staff_id,
        customer_id
    ORDER BY
        staff_id,
        customer_id
    '''
,engine).head(10)

Unnamed: 0,customer_id,staff_id,sum
0,1,1,60.85
1,2,1,55.86
2,3,1,59.88
3,4,1,49.88
4,5,1,63.86
5,6,1,53.85
6,7,1,69.84
7,8,1,57.86
8,9,1,39.88
9,10,1,40.88


In [16]:
pd.read_sql_query(
    '''
    SELECT
        (payment_date)
    FROM
        payment
    '''
,engine).head(3)

Unnamed: 0,payment_date
0,2007-02-15 22:25:46.996577
1,2007-02-16 17:23:14.996577
2,2007-02-16 22:41:45.996577


In [17]:
# use DATE
pd.read_sql_query(
    '''
    SELECT
        DATE(payment_date)
    FROM
        payment
    '''
,engine).head(3)

Unnamed: 0,date
0,2007-02-15
1,2007-02-16
2,2007-02-16


In [19]:
pd.read_sql_query(
    '''
    SELECT
        DATE(payment_date),
        SUM(amount)
    FROM
        payment
    GROUP BY 
        DATE(payment_date)
    ORDER BY
        SUM(amount) DESC
    '''
,engine).head(5)

Unnamed: 0,date,sum
0,2007-04-30,5723.89
1,2007-03-21,2868.27
2,2007-03-01,2808.24
3,2007-04-29,2717.6
4,2007-03-18,2701.76


### GROUP BY - Challenge

**Corporate HQ is conducting a study on the relationship beteen replacement cost and a movie MPAA rating (e.g G, PC, R, etc...)**

**La sede corporativa está realizando un estudio sobre la relación entre el costo de reemplazo y la clasificación MPAA de una película (por ejemplo, G, PC, R, etc...)**

**What is the average replacement cost per MPAA rating?**

    * Note: You may need to expand the AVG column to view correct results

In [5]:
pd.read_sql_query(
    '''
    SELECT
        rating,
        ROUND(AVG(replacement_cost), 2)
    FROM
        film
    GROUP BY 
        rating
    '''
,engine)

Unnamed: 0,rating,round
0,R,20.23
1,NC-17,20.14
2,G,20.12
3,PG,18.96
4,PG-13,20.4


**We are running a promotion to reward our top 5 customers with coupons**

**Estamos realizando una promoción para recompensar a nuestros 5 mejores clientes con cupones**


**What are the customer ids of the top 5 customers by total spend?**

**¿Cuáles son los identificadores de los 5 principales clientes por gasto total?**

In [6]:
pd.read_sql_query(
    '''
    SELECT
        customer_id,
        SUM(amount)
    FROM
        payment
    GROUP BY
        customer_id
    ORDER BY
        SUM(amount) DESC
    LIMIT 5
    '''
,engine)

Unnamed: 0,customer_id,sum
0,148,211.55
1,526,208.58
2,178,194.61
3,137,191.62
4,144,189.6


### HAVING

**The HAVING clause allows us to filter after an aggregation has already taken place.**

**La cláusula HAVING nos permite filtrar después de que ya se haya producido una agregación.**

**We can not use WHERE to filter based off of aggregate results, because those happen after a WHERE is executed**

**No podemos usar WHERE para filtrar en función de los resultados agregados, porque eso sucede después de que se ejecuta WHERE**

In [5]:
pd.read_sql_query(
    '''
    SELECT
        customer_id,
        SUM(amount)
    FROM
        payment
    GROUP BY
        customer_id
    HAVING
        SUM(amount) > 100
    '''
,engine).head()

Unnamed: 0,customer_id,sum
0,87,137.72
1,477,106.79
2,273,130.72
3,550,151.69
4,51,123.7


In [7]:
pd.read_sql_query(
    '''
    SELECT
        store_id,
        COUNT(*)
    FROM
        customer
    GROUP BY
        store_id
    '''
,engine)#.head()

Unnamed: 0,store_id,count
0,1,326
1,2,273


In [6]:
pd.read_sql_query(
    '''
    SELECT
        store_id,
        COUNT(*)
    FROM
        customer
    GROUP BY
        store_id
    HAVING
        COUNT(*) > 300
    '''
,engine)#.head()

Unnamed: 0,store_id,count
0,1,326


### HAVING CHALLENGE

**We are launchhing a platunum servie for our most loyal customers.  We will assign platinum status to customers that have hhad 40 or more transaction payments.**

**What customer_ids are eligible for platinum status?**

**Lanzamos un servicio platino para nuestros clientes más fieles. Asignaremos el estado de platino a los clientes que hayan tenido 40 o más pagos de transacciones.**

**¿Qué ID_clientes son elegibles para el estado platino?**

In [8]:
pd.read_sql_query(
    '''
    SELECT
        customer_id,
        COUNT(*)
    FROM
        payment
    GROUP BY
        customer_id
    HAVING
        COUNT(*) >= 40;
    '''
,engine)

Unnamed: 0,customer_id,count
0,144,40
1,526,42
2,148,45


**What are the customer ids of customers who hace spent more than $100 in payment transactions with our staff_id member 2?** 

**¿Cuáles son los ids de los clientes que han gastado más de $100 en transacciones de pago con su miembro de identificación de personal 2?**

In [9]:
pd.read_sql_query(
    '''
    SELECT
        customer_id,
        SUM(amount)
    FROM
        payment
    WHERE
        staff_id = 2
    GROUP BY
        customer_id
    HAVING
        SUM(amount) > 100;
    '''
,engine)

Unnamed: 0,customer_id,sum
0,187,110.81
1,522,102.8
2,526,101.78
3,211,108.77
4,148,110.78
