# PYTHON: DATA MANAGEMENT TIPS OVER CONNECTIVITY

# MySQL Platform

By: Hector Alvaro Rojas &nbsp;&nbsp;|&nbsp;&nbsp; Data Science, Visualizations and Applied Statistics &nbsp;&nbsp;|&nbsp;&nbsp; January 20, 2018<br>
    Url: [http://www.arqmain.net]   &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;   GitHub: [https://github.com/arqmain]
    <hr>

## 1 How about getting the Connection?

I present two connections "pymysql and sqlalchemy" but I decide using "SQLAlchemy" to keep the standar of the examples done with "MS-SQL", SQLite" and "PostgreSQL".  They accept "SQLAlchemy" to get "df.to_sql" when using Pandas.

In [24]:
# pymsql connection
import pymysql.cursors

# Connect to the database
conn = pymysql.connect(host='localhost',
                             user='root',
                             password='hell',
                             db='mybase1',
                             charset='utf8mb4',
                             cursorclass=pymysql.cursors.DictCursor)
conn.close()

In [23]:
# sqlalchemy connection
import pymysql
import pymysql.cursors
from sqlalchemy import create_engine

engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

## 2 How about Querying the Database?

### 21 Selecting all (*)

In [13]:
import pandas as pd
import pymysql
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

sql = "SELECT * FROM innova_first"
df = pd.read_sql_query(sql, engine)
print (df.shape)
df.head()

(3545, 13)


Unnamed: 0,rol,identificador,region,ventas_ao_2005,expo_ao_2005,empleo_ao_2005,ventas_ao_2006,expo_ao_2006,empleo_ao_2006,producto1,producto2,producto3,producto4
0,110467,2,1,7135775,0,472,6745463,2091093,411,2,2,2,2
1,110468,2,1,367582,0,7,122981,0,7,1,2,2,2
2,110469,2,1,1650613,0,50,1769443,0,50,2,2,2,2
3,110472,2,1,62272395,38644399,209,45995579,24126202,277,1,2,2,2
4,110473,2,1,1191963,0,85,1090426,0,80,2,2,2,2


### 22 Selecting fields

In [14]:
import pandas as pd
import pymysql
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

sql = "SELECT rol, region, ventas_ao_2005, empleo_ao_2005 FROM innova_first"
df = pd.read_sql_query(sql, engine)
print (df.shape)
df.head()

(3545, 4)


Unnamed: 0,rol,region,ventas_ao_2005,empleo_ao_2005
0,110467,1,7135775,472
1,110468,1,367582,7
2,110469,1,1650613,50
3,110472,1,62272395,209
4,110473,1,1191963,85


### 23 Selecting fields and conditions

In [15]:
import pandas as pd
import pymysql
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

sql = "SELECT rol, region, region_name, ventas_ao_2005, empleo_ao_2005  FROM innova01 WHERE ventas_ao_2005 >=1500000 AND region >= 8 AND region_name = 'Trece'"
df = pd.read_sql_query(sql, engine)
print (df.shape)
df.head()

(619, 5)


Unnamed: 0,rol,region,region_name,ventas_ao_2005,empleo_ao_2005
0,210544,13,Trece,32952041,510
1,441584,13,Trece,8579325,4
2,811623,13,Trece,3998000,110
3,842071,13,Trece,1690244,60
4,1040421,13,Trece,27000000,450


## 3 How about Creating a Database?

In [25]:
import pandas as pd
import pymysql.cursors
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

# Select elements new table
sql = "SELECT rol, region, ventas_ao_2005, empleo_ao_2005 FROM innova_first"

# Made pd data frame ("paso")
paso = pd.read_sql_query(sql, engine)

# Create new table "first333" in "mybase1" database 
paso.to_sql(name='first333', con=engine, if_exists='replace',index=False)

# Verify if new table "first333" was created in "mybase1" database
sql = "SELECT * FROM first333"
df = pd.read_sql_query(sql, engine)
print (df.shape)
df.head()

(3545, 4)


Unnamed: 0,rol,region,ventas_ao_2005,empleo_ao_2005
0,110467,1,7135775,472
1,110468,1,367582,7
2,110469,1,1650613,50
3,110472,1,62272395,209
4,110473,1,1191963,85


## 4 How about Quering by JOIN and Aggregations Summary?

### 41 Joining tables with JOIN

In [18]:
### Simple JOIN

import pandas as pd
import pymysql
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

sql = "SELECT innova_first.rol, innova_first.region, tamano, ventas_ao_2005 from innova_first inner join innova_second on innova_first.rol = innova_second.rol"
df = pd.read_sql_query(sql, engine)
print (df.shape)
df.head()

(3539, 4)


Unnamed: 0,rol,region,tamano,ventas_ao_2005
0,110467,1,2,7135775
1,110468,1,1,367582
2,110473,1,2,1191963
3,110474,1,2,125114709
4,110476,1,2,24413000


In [19]:
### JOIN with agregated summary

import pandas as pd
import pymysql
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

sql = "SELECT innova01.region, tamano, tam_nombre, AVG(ventas_ao_2005) AS AVERAGE, COUNT(ventas_ao_2005) as MUESTRA, STD(ventas_ao_2005) AS SE,MAX(ventas_ao_2005) AS MAXIMO, MIN(ventas_ao_2005) AS MINIMO from innova01  inner join innova02 on innova01.rol = innova02.rol GROUP BY   region_name, tam_nombre  order BY innova01.region, tamano"
df = pd.read_sql_query(sql, engine) 
print (df.shape) 
df

(26, 8)


Unnamed: 0,region,tamano,tam_nombre,AVERAGE,MUESTRA,SE,MAXIMO,MINIMO
0,1,1,Grande,15147770.0,15,39756490.0,162013316,133011
1,1,2,chica,19097930.0,120,107152500.0,825162121,0
2,2,1,Grande,15350260.0,19,49572030.0,223789307,200000
3,2,2,chica,59432620.0,143,319399100.0,2975540156,0
4,3,1,Grande,13133490.0,9,24177210.0,61771337,45000
5,3,2,chica,12624860.0,101,31710960.0,206620260,0
6,4,1,Grande,8675473.0,13,17035600.0,48830595,52888
7,4,2,chica,12343470.0,131,96492800.0,1097700384,3360
8,5,1,Grande,11107140.0,31,15608390.0,65594941,107323
9,5,2,chica,20808060.0,257,105177600.0,1114395635,0


### 42 Joining tables with WHERE

In [21]:
### Simple JOIN

import pandas as pd
import pymysql
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

sql = "SELECT innova_first.rol, innova_first.region,  ventas_ao_2005 from innova_first, innova_second WHERE innova_first.rol = innova_second.rol"
df = pd.read_sql_query(sql, engine)
print (df.shape)
df.head()

(3539, 3)


Unnamed: 0,rol,region,ventas_ao_2005
0,110467,1,7135775
1,110468,1,367582
2,110473,1,1191963
3,110474,1,125114709
4,110476,1,24413000


In [22]:
### JOIN with agregated summary

import pandas as pd
import pymysql
from sqlalchemy import create_engine

# Connect to the database
engine = create_engine('mysql+pymysql://root:''@localhost:3306/mybase1')

sql = "SELECT innova_first.region, innova_second.tamano, innova_second.tam_nombre, AVG(ventas_ao_2005) AS AVERAGE, COUNT(ventas_ao_2005) as MUESTRA, STD(ventas_ao_2005) AS SE, MAX(ventas_ao_2005) AS MAXIMO, MIN(ventas_ao_2005) AS MINIMO from innova_first, innova_second WHERE innova_first.rol = innova_second.rol GROUP BY   innova_first.region, innova_second.tamano  order BY innova_first.region, innova_second.tamano"
df = pd.read_sql_query(sql, engine)
print (df.shape)
df

(26, 8)


Unnamed: 0,region,tamano,tam_nombre,AVERAGE,MUESTRA,SE,MAXIMO,MINIMO
0,1,1,Grande,15147770.0,15,39756490.0,162013316,133011
1,1,2,chica,19097930.0,120,107152500.0,825162121,0
2,10,1,Grande,54771510.0,28,181500000.0,980000000,1800
3,10,2,chica,10120620.0,240,56661640.0,847223711,0
4,11,1,Grande,10388340.0,8,19224170.0,56916692,143332
5,11,2,chica,4126802.0,59,9448409.0,41352297,0
6,12,1,Grande,955641.4,8,969261.7,2943442,142448
7,12,2,chica,10678960.0,78,55250280.0,366223897,7245
8,13,1,Grande,49436140.0,294,355252100.0,5962235430,28000
9,13,2,chica,24370650.0,1123,207364000.0,5532431538,0


## References:
>* This is SQLAlchemy web site [SQLAlchemy 1.1 Documentation](http://docs.sqlalchemy.org/en/latest/core/engines.html) to get the necessary formal structure of the syntax from section <b>MySQL</b>.<br>
>* From here [pandas.DataFrame.to_sql](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html) we get the parameters definition.
>* From here [PYTHON: PANDAS → MYSQL USING SQLALCHEMY. A.K.A SQLALCHEMY FOR PANDAS USERS WHO DON’T KNOW SQL (THE BRAVE AND THE FOOLHARDY)](https://robertdavidwest.com/2014/10/12/python-pandas-%E2%86%92-mysql-using-sqlalchemy-a-k-a-sqlalchemy-for-pandas-users-who-dont-know-sql-the-brave-and-the-foolhardy/) we get other material too.<br>
>* This is SQLAlchemy web site [SQLAlchemy 1.1 Documentation](http://docs.sqlalchemy.org/en/latest/core/engines.html) to get the necessary formal structure of the syntax.<br>
>* To get a summary of python aggregated functions see this link: [Pandas Essential Basic Functionality](http://pandas.pydata.org/pandas-docs/stable/basics.html) in the <b><u>Descriptive statistics</u></b> section.

<hr>
By: Hector Alvaro Rojas &nbsp;&nbsp;|&nbsp;&nbsp; Data Science, Visualizations and Applied Statistics &nbsp;&nbsp;|&nbsp;&nbsp; January 20, 2018<br>
    Url: [http://www.arqmain.net]   &nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;   GitHub: [https://github.com/arqmain]
    <hr>