# Data Transmission between Pandas and MySQL Database

**Dr. Pengfei Zhao**

Finance Mathematics Program, 

BNU-HKBU United International College

* Data can be transmitted in two directions between Pandas and DataBase. In this class, we choose `MySQL` database, since it is the most widely used free and open source relational database.

## 1. Basic MySQL Database Operation

* Suppose MySQL service is already installed at a remote server with IP address \*\*\*, then you can use below command to connect to MySQL server through various command line tools (e.g. CMD in Windows and Terminal in Mac):

>```
mysql -h *** -u your_user_name -p your_password
``` 

* If you can see "MySQL>" symbol appears on the screen, it means you successfully log into MySQL server. Then you can use follow commands do some basic operations:

(1) list all the databases existed in your account

>```SQL
show databases;
```

(2) select the database

>```SQL
use database_name;
```

(3) Create a database

>```SQL
create database database_name;
```

(4) Delete database

>```SQL
delete database database_name;
```

(5) After selecting a database, list all tables under the database

>```SQL
show tables;
```

* If MySQL service is installed in your own PC, then you PC is *localhost*, you can use the below command to connect to the database:

>```SQL
mysql -u your_user_name -p your_password
```

## 2. MySQL server connection by Python

* Besides using commands to do remote access, we can also use Python codes to access MySQL server database. This is more preferable since the codes can be embeded into other applications, e.g. web applications.

* Before data transmission, we have to create connection between your local computer and remote MySQL server. Here we use `sqlalchemy` to do the connection.

* `SQLAlchemy` is an open source SQL toolkit for the Python programming language, which provides "a full suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access".


* MySQL server user name, login password, and database to be connected should be provided.
* `conn` in above code is the "connection object", which will be used later.

### 2.1 Create DataBase

In [42]:
#import mysql.connector
#conn = mysql.connector.connect(user='few', password='***', host='www.uicquant.com')
#cur = conn.cursor()
# sql_createdb = 'create database HelloDB;'
# cur.execute(sql_createdb)

In [1]:
from sqlalchemy import create_engine

In [2]:
conn_helloDB = create_engine("mysql+pymysql://{user}:{pw}@localhost".format(user="few", pw="123456"))


In [3]:
sql_createdb = 'create database HelloDB1;'
conn_hello.execute(sql_createdb)

NameError: name 'conn_hello' is not defined

* To check if the database is successfully created, you can remote access MySQL server by command lines, and type `show database`.

### 2.2 Use DataBase

* After creating the database, you need to select the database.

In [11]:
sql_usedb = 'use HelloDB;'
cur_helloDB.execute(sql_usedb)

NameError: name 'cur_helloDB' is not defined

### 2.3 Save dataframe data to table

* Here we save the data in csv file into a dataframe, and then store the data in dataframe to a table in database. 

In [4]:
import pandas as pd

%time hundred_stocks_df = pd.read_csv('../data/hundred_stocks_twoyears_daily_bar.csv')

CPU times: user 44 ms, sys: 12 ms, total: 56 ms
Wall time: 217 ms


In [5]:
hundred_stocks_df.tail()

Unnamed: 0,code,date,open,high,close,low,volume
30098,99,2017-11-06,10.4,10.4,10.28,10.03,71688.22
30099,99,2017-11-07,10.21,10.39,10.29,10.21,42507.74
30100,99,2017-11-08,10.28,10.69,10.54,10.26,73131.8
30101,99,2017-11-09,10.47,10.66,10.53,10.44,39311.34
30102,99,2017-11-10,10.54,10.58,10.45,10.4,28081.47


In [5]:
conn_helloDB = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}".format(user="few", pw="123456", db="HelloDB"))

In [54]:
hundred_stocks_df.to_sql(con=conn_helloDB, name='hundred_stocks_twoyears_daily_bar', if_exists='replace', index=False)

### 2.4 Read Data from Database to Dataframe

* You can use pandas' `read_sql` method to load data from remote database server to localhost. API of the method can be found [here](http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.read_sql.html).

In [8]:
df = pd.read_sql('select * from hundred_stocks_twoyears_daily_bar', conn_helloDB)
df.tail()

Unnamed: 0,code,date,open,high,close,low,volume
30098,99,2017-11-06,10.4,10.4,10.28,10.03,71688.22
30099,99,2017-11-07,10.21,10.39,10.29,10.21,42507.74
30100,99,2017-11-08,10.28,10.69,10.54,10.26,73131.8
30101,99,2017-11-09,10.47,10.66,10.53,10.44,39311.34
30102,99,2017-11-10,10.54,10.58,10.45,10.4,28081.47


In [9]:
code, start, end = '1', '2017/08/01', '2017/11/10'
sql = "select * from hundred_stocks_twoyears_daily_bar where code=%s"% (code)
df000001 = pd.read_sql(sql, conn_helloDB)

In [10]:
df000001.tail()

Unnamed: 0,code,date,open,high,close,low,volume
551,1,2017-11-06,11.42,11.42,11.28,11.09,1029902.81
552,1,2017-11-07,11.27,12.09,11.92,11.25,2477163.25
553,1,2017-11-08,12.0,12.59,12.13,11.93,4262825.5
554,1,2017-11-09,12.2,12.57,12.33,12.15,2295289.25
555,1,2017-11-10,12.37,12.55,12.3,12.15,1757552.38


### Build Company_Info Table

In [6]:
import pandas as pd
stocks_info_df = pd.read_csv('../data/stocks_info.csv')

In [7]:
stocks_info_df.tail()

Unnamed: 0,index,code,name,industry,area,pe,outstanding,totals,totalAssets,liquidAssets,...,bvps,pb,timeToMarket,undp,perundp,rev,profit,gpr,npr,holders
3440,3440,300727,润禾材料,化工原料,浙江,0.0,0.0,0.0,43196.92,25291.61,...,0.0,0.0,0,8362.07,0.0,0.0,0.0,29.56,10.75,0.0
3441,3441,300723,一品红,化学制药,广东,0.0,0.0,0.0,76309.68,41579.12,...,0.0,0.0,0,22526.03,0.0,0.0,0.0,56.95,10.57,0.0
3442,3442,300721,怡达股份,化工原料,江苏,0.0,0.0,0.0,88405.46,52139.9,...,0.0,0.0,0,16709.92,0.0,0.0,0.0,16.56,5.44,0.0
3443,3443,2912,中新赛克,软件服务,深圳,0.0,0.0,0.0,95634.96,77540.45,...,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,31.45,0.0
3444,3444,2911,佛燃股份,供气供热,广东,0.0,0.0,0.0,441762.84,90943.81,...,0.0,0.0,0,89013.92,0.0,0.0,0.0,22.0,9.61,0.0


In [6]:
stocks_info_df.to_sql(con=conn_helloDB, name='company_info', if_exists='replace', index=False)

In [8]:
sh50_df = pd.read_csv('../data/sh50.csv')


In [9]:
sh50_df.to_sql(con=conn_helloDB, name='sh50', if_exists='replace', index=False)

In [14]:
# * Data Preparation
# df = pd.DataFrame()
# for i in range(1,100):
#     stock_code = '00000' + str(i)
#     df_stock_code = uicdb.get_price(stock_code, start='2015/08/01', end='2017/11/10', attr= \
#                                     ['code', 'date', 'open', 'high', 'close', 'low', 'volume'])
#     df = df.append(df_stock_code)
# df.to_csv(path_or_buf='../data/hundred_stocks_twoyears_daily_bar.csv', index=False)