# Getting data from a database
One common source of data, especially in institutions, is a relational database.  Microsoft SQL Server, Teradata, Oracle, Postgres, MySql are all examples of relational databases that in common use for storing and retrieving complex data.  Storing and retrieving data from these servers is a regular task in the life of an analyst because any process that does something interesting is likely to generate or ingest alot of data.  In this notebook, we'll look at the main components of a database connection, establish a connection and put some data into a simple database and read some data out the database.

````{note}
SQL (pronounced S-Q-L or see-kwil) is the primary data manipulation language, meaning SQL is the language we use to get data from a relational database.  You wont need an in-depth understanding to work through this notebook, but if you want to brush up on some of the basics, this is [a good resources](https://www.w3schools.com/sql/default.asp).

## Connecting to the database
There are three basic steps in order to execute a query against a relational database.
1. Establish a connection to the database
2. Create a command
3. Execute the command
4. Process the result (if there is a return)

We are going to look at a few different approaches for making the connection to the database.  Which one you use will depend largely on the destination RDBMS server.  For instance some of the common protocols include ODBC, OLE-DB, and DBAPI.  We'll take a look at a couple of these here.

In [51]:
import sqlite3
import pandas as pd
from pathlib import Path

conn = sqlite3.connect('../data/laptopsales.db')
pd.read_sql('select * from sales', conn,index_col='sale_id')

Unnamed: 0_level_0,Configuration,Customer Postcode,Store Postcode,Retail Price,Screen Size (Inches),Battery Life (Hours),RAM (GB),Processor Speeds (GHz),Integrated Wireless?,HD Size (GB),Bundled Applications?,customer X,customer Y,store X,store Y,sale_date
sale_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
39068,360,SW4 0LB,SW1V 4QQ,565.0,15,6,2,2.0,Yes,300,No,529182,175552,528924.0,178440.0,2008-06-01 00:00:00
39069,436,EC4A 3BQ,SW1P 3AU,322.0,17,4,1,1.5,Yes,80,No,531469,181384,529902.0,179641.0,2008-06-01 00:06:00
39070,555,SW11 5RD,SW1V 4QQ,665.0,17,4,4,2.0,No,80,Yes,528144,175684,528924.0,178440.0,2008-06-01 00:07:00
39071,392,W2 1PU,SW1P 3AU,483.0,15,6,4,1.5,Yes,300,No,527090,181506,529902.0,179641.0,2008-06-01 00:08:00
39072,60,WC1N 2PB,SE1 2BN,340.0,15,4,2,1.5,No,80,No,530769,182111,534057.0,179682.0,2008-06-01 00:12:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71813,97,NW6 2LU,N3 1DH,364.0,15,4,4,1.5,Yes,40,Yes,525540,184577,525109.0,190628.0,2008-06-30 23:55:00
71814,634,EC4M 7DE,SW1P 3AU,329.0,17,5,2,1.5,No,40,No,531815,181170,529902.0,179641.0,2008-06-30 23:55:00
71815,276,TW1 3AW,W4 3PH,406.0,15,5,4,2.4,Yes,80,No,516617,173615,519585.0,177640.0,2008-06-30 23:56:00
71816,292,W1G 7EQ,NW5 2QH,435.0,15,6,1,1.5,Yes,80,No,528714,181753,529248.0,185213.0,2008-06-30 23:56:00


In [27]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('../data/laptopsales.db')
sales = pd.read_csv('../data/LaptopSales.csv')
sales['sale_date']=pd.to_datetime(sales.Date)
sales.drop(columns='Date',inplace=True)
june_sales = sales[sales.sale_date.dt.month==6]
june_sales.to_sql('sales',conn, if_exists='replace',index_label='sale_id')


32722

In [37]:
conn = sqlite3.connect('../data/laptopsales.db')
pd.read_sql('select * from sales',conn, index_col='sale_id')

Unnamed: 0_level_0,Configuration,Customer Postcode,Store Postcode,Retail Price,Screen Size (Inches),Battery Life (Hours),RAM (GB),Processor Speeds (GHz),Integrated Wireless?,HD Size (GB),Bundled Applications?,customer X,customer Y,store X,store Y,sale_date
sale_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
39068,360,SW4 0LB,SW1V 4QQ,565.0,15,6,2,2.0,Yes,300,No,529182,175552,528924.0,178440.0,2008-06-01 00:00:00
39069,436,EC4A 3BQ,SW1P 3AU,322.0,17,4,1,1.5,Yes,80,No,531469,181384,529902.0,179641.0,2008-06-01 00:06:00
39070,555,SW11 5RD,SW1V 4QQ,665.0,17,4,4,2.0,No,80,Yes,528144,175684,528924.0,178440.0,2008-06-01 00:07:00
39071,392,W2 1PU,SW1P 3AU,483.0,15,6,4,1.5,Yes,300,No,527090,181506,529902.0,179641.0,2008-06-01 00:08:00
39072,60,WC1N 2PB,SE1 2BN,340.0,15,4,2,1.5,No,80,No,530769,182111,534057.0,179682.0,2008-06-01 00:12:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71813,97,NW6 2LU,N3 1DH,364.0,15,4,4,1.5,Yes,40,Yes,525540,184577,525109.0,190628.0,2008-06-30 23:55:00
71814,634,EC4M 7DE,SW1P 3AU,329.0,17,5,2,1.5,No,40,No,531815,181170,529902.0,179641.0,2008-06-30 23:55:00
71815,276,TW1 3AW,W4 3PH,406.0,15,5,4,2.4,Yes,80,No,516617,173615,519585.0,177640.0,2008-06-30 23:56:00
71816,292,W1G 7EQ,NW5 2QH,435.0,15,6,1,1.5,Yes,80,No,528714,181753,529248.0,185213.0,2008-06-30 23:56:00


In [None]:
pd.to_sql('sales',conn,index)