# Connect Database in Python

This note summarizes the methods to connect databases in Python or Jupyter notebook

### SQL Server

* `pyodbc` package is used to connect Python with SQL Server
* The Driver is: /opt/microsoft/msodbcsql7/lib64/libmsodbcsql-17.3.so.1.1
* The Server Name is: RON\SQLEXPRESS
* The Database Name is: TestDB
* The username is: jli
* The password is: 0000
* The Table Name is: dbo.table

In [22]:
import pyodbc

driver = '/opt/microsoft/msodbcsql7/lib64/libmsodbcsql-17.3.so.1.1'
server = 'RON\SQLEXPRESS'
db = 'TestDB'
user = 'jli'
pwd = '0000'

conn = pyodbc.connect('Driver={{0}};Server={1};Database={2};UID={3};PWD={4};Trusted_Connection=yes;'.format(driver, server, db, user, pwd))

cursor = conn.cursor()
cursor.execute('select * from TestDB.dbo.table')

# Fetch all the records
result = cursor.fetchall()

# Use for loop to print them
for i in result:
    print(i)

In [None]:
import pandas as pd

# Query returns Python DataFrame
pd.read_sql_query(sql = 'select * from TestDB.dbo.table', con = conn)

In [None]:
from sqlalchemy import create_engine

# create sqlalchemy engine
engine = create_engine('mssql+pymssql:{0}:{1}@{2}/{3}'.format(user, pwd, server, db), echo=False)

# write DataFrame into SQL Server
data.to_sql('table_name', con = engine, if_exists = 'append', chunksize = 1000)

### Big Query

The `pandas-gbq` library is a community-led project by the pandas community. The `google-cloud-bigquery` library is the official python library for interacting with BigQuery. The `pandas-gbq` uses `google-cloud-bigquery` to make API calls to BigQuery.

* google-cloud-bigquery==1.20.0
* google-cloud-bigquery-storage==0.7.0
* pandas==0.25.1
* pandas-gbq==0.11.0
* pyarrow==0.14.1

In [None]:
# pandas-gbq
sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = 'TX'
    LIMIT 100
"""

# Run a Standard SQL query using the environment's default project
df = pandas.read_gbq(sql, dialect='standard')

# Run a Standard SQL query with the project set explicitly
project_id = 'your-project-id'
df = pandas.read_gbq(sql, project_id=project_id, dialect='standard')

In [26]:
from google.cloud import bigquery

client = bigquery.Client()
sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = 'TX'
    LIMIT 100
"""

# Run a Standard SQL query using the environment's default project
df = client.query(sql).to_dataframe()

# Run a Standard SQL query with the project set explicitly
project_id = 'your-project-id'
df = client.query(sql, project=project_id).to_dataframe()

### Jupyter Notebook

Jupyter notebooks can be used for an interactive data analysis with SQL on a relational database.

In [23]:
import pandas as pd
import sqlalchemy

Now, we will use the `sqlalchemy` library to create an engine needed to connect to the database. This will be required only once per connection string — meaning you won’t have to do it each time when making a connection.

* PostgreSQL: `postgresql://scott:tiger@localhost/mydatabase`
* MySQL: `mysql://scott:tiger@localhost/foo`
* Oracle: `oracle://scott:tiger@127.0.0.1:1521/sidname`
* SQL Server: `mssql+pyodbc://scott:tiger@mydsn`
* SQLite: `sqlite:///foo.db`

In [None]:
# Create database engine
db = sqlalchemy.create_engine('mssql+pyodbc://scott:tiger@mydsn')

# Query data into a DataFrame
pd.read_sql('select * from table limit 1', db)

On the other hand, we also can use notebook extension `ipython-sql` to perform same process. You’ll need to install one library to make sure you can run SQL directly in the Notebooks, so paste the following into any Jupyter cell:

`pip install ipython-sql`

To connect to the database you need to pass connection string to the `%sql` function.

In [None]:
%load_ext sql

%sql mssql+pyodbc://scott:tiger@mydsn

In [None]:
# Single Line Statements
data = %sql select * from table where name = 'Jay' limit 1
data.DataFrame()

In [None]:
# Multiple Line Statements
%% sql data <<

select * from table
where name = 'Jay'
limit 1

In [None]:
data.DataFrame()