In [None]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# Lecture 14b (optional) - Interfacing with Oracle DBMS through Python
There is no recording associated with this notebook.

---

### Content

1. Connecting to Oracle DB
2. Creating Tables in a Database
3. Inserting values in Tables
4. Querying Tables


### Learning Outcomes

At the end of this lecture, you should be able to:

* connect to Oracle DB using Python scripts  
* create tables in a selected database
* construct insert statements with data from a dataframe
* execute inserts into tables
* construct and execute select statements using Python scripts

The following instructions are for Windows users on how to set up the interface between Oracle and Python. 

For users of other operating systems, please research for equivalent instructions. The instructions will only differ in setting up the required environmental variables.

1. go to http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html, create a free account with Oracle, then
2. download the instantclient-odbc, instantclient-sdk and instantclient-basic zipped files
3. extract all the above files into a new folder of your choosing. THEY ALL MUST EXTRACT INTO THE SAME FOLDER YOU HAVE CHOSEN
4. copy the full path of the folder you have selected above for the extraction and add it to the system environment variable 'path'
5. create a new system variable called 'ORACLE_HOME' and add the full path to the folder above to this variable
6. shut down this notebook as well as the python process
7. restart the notebook and execute !pip install --upgrade setuptools
8. then execute !pip install cx_Oracle (and cross your fingers and hope the install works....)


In [None]:
!pip install --upgrade setuptools

In [None]:
!pip install cx_Oracle

In [None]:
import cx_Oracle

In [None]:
from sqlalchemy import create_engine, MetaData, Table

In [None]:
connection = cx_Oracle.connect('IT739002/IT739002@vm011513.massey.ac.nz:1521/ orcl.massey.ac.nz')
cursor = connection.cursor()

In [None]:
connection.username

In [None]:
# execute this to drop the national_populations table if it already exists
#cursor.execute("DROP TABLE national_populations")
#connection.commit()

We will use the population example from the previous lectures to demonstrate how a table based on this example can be created and its data can be inserted. 

In [None]:
import datetime as dt
import pandas as pd
import numpy as np
import sys

data = pd.DataFrame({'population':[3778000, 19138000, 20000, 447000, 4433000, 22680000, 10900, 549598],
                     'year':[2000, 2000, 2000, 2000, 2014, 2014, 2014, 2014],
                     'nation':['New Zealand', 'Australia', 'Cook Islands', 'Solomon Islands', 
                                'New Zealand', 'Australia', 'Cook Islands', 'Solomon Islands']})
data

We can now create a DB table to store this data.



In [None]:
national_populations = """
    CREATE TABLE national_populations (
      entry number(10) PRIMARY KEY,
      nation varchar2(20) NOT NULL,
      population number(10) NOT NULL,
      year date NOT NULL
    )
    """

national_populations

In [None]:
cursor.execute(national_populations)
connection.commit()

We can now begin inserting data from a data frame into the table.

In [None]:
data

Of course, we could perform the row insertions manually one-by-one by writing out the SQL statement as a string with all the values imbedded.

In [None]:
sql_statement = """
            INSERT INTO national_populations 
            (entry, nation, population, year) 
            VALUES (0, 'New Zealand', 3778000, TO_DATE('2000-01-01', 'YYYY-MM-DD'))
            """

We then execute the SQL statement below by passing it to the *execute()* method as an argument, followed by a call to commit.

In [None]:
cursor.execute(sql_statement)
connection.commit()

In [None]:
pd.read_sql_query("SELECT * FROM national_populations", connection)

**Exercise:** Write code to insert the second row of the above data frame into the database

**Exercise:** Consider the potential issues with the above approach to inserting data into a database if you are faced with millions of records.

So clearly this approach to inserting data does not scale to bigger and real-world problems.

What is needed is a more automated approach.

Below is an example of how we can create a a list of dictionaries where each column name is matched with the corresponding value.

We then cal the execute statement on the cursor with the above arguments:

In [None]:
fields = ['entry', 'nation', 'population', 'year']
values = [1, 'Cook Islands', 20000, dt.date(2000, 1, 1)]
field_value_pair = dict(zip(fields, values))
field_value_pair

In [None]:
cursor.execute("INSERT INTO national_populations VALUES(:entry, :nation, :population, :year)", field_value_pair)
connection.commit()

In [None]:
pd.read_sql_query("select * from national_populations", connection)

Perform a bulk insert of the above dataframe into the Oracle database, inserting values from index 3 in the dataframe onwards:

In [None]:
dict_sequence = [{'entry': int(i), 'nation': data.nation.iloc[i], 'population': int(data.population.iloc[i]), 'year': dt.date(data.year.iloc[i], 1, 1) } for i in range(3, len(data.index))]
cursor.executemany("INSERT INTO national_populations VALUES(:entry, :nation, :population, :year)", dict_sequence)
connection.commit()

Check that all the data has been written to the table:

In [None]:
pd.read_sql_query("SELECT * from national_populations", connection)

Once finished using a database, release the memory by closing both the cursor and the connection.

In [None]:
cursor.close()
connection.close()

**Exercise**: Create a database and a table schema to store the data in from the adult_mortality_rate_by_cause.csv, adult_mortality_rates.csv, child_mortality_rates.csv, total_health_expenditure_peercent_per_capita_of_gdp_by_country_per_year.csv datasets cleaned from the previous tutorials. 

Write a scripts to to insert the data from a data frame into the database.