<a href="https://colab.research.google.com/github/john-decker/Dutch_Colonial_Research/blob/main/Initial_DB_Creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#Begin process for creating a simple sqllite DB in python by importing needed modules

import csv

import sqlite3


In [2]:
#creat a connection
conn = sqlite3.connect("Dutch_Colonial_DB") #NOTE: if the DB doesn't already exist, this command will create it.


##Creating Tables in the Database
Once a database exists, we need to build out the elements that represent our data model (e.g. our entities). Each table must be represented as we worked out when creating the .csv files.

NOTE: for this particular database, we are considering all of the data to be text. If we had integer or floating point data, we would use the appropriate data types for them.

Consult the documentation for more on [SQLLite in Python.](https://docs.python.org/3.9/library/sqlite3.html)

In [3]:
#create a cursor to access the DB
cursor = conn.cursor()

# create all of the tables needed for the data model in the DB

cursor.execute('''CREATE TABLE Person (
               personID text,
               lastName text,
               firstName text,
               gender text,
               statusID text,
               ethnicityID text
               )''')

cursor.execute('''CREATE TABLE Ethnicity (
               ethnicityID text,
               ethnicityDescription text
               )''')

cursor.execute('''CREATE TABLE Status (
               statusID text,
               statusDescription text
               )''')

cursor.execute('''CREATE TABLE Participation_Type (
               participationID text,
               roleDescription text
               )''')

cursor.execute('''CREATE TABLE Person_to_Session (
               sessionDetailsID text,
               personID text,
               participationID text
               )''')

cursor.execute('''CREATE TABLE Session_Record (
               sessionRecordID text,
               bookNumber text,
               pageNumber text,
               dataEntryClerk text,
               dataVerificationClerk text,
               dateEntered text,
               fullMinutesEteredBy text
               )''')

cursor.execute('''CREATE TABLE Session_Details (
               sessionDetailsID text,
               sessionRecordID text,
               sessionTypeID text,
               minutesText text
               )''')

cursor.execute('''CREATE TABLE Session_Action (
               actionID text,
               sessionDetailsID text,
               description text,
               amount text
               )''')

cursor.execute('''CREATE TABLE Session_Type (
               sessionTypeID text,
               description text
               )''')

print("Tables Created")

Tables Created


##Importing Data Into the DB
Once a simple database has been created and the entities for the model are present, it is necessary to populate the tables with data. Here, we are using the .csv files that we created to represent the various parts of our data model.

We must be certain to import the csv files into Colab's data manager so that the files are available for the next steps.

NOTE: the number of variables associated with each table is not the same for all of the tables we are inputting. This means that we will have to write specific code for each to ensure that the population process proceeds correctly. While we could break them up into lists of entities with the same number of elements (e.g. ethnicity and status), the code block below will treat each individually so that the mechanics are as clear as possible.

In [4]:
#Start work for importing data
#provide paths to each csv
data_path_person = '/content/Person.csv'
data_path_ethnicity = '/content/Ethnicity.csv'
data_path_status = '/content/Status.csv'
data_path_participation = '/content/Participation_Type.csv'
data_path_person_session = '/content/Person_to_Session.csv'
data_path_session_record = '/content/Session_Record.csv'
data_path_session_details = '/content/Session_Details.csv'
data_path_session_action = '/content/Session_Action.csv'
data_path_session_type = '/content/Session_Type.csv'

In [5]:
#Open each .csv file, loop over the contents, and populate each corresponding table

#The .execute() method allows us to pass in an INSERT command, specify the number of variables, and provide the data

with open(data_path_person, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None) #"turns off" the .csv file's header so that the variable names are not imported
    for row in reader:
        cursor.execute('INSERT INTO Person VALUES (?,?,?,?,?,?)', row)

with open(data_path_ethnicity, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None)
    for row in reader:
        cursor.execute('INSERT INTO Ethnicity VALUES (?,?)', row)

with open(data_path_status, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None) #"turns off" the .csv file's header so that the variable names are not imported
    for row in reader:
        cursor.execute('INSERT INTO Status VALUES (?,?)', row)

with open(data_path_participation, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None) #"turns off" the .csv file's header so that the variable names are not imported
    for row in reader:
        cursor.execute('INSERT INTO Participation_Type VALUES (?,?)', row)

with open(data_path_person_session, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None) #"turns off" the .csv file's header so that the variable names are not imported
    for row in reader:
        cursor.execute('INSERT INTO Person_to_Session VALUES (?,?,?)', row)

with open(data_path_session_record, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None) #"turns off" the .csv file's header so that the variable names are not imported
    for row in reader:
        cursor.execute('INSERT INTO Session_Record VALUES (?,?,?,?,?,?,?)', row)

with open(data_path_session_details, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None) #"turns off" the .csv file's header so that the variable names are not imported
    for row in reader:
        cursor.execute('INSERT INTO Session_Details VALUES (?,?,?,?)', row)

with open(data_path_session_action, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None) #"turns off" the .csv file's header so that the variable names are not imported
    for row in reader:
        cursor.execute('INSERT INTO Session_Action VALUES (?,?,?,?)', row)

with open(data_path_session_type, 'r') as csv_obj:
    reader = csv.reader(csv_obj)
    next(reader, None) #"turns off" the .csv file's header so that the variable names are not imported
    for row in reader:
        cursor.execute('INSERT INTO Session_Type VALUES (?,?)', row)

print('Import process finished')

Import process finished


##Test Query
To ensure that the DB has been properly created and populated, we will peform a simple query of the Person table.

In [6]:
cursor = conn.cursor()

cursor.execute("SELECT * FROM Person")
results = cursor.fetchall()

for result in results:
    print(result)

conn.commit()
conn.close()

('1', 'Delavall', '', 'M', '2', '1')
('2', 'Ten Houdt', 'Severyn', 'M', '1', '1')
('3', 'DuBooys (Dubois)', 'Lowies (Louys)', 'M', '1', '1')
('4', 'Blansjan', 'Matthias', 'M', '1', '1')
('5', 'Fortune', 'Jan', 'M', '3', '1')
('6', 'Lavall', '', 'M', '1', '1')
('7', 'Ackerman', 'Lodowyck', 'M', '1', '1')
('8', 'DeGraef', 'Moses', 'M', '1', '1')
('9', 'Fisher', 'William', 'M', '1', '1')
('10', 'Haegen', 'Bruyn', 'M', '1', '1')
('11', 'Claesen DeWitt', 'Tierk', 'M', '1', '1')
('12', 'Osterhoudt', 'Jan', 'M', '1', '1')
('13', 'Tynhoudt', 'Cronelis', 'M', '1', '1')
('14', 'Lowersen', 'Jan', 'M', '1', '1')
('15', 'Cool', 'Pieter', 'M', '1', '1')
('16', 'Cool', 'Leendert', 'M', '1', '1')
('17', 'Unnamed', '', 'M', '3', '1')
('18', 'Jansen', 'Dirck', 'M', '1', '1')
('19', 'Grevenraedt', '', 'M', '1', '4')
('20', 'Martensen', 'Aerdt', 'M', '1', '1')
('21', 'Addesen', 'Anthony', 'M', '1', '1')
('22', 'Joshensen', 'Hendrick', 'M', '1', '1')
('23', 'Wittikar', 'Eduward', 'M', '1', '1')
('24', 'Wyn

##Complete DB Exists
Now that we have created the DB, created the tables needed for our data model, have imported the data from our .csv files to the tables, and have performed a test query we can save the DB so that we can use it to perform queries. To do this, chose the ```...``` indicator when you hover over ```Dutch_Colonial_DB```