### Import CSV files to a postgreSQL database - manual

### Contents<a class="anchor" id="contents"></a>

- [1. Create pandas DataFrame from csv file](#section1)
- [2. Clean tables names](#section2)
- [3. Clean header names](#section3)
- [4. SQL statement to create database table](#section4)
    - [4-1. Replace pandas DataFrame datatypes with sql database datatypes](#section4.1)
- [5. Establish a connection to database](#section5)
- [6. Write SQL statement(s) to database](#section6)
    - [6-1. Create a table](#section6.1)
    - [6-2. Insert values into table](#section6.2)
    - [6-3. Upload to database](#section6.3)
    - [6-4. Table permissions](#section6.4)
    - [6-5. Close connection](#section6.5)

In [1]:
# Import libraries
import os
import numpy as np
import pandas as pd
import psycopg2 # - access PostgreSQL with Python

In [2]:
# Bash command - returns contents of the current working directory
!ls

Customer Contracts$.csv  Customer Engagements.csv manualUpload.ipynb
Customer Demo.csv        customer_contracts.csv


### 1. Create pandas DataFrame from csv file<a class="anchor" id="section1"></a>

In [3]:
df=pd.read_csv('Customer Contracts$.csv')
display(df.head())

Unnamed: 0,customer_name,start_date,end_date,contract_amount_m,invoice_sent,paid
0,Nike,01-02-2019,12-20-2020,2.98,Yes,Yes
1,Reebox,06-20-2017,,3.9,No,No
2,Adidas,12-07-2015,6-20-2018,4.82,Yes,Yes
3,Google,05-25-2014,03-20-2017,5.74,Yes,No
4,Amazon,11-10-2012,12-20-2015,6.66,No,Yes


### 2. Clean tables names<a class="anchor" id="section2"></a>

In [4]:
rawTableName='Customer Contracts$.csv'

In [5]:
# Clean table names
cleanTableName = rawTableName.lower().replace('%', '').replace('$', '') \
                 .replace('£', '').replace('€', '').replace(')', '') \
                 .replace(r'(', '').replace('?', '').replace('!', '') \
                 .replace(' ', '_').replace('-', '_').replace(r'/', '_') \
                 .replace('\\', '_')

print(cleanTableName)

customer_contracts.csv


### 3. Clean header names<a class="anchor" id="section3"></a>

In [6]:
print(df.columns)

Index(['customer_name', 'start_date', 'end_date', 'contract_amount_m',
       'invoice_sent', 'paid'],
      dtype='object')


In [7]:
# Clean column names
df.columns = [x.lower().replace('%', '').replace('$', '') \
              .replace('£', '').replace('€', '').replace(')', '') \
              .replace(r'(', '').replace('?', '').replace('!', '') \
              .replace(' ', '_').replace('-', '_').replace(r'/', '_') \
              .replace('\\', '_') for x in df.columns]

### 4. SQL statement to create a database table<a class="anchor" id="section4"></a>

#### 4-1. Replace pandas DataFrame datatypes with sql database datatypes<a class="anchor" id="section4.1"></a>

In [8]:
# pandas DataFrame column dtypes
print(df.dtypes)

customer_name         object
start_date            object
end_date              object
contract_amount_m    float64
invoice_sent          object
paid                  object
dtype: object


In [9]:
# Dictionary mapping pandas dtypes to SQL dtypes
dataTypeReplacements = {'object': 'varchar', 'float64': 'float',
                        'int64': 'int', 'datetime64': 'timestamp',
                        'timedelta64[ns]': 'varchar'}

display(dataTypeReplacements)

{'object': 'varchar',
 'float64': 'float',
 'int64': 'int',
 'datetime64': 'timestamp',
 'timedelta64[ns]': 'varchar'}

In [10]:
# Table schema
colString = ', '.join('{} {}'.format(n, d) for (n, d) \
                      in zip(df.columns, df.dtypes.replace(dataTypeReplacements)))

display(colString)

'customer_name varchar, start_date varchar, end_date varchar, contract_amount_m float, invoice_sent varchar, paid varchar'

### 5. Establish a connection to database<a class="anchor" id="section5"></a>

- `AWS RDS postgreSQL` database

In [11]:
# Database configuration
host = '####.cfa0pnoy####.eu-west-1.rds.amazonaws.com'
databaseName = 'database_github'
user = 'postgres'
password = '####'
    
# Database details
connectionString = "host=%s dbname=%s user=%s password=%s" % (host, databaseName, user, password)

# Establish connection
rdsConnection = psycopg2.connect(connectionString)

# Opens the connection
cursor = rdsConnection.cursor()
print('Connected to database')

Connected to database


### 6. Write SQL statement(s) to database<a class="anchor" id="section6"></a>

In [12]:
# Drops any tables with the same name
cursor.execute("DROP TABLE IF EXISTS customer_contracts")

#### 6-1. Create a table<a class="anchor" id="section6.1"></a>

In [13]:
# Creates table
cursor.execute("CREATE TABLE customer_contracts \
( \
    customer_name         varchar, \
    start_date            varchar, \
    end_date              varchar, \
    contract_amount_m     float, \
    invoice_sent          varchar, \
    paid                  varchar \
);"
              )

#### 6-2. Insert values into table<a class="anchor" id="section6.2"></a>

In [14]:
# step 1. Save pandas DataFrame to a csv file
df.to_csv('customer_contracts.csv', header=df.columns, index=False, encoding='utf-8')


# step 2. Open the csv file to open it up in memory (note: file is saved as an object)
myFile = open('customer_contracts.csv')
print('File opened in memory')

File opened in memory


#### 6-3. Upload to database<a class="anchor" id="section6.3"></a>

In [15]:
# step 3. Upload the csv file to the created table in database
# - copy the csv file open in memory, it has headers, it has commas as delimiters
# - copy all of these values to table customer_contracts
sqlStatement = """
               COPY customer_contracts FROM STDIN WITH
                    CSV
                    HEADER
                    DELIMITER AS ','
               """

# Execute this SQL statement
# Utilise copy_expert method - inserts values into the table
cursor.copy_expert(sql=sqlStatement, file=myFile)
print('File copied to database')

File copied to database


#### 6-4. Table permissions<a class="anchor" id="section6.4"></a>

In [16]:
# Grant multiple users access to table
cursor.execute('GRANT SELECT ON table customer_contracts TO public')
rdsConnection.commit()

#### 6-5. Close connection<a class="anchor" id="section6.5"></a>

In [17]:
# Close the connection to database
cursor.close()
print('Table customer_contracts successfully imported to database')

Table customer_contracts successfully imported to database
