## Develop Function to Load Data into table

Let us develop a generic function to load data into any table. 
* As part of the primitive solution there is redundant code for `stations` and `station_rental_types`.
* Except for `INSERT` query, rest of the logic is same.
* Here is the logic for `stations`.

```python
cursor = connection.cursor()
query = ("""
         INSERT INTO stations 
         (station_id, station_type, name, short_name, 
          capacity, external_id, has_kiosk, legacy_id, 
          region_id, electric_bike_surcharge_waiver, eightd_station_services
         )
         VALUES 
         (%s, %s, %s, %s, 
          %s, %s, %s, %s, 
          %s, %s, %s
         )
    """)
cursor.executemany(query, stations)
connection.commit()
cursor.close()
```

* Here is the logic for `station_rental_types`.

```python
cursor = connection.cursor()
query = ("""
         INSERT INTO station_rental_types 
         (station_id,rental_type)
         VALUES 
         (%s, %s)
        """)
cursor.executemany(query, station_rental_types)
connection.commit()
cursor.close()
```

In [1]:
%run 00_setup_database_variables.ipynb

In [2]:
%load_ext sql

In [3]:
%env DATABASE_URL=postgresql://deepan:DB_PASSWORD@localhost:5432/sms_db

env: DATABASE_URL=postgresql://itversity_sms_user:Itv3rs1ty!23@m01.itversity.com:5433/itversity_sms_db


In [4]:
%%sql

TRUNCATE TABLE station_rental_types

Done.


[]

In [5]:
%%sql

COMMIT

 * postgresql://itversity_sms_user:***@m01.itversity.com:5433/itversity_sms_db
Done.


[]

In [6]:
import psycopg2
def get_pg_connection(host, port, database, user, password):
    connection = None
    try:
        connection = psycopg2.connect(
            host=host,
            port=port,
            database=database,
            user=user,
            password=password
        )
    except Exception as e:
        raise(e)
    
    return connection

In [7]:
def load_df_to_table(conn, df):
    cursor = connection.cursor()
    data = [tuple(value) for value in df.values]
    # query is still hardcoded with table name, column names and number of columns
    query = """
        INSERT INTO 
        station_rental_types (station_id, rental_type)
        VALUES (%s, %s)
        """
    cursor.executemany(query, data)
    connection.commit()
    cursor.close()

In [8]:
# Creating dataframe to validate our function

import pandas as pd

df = pd.DataFrame([
    {'station_id': 1, 'rental_type': 'KEY'},
    {'station_id': 1, 'rental_type': 'CREDIT CARD'},
    {'station_id': 2, 'rental_type': 'KEY'},
    {'station_id': 2, 'rental_type': 'CREDIT CARD'}
])

In [9]:
connection = get_pg_connection(
    host=postgres_host,
    port=postgres_port,
    database=f'{username}_sms_db',
    user=f'{username}_sms_user',
    password=password
)

In [10]:
connection.commit()

In [11]:
load_df_to_table(connection, df)

In [12]:
%%sql

SELECT * FROM station_rental_types

 * postgresql://itversity_sms_user:***@m01.itversity.com:5433/itversity_sms_db
4 rows affected.


station_rental_type_id,station_id,rental_type
25,1,KEY
26,1,CREDIT CARD
27,2,KEY
28,2,CREDIT CARD


Let us improvise `load_df_to_table`.
* The `query` is hard coded with details related to `station_rental_types`.
* We need to pass `table_name` as argument.
* Based upon the structure of `df` passed, we need to get the column names dynamically.
* Also we need to update `VALUES` clause by placing `%s` dynamically based upon the number of columns in the `df` we are trying to load into the table.

In [13]:
df = pd.DataFrame([
    {'station_id': 1, 'rental_type': 'KEY'},
    {'station_id': 1, 'rental_type': 'CREDIT CARD'},
    {'station_id': 2, 'rental_type': 'KEY'},
    {'station_id': 2, 'rental_type': 'CREDIT CARD'}
])

In [14]:
df.columns

Index(['station_id', 'rental_type'], dtype='object')

In [15]:
# Get column names
columns = ', '.join(df.columns)

In [16]:
columns

'station_id, rental_type'

In [17]:
['a']

['a']

In [18]:
['a'] * 10

['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a']

In [19]:
len(df.columns)

2

In [20]:
# Create list with %s based up on the number of the columns
['%s'] * len(df.columns)

['%s', '%s']

In [21]:
# Dynamically generate values clause
values_clause = ', '.join(['%s'] * len(df.columns))

In [22]:
values_clause

'%s, %s'

In [23]:
def load_df_to_table(conn, df, table_name):
    cursor = connection.cursor()
    data = [tuple(value) for value in df.values]
    columns = ', '.join(df.columns)
    values_clause = ', '.join(['%s'] * len(df.columns))
    query = f"""
        INSERT INTO 
        {table_name} ({columns})
        VALUES ({values_clause})
        """
    cursor.executemany(query, data)
    connection.commit()
    cursor.close()

In [24]:
%%sql

TRUNCATE TABLE station_rental_types

 * postgresql://itversity_sms_user:***@m01.itversity.com:5433/itversity_sms_db
Done.


[]

In [25]:
%%sql

SELECT * FROM station_rental_types

 * postgresql://itversity_sms_user:***@m01.itversity.com:5433/itversity_sms_db
0 rows affected.


station_rental_type_id,station_id,rental_type


In [26]:
# We do not need the original REST payload to validate.
# The function accepts connection, data frame and table name.
# We can validate by using connection, simple data frame 
  # and any valid table which is consistent with the data frame.
df = pd.DataFrame([
    {'station_id': 1, 'rental_type': 'KEY'},
    {'station_id': 1, 'rental_type': 'CREDIT CARD'},
    {'station_id': 2, 'rental_type': 'KEY'},
    {'station_id': 2, 'rental_type': 'CREDIT CARD'}
])

connection = get_pg_connection(
    host=postgres_host,
    port=postgres_port,
    database=f'{username}_sms_db',
    user=f'{username}_sms_user',
    password=password
)

load_df_to_table(connection, df, 'station_rental_types')

In [27]:
%%sql

SELECT * FROM station_rental_types

 * postgresql://itversity_sms_user:***@m01.itversity.com:5433/itversity_sms_db
4 rows affected.


station_rental_type_id,station_id,rental_type
29,1,KEY
30,1,CREDIT CARD
31,2,KEY
32,2,CREDIT CARD


Here is the core logic to load a data frame into a table.

```python
def load_df_to_table(conn, df, table_name):
    cursor = connection.cursor()
    data = [tuple(value) for value in df.values]
    columns = ', '.join(df.columns)
    values_clause = ', '.join(['%s'] * len(df.columns))
    query = f"""
        INSERT INTO 
        {table_name} ({columns})
        VALUES ({values_clause})
        """
    cursor.executemany(query, data)
    connection.commit()
    cursor.close()
```

Here are some of the issues with the above logic.
* We are not logging the progress of the process.
* The code is not handling any exceptions.
* Data is being inserted using `executemany` all at once. If we have to deal with a very large data set, this might not be the most effective solution. 
* If you are not sure, please review our content related to database programming for batch operations as part of **18_database_programming_batch_operations**.
* As we have only one function for all the tables, it will be easier to close those gaps in this function compared to primitive solution.