# Data Importer
[![Static Badge](https://img.shields.io/badge/Jupyter_Notebook-F37726?style=for-the-badge)](https://jupyter.org/)

<br>

Performs any **adhoc** database operations that need to be done in bulk. 

<br>

## Requirements
- Python (Version 3.6 or up)

<br>
<br>

## Installation
Run the pip install command below:

In [8]:
pip install pandas

Collecting pandas
  Downloading pandas-2.3.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting numpy>=1.22.4 (from pandas)
  Downloading numpy-2.2.6-cp310-cp310-macosx_14_0_arm64.whl.metadata (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.3.0-cp310-cp310-macosx_11_0_arm64.whl (10.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading numpy-2.2.6-cp310-cp310-macosx_14_0_arm64.whl (5.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[

In [9]:
pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.1.0
Note: you may need to restart the kernel to use updated packages.


In [10]:

pip install SQLAlchemy

Collecting SQLAlchemy
  Downloading sqlalchemy-2.0.41-cp310-cp310-macosx_11_0_arm64.whl.metadata (9.6 kB)
Downloading sqlalchemy-2.0.41-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: SQLAlchemy
Successfully installed SQLAlchemy-2.0.41
Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install psycopg2-binary

Collecting psycopg2-binary
  Downloading psycopg2_binary-2.9.10-cp310-cp310-macosx_14_0_arm64.whl.metadata (4.9 kB)
Downloading psycopg2_binary-2.9.10-cp310-cp310-macosx_14_0_arm64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: psycopg2-binary
Successfully installed psycopg2-binary-2.9.10
Note: you may need to restart the kernel to use updated packages.


<br>
<br>

## Initialization

Run the codeblock below to initialize all the necessary tools

<br>

> ***❇️ Important*** <br>
>
> You may need to restart the kernel of this notebook if you changed anything in the source code
> 

In [1]:
import sys

sys.path.insert(1, r"src")

import DataImporter as DI

        
########
# MAIN #
########
Secrets = DI.DBSecrets.load()
Database = DI.DBNames.Prod.value
importer = DI.Importer(Secrets, database = Database, useConnPool = False)


<br>
<br>

## Dataset Format
A *dataset* is a folder that contains many .csv files.<br> 
For simplicity, each .csv file references a particular table. 

<br>

> ***📝 NOTE:*** <br>
>
> The id keys in the .csv files are only for convenience of debugging
> 
> These ids will be regenerated on the database side.


<br>
<br>

## Importing a Dataset

The following codeblock gives some example of importing a dataset. <br>
The data will first be cleaning, before being imported.

<br>

For cleaning, we have the following settings:

| Clean Level | Description |
| ----------- | ----------- |
| None        | No data cleaning done |
| Tuples      | Clears all data from every table |
| Tables      | Deletes every table in the database |
| Database    | Deletes an entire database |

<br>

For importing, we have the following setttings:

| Build Level | Description |
| ----------- | ----------- |
| Tuples      | Only Imports the data into existing tables |
| Tables      | Constructs the required tables, then imports the data |
| Database    | Constructs a database and the required tables, before importing the data | 

In [None]:
print("===== STARTING TO IMPORT DATA ========")

importer.importData(r"data/Toy Dataset", cleanLevel = DI.ImportLevel.Database, buildLevel = DI.ImportLevel.Database)

print("========== IMPORT COMPLETE ===========")

Deleting database by the name, production ...
Constructing the database by the name, production ...
Constructing all tables...
Inserting User Data...
Inserting Building Data...
Inserting Room Data...
Inserting Booking Data...
Inserting Cancellation Data...


: 

<br>
<br>

## Clearing all Data

The following codeblock gives an example of clearing the data from all the tables

<br>

> ***❗ WARNING:*** <br>
>
> ONLY DO THIS IF YOU ARE ABSOLUTELY SURE OF WHAT YOU ARE DOING
> 

<br>

For the cleaning settings, please refer to the table at [Importing a Dataset](#importing-a-dataset)


In [None]:
print("===== STARTING TO DELETE DATA ========")

importer.clean(cleanLevel = DI.ImportLevel.Tables)

print("========= DELETION COMPLETE ==========")

Deleting all tables...


<br>
<br>

## Execute Custom SQL

The following codeblock gives some example to execute some custom sql command

In [10]:
import psycopg2
import psycopg2.sql


selectNameSQL = '''
SELECT "bookingID", "userID", NOW() 
FROM "Cancellation"
WHERE "bookingID" = %(booking_id)s AND "userID" = %(user_id)s ;
'''
connData, cursor, err = importer.executeSQL(selectNameSQL, {"booking_id": "14d85a4c-87c4-43a2-a399-8b229cec9a5d", "user_id": "6e25f54e-5cf8-40e6-a8b0-a446e9f6529e"}, closeConn = False)

if (err is None):
    print(cursor.fetchone())
    connData.putConn()
else:
    connData.close()
    raise err

('14d85a4c-87c4-43a2-a399-8b229cec9a5d', '6e25f54e-5cf8-40e6-a8b0-a446e9f6529e', datetime.datetime(2025, 6, 19, 1, 58, 40, 696739, tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)))
