#### **First Step**: Write the queries for the PostgreSQL database based on the data.

Task:

- Identify the data types to write the database `schema.sql`
- Transfer the data from the CSV to a query that allows me to insert all the `seed_data.sql`

In [1]:
import pandas as pd
import sys
import os

# Add the 'src' folder to sys.path
sys.path.append(os.path.abspath(os.path.join('..', 'src')))

In [2]:
df = pd.read_csv('../data/raw/credit_card_purchases.csv')

In [3]:
from utils.pysqlschema import SQLSchemaGenerator

generator = SQLSchemaGenerator(table_name='credit_card_transactions')
generator.generate_schema(df, '../sql/schema.sql')
generator.generate_seed_data(df, '../sql/seed_data.sql')

2024-08-24 15:45:54,471 - Generating schema for credit_card_transactions
2024-08-24 15:45:54,472 - Infering SQL type for int64
2024-08-24 15:45:54,477 - Infering SQL type for object
2024-08-24 15:45:54,478 - Infering SQL type for int64
2024-08-24 15:45:54,483 - Infering SQL type for object
2024-08-24 15:45:54,483 - Infering SQL type for object
2024-08-24 15:45:54,484 - Infering SQL type for float64
2024-08-24 15:45:54,485 - Infering SQL type for object
2024-08-24 15:45:54,486 - Infering SQL type for object
2024-08-24 15:45:54,487 - Infering SQL type for object
2024-08-24 15:45:54,488 - Infering SQL type for object
2024-08-24 15:45:54,488 - Infering SQL type for object
2024-08-24 15:45:54,490 - Infering SQL type for object
2024-08-24 15:45:54,491 - Infering SQL type for int64
2024-08-24 15:45:54,495 - Infering SQL type for float64
2024-08-24 15:45:54,497 - Infering SQL type for float64
2024-08-24 15:45:54,498 - Infering SQL type for int64
2024-08-24 15:45:54,501 - Infering SQL type for 

---

#### **Second Step**: Upload data to database

Task:

- Import db class to use connector
- Establish connection and execute the queries to create the schema and send the data.
- Validate that the table has been created and that all records have been loaded.

In [4]:
from connections.db import DB
db = DB()

In [5]:
# Remove the table if it already exists
db.execute("../sql/queries/002_drop_tables.sql", fetch_results=False)

2024-08-24 15:47:40,429 - ✔ Connected to database
2024-08-24 15:47:40,562 - ✔ Query executed
2024-08-24 15:47:40,563 - ✔ Cursor closed
2024-08-24 15:47:40,564 - ✔ Connection closed


In [6]:
# Create schema
db.execute("../sql/schema.sql", fetch_results=False)

2024-08-24 15:47:41,386 - ✔ Connected to database
2024-08-24 15:47:41,518 - ✔ Query executed
2024-08-24 15:47:41,519 - ✔ Cursor closed
2024-08-24 15:47:41,520 - ✔ Connection closed


In [7]:
# Seed data by executing the seed data script in batches
db.execute_in_batches("../sql/seed_data.sql", batch_size=20000)

2024-08-24 15:47:42,250 - ✔ Connected to database
2024-08-24 15:47:57,498 - ✔ Executed a batch of 20000 records
2024-08-24 15:48:11,525 - ✔ Executed a batch of 20000 records
2024-08-24 15:48:25,106 - ✔ Executed a batch of 20000 records
2024-08-24 15:48:39,384 - ✔ Executed a batch of 20000 records
2024-08-24 15:49:02,796 - ✔ Executed a batch of 20000 records
2024-08-24 15:49:16,940 - ✔ Executed a batch of 20000 records
2024-08-24 15:49:29,923 - ✔ Executed a batch of 20000 records
2024-08-24 15:49:40,716 - ✔ Executed a batch of 20000 records
2024-08-24 15:50:01,950 - ✔ Executed a batch of 20000 records
2024-08-24 15:50:16,156 - ✔ Executed a batch of 20000 records
2024-08-24 15:50:29,853 - ✔ Executed a batch of 20000 records
2024-08-24 15:50:45,723 - ✔ Executed a batch of 20000 records
2024-08-24 15:51:07,141 - ✔ Executed a batch of 20000 records
2024-08-24 15:51:21,095 - ✔ Executed a batch of 20000 records
2024-08-24 15:51:31,892 - ✔ Executed a batch of 20000 records
2024-08-24 15:51:43,

In [10]:
# View tables
db.execute("../sql/queries/001_view_tables.sql", fetch_results=True)

2024-08-24 16:04:56,987 - ✔ Connected to database
2024-08-24 16:04:57,119 - ✔ Query executed
2024-08-24 16:04:57,120 - ✔ Cursor closed
2024-08-24 16:04:57,121 - ✔ Connection closed


[('credit_card_transactions',)]

In [11]:
# View size of tables
db.execute("../sql/queries/003_view_tables_sizes.sql")

2024-08-24 16:05:00,049 - ✔ Connected to database
2024-08-24 16:05:00,188 - ✔ Query executed
2024-08-24 16:05:00,189 - ✔ Cursor closed
2024-08-24 16:05:00,190 - ✔ Connection closed


[('public.credit_card_transactions', 1267098)]

---

#### **Results**:

- Created the query to define the database schema based on the data.
- Created the query to insert the data seed to the database.
- Connection established with the database.
- Creation of table with the defined schema.
- Load data into the table using batches of size 20,000

---