# OLTP - Database
The database, as shown in the OLTP.jpg diagram, is meant to represent activity related to an e-commerce site. 

In this file I will:

1- Create the database (create_db.py)

2- Populate it with synthetic data (populate.py)

3- Error inyect the synthetic data (error_inyect.py)

The idea is to have dirty data to prove our ETL incremental pipeline is resilient.

In [1]:
import pandas as pd

import sqlite3
from datetime import date

INITIAL_DATE = date(2026, 2, 2)

In [2]:
conn = sqlite3.connect('ecommerce-OLTP.db')

In [3]:
# Creates db in sqlite.
import scripts.create_db as create_db

create_db.main()

Tables created successfully.


In [4]:
# Populates our db.
import scripts.populate as populate

populate.populate_db_first_time(conn, num_users=100, num_products=20, join_date=INITIAL_DATE)


Inserted 100 users and 20 products on 2026-02-02


In [5]:


# Connect to SQLite database
conn = sqlite3.connect('ecommerce-OLTP.db')

## Testing change functionality

### Users

In [6]:
df_users = pd.read_sql_query('SELECT * FROM users', conn)
df_users

Unnamed: 0,user_id,name,email,join_date
0,1,Lindsey Mcpherson,donaldanderson@example.com,2026-02-02
1,2,David Lynch,smithpatrick@example.org,2026-02-02
2,3,Robert Ashley,burnsjustin@example.org,2026-02-02
3,4,Andrew Herrera,martinezlaura@example.net,2026-02-02
4,5,Robert Reyes,ruthheath@example.net,2026-02-02
...,...,...,...,...
95,96,Bradley King,ascott@example.net,2026-02-02
96,97,Deanna Diaz,fjohnson@example.net,2026-02-02
97,98,Travis Martin,kaitlin53@example.net,2026-02-02
98,99,Valerie Sanchez,charles99@example.org,2026-02-02


In [7]:
populate.change_existent_users(conn, 100)

Updated emails of 100 users: [87, 69, 46, 21, 93, 49, 80, 28, 1, 86, 36, 15, 29, 34, 23, 32, 38, 47, 100, 55, 63, 64, 7, 30, 33, 92, 54, 3, 52, 24, 35, 97, 12, 95, 89, 25, 82, 68, 91, 85, 14, 83, 60, 58, 56, 66, 81, 90, 51, 5, 26, 67, 96, 76, 78, 11, 99, 88, 72, 37, 75, 9, 53, 42, 74, 84, 13, 57, 71, 62, 65, 40, 77, 98, 94, 22, 70, 43, 20, 16, 19, 4, 73, 6, 39, 41, 8, 59, 2, 18, 27, 50, 17, 45, 31, 79, 10, 44, 61, 48]


In [8]:
df_users = pd.read_sql_query('SELECT * FROM users', conn)
df_users

Unnamed: 0,user_id,name,email,join_date
0,1,Lindsey Mcpherson,paul01@example.org,2026-02-02
1,2,David Lynch,meadowsmicheal@example.com,2026-02-02
2,3,Robert Ashley,colemanandre@example.net,2026-02-02
3,4,Andrew Herrera,joycemargaret@example.net,2026-02-02
4,5,Robert Reyes,jessica72@example.org,2026-02-02
...,...,...,...,...
95,96,Bradley King,johnsonsean@example.org,2026-02-02
96,97,Deanna Diaz,jessicaharris@example.org,2026-02-02
97,98,Travis Martin,holly03@example.org,2026-02-02
98,99,Valerie Sanchez,jennifer44@example.net,2026-02-02


### Products

In [9]:
df_products = pd.read_sql_query('SELECT * FROM products', conn)
df_products

Unnamed: 0,product_id,name,category,price,stock
0,1,Cause Stay,Beauty,141.53,198
1,2,Through Car,Beauty,58.41,195
2,3,Determine Eight,Clothing,383.85,90
3,4,Carry They,Clothing,68.79,105
4,5,Way Reflect,Accessories,213.01,83
5,6,Record Here,Clothing,206.01,142
6,7,Writer Here,Accessories,347.29,123
7,8,Plan Sense,Electronics,99.11,84
8,9,Away Never,Beauty,244.89,127
9,10,Organization Wife,Beauty,89.02,169


In [11]:
populate.change_existent_products(conn, 20, [50, 100], [30, 100])

Randomly updated 20 products (some attributes may remain unchanged).


In [12]:
df_products = pd.read_sql_query('SELECT * FROM products', conn)
df_products

Unnamed: 0,product_id,name,category,price,stock
0,1,Experience Market,Footwear,141.53,198
1,2,Fall While,Footwear,73.13,195
2,3,Determine Eight,Accessories,383.85,90
3,4,Carry They,Electronics,86.7,51
4,5,Way Reflect,Footwear,98.42,37
5,6,Similar Allow,Beauty,206.01,142
6,7,Should Economy,Accessories,70.31,123
7,8,Decade Pull,Electronics,65.13,84
8,9,Teach Couple,Footwear,244.89,90
9,10,Organization Wife,Beauty,65.96,169


In [13]:
df_transactions = pd.read_sql_query('SELECT * FROM transactions', conn)
df_transactions


Unnamed: 0,transaction_id,date,user_id,product_id,quantity,price,payment_type,status


In [14]:
from datetime import date

populate.create_new_transactions(conn, 300, date(2026, 2, 2), 300, [0.7, 0.3])


Inserted 300 transactions (attempted 300).


300

In [None]:
# Close the connection
conn.close()