# <span style='color:magenta; font-weight:bold;'>Data Cleaning and Transformation</span>

This notebook focuses on cleaning and transforming the raw data to match the schema defined in the Entity-Relationship Diagram (ERD). Each table is normalized and prepared for loading into a PostgreSQL database.

### Objectives:
1. Extract and transform raw data into structured tables.
2. Align the data with the star schema based on the ERD.
3. Prepare the data for loading into the PostgreSQL data warehouse.

The following tables will be processed:
- **Products**: Normalize product details, extract ratings, and map categories.
- **Categories**: Map category names to unique IDs.
- **Carts**: Normalize cart details and create individual rows for each product in a cart.
- **Users**: Flatten user details and addresses, assigning unique address IDs.
- **Address**: Extract user address details into a separate table.

## Import Libraries & Load the Data

In [1]:
# Essentials
import pandas as pd
import numpy as np
import ast
import os

In [2]:
# Create directories for raw data
#os.makedirs("data/processed", exist_ok=True)

In [3]:
# Load datasets
products = pd.read_csv("../data/processed/products.csv")
categories = pd.read_csv("../data/processed/categories.csv")
carts = pd.read_csv("../data/processed/carts.csv")
users = pd.read_csv("../data/processed/users.csv")

## <span style='color:magenta; font-weight:bold'>Exploratory Data Analysis (EDA)</span>

In this section, we explore the raw data to:
1. Understand the structure and distribution of the data.
2. Identify missing or inconsistent values.
3. Detect duplicates or outliers.
4. Gain insights to guide the data cleaning and transformation process.

### Steps:
- Display the first few rows of each dataset.
- Check for missing values and duplicates.
- Summarize key statistics for numerical and categorical columns.


### General EDA function

In [4]:
def eda(df):
    print("-------------------------------TOP 5 RECORDS-----------------------------")
    display(df.head())
    
    print("\n-------------------------------INFO--------------------------------------")
    display(df.info())
    
    print("\n-------------------------------Describe----------------------------------")
    display(df.describe())
    
    print("\n-------------------------------Columns-----------------------------------")
    display(df.columns)
    
    print("\n----------------------------Missing Values-------------------------------")
    display(df.isnull().sum())
    
    print("\n--------------------------Shape Of Data---------------------------------")
    display(df.shape)

### EDA on Datasets

In [5]:
# Exploring the carts dataset
print("=================================Carts Data=================================")
eda(carts)

# Exploring the products dataset
print("=================================Products Data=================================")
eda(products)

# Exploring the categories events dataset
print("=================================Categories Events=================================")
eda(categories)

# Exploring the users dataset
print("=================================Users Data=================================")
eda(users)

-------------------------------TOP 5 RECORDS-----------------------------


Unnamed: 0,id,userId,date,products,__v
0,1,1,2020-03-02T00:00:00.000Z,"[{'productId': 1, 'quantity': 4}, {'productId'...",0
1,2,1,2020-01-02T00:00:00.000Z,"[{'productId': 2, 'quantity': 4}, {'productId'...",0
2,3,2,2020-03-01T00:00:00.000Z,"[{'productId': 1, 'quantity': 2}, {'productId'...",0
3,4,3,2020-01-01T00:00:00.000Z,"[{'productId': 1, 'quantity': 4}]",0
4,5,3,2020-03-01T00:00:00.000Z,"[{'productId': 7, 'quantity': 1}, {'productId'...",0



-------------------------------INFO--------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        7 non-null      int64 
 1   userId    7 non-null      int64 
 2   date      7 non-null      object
 3   products  7 non-null      object
 4   __v       7 non-null      int64 
dtypes: int64(3), object(2)
memory usage: 408.0+ bytes


None


-------------------------------Describe----------------------------------


Unnamed: 0,id,userId,__v
count,7.0,7.0,7.0
mean,4.0,3.142857,0.0
std,2.160247,2.410295,0.0
min,1.0,1.0,0.0
25%,2.5,1.5,0.0
50%,4.0,3.0,0.0
75%,5.5,3.5,0.0
max,7.0,8.0,0.0



-------------------------------Columns-----------------------------------


Index(['id', 'userId', 'date', 'products', '__v'], dtype='object')


----------------------------Missing Values-------------------------------


id          0
userId      0
date        0
products    0
__v         0
dtype: int64


--------------------------Shape Of Data---------------------------------


(7, 5)

-------------------------------TOP 5 RECORDS-----------------------------


Unnamed: 0,id,title,price,description,category,image,rating
0,1,"Fjallraven - Foldsack No. 1 Backpack, Fits 15 ...",109.95,Your perfect pack for everyday use and walks i...,men's clothing,https://fakestoreapi.com/img/81fPKd-2AYL._AC_S...,"{'rate': 3.9, 'count': 120}"
1,2,Mens Casual Premium Slim Fit T-Shirts,22.3,"Slim-fitting style, contrast raglan long sleev...",men's clothing,https://fakestoreapi.com/img/71-3HjGNDUL._AC_S...,"{'rate': 4.1, 'count': 259}"
2,3,Mens Cotton Jacket,55.99,great outerwear jackets for Spring/Autumn/Wint...,men's clothing,https://fakestoreapi.com/img/71li-ujtlUL._AC_U...,"{'rate': 4.7, 'count': 500}"
3,4,Mens Casual Slim Fit,15.99,The color could be slightly different between ...,men's clothing,https://fakestoreapi.com/img/71YXzeOuslL._AC_U...,"{'rate': 2.1, 'count': 430}"
4,5,John Hardy Women's Legends Naga Gold & Silver ...,695.0,"From our Legends Collection, the Naga was insp...",jewelery,https://fakestoreapi.com/img/71pWzhdJNwL._AC_U...,"{'rate': 4.6, 'count': 400}"



-------------------------------INFO--------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   id           20 non-null     int64  
 1   title        20 non-null     object 
 2   price        20 non-null     float64
 3   description  20 non-null     object 
 4   category     20 non-null     object 
 5   image        20 non-null     object 
 6   rating       20 non-null     object 
dtypes: float64(1), int64(1), object(5)
memory usage: 1.2+ KB


None


-------------------------------Describe----------------------------------


Unnamed: 0,id,price
count,20.0,20.0
mean,10.5,162.046
std,5.91608,272.220532
min,1.0,7.95
25%,5.75,15.24
50%,10.5,56.49
75%,15.25,110.9625
max,20.0,999.99



-------------------------------Columns-----------------------------------


Index(['id', 'title', 'price', 'description', 'category', 'image', 'rating'], dtype='object')


----------------------------Missing Values-------------------------------


id             0
title          0
price          0
description    0
category       0
image          0
rating         0
dtype: int64


--------------------------Shape Of Data---------------------------------


(20, 7)

-------------------------------TOP 5 RECORDS-----------------------------


Unnamed: 0,categories
0,electronics
1,jewelery
2,men's clothing
3,women's clothing



-------------------------------INFO--------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   categories  4 non-null      object
dtypes: object(1)
memory usage: 160.0+ bytes


None


-------------------------------Describe----------------------------------


Unnamed: 0,categories
count,4
unique,4
top,electronics
freq,1



-------------------------------Columns-----------------------------------


Index(['categories'], dtype='object')


----------------------------Missing Values-------------------------------


categories    0
dtype: int64


--------------------------Shape Of Data---------------------------------


(4, 1)

-------------------------------TOP 5 RECORDS-----------------------------


Unnamed: 0,address,id,email,username,password,name,phone,__v
0,"{'geolocation': {'lat': '-37.3159', 'long': '8...",1,john@gmail.com,johnd,m38rmF$,"{'firstname': 'john', 'lastname': 'doe'}",1-570-236-7033,0
1,"{'geolocation': {'lat': '-37.3159', 'long': '8...",2,morrison@gmail.com,mor_2314,83r5^_,"{'firstname': 'david', 'lastname': 'morrison'}",1-570-236-7033,0
2,"{'geolocation': {'lat': '40.3467', 'long': '-3...",3,kevin@gmail.com,kevinryan,kev02937@,"{'firstname': 'kevin', 'lastname': 'ryan'}",1-567-094-1345,0
3,"{'geolocation': {'lat': '50.3467', 'long': '-2...",4,don@gmail.com,donero,ewedon,"{'firstname': 'don', 'lastname': 'romer'}",1-765-789-6734,0
4,"{'geolocation': {'lat': '40.3467', 'long': '-4...",5,derek@gmail.com,derek,jklg*_56,"{'firstname': 'derek', 'lastname': 'powell'}",1-956-001-1945,0



-------------------------------INFO--------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   address   10 non-null     object
 1   id        10 non-null     int64 
 2   email     10 non-null     object
 3   username  10 non-null     object
 4   password  10 non-null     object
 5   name      10 non-null     object
 6   phone     10 non-null     object
 7   __v       10 non-null     int64 
dtypes: int64(2), object(6)
memory usage: 768.0+ bytes


None


-------------------------------Describe----------------------------------


Unnamed: 0,id,__v
count,10.0,10.0
mean,5.5,0.0
std,3.02765,0.0
min,1.0,0.0
25%,3.25,0.0
50%,5.5,0.0
75%,7.75,0.0
max,10.0,0.0



-------------------------------Columns-----------------------------------


Index(['address', 'id', 'email', 'username', 'password', 'name', 'phone',
       '__v'],
      dtype='object')


----------------------------Missing Values-------------------------------


address     0
id          0
email       0
username    0
password    0
name        0
phone       0
__v         0
dtype: int64


--------------------------Shape Of Data---------------------------------


(10, 8)

## <span style='color:magenta; font-weight:bold'>Preparing Tables for Transformation</span>

After performing EDA, the next step is to clean and transform the raw data to align with the schema defined in the Entity-Relationship Diagram (ERD). This involves:

1. **Products Table**:
   - Extract and split the `rating` field into `rate` and `count`.
   - Map `category` names to `CategoryId` using the `Categories` table.

2. **Categories Table**:
   - Ensure category names are mapped to unique IDs.

3. **Carts Table**:
   - Normalize the nested `products` field into separate rows, each representing a product in a cart.

4. **Users Table**:
   - Flatten user details and split the `name` field into `Firstname` and `Lastname`.
   - Assign unique `AddressId` values to each user.

5. **Address Table**:
   - Extract address details from the `users` table into a separate `Address` table.

### Objectives:
- Normalize each dataset to match the ERD schema.
- Prepare the tables for loading into the PostgreSQL database.


### Carts Fact Table

In [6]:
carts = pd.read_csv("../data/processed/carts.csv")
carts

Unnamed: 0,id,userId,date,products,__v
0,1,1,2020-03-02T00:00:00.000Z,"[{'productId': 1, 'quantity': 4}, {'productId'...",0
1,2,1,2020-01-02T00:00:00.000Z,"[{'productId': 2, 'quantity': 4}, {'productId'...",0
2,3,2,2020-03-01T00:00:00.000Z,"[{'productId': 1, 'quantity': 2}, {'productId'...",0
3,4,3,2020-01-01T00:00:00.000Z,"[{'productId': 1, 'quantity': 4}]",0
4,5,3,2020-03-01T00:00:00.000Z,"[{'productId': 7, 'quantity': 1}, {'productId'...",0
5,6,4,2020-03-01T00:00:00.000Z,"[{'productId': 10, 'quantity': 2}, {'productId...",0
6,7,8,2020-03-01T00:00:00.000Z,"[{'productId': 18, 'quantity': 1}]",0


In [7]:
# Convert 'products' to a list of dictionaries if it is in string format
def safe_eval(value):
    if isinstance(value, str):
        try:
            return ast.literal_eval(value)
        except ValueError:
            print(f"Could not parse: {value}")
            return None
    return value

carts["products"] = carts["products"].apply(safe_eval)

# Drop rows where 'products' could not be parsed
carts = carts.dropna(subset=["products"])

# Explode the 'products' column so each product becomes its own row
carts = carts.explode("products")

# Ensure each element in the 'products' column is a dictionary
carts["products"] = carts["products"].apply(lambda x: x if isinstance(x, dict) else ast.literal_eval(str(x)))

# Extract 'productId' and 'quantity' into separate columns
carts["productId"] = carts["products"].apply(lambda x: x["productId"])
carts["quantity"] = carts["products"].apply(lambda x: x["quantity"])

# Drop the original 'products' column and '__v' column
carts.drop(columns=["products", "__v"], inplace=True)

# 3. Clean the date column
carts["date"] = pd.to_datetime(carts["date"], format="%Y-%m-%dT%H:%M:%S.%fZ")

# 4. Ensure 'id' is unique and integer
carts.reset_index(drop=True, inplace=True)  # Reset the index
carts["id"] = range(1, len(carts) + 1)  # Generate unique integer IDs
# Rename 'id' column to 'carts_id'
carts.rename(columns={"id": "CartsId"}, inplace=True)



# Output the transformed DataFrame
carts.head()

Unnamed: 0,CartsId,userId,date,productId,quantity
0,1,1,2020-03-02,1,4
1,2,1,2020-03-02,2,1
2,3,1,2020-03-02,3,6
3,4,1,2020-01-02,2,4
4,5,1,2020-01-02,1,10


### Categories Dimension Table

In [8]:
categories = pd.read_csv("../data/processed/categories.csv")
categories

Unnamed: 0,categories
0,electronics
1,jewelery
2,men's clothing
3,women's clothing


In [9]:
# Add a unique integer 'id' column
categories["CategoryId"] = range(1, len(categories) + 1)

# Display the updated DataFrame
categories.head()

Unnamed: 0,categories,CategoryId
0,electronics,1
1,jewelery,2
2,men's clothing,3
3,women's clothing,4


### Products Dimension Table

In [10]:
products = pd.read_csv("../data/processed/products.csv")
products

Unnamed: 0,id,title,price,description,category,image,rating
0,1,"Fjallraven - Foldsack No. 1 Backpack, Fits 15 ...",109.95,Your perfect pack for everyday use and walks i...,men's clothing,https://fakestoreapi.com/img/81fPKd-2AYL._AC_S...,"{'rate': 3.9, 'count': 120}"
1,2,Mens Casual Premium Slim Fit T-Shirts,22.3,"Slim-fitting style, contrast raglan long sleev...",men's clothing,https://fakestoreapi.com/img/71-3HjGNDUL._AC_S...,"{'rate': 4.1, 'count': 259}"
2,3,Mens Cotton Jacket,55.99,great outerwear jackets for Spring/Autumn/Wint...,men's clothing,https://fakestoreapi.com/img/71li-ujtlUL._AC_U...,"{'rate': 4.7, 'count': 500}"
3,4,Mens Casual Slim Fit,15.99,The color could be slightly different between ...,men's clothing,https://fakestoreapi.com/img/71YXzeOuslL._AC_U...,"{'rate': 2.1, 'count': 430}"
4,5,John Hardy Women's Legends Naga Gold & Silver ...,695.0,"From our Legends Collection, the Naga was insp...",jewelery,https://fakestoreapi.com/img/71pWzhdJNwL._AC_U...,"{'rate': 4.6, 'count': 400}"
5,6,Solid Gold Petite Micropave,168.0,Satisfaction Guaranteed. Return or exchange an...,jewelery,https://fakestoreapi.com/img/61sbMiUnoGL._AC_U...,"{'rate': 3.9, 'count': 70}"
6,7,White Gold Plated Princess,9.99,Classic Created Wedding Engagement Solitaire D...,jewelery,https://fakestoreapi.com/img/71YAIFU48IL._AC_U...,"{'rate': 3, 'count': 400}"
7,8,Pierced Owl Rose Gold Plated Stainless Steel D...,10.99,Rose Gold Plated Double Flared Tunnel Plug Ear...,jewelery,https://fakestoreapi.com/img/51UDEzMJVpL._AC_U...,"{'rate': 1.9, 'count': 100}"
8,9,WD 2TB Elements Portable External Hard Drive -...,64.0,USB 3.0 and USB 2.0 Compatibility Fast data tr...,electronics,https://fakestoreapi.com/img/61IBBVJvSDL._AC_S...,"{'rate': 3.3, 'count': 203}"
9,10,SanDisk SSD PLUS 1TB Internal SSD - SATA III 6...,109.0,"Easy upgrade for faster boot up, shutdown, app...",electronics,https://fakestoreapi.com/img/61U7T1koQqL._AC_S...,"{'rate': 2.9, 'count': 470}"


In [11]:
products.columns

Index(['id', 'title', 'price', 'description', 'category', 'image', 'rating'], dtype='object')

In [12]:
# 1. Split the 'rating' column into 'rate' and 'count'
products["rating"] = products["rating"].apply(ast.literal_eval)  # Convert string to dict
products["rate"] = products["rating"].apply(lambda x: x["rate"])
products["count"] = products["rating"].apply(lambda x: x["count"])
products.drop(columns=["rating"], inplace=True)  # Remove the original 'rating' column

# 2. Replace 'category' with its corresponding id from the categories table
category_map = {row["categories"]: row["CategoryId"] for _, row in categories.iterrows()}
products["category"] = products["category"].map(category_map)
products.rename(columns={"category": "CategoryId"}, inplace=True)
products.rename(columns={"id": "ProductsId"}, inplace=True)


# Display the updated DataFrame
products.head()

Unnamed: 0,ProductsId,title,price,description,CategoryId,image,rate,count
0,1,"Fjallraven - Foldsack No. 1 Backpack, Fits 15 ...",109.95,Your perfect pack for everyday use and walks i...,3,https://fakestoreapi.com/img/81fPKd-2AYL._AC_S...,3.9,120
1,2,Mens Casual Premium Slim Fit T-Shirts,22.3,"Slim-fitting style, contrast raglan long sleev...",3,https://fakestoreapi.com/img/71-3HjGNDUL._AC_S...,4.1,259
2,3,Mens Cotton Jacket,55.99,great outerwear jackets for Spring/Autumn/Wint...,3,https://fakestoreapi.com/img/71li-ujtlUL._AC_U...,4.7,500
3,4,Mens Casual Slim Fit,15.99,The color could be slightly different between ...,3,https://fakestoreapi.com/img/71YXzeOuslL._AC_U...,2.1,430
4,5,John Hardy Women's Legends Naga Gold & Silver ...,695.0,"From our Legends Collection, the Naga was insp...",2,https://fakestoreapi.com/img/71pWzhdJNwL._AC_U...,4.6,400


### Users Dimension Table

In [13]:
users = pd.read_csv("../data/processed/users.csv")
users

Unnamed: 0,address,id,email,username,password,name,phone,__v
0,"{'geolocation': {'lat': '-37.3159', 'long': '8...",1,john@gmail.com,johnd,m38rmF$,"{'firstname': 'john', 'lastname': 'doe'}",1-570-236-7033,0
1,"{'geolocation': {'lat': '-37.3159', 'long': '8...",2,morrison@gmail.com,mor_2314,83r5^_,"{'firstname': 'david', 'lastname': 'morrison'}",1-570-236-7033,0
2,"{'geolocation': {'lat': '40.3467', 'long': '-3...",3,kevin@gmail.com,kevinryan,kev02937@,"{'firstname': 'kevin', 'lastname': 'ryan'}",1-567-094-1345,0
3,"{'geolocation': {'lat': '50.3467', 'long': '-2...",4,don@gmail.com,donero,ewedon,"{'firstname': 'don', 'lastname': 'romer'}",1-765-789-6734,0
4,"{'geolocation': {'lat': '40.3467', 'long': '-4...",5,derek@gmail.com,derek,jklg*_56,"{'firstname': 'derek', 'lastname': 'powell'}",1-956-001-1945,0
5,"{'geolocation': {'lat': '20.1677', 'long': '-1...",6,david_r@gmail.com,david_r,3478*#54,"{'firstname': 'david', 'lastname': 'russell'}",1-678-345-9856,0
6,"{'geolocation': {'lat': '10.3456', 'long': '20...",7,miriam@gmail.com,snyder,f238&@*$,"{'firstname': 'miriam', 'lastname': 'snyder'}",1-123-943-0563,0
7,"{'geolocation': {'lat': '50.3456', 'long': '10...",8,william@gmail.com,hopkins,William56$hj,"{'firstname': 'william', 'lastname': 'hopkins'}",1-478-001-0890,0
8,"{'geolocation': {'lat': '40.12456', 'long': '2...",9,kate@gmail.com,kate_h,kfejk@*_,"{'firstname': 'kate', 'lastname': 'hale'}",1-678-456-1934,0
9,"{'geolocation': {'lat': '30.24788', 'long': '-...",10,jimmie@gmail.com,jimmie_k,klein*#%*,"{'firstname': 'jimmie', 'lastname': 'klein'}",1-104-001-4567,0


In [14]:
users.address.shape

(10,)

In [15]:
# Safe eval function to handle parsing issues
def safe_eval(value):
    if isinstance(value, str):
        try:
            return ast.literal_eval(value)
        except (ValueError, SyntaxError):
            print(f"Could not parse: {value}")
            return None
    return value

# Parse the 'address' column
users["address"] = users["address"].apply(safe_eval)

# Drop rows where 'address' could not be parsed
users = users.dropna(subset=["address"])

### Address Dimension Table

In [16]:
# Convert the dictionaries to strings to make them hashable
users["address_str"] = users["address"].apply(lambda x: str(x))

# Extract unique addresses
addresses = pd.DataFrame(users["address_str"].unique(), columns=["address_str"])
addresses["AddressId"] = range(1, len(addresses) + 1)

# Map the string representation back to the original dictionary
addresses["address"] = addresses["address_str"].apply(safe_eval)

# Drop the string representation column
addresses.drop(columns=["address_str"], inplace=True)

# Map 'address_id' back to users
address_map = {str(row["address"]): row["AddressId"] for _, row in addresses.iterrows()}
users["AddressId"] = users["address"].apply(lambda x: address_map.get(str(x)))

# Drop the 'address' column from users
users.drop(columns=["address"], inplace=True)

# Step 3: Parse and split the 'name' column
users["name"] = users["name"].apply(safe_eval)
users["firstname"] = users["name"].apply(lambda x: x["firstname"] if isinstance(x, dict) else None)
users["lastname"] = users["name"].apply(lambda x: x["lastname"] if isinstance(x, dict) else None)
users.drop(columns=["name","address_str","__v"], inplace=True)



# Split the 'address' column in the addresses table
addresses["geolocation"] = addresses["address"].apply(lambda x: x["geolocation"] if isinstance(x, dict) else None)
addresses["city"] = addresses["address"].apply(lambda x: x["city"] if isinstance(x, dict) else None)
addresses["street"] = addresses["address"].apply(lambda x: x["street"] if isinstance(x, dict) else None)
addresses["number"] = addresses["address"].apply(lambda x: x["number"] if isinstance(x, dict) else None)
addresses["zipcode"] = addresses["address"].apply(lambda x: x["zipcode"] if isinstance(x, dict) else None)

# Split 'geolocation' into 'lat' and 'long'
addresses["latitude"] = addresses["geolocation"].apply(lambda x: x["lat"] if isinstance(x, dict) else None)
addresses["longitude"] = addresses["geolocation"].apply(lambda x: x["long"] if isinstance(x, dict) else None)

# Convert latitude and longitude to float
addresses["latitude"] = addresses["latitude"].astype(float)
addresses["longitude"] = addresses["longitude"].astype(float)

# Drop the original 'address' and 'geolocation' columns
addresses.drop(columns=["address", "geolocation"], inplace=True)

users.rename(columns={"id": "UsersId"}, inplace=True)

In [17]:
# Print for verification
print("Addresses Table:")
addresses

Addresses Table:


Unnamed: 0,AddressId,city,street,number,zipcode,latitude,longitude
0,1,kilcoole,new road,7682,12926-3874,-37.3159,81.1496
1,2,kilcoole,Lovers Ln,7267,12926-3874,-37.3159,81.1496
2,3,Cullman,Frances Ct,86,29567-1452,40.3467,-30.131
3,4,San Antonio,Hunters Creek Dr,6454,98234-1734,50.3467,-20.131
4,5,san Antonio,adams St,245,80796-1234,40.3467,-40.131
5,6,el paso,prospect st,124,12346-0456,20.1677,-10.6789
6,7,fresno,saddle st,1342,96378-0245,10.3456,20.6419
7,8,mesa,vally view ln,1342,96378-0245,50.3456,10.6419
8,9,miami,avondale ave,345,96378-0245,40.12456,20.5419
9,10,fort wayne,oak lawn ave,526,10256-4532,30.24788,-20.545419


In [18]:
print("Users Table:")
users

Users Table:


Unnamed: 0,UsersId,email,username,password,phone,AddressId,firstname,lastname
0,1,john@gmail.com,johnd,m38rmF$,1-570-236-7033,1,john,doe
1,2,morrison@gmail.com,mor_2314,83r5^_,1-570-236-7033,2,david,morrison
2,3,kevin@gmail.com,kevinryan,kev02937@,1-567-094-1345,3,kevin,ryan
3,4,don@gmail.com,donero,ewedon,1-765-789-6734,4,don,romer
4,5,derek@gmail.com,derek,jklg*_56,1-956-001-1945,5,derek,powell
5,6,david_r@gmail.com,david_r,3478*#54,1-678-345-9856,6,david,russell
6,7,miriam@gmail.com,snyder,f238&@*$,1-123-943-0563,7,miriam,snyder
7,8,william@gmail.com,hopkins,William56$hj,1-478-001-0890,8,william,hopkins
8,9,kate@gmail.com,kate_h,kfejk@*_,1-678-456-1934,9,kate,hale
9,10,jimmie@gmail.com,jimmie_k,klein*#%*,1-104-001-4567,10,jimmie,klein


### Save the Processed Tables

In [19]:
# Save the processed tables
products.to_csv("../data/processed/dimension_products.csv", index=False)
categories.to_csv("../data/processed/dimension_categories.csv", index=False)
carts.to_csv("../data/processed/fact_carts.csv", index=False)
users.to_csv("../data/processed/dimension_users.csv", index=False)
addresses.to_csv("../data/processed/dimension_address.csv", index=False)

print("All processed tables have been saved successfully.")

All processed tables have been saved successfully.
