**Customer Orders and Transactions**

Columns:
* Customer ID
* Customer Name
* Email, Order ID
* Order Date
* Product Type
* Product Id
* Quantity
* Price
* Discount
* Total
* Order Status

**Features:**
Generate a large dataset of customer orders for an e-commerce site. Simulate product purchases, calculate total prices, discounts, and track statuses like "Shipped," "Delivered," "Returned."

Manipulation Ideas:
* Calculate order totals with discounts.
* Filter by order status and date.
* Group by product to find best-sellers.
* Analyze customer order patterns over time.
* Generate reports on total sales per month or year.

In [2]:
import random
import datetime
import faker
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
fake = faker.Faker()

product_category = [
    "Electronics",
    "Sports & Outdoors",
    "Home Appliances",
    "Toys & Games",
    "Clothing & Appeal",
    "Baby Products",
    "Health & Beauty",
    "Office Supplies",
    "Software & Games",
    "Books & Stationery",
]

order_status = [
    "Shipping Soon", "Shipped", "Out For Delivery", "Delivered"
]

data = {
    "Customer Id": [random.randint(000000, 999999) for c in range(150)],
    "First Name": [fake.first_name() for fn in range(150)],
    "Last Name": [fake.last_name() for ln in range(150)],
    "Order Date": [fake.date_this_century() for dt in range(150)],
    "Product Type": [random.choice(product_category) for pt in range(150)],
    "Product ID": [random.randint(00000, 99999) for pi in range(150)],
    "Product Price (Per Item)": [fake.pydecimal(left_digits=3, right_digits=2, positive=True, min_value=1, max_value=700) for pp in range(150)],
    "Product Quantity": [random.randint(1, 10) for pq in range(150)],
    "Order Status": [random.choice(order_status) for os in range(150)]
    
}

In [4]:
# create the DataFrame
df = pd.DataFrame(data)

In [5]:
# generate email based on firstname and lastname
email_address = [f"{first.lower()}.{last.lower()}@customer.com" for first, last in zip(df['First Name'], df['Last Name'])]
df.insert(3, "Email Address", email_address)

# print the first x rows
print(df.head(5))

   Customer Id First Name Last Name                  Email Address  \
0       940022      James    Howard      james.howard@customer.com   
1       946912     Austin     Glenn      austin.glenn@customer.com   
2       875845      Colin    Watson      colin.watson@customer.com   
3       952587    Jeffery  Williams  jeffery.williams@customer.com   
4       884622     Joshua    Bryant     joshua.bryant@customer.com   

   Order Date        Product Type  Product ID Product Price (Per Item)  \
0  2002-02-27    Software & Games       11158                   635.32   
1  2014-03-25  Books & Stationery       31544                   157.64   
2  2005-08-30         Electronics       35754                   184.95   
3  2013-05-12   Sports & Outdoors       23561                   595.93   
4  2020-03-22        Toys & Games       94075                   201.80   

   Product Quantity      Order Status  
0                 9  Out For Delivery  
1                 7           Shipped  
2             

In [6]:
# generate total (Product Price * Product Quantity)
total_cost = df["Product Price (Per Item)"] * df["Product Quantity"]

# insert total cost column onto dataframe
df.insert(9, "Total Cost", total_cost)

# print the first x rows
print(df.head(5))

   Customer Id First Name Last Name                  Email Address  \
0       940022      James    Howard      james.howard@customer.com   
1       946912     Austin     Glenn      austin.glenn@customer.com   
2       875845      Colin    Watson      colin.watson@customer.com   
3       952587    Jeffery  Williams  jeffery.williams@customer.com   
4       884622     Joshua    Bryant     joshua.bryant@customer.com   

   Order Date        Product Type  Product ID Product Price (Per Item)  \
0  2002-02-27    Software & Games       11158                   635.32   
1  2014-03-25  Books & Stationery       31544                   157.64   
2  2005-08-30         Electronics       35754                   184.95   
3  2013-05-12   Sports & Outdoors       23561                   595.93   
4  2020-03-22        Toys & Games       94075                   201.80   

   Product Quantity Total Cost      Order Status  
0                 9    5717.88  Out For Delivery  
1                 7    1103.48  

In [7]:
# generate a discount column
discount = [random.randint(0, 35) for d in range(150)]
df.insert(10, "Discount (%)", discount)

print(df.head(5))

   Customer Id First Name Last Name                  Email Address  \
0       940022      James    Howard      james.howard@customer.com   
1       946912     Austin     Glenn      austin.glenn@customer.com   
2       875845      Colin    Watson      colin.watson@customer.com   
3       952587    Jeffery  Williams  jeffery.williams@customer.com   
4       884622     Joshua    Bryant     joshua.bryant@customer.com   

   Order Date        Product Type  Product ID Product Price (Per Item)  \
0  2002-02-27    Software & Games       11158                   635.32   
1  2014-03-25  Books & Stationery       31544                   157.64   
2  2005-08-30         Electronics       35754                   184.95   
3  2013-05-12   Sports & Outdoors       23561                   595.93   
4  2020-03-22        Toys & Games       94075                   201.80   

   Product Quantity Total Cost  Discount (%)      Order Status  
0                 9    5717.88            22  Out For Delivery  
1   

In [8]:
discounted_price = [
    # round the discounted price to 2 decimal place
    round(float(price) * (1 - discount / 100), 2)
    for price, discount in zip(df['Product Price (Per Item)'], df['Discount (%)'])
]

# insert discounted price to DataFrame
# index 11 = 12th position
df.insert(11, "Discounted Price (Per Item)", discounted_price)
print(df.head(5))

   Customer Id First Name Last Name                  Email Address  \
0       940022      James    Howard      james.howard@customer.com   
1       946912     Austin     Glenn      austin.glenn@customer.com   
2       875845      Colin    Watson      colin.watson@customer.com   
3       952587    Jeffery  Williams  jeffery.williams@customer.com   
4       884622     Joshua    Bryant     joshua.bryant@customer.com   

   Order Date        Product Type  Product ID Product Price (Per Item)  \
0  2002-02-27    Software & Games       11158                   635.32   
1  2014-03-25  Books & Stationery       31544                   157.64   
2  2005-08-30         Electronics       35754                   184.95   
3  2013-05-12   Sports & Outdoors       23561                   595.93   
4  2020-03-22        Toys & Games       94075                   201.80   

   Product Quantity Total Cost  Discount (%)  Discounted Price (Per Item)  \
0                 9    5717.88            22             

In [9]:
# formula for generating final price
final_price = [
    round(float(discounted_price) * product_quantity, 2)
    for discounted_price, product_quantity in zip(df['Discounted Price (Per Item)'], df['Product Quantity'])
]

# insert final price column
df.insert(12, "Final Price", final_price)
print(df.head(5))

   Customer Id First Name Last Name                  Email Address  \
0       940022      James    Howard      james.howard@customer.com   
1       946912     Austin     Glenn      austin.glenn@customer.com   
2       875845      Colin    Watson      colin.watson@customer.com   
3       952587    Jeffery  Williams  jeffery.williams@customer.com   
4       884622     Joshua    Bryant     joshua.bryant@customer.com   

   Order Date        Product Type  Product ID Product Price (Per Item)  \
0  2002-02-27    Software & Games       11158                   635.32   
1  2014-03-25  Books & Stationery       31544                   157.64   
2  2005-08-30         Electronics       35754                   184.95   
3  2013-05-12   Sports & Outdoors       23561                   595.93   
4  2020-03-22        Toys & Games       94075                   201.80   

   Product Quantity Total Cost  Discount (%)  Discounted Price (Per Item)  \
0                 9    5717.88            22             

In [10]:
df.to_csv("customer_orders.csv", index=True)