# Technical Challenge ! Custiomers and Orders - The lifeblood of any business!

This notebook depends on two data files: customers.scv and orders.csv. These files are hosted in the cloud (or Github), so for simplicity, we are providing you with the code to download and save the two files in a folder called 'sample_data' which lives within this notebook session (not on your local computer nor Google Drive).

![sample_data folder in notebook session](https://raw.githubusercontent.com/anyoneai/notebooks/main/customers_and_orders/images/sample_data_folder.png)

Please execute the first code snippet before moving to the rest of the exercise as this code will import the data from the files for you to be able to execute the rest of the exercise.

Remember as we said in the previous lecture, you have to save this notebook into your Google Drive by going to menu File/Save a Copy In Drive, or clicking in the button "Copy to Drive". This way, you will not lose any work you've done and the file will persist with your latest changes in your Google Drive. Also we recommend you to rename your .ipynb in your Google Drive, so you can easily find it in the future.

In [None]:
import requests

def import_data_files():
  r = requests.get('https://raw.githubusercontent.com/anyoneai/notebooks/main/customers_and_orders/data/customers.csv')
  with open('./sample_data/customers.csv', 'wb') as f:
    f.write(r.content)

  r = requests.get('https://raw.githubusercontent.com/anyoneai/notebooks/main/customers_and_orders/data/orders.csv')
  with open('./sample_data/orders.csv', 'wb') as f:
    f.write(r.content)

import_data_files()
print("Customers and orders CSV files have been added './sample_data'")

Customers and orders CSV files have been added './sample_data'


# Exercise 1: Processing Customers data (difficulty medium)

The sample customer data in 'customers.csv' file has just 5 columns: CustomerId, First Name, Last Name, City and State

![Data sample](https://raw.githubusercontent.com/anyoneai/notebooks/main/customers_and_orders/images/customers.png)

We strongly recommend that you complete from the [Prep Course: Intro to Python](https://colab.research.google.com/github/anyoneai/notebooks/blob/main/python3_crash_course.ipynb) the following section:
- Section 7: File I/O (to understand how to read CSV file)
- Section 6: For Loop (to navigate the contents of the CSV file)
- Section 5: Tuples, Lists, and Dictionaries (to manipulate the data of the CSV file)

With this, we hope you can complete this exercise successfully. Although if you want to solve this with libraries or any other way, you are welcome to do it your way.

*Hint:* We advise you to take a look at the data before you start.
**if you want to manually take a look at the data before you start, please see the content of the data [here](https://github.com/anyoneai/notebooks/blob/main/customers_and_orders/data/customers.csv).

*Hint*: There are many ways to do this exercise you can do your own, although here's some help. You can solve this exercise reading and parsing CSV files, structuring data into dictionaries, and using for loops to navigate the contents

*Hint*: Also, keep in mind that data might not be clean and you might have to figure out how to deal with that data from the code, without having to modify the data source.

Below are the 5 questions you'll have to answer to pass the evaluation:

**Question 1:** How many customers are in the file?
(as help, we have added some comments and starter code to help you structure the solution)

In [None]:
# Import necessary libraries
import csv
from os.path import exists

# Path to the customers file
customers_file = "./sample_data/customers.csv"

# Check if the file exists
if not exists(customers_file):
    raise SystemExit("🚨 ERROR: You must run the first code cell to download the data files!")

# Function to show column names
def show_column_names(file_path):
    with open(file_path, 'r') as fl:
        csvreader = csv.reader(fl, delimiter=',')
        # Get the header (column names)
        header = next(csvreader)
        print(f"\n📂 Columns in {file_path}:")
        print("┌" + "─" * (len(', '.join(header)) + 2) + "┐")
        print(f"│ {', '.join(header)} │")
        print("└" + "─" * (len(', '.join(header)) + 2) + "┘")

# Show the columns of the customers.csv file
show_column_names("./sample_data/customers.csv")

# Show the columns of the orders.csv file
show_column_names("./sample_data/orders.csv")

# Initialize a counter for customers
customer_count = 0

# Open the CSV file and read the rows
with open(customers_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')
    # Skip the header
    header = next(csvreader)
    # Count the remaining rows
    for row in csvreader:
        customer_count += 1

# Show the total number of customers
print("\n" + "★" * 50)
print(f"🌟 The total number of customers is: {customer_count}")
print("★" * 50)

print("\n✅ All set!")


📂 Columns in ./sample_data/customers.csv:
┌──────────────────────────────────────────────┐
│ CustomerID, FirstName, LastName, City, State │
└──────────────────────────────────────────────┘

📂 Columns in ./sample_data/orders.csv:
┌───────────────────────────────────────────────────────────┐
│ CustomerID, OrderID, Date, OrderTotal, ProductName, Price │
└───────────────────────────────────────────────────────────┘

★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
🌟 The total number of customers is: 602
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★

✅ All set!


**Question 2:** In how many different states do the customers live in?

In [None]:
# Path to the customers file
customers_file = "./sample_data/customers.csv"

# Initialize a set to store unique states
unique_states = set()

# Open the CSV file and read the rows
with open(customers_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Skip the header
    header = next(csvreader)

    # Identify the state column
    state_index = header.index("State")

    # Collect unique states with normalization
    for row in csvreader:
        # Normalize the state to uppercase
        state = row[state_index].strip().upper()
        unique_states.add(state)

# Calculate the number of unique states
num_states = len(unique_states)

# Display the result with a graphical layout
print("\n" + "=" * 50)
print("🌎  STATE ANALYSIS IN THE CSV FILE  🌎")
print("=" * 50)
print(f"📊 Total unique states found: {num_states}")
print("📍 List of unique states:")
print("┌" + "─" * 48 + "┐")
for state in sorted(unique_states):
    print(f"│ {state.ljust(46)} │")
print("└" + "─" * 48 + "┘")
print("\n✅ Analysis completed successfully!")


🌎  STATE ANALYSIS IN THE CSV FILE  🌎
📊 Total unique states found: 14
📍 List of unique states:
┌────────────────────────────────────────────────┐
│ AZ                                             │
│ CA                                             │
│ CO                                             │
│ FL                                             │
│ ID                                             │
│ IN                                             │
│ MA                                             │
│ NH                                             │
│ NM                                             │
│ NV                                             │
│ OR                                             │
│ TX                                             │
│ UT                                             │
│ WA                                             │
└────────────────────────────────────────────────┘

✅ Analysis completed successfully!


**Question 3** What is the state with most customers?

In [None]:
# Import necessary libraries
from collections import Counter

# Path to the customers file
customers_file = "./sample_data/customers.csv"

# Initialize a counter for states
state_counter = Counter()

# Open the CSV file and count the states
with open(customers_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Skip the header
    header = next(csvreader)

    # Identify the index of the "State" column
    state_index = header.index("State")

    # Count the occurrences of each state, normalizing to uppercase
    for row in csvreader:
        state = row[state_index].strip().upper()  # Normalize the state
        state_counter[state] += 1

# Find the state with the most customers
most_common_state = state_counter.most_common(1)[0]  # Returns (state, number of customers)

# Display the result with a graphical layout
print("\n" + "=" * 50)
print("🌎  CUSTOMER ANALYSIS BY STATE")
print("=" * 50)
print(f"🏆 State with the most customers: {most_common_state[0]}")
print(f"👥 Number of customers: {most_common_state[1]}")

# Show the complete distribution of customers by state
print("\n📊 Complete distribution of customers by state:")
print("┌" + "─" * 30 + "┬" + "─" * 15 + "┐")
print(f"│ {'State'.ljust(30)} │ {'Customers'.rjust(13)} │")
print("├" + "─" * 30 + "┼" + "─" * 15 + "┤")
for state, count in state_counter.most_common():
    print(f"│ {state.ljust(30)} │ {str(count).rjust(13)} │")
print("└" + "─" * 30 + "┴" + "─" * 15 + "┘")
print("\n✅ Analysis completed successfully!")


🌎  CUSTOMER ANALYSIS BY STATE
🏆 State with the most customers: CA
👥 Number of customers: 569

📊 Complete distribution of customers by state:
┌──────────────────────────────┬───────────────┐
│ State                          │     Customers │
├──────────────────────────────┼───────────────┤
│ CA                             │           569 │
│ NV                             │             8 │
│ AZ                             │             6 │
│ FL                             │             3 │
│ CO                             │             3 │
│ NM                             │             3 │
│ TX                             │             2 │
│ UT                             │             2 │
│ WA                             │             1 │
│ NH                             │             1 │
│ ID                             │             1 │
│ OR                             │             1 │
│ MA                             │             1 │
│ IN                             │            

**Question 4** What is the state with the least customers?

In [None]:
# Import necessary libraries
from collections import Counter

# Path to the customers file
customers_file = "./sample_data/customers.csv"

# Initialize a counter for states
state_counter = Counter()

# Open the CSV file and count the states
with open(customers_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Skip the header
    header = next(csvreader)

    # Identify the index of the "State" column
    state_index = header.index("State")

    # Count the occurrences of each state, normalizing to uppercase
    for row in csvreader:
        state = row[state_index].strip().upper()  # Normalize the state
        state_counter[state] += 1

# Find the state with the least number of customers
least_common_state = min(state_counter.items(), key=lambda x: x[1])  # Returns (state, number of customers)

# Display the result with a graphical layout
print("\n" + "=" * 50)
print("🌟  CUSTOMER ANALYSIS BY STATE")
print("=" * 50)
print(f"🏅 State with the least number of customers: {least_common_state[0]}")
print(f"👥 Number of customers: {least_common_state[1]}")
print("\n📊 Complete distribution of customers by state:")
print("┌" + "─" * 30 + "┬" + "─" * 15 + "┐")
print(f"│ {'State'.ljust(30)} │ {'Customers'.rjust(13)} │")
print("├" + "─" * 30 + "┼" + "─" * 15 + "┤")
for state, count in state_counter.most_common():
    print(f"│ {state.ljust(30)} │ {str(count).rjust(13)} │")
print("└" + "─" * 30 + "┴" + "─" * 15 + "┘")
print("\n✅ Analysis completed successfully!")


🌟  CUSTOMER ANALYSIS BY STATE
🏅 State with the least number of customers: WA
👥 Number of customers: 1

📊 Complete distribution of customers by state:
┌──────────────────────────────┬───────────────┐
│ State                          │     Customers │
├──────────────────────────────┼───────────────┤
│ CA                             │           569 │
│ NV                             │             8 │
│ AZ                             │             6 │
│ FL                             │             3 │
│ CO                             │             3 │
│ NM                             │             3 │
│ TX                             │             2 │
│ UT                             │             2 │
│ WA                             │             1 │
│ NH                             │             1 │
│ ID                             │             1 │
│ OR                             │             1 │
│ MA                             │             1 │
│ IN                             │   

**Question 5:** What is the most common last name?

In [None]:
# Import necessary libraries
from collections import Counter

# Path to the customers file
customers_file = "./sample_data/customers.csv"

# Initialize a counter for last names
last_name_counter = Counter()

# Open the CSV file and count the last names
with open(customers_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Skip the header
    header = next(csvreader)

    # Identify the index of the "LastName" column
    last_name_index = header.index("LastName")

    # Count the occurrences of each last name
    for row in csvreader:
        last_name_counter[row[last_name_index]] += 1

# Find the most common last name
most_common_last_name = last_name_counter.most_common(1)[0]  # Returns (last name, number of occurrences)

# Display the result with a graphical layout
print("\n" + "=" * 50)
print("🌟  LAST NAME ANALYSIS IN THE RECORD")
print("=" * 50)
print(f"🏅 Most common last name: {most_common_last_name[0]}")
print(f"👥 Number of occurrences: {most_common_last_name[1]}")
print("\n📊 Complete distribution of last names:")
print("┌" + "─" * 30 + "┬" + "─" * 15 + "┐")
print(f"│ {'Last Name'.ljust(30)} │ {'Occurrences'.rjust(13)} │")
print("├" + "─" * 30 + "┼" + "─" * 15 + "┤")
for last_name, count in last_name_counter.most_common(10):  # Display the 10 most common last names
    print(f"│ {last_name.ljust(30)} │ {str(count).rjust(13)} │")
print("└" + "─" * 30 + "┴" + "─" * 15 + "┘")
print("\n✅ Analysis completed successfully!")


🌟  LAST NAME ANALYSIS IN THE RECORD
🏅 Most common last name: Smith
👥 Number of occurrences: 8

📊 Complete distribution of last names:
┌──────────────────────────────┬───────────────┐
│ Last Name                      │   Occurrences │
├──────────────────────────────┼───────────────┤
│ Smith                          │             8 │
│ Gomez                          │             5 │
│ Zambrana                       │             5 │
│ Doggett                        │             5 │
│ Huynh                          │             4 │
│ Rocha                          │             4 │
│ Reyes                          │             4 │
│ McMahon                        │             4 │
│ Garcia                         │             4 │
│ Gonzalez                       │             4 │
└──────────────────────────────┴───────────────┘

✅ Analysis completed successfully!


# Exercise 2: Processing Orders data (difficulty high)

The second sample files contains orders placed by customers from the first file. Be careful, this file has many rows and you most likely should not print the contents of the whole file.

The file contains the following columns: CustomerID, OrderID, Date, OrderTotal, ProductName, Price

![Data sample](https://raw.githubusercontent.com/anyoneai/notebooks/main/customers_and_orders/images/orders.png)

*Hint:* We advise you to take a look at the data before you start.
**if you want to manually take a look at the data before you start, please see the content of the data [here](https://raw.githubusercontent.com/anyoneai/notebooks/main/customers_and_orders/data/orders.csv).

*Hint*: There are many ways to do this exercise you can do your own, although here's some help. You can solve this exercise reading and parsing CSV files, structuring data into dictionaries, and using for loops to navigate the contents

*Hint*: Also, the data is not clean and you will have to figure out how to deal with that data from the code, without having to modify the data source.



**Question #1:** How many unique orders are in the orders.csv file?

**Question #2:** What is the average number of items per order (rounded to two decimal places)?

**Question #3:** What is the highest number of items per order?

**Question #4:** What is the number of orders placed in October 2021?

**Question #5:** Which customer spent the most amount of money in 2021?

**Question #6:** Historically, what is the best month for sales?

Once you get your answers, remember to go back to the course and introduce them in the multiple choice quiz

In [None]:
# Path to the orders file
orders_file = "./sample_data/orders.csv"

# Read and display the data
with open(orders_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Read the header
    header = next(csvreader)

    # Graphical header
    print("\n" + "=" * 60)
    print("📦  ORDERS FILE REVIEW")
    print("=" * 60)
    print(f"🔑 Columns found: {', '.join(header)}")
    print("=" * 60)

    # Display the first 8 rows
    print("🔍 First 8 rows of the file:\n")
    print("┌" + "─" * 10 + "┬" + "─" * (len(header) * 15) + "┐")
    print(f"│ {'Row #'.center(10)} │ {' | '.join(header).center(len(header) * 15)} │")
    print("├" + "─" * 10 + "┼" + "─" * (len(header) * 15) + "┤")

    for i, row in enumerate(csvreader):
        if i < 8:
            print(f"│ {str(i+1).center(10)} │ {' | '.join(row).center(len(header) * 15)} │")
        else:
            break

    print("└" + "─" * 10 + "┴" + "─" * (len(header) * 15) + "┘")
    print("\n✅ Review completed successfully!")


📦  ORDERS FILE REVIEW
🔑 Columns found: CustomerID, OrderID, Date, OrderTotal, ProductName, Price
🔍 First 8 rows of the file:

┌──────────┬──────────────────────────────────────────────────────────────────────────────────────────┐
│   Row #    │               CustomerID | OrderID | Date | OrderTotal | ProductName | Price               │
├──────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│     1      │             8091 | 7742581 | 2021-07-26 14:40:10.783 | 95.0000 | Z03 | 90.0000             │
│     2      │         902139 | 7742778 | 2021-08-08 05:01:21.120 | 60.0000 | 0844 A/C | 60.0000          │
│     3      │           2300266 | 7742593 | 2021-07-27 11:00:16.020 | 185.0000 | M07 | 90.0000           │
│     4      │           2300266 | 7742593 | 2021-07-27 11:00:16.020 | 185.0000 | M09 | 90.0000           │
│     5      │          5173013 | 7742609 | 2021-07-28 14:26:13.930 | 165.0000 | 0324 | 160.0000          │
│     6      │   

**Question #1:** How many unique orders are in the orders.csv file?

In [None]:
# Path to the orders file
orders_file = "./sample_data/orders.csv"

# Initialize a set to store unique order IDs
unique_order_ids = set()

# Read the CSV file and extract the order IDs
with open(orders_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Skip the header
    header = next(csvreader)

    # Identify the index of the "OrderID" column
    order_id_index = header.index("OrderID")

    # Add each OrderID to the set
    for row in csvreader:
        unique_order_ids.add(row[order_id_index])

# Count the number of unique orders
num_unique_orders = len(unique_order_ids)

# Display the result with graphical design
print("\n" + "=" * 60)
print("📋  UNIQUE ORDER ANALYSIS")
print("=" * 60)
print(f"🔑 Total unique orders found: {num_unique_orders}")
print("=" * 60)
print(f"✅ Analysis completed!")
print("=" * 60)


📋  UNIQUE ORDER ANALYSIS
🔑 Total unique orders found: 16672
✅ Analysis completed!


**Question #2:** What is the average number of items per order (rounded to two decimal places)?

In [None]:
# Path to the orders file
orders_file = "./sample_data/orders.csv"

# Initialize variables for total items and the number of orders
total_items = 0
total_orders = 0

# Read the CSV file and count items per order
with open(orders_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Skip the header
    header = next(csvreader)

    # Identify the indexes of the "OrderID" and "ProductName" columns
    order_id_index = header.index("OrderID")
    product_name_index = header.index("ProductName")

    # Count items per order
    for row in csvreader:
        total_orders += 1
        total_items += 1  # Each row represents an item in an order

# Calculate the average number of items per order
average_items_per_order = total_items / total_orders

# Display the result with graphical design
print("\n" + "=" * 60)
print("📦  ORDER AND ITEM ANALYSIS")
print("=" * 60)
print(f"📋 Total orders processed: {total_orders}")
print(f"🛒 Total items in all orders: {total_items}")
print(f"🔢 Average items per order: {average_items_per_order:.2f}")
print("=" * 60)
print("✅ Calculation successfully completed!")
print("=" * 60)



📦  ORDER AND ITEM ANALYSIS
📋 Total orders processed: 29294
🛒 Total items in all orders: 29294
🔢 Average items per order: 1.00
✅ Calculation successfully completed!


**Question #3:** What is the highest number of items per order?

In [None]:
# Import necessary libraries
from collections import defaultdict

# Path to the orders file
orders_file = "./sample_data/orders.csv"

# Initialize a dictionary to count the number of items per OrderID
order_item_count = defaultdict(int)

# Open the CSV file and count items per order
with open(orders_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Skip the header
    header = next(csvreader)

    # Identify the index of the "OrderID" column
    order_id_index = header.index("OrderID")

    # Count items per order
    for row in csvreader:
        order_item_count[row[order_id_index]] += 1

# Find the highest number of items per order
max_items_per_order = max(order_item_count.values())

# Display the result with graphical design
print("\n" + "=" * 60)
print("🛒  ORDER ANALYSIS")
print("=" * 60)
print(f"📦 Total orders analyzed: {len(order_item_count)}")
print(f"📈 Highest number of items in a single order: {max_items_per_order}")
print("=" * 60)
print("✅ Analysis successfully completed!")
print("=" * 60)



🛒  ORDER ANALYSIS
📦 Total orders analyzed: 16672
📈 Highest number of items in a single order: 35
✅ Analysis successfully completed!


**Question #4:** What is the number of orders placed in October 2021?

In [None]:
# Import necessary libraries
from datetime import datetime

# Path to the orders file
orders_file = "./sample_data/orders.csv"

# Initialize a counter for orders in October 2021
october_orders_count = 0

# Open the CSV file and process the orders
with open(orders_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')

    # Skip the header
    header = next(csvreader)

    # Identify the index of the "Date" column
    date_index = header.index("Date")

    # Filter and count orders made in October 2021
    for row in csvreader:
        order_date = row[date_index]

        # Convert the date to a datetime object
        try:
            order_date_obj = datetime.strptime(order_date, "%Y-%m-%d %H:%M:%S.%f")

            # Check if the date is from October 2021
            if order_date_obj.strftime("%Y-%m") == "2021-10":
                october_orders_count += 1
        except ValueError:
            # Handle potential date format errors
            continue

# Display the number of orders placed in October 2021 with graphical design
print("\n" + "=" * 60)
print("📅  OCTOBER 2021 ORDER ANALYSIS")
print("=" * 60)
# print(f"🔍 Total orders processed: {len(header) - 1}")
print(f"🍂 Orders made in October 2021: {october_orders_count}")
print("=" * 60)
print("✅ Analysis successfully completed!")
print("=" * 60)



📅  OCTOBER 2021 ORDER ANALYSIS
🍂 Orders made in October 2021: 437
✅ Analysis successfully completed!


**Question #5:** Which customer spent the most amount of money in 2021?

In [None]:
# Import necessary libraries
from datetime import datetime
from collections import defaultdict

# Path to the files
orders_file = "./sample_data/orders.csv"
customers_file = "./sample_data/customers.csv"

# Initialize a dictionary to accumulate the spending of each customer in 2021
customer_spend = defaultdict(float)

# Create a dictionary to store the CustomerID -> Name mapping
customer_names = {}

# Initialize a list to store customers with empty dates
clients_with_empty_dates = defaultdict(int)

# Read the data from customers.csv to get the customer names
with open(customers_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')
    header = next(csvreader)

    # Identify the index of the columns "CustomerID" and "FirstName", "LastName"
    customer_id_index = header.index("CustomerID")
    first_name_index = header.index("FirstName")
    last_name_index = header.index("LastName")

    # Save the customer names
    for row in csvreader:
        customer_id = row[customer_id_index]
        first_name = row[first_name_index]
        last_name = row[last_name_index]
        customer_names[customer_id] = f"{first_name} {last_name}"

# Open the orders file and process the orders
with open(orders_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')
    header = next(csvreader)

    # Identify the indices of the columns "CustomerID", "Date", and "OrderTotal"
    customer_id_index = header.index("CustomerID")
    date_index = header.index("Date")
    order_total_index = header.index("OrderTotal")

    # Filter and accumulate the spending of each customer in 2021
    for row in csvreader:
        order_date = row[date_index]
        order_total = float(row[order_total_index])
        customer_id = row[customer_id_index]

        # Check if the date is empty
        if not order_date:
            # Count customers with empty dates
            clients_with_empty_dates[customer_id] += 1
            continue  # Skip if the date is empty

        # Convert the date to a datetime object
        try:
            # Adjust this format if needed
            order_date_obj = datetime.strptime(order_date, "%Y-%m-%d %H:%M:%S.%f")

            # Filter orders from 2021 only
            if order_date_obj.year == 2021:
                customer_spend[customer_id] += order_total
        except ValueError as e:
            print(f"Error converting date: {order_date} - Error: {e}")
            continue

# Display how many customers have empty dates
if clients_with_empty_dates:
    print(f"\n❌ Customers with empty dates: {len(clients_with_empty_dates)}")
    for client_id, count in clients_with_empty_dates.items():
        client_name = customer_names.get(client_id, "Unknown")
        print(f"Customer: {client_name} (ID: {client_id}) - Empty dates: {count} orders skipped")

# Find the customer who spent the most in 2021
if customer_spend:
    max_spend_customer_id = max(customer_spend, key=customer_spend.get)
    max_spend_amount = customer_spend[max_spend_customer_id]

    # Get the name of the customer with the highest spending
    max_spend_customer_name = customer_names.get(max_spend_customer_id, "Unknown")

    # Display the result with graphical format
    print("\n" + "=" * 60)
    print("💰  2021 SPENDING ANALYSIS  💰")
    print("=" * 60)
    print(f"🌟 The customer with the highest spending in 2021 was: {max_spend_customer_name}")
    print(f"🆔 Customer ID: {max_spend_customer_id}")
    print(f"💸 Total spent: ${max_spend_amount:.2f} USD")
    print("=" * 60)

    # Create a ranking of the top 5 customers by spending
    print("\n📝 Top 5 customers by spending in 2021:")
    sorted_customers = sorted(customer_spend.items(), key=lambda x: x[1], reverse=True)
    for i, (customer_id, spend) in enumerate(sorted_customers[:5]):
        customer_name = customer_names.get(customer_id, "Unknown")
        print(f"{i+1}. {customer_name} (ID: {customer_id}) - Total spending: ${spend:.2f} USD")

    print("=" * 60)
    print("✅ Analysis successfully completed!")
    print("=" * 60)
else:
    print("❌ No orders found in 2021.")



❌ Customers with empty dates: 57
Customer: Teresa Ascolese (ID: 5014) - Empty dates: 90 orders skipped
Customer: Maureen Amato Mayes (ID: 5068) - Empty dates: 46 orders skipped
Customer: Doe Harris (ID: 5053) - Empty dates: 42 orders skipped
Customer: Merrily Morris (ID: 5955) - Empty dates: 57 orders skipped
Customer: Todd Johnson (ID: 5572) - Empty dates: 87 orders skipped
Customer: Stephen Cohn (ID: 5889) - Empty dates: 29 orders skipped
Customer: Andre Tabak (ID: 7812) - Empty dates: 20 orders skipped
Customer: Mary Scates Johnson (ID: 5774) - Empty dates: 29 orders skipped
Customer: Craig Thompson (ID: 5971) - Empty dates: 87 orders skipped
Customer: Edna Fabbro (ID: 5756) - Empty dates: 29 orders skipped
Customer: Janet Hyatt (ID: 5721) - Empty dates: 26 orders skipped
Customer: Nicki Huard (ID: 5076) - Empty dates: 43 orders skipped
Customer: Richard Machtolff (ID: 5365) - Empty dates: 14 orders skipped
Customer: Miguel Campos (ID: 8462) - Empty dates: 3 orders skipped
Customer


**Question #6:** Historically, what is the best month for sales?

In [None]:
# Import necessary libraries
from collections import defaultdict
from datetime import datetime

# Path to the orders file
orders_file = "./sample_data/orders.csv"

# Initialize dictionaries to count orders by month and store total sales by month
monthly_orders = defaultdict(int)
monthly_sales = defaultdict(float)

# Open the CSV file and process the orders
with open(orders_file, 'r') as fl:
    csvreader = csv.reader(fl, delimiter=',')
    header = next(csvreader)

    # Identify the indices of the columns "Date" and "OrderTotal"
    date_index = header.index("Date")
    order_total_index = header.index("OrderTotal")

    # Filter and count the orders by month, as well as accumulate the sales
    for row in csvreader:
        order_date = row[date_index]
        try:
            # Convert the date to a datetime object to extract the month
            order_date_obj = datetime.strptime(order_date, "%Y-%m-%d %H:%M:%S.%f")
            month_year = order_date_obj.strftime("%m")  # Only the month, without the year
            monthly_orders[month_year] += 1

            # Add the sales for that month (using "OrderTotal")
            sales_amount = float(row[order_total_index]) if row[order_total_index] else 0
            monthly_sales[month_year] += sales_amount
        except ValueError:
            # Handle possible date format errors
            continue

# Calculate the average sales per month (regardless of the year)
monthly_average_sales = {}

for month in monthly_sales:
    total_sales_for_month = monthly_sales[month]
    total_orders_for_month = monthly_orders[month]

    if total_orders_for_month > 0:
        monthly_average_sales[month] = total_sales_for_month / total_orders_for_month
    else:
        monthly_average_sales[month] = 0

# Sum the monthly average sales to find the best month
best_month = max(monthly_average_sales, key=monthly_average_sales.get)
best_month_value = monthly_average_sales[best_month]

# Display the result with graphical format
print("\n" + "=" * 60)
print("📊  HISTORICAL SALES ANALYSIS")
print("=" * 60)
print(f"🚀 The best month for historical sales was month {best_month}")
print(f"💵 Average sales in this month: ${best_month_value:.2f} USD")
print("=" * 60)
print("✅ Analysis successfully completed!")
print("=" * 60)



📊  HISTORICAL SALES ANALYSIS
🚀 The best month for historical sales was month 02
💵 Average sales in this month: $357.33 USD
✅ Analysis successfully completed!


# Finished!

Hope this was not too difficult and slicing and dicing the datasets was some fun. Now head on back to the course and provide the answers to the questions from this exercise.