

---

# Python Foundations Challenge Project (Level 1 – Real-Life Simulation)
## A hands-on project focused on mastering Python data structures

### Project Goal

You're a junior data scientist in a new company.  
Your task is to analyze customer data and prepare a summary report for your manager using core Python skills.

---

## Project Scenario

You've been given a basic dataset containing customer information for a small e-commerce business. Your manager has requested that you:

1. Clean up the raw data using appropriate data structures (lists, tuples, dictionaries, sets).
2. Perform basic data analysis (data type handling, list manipulations, dictionary operations).
3. Summarize and display useful information in a readable format.


---

###  Dataset (Hardcoded, you're simulating receiving raw data):

```python
# Customer IDs
customer_ids = [101, 102, 103, 104, 105, 106, 103, 102]

# Customer Names
customer_names = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Charlie', 'Bob']

# Customer Locations
customer_locations = ('New York', 'California', 'Texas', 'Florida', 'New York', 'Texas', 'Florida', 'California')

# Purchases: dictionary of ID: total amount spent
purchases = {
    101: 250.50,
    102: 100.00,
    103: 340.75,
    104: 80.00,
    105: 150.50,
    106: 300.25
}

# Emails (some duplicates, some invalid)
emails = [
    'alice@gmail.com', 
    'bob@yahoo.com', 
    'charlie@gmail.com', 
    'diana@yahoo.com', 
    'eve@gmail.com', 
    'frank@outlook.com',
    'bob@yahoo.com',
    'invalidemail.com'
]
```

---

###  TASKS:

**Part A: Data Cleaning**

* 1. Find and remove duplicate customer IDs.
* 2. Find and remove duplicate customer names.
* 3. Remove invalid emails (a valid email should contain `@` and a dot `.`).

**Part B: Set Operations**

* 4. Use sets to find:

  * Unique customer locations
  * Unique valid email providers (gmail, yahoo, outlook, etc.)

**Part C: Dictionary Operations**

* 5. Calculate:

  * Total revenue (sum of all purchases)
  * Average purchase amount
  * ID of customer who spent the highest amount

**Part D: Data Summary (Final Output)**

* 6. Print out:

  * Total number of unique customers.
  * Total revenue and average purchase.
  * Unique locations.
  * Unique email providers.

---

###  **RULES** (to simulate real life conditions):

* DO NOT use Pandas, Numpy, or external libraries yet.
* Use only core Python, lists, tuples, dictionaries, sets, loops, if-else, and functions if you can.
* Comment your code to explain your thinking.
* Aim for clean, readable code.

---

###  Bonus (Optional — Extra Credit)

* Write functions for each of the above steps.
* Handle edge cases where possible.

---

## Deliverables

When you're done, add your complete Python solution either:

- Below this markdown (if in a notebook), or
- In a separate `.py` file (e.g., `solution.py`) in the same repository.
- Or feel free to upload your solution to any AI for feedback

This project is designed for self-practice or group learning. Use it to test your understanding of Python data structures and logic building in real-world scenarios.

---

> This will test your full understanding of:
>
> * Python data structures
> * Logic building
> * Real-world problem solving

---



The code cell below provides the full dataset you'll use for this exercise. Review it before starting your analysis.

In [3]:
# Customer IDs
customer_ids = [101, 102, 103, 104, 105, 106, 103, 102]

# Customer Names
customer_names = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Charlie', 'Bob']

# Customer Locations
customer_locations = ('New York', 'California', 'Texas', 'Florida', 'New York', 'Texas', 'Florida', 'California')

# Purchases: dictionary of ID: total amount spent
purchases = {
    101: 250.50,
    102: 100.00,
    103: 340.75,
    104: 80.00,
    105: 150.50,
    106: 300.25
}

# Emails (some duplicates, some invalid)
emails = [
    'alice@gmail.com', 
    'bob@yahoo.com', 
    'charlie@gmail.com', 
    'diana@yahoo.com', 
    'eve@gmail.com', 
    'frank@outlook.com',
    'bob@yahoo.com',
    'invalidemail.com',
    'muniaa2gmailcom'
]

The code cell below contains sample solutions you can use to check your solutions.

In [4]:
# --PART A-- ##
#convert IDs to a set to remove duplicates
unique_customer_ids = set(customer_ids)#convert list to set to remove duplicates
customer_ids = list(unique_customer_ids)#convert back to list frm set
customer_ids

unique_customer_names = set(customer_names)  # convert list to set to remove duplicates
customer_names = list(unique_customer_names)  # convert back to list from set
customer_names.sort()  # sort the names alphabetically
customer_names

unique_emails = set(emails)  # convert list to set to remove duplicates
valid_emails = [email for email in unique_emails if '@' in email and '.' in email]  # filter out invalid emails
emails = list(valid_emails)  # convert back to list from set
emails

## --PART B-- ##
# find unique customer locations
unique_customer_locations = set(customer_locations)  # convert tuple to set to remove duplicates
customer_locations = list(unique_customer_locations)  # convert back to list from set
customer_locations

# using sets to find unique domain providers i.e gmail, yahoo, outlook without the .com part
unique_domains = set(email.split('@')[1].split('.')[0] for email in emails if '@' in email)  # extract unique domains
unique_domains

## --PART C-- ##
# total revenue
total_rev = sum(purchases.values())  # calculate total revenue from purchases

# average purchase amount
average_purchase = total_rev / len(purchases) if purchases else 0  # calculate average purchase amount
average_purchase

# ID of the customer who spent the most
max_spender_id = max(purchases, key=purchases.get) if purchases else None  
max_spender_id

## --PART D-- ##
print(f"Total unique customers: {len(unique_customer_ids)}")
print(f"Total revenue: ${total_rev:.2f}")
print(f"Average purchase amount: ${average_purchase:.2f}")
print(f"Unique locations: {customer_locations}")
print(f"Email providers: {unique_domains}")

Total unique customers: 6
Total revenue: $1222.00
Average purchase amount: $203.67
Unique locations: ['Florida', 'California', 'New York', 'Texas']
Email providers: {'gmail', 'yahoo', 'outlook'}
