# ‚öôÔ∏è Data Processing Functions & Targeted Aggregation

**Objective:** Develop reusable Python functions to automate data cleaning and create a querying engine for targeted demographic and spending analysis. This project demonstrates modular programming, a critical skill for scaling data pipelines and ensuring data consistency.

### üõ†Ô∏è Tech Stack & Concepts
* **Language:** Python
* **Core Concepts:** User-Defined Functions (UDFs), `while` loops, Nested Iterations, Dynamic Filtering, and Aggregation.

In [1]:
# Raw dataset representing user profiles and purchasing history
users = [
    ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
    ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES', 'ELECTRONICS', 'BEAUTY'], [299, 679, 85]],
    ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]],
    ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]],
    ['33767', ' Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]],
    ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]],
    ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]],
    ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]],
]

### üßπ 1. Automated Cleaning Pipeline (Function)
Creating a robust function to standardize individual user records, handling string formatting, list comprehensions for categories, and type casting in a single pass.

In [2]:
def clean_user(user_info, name_index=1, age_index=2, cat_index=3):
    """
    Cleans and standardizes raw user data.
    """
    # Standardize name
    cleaned_name = user_info[name_index].lower().strip().replace("_", " ").split()
    
    # Standardize age
    cleaned_age = int(user_info[age_index])
    
    # Standardize categories to lowercase
    categories_low = [category.lower() for category in user_info[cat_index]]
    
    # Rebuild the user record
    return [user_info[0], cleaned_name, cleaned_age, categories_low, user_info[4]]

# Apply the function to the entire dataset
users_cleaned = [clean_user(user) for user in users]

print(f"Successfully processed {len(users_cleaned)} records. Sample output:")
print(users_cleaned[0])

Successfully processed 10 records. Sample output:
['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'], [894, 213, 173]]


### üìà 2. Financial Aggregation & Target Simulation
Calculating total organizational revenue and simulating customer purchasing behavior to hit commercial thresholds.

In [4]:
from random import randint

# 2.1 Calculate Total Revenue
total_revenue = sum(sum(user[-1]) for user in users_cleaned)
print(f"Total Current Revenue: ${total_revenue}")

# 2.2 Spending Simulation (Loyalty Program Target)
total_amount_spent = 1280
target_amount = 1500

# Simulating incremental purchases until target is reached
while total_amount_spent < target_amount:
    new_purchase = randint(30, 80)
    total_amount_spent += new_purchase

print(f"Target reached! Total simulated spending: ${total_amount_spent}")

Total Current Revenue: $9189
Target reached! Total simulated spending: $1514


### üéØ 3. Demographic Targeting Engine
Building a dynamic filtering function to extract specific user demographics based on purchasing behavior.

In [5]:
def get_client_by_cat(data, id_idx, name_idx, age_idx, cat_idx, amounts_idx, filter_category):
    """
    Filters users by a specific purchasing category and aggregates their total spending.
    """
    filtered_clients = []
    
    for user in data:
        if filter_category.lower() in user[cat_idx]:
            total_spent = sum(user[amounts_idx])
            client_info = [user[id_idx], user[name_idx], user[age_idx], total_spent]
            filtered_clients.append(client_info)
            
    return filtered_clients

# Example usage: Find all customers who purchased 'home' products
home_buyers = get_client_by_cat(users_cleaned, 0, 1, 2, 3, 4, 'home')

print("Customers in the 'Home' category:")
for buyer in home_buyers:
    print(f"ID: {buyer[0]} | Name: {' '.join(buyer[1]).title()} | Age: {buyer[2]} | Total Spent: ${buyer[3]}")

Customers in the 'Home' category:
ID: 32156 | Name: John Doe | Age: 37 | Total Spent: $678
ID: 32984 | Name: David White | Age: 41 | Total Spent: $806
ID: 33001 | Name: Emily Brown | Age: 26 | Total Spent: $951
ID: 33912 | Name: Jose Martinez | Age: 22 | Total Spent: $917
ID: 34009 | Name: Lisa Wilson | Age: 35 | Total Spent: $847
