# Customer Insights Toolkit

## Overview

This project provides a basic set of Python functionalities within a Google Colab notebook for cleaning and analyzing user data. It's designed to help transform raw customer information into structured insights, enabling basic customer segmentation and spending analysis.

## Problem Solved

In many business scenarios, raw customer data can be inconsistent, incomplete, or difficult to analyze directly. This toolkit addresses the need for initial data preparation and provides methods to extract valuable information such as total revenue, specific customer segments (e.g., young high-spenders), and purchasing habits across different product categories. It serves as a foundational step for deeper customer relationship management (CRM) and marketing analytics.

## Key Features

*   **User Data Cleaning:**
    *   Standardizes user names (lowercase, removes extra spaces/underscores, splits into first/last name).
    *   Converts age data to integer type.
    *   Normalizes product categories to lowercase for consistent analysis.
*   **Revenue Aggregation:** Calculates the total revenue generated by all users based on their recorded spending.
*   **Customer Segmentation:** Filters users based on:
    *   Age criteria (e.g., users under a certain age).
    *   Combined criteria like age and total spending.
*   **Category-Specific Analysis:** Identifies users who have purchased within specific product categories and aggregates their total spending.
*   **Illustrative Simulation:** Includes an example of a `while` loop for simulating incremental purchases until a target amount is reached.

## Getting Started

To run this project, simply open the `.ipynb` file in Google Colab or any Jupyter-compatible environment. The code is written in Python and uses basic list manipulations. No external libraries are strictly required for the core functionalities, although `random` is used for the simulation example.

### Data Structure

The `users` data is represented as a list of lists, where each inner list contains:

1.  User ID (string)
2.  User Name (string - before cleaning, list of strings - after cleaning)
3.  Age (float - before cleaning, int - after cleaning)
4.  List of Categories (list of strings)
5.  List of Spending Amounts (list of integers)

### How to Use

1.  **Define `users` data:** The notebook starts with an example `users` list.
2.  **Run Cleaning Functions:** Execute the `clean_user` function and its application loop to preprocess the `users` data.
3.  **Perform Analysis:** Run the subsequent cells to:
    *   Calculate total revenue.
    *   Filter users by age or spending.
    *   Use `get_client_by_cat` to find users interested in specific categories.

Feel free to modify the `users` data or the filtering criteria to suit your specific analysis needs.


In [None]:
users = [
    ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
    ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES', 'ELECTRONICS', 'BEAUTY'], [299, 679, 85]],
    ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]],
    ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]],
    ['33767', ' Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]],
    ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]],
    ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]],
    ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]],
]

def clean_user (user_info, name_index, age_index): # define tu función aquí

    # Paso 1: elimina del nombre espacios iniciales y finales, así como guiones
    user_name_1 = user_info[name_index].strip().replace('_', ' ')

    # Paso 2: convierte la edad en entero
    user_age_1 = int(user_info[age_index])

    # Paso 3: separa el nombre y el apellido en una sublista
    user_name_1 = user_name_1.split()

    # Prepara la lista con la información completa del usuario
    # Reemplaza el nombre y la edad originales con los datos limpios
    user_info[name_index] = user_name_1
    user_info[age_index] = user_age_1

    return user_info

# Prueba la función
test_user = ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]]
name_index = 1
age_index = 2

print(clean_user(test_user, name_index, age_index)) # completa aquí el llamado de la función

['32415', ['mike', 'reed'], 32, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]]


In [None]:
fav_categories = ['ELECTRONICS', 'SPORT', 'BOOKS']
fav_categories_low = []

for categorias_minusculas in fav_categories:
    fav_categories_low.append(categorias_minusculas.lower())
 # escribe tu código aquí

print(fav_categories_low)

['electronics', 'sport', 'books']


In [None]:
users = [
    ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
    ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES', 'ELECTRONICS', 'BEAUTY'], [299, 679, 85]],
    ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]],
    ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]],
    ['33767', ' Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]],
    ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]],
    ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]],
    ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]],
]

users_categories_low = []

# Itera sobre la lista de usuarios
for user in users:
    # Crea una nueva lista para almacenar las categorías en minúsculas
    categories_low = []

    # Itera sobre las categorías del usuario actual
    for category in user[3]:
        # Convierte la categoría a minúsculas y agrégala a la nueva lista
        categories_low.append(category.lower())

    # Elimina la lista de categorías anterior con pop()
    user.pop(3)

    # Inserta la nueva lista de categorías con insert()
    user.insert(3, categories_low)

    # Agrega el usuario modificado a la lista users_categories_low
    users_categories_low.append(user)

# Imprime la lista con las categorías en minúsculas
print(users_categories_low)

[['32415', ' mike_reed ', 32.0, ['electronics', 'sport', 'books'], [894, 213, 173]], ['31980', 'kate morgan', 24.0, ['clothes', 'books'], [439, 390]], ['32156', ' john doe ', 37.0, ['electronics', 'home', 'food'], [459, 120, 99]], ['32761', 'SAMANTHA SMITH', 29.0, ['clothes', 'electronics', 'beauty'], [299, 679, 85]], ['32984', 'David White', 41.0, ['books', 'home', 'sport'], [234, 329, 243]], ['33001', 'emily brown', 26.0, ['beauty', 'home', 'food'], [213, 659, 79]], ['33767', ' Maria Garcia', 33.0, ['clothes', 'food', 'beauty'], [499, 189, 63]], ['33912', 'JOSE MARTINEZ', 22.0, ['sport', 'electronics', 'home'], [259, 549, 109]], ['34009', 'lisa wilson ', 35.0, ['home', 'books', 'clothes'], [329, 189, 329]], ['34278', 'James Lee', 28.0, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]


In [None]:
 def clean_user(user_info, name_index, age_index, cat_index):

  # Paso 1: pon todo en minúsculas y elimina del nombre espacios iniciales y finales, así como guiones
  user_name_1 = user_info[name_index].lower().strip().replace('_', ' ') # escribe tu código aquí

  # Paso 2: convierte la edad en entero
  user_age_1 = int(user_info[age_index])# escribe tu código aquí

  # Paso 3: separa el nombre y el apellido en una sublista
  user_name_1 = user_name_1.split() # escribe tu código aquí

  # Paso 4: poner categorías en minúsculas
  categories_low = []
  for category in user[cat_index]:
          categories_low.append(category.lower())

      # escribe tu código aquí
  user_info[name_index] = user_name_1
  user_info[age_index] = user_age_1
  user_info[cat_index] = categories_low

  return user_info


users = [
    ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
    ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES', 'ELECTRONICS', 'BEAUTY'], [299, 679, 85]],
    ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]],
    ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]],
    ['33767', ' Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]],
    ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]],
    ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]],
    ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]],
]

name_index = 1
age_index = 2
cat_index = 3
users_cleaned = []

for user in users:# escribe tu código aquí
  user_cleaned = clean_user(user, name_index, age_index, cat_index)  # escribe tu código aquí
  users_cleaned.append(user_cleaned)# escribe tu código aquí


print(users_cleaned)

[['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'], [894, 213, 173]], ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439, 390]], ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'], [459, 120, 99]], ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]], ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234, 329, 243]], ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213, 659, 79]], ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'], [499, 189, 63]], ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'], [259, 549, 109]], ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'], [329, 189, 329]], ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]


In [None]:
def clean_user(user_info, name_index, age_index, cat_index):

  # Paso 1: pon todo en minúsculas y elimina del nombre espacios iniciales y finales, así como guiones
  user_name_1 = # escribe tu código aquí

  # Paso 2: convierte la edad en entero
  user_age_1 = # escribe tu código aquí

  # Paso 3: separa el nombre y el apellido en una sublista
  user_name_1 = # escribe tu código aquí

  # Paso 4: poner categorías en minúsculas
  categories_low = []
  for category in user[# escribe tu código aquí


  # Prepara la lista con la información completa del usuario
  # Reemplaza el nombre y la edad originales con los datos limpios
  # escribe tu código aquí

return


users = [
    ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
    ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES', 'ELECTRONICS', 'BEAUTY'], [299, 679, 85]],
    ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]],
    ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]],
    ['33767', ' Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]],
    ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]],
    ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]],
    ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]],
]

name_index = 1
age_index = 2
cat_index = 3
users_cleaned = []

for user # escribe tu código aquí
  user_cleaned = clean_user(# escribe tu código aquí
  users_cleaned.# escribe tu código aquí

print(users_cleaned)

In [None]:
users = [['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'], [894, 213, 173]],
         ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439, 390]],
         ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'], [459, 120, 99]],
         ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]],
         ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234, 329, 243]],
         ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213, 659, 79]],
         ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'], [499, 189, 63]],
         ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'], [259, 549, 109]],
         ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'], [329, 189, 329]],
         ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]

revenue = 0

for user in users:
    spendings_list = user[4]  # Extrae la lista de gastos del usuario
    total_spendings = sum(spendings_list)  # Calcula el total de gastos del usuario
    revenue += total_spendings

print(revenue)

9189


In [None]:
users = [['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'], [894, 213, 173]],
         ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439, 390]],
         ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'], [459, 120, 99]],
         ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]],
         ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234, 329, 243]],
         ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213, 659, 79]],
         ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'], [499, 189, 63]],
         ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'], [259, 549, 109]],
         ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'], [329, 189, 329]],
         ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]

revenue = 0

for user in users:
        spendings_list = user[4]  #Asignamos una variable a la lista de gastos de cada usuario
        total_spendings = sum(spendings_list)  # Calculamos el total de gastos del usuario
        revenue += total_spendings #Sumamos los totales de compras de todos los usuarios

print(revenue)

9189


In [None]:
from random import randint

total_amount_spent = 1280
target_amount = 1500

while total_amount_spent <= target_amount: # escribe tu código aquí
	new_purchase = randint(30, 80) # generamos un número aleatorio de 30 a 80
	total_amount_spent += new_purchase  # escribe tu código aquí

print(total_amount_spent)

1522


In [None]:
users = [['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'], [894, 213, 173]],
         ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439, 390]],
         ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'], [459, 120, 99]],
         ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]],
         ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234, 329, 243]],
         ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213, 659, 79]],
         ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'], [499, 189, 63]],
         ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'], [259, 549, 109]],
         ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'], [329, 189, 329]],
         ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]

menores_a_30 = []

for user in users:
    if user[2] < 30:
        menores_a_30.append(user)

print(menores_a_30)


[['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439, 390]], ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]], ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213, 659, 79]], ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'], [259, 549, 109]], ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]


In [None]:
users = [['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'], [894, 213, 173]],
         ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439, 390]],
         ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'], [459, 120, 99]],
         ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]],
         ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234, 329, 243]],
         ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213, 659, 79]],
         ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'], [499, 189, 63]],
         ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'], [259, 549, 109]],
         ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'], [329, 189, 329]],
         ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]

menores_a_30_mas_de_1000 = []

for user in users:
    if user[2] < 30 and sum(user[4]) > 1000:
        menores_a_30_mas_de_1000.append(user)

print(menores_a_30_mas_de_1000)

[['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]], ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]


In [None]:
users = [['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'], [894, 213, 173]],
         ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439, 390]],
         ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'], [459, 120, 99]],
         ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]],
         ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234, 329, 243]],
         ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213, 659, 79]],
         ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'], [499, 189, 63]],
         ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'], [259, 549, 109]],
         ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'], [329, 189, 329]],
         ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]]


for user in users:
    if 'clothes' in user[3]:
        print(user[1], user[2])
#

['kate', 'morgan'] 24
['samantha', 'smith'] 29
['maria', 'garcia'] 33
['lisa', 'wilson'] 35
['james', 'lee'] 28


In [None]:
def get_client_by_cat(users, id_index, name_index, age_index, category_index, amounts_index, filter_category):

    filtered_users = [] #Se crea una lista para guardar momentaneamente los resultados

    for user in users: #Iteramos sobre cada usuario en la lista principal
        if filter_category in user[category_index]: #Comprobamos si la categoria esta en la lista de categorias
           total_amount_spent = sum(user[amounts_index]) #Calculamos el total de compras del usuario en esa categoria
           filtered_users.append([user[id_index], user[name_index], user[age_index], total_amount_spent]) #Añadimos el usuario a la lista de resultados

    return filtered_users #El resultado a devolver es la lista creada con los datos del cliente




# La lista de usuarios
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'], [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439, 390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'], [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics', 'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234, 329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213, 659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'], [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'], [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'], [189, 299, 579]]
]

# Llama a la función con la categoría 'home'
result = get_client_by_cat(users, 0, 1, 2, 3, 4, 'home')

# Muestra en pantalla la lista que resulta
print(result)

[['32156', ['john', 'doe'], 37, 678], ['32984', ['david', 'white'], 41, 806], ['33001', ['emily', 'brown'], 26, 951], ['33912', ['jose', 'martinez'], 22, 917], ['34009', ['lisa', 'wilson'], 35, 847]]
