# Data processing and collection for future analysis

# Description
An e-commerce company, Store 1, recently started collecting data on its customers. Store 1's ultimate goal is to better understand its customers' behavior and make data-driven decisions to improve their online experience.

Store 1 aims to ensure consistency in data collection. As part of this initiative, the quality of data collected about users must be evaluated. You have been asked to review the data collected and propose changes. Below you will see data about a particular user; reviews the data and identifies any potential problems.

In [1]:
user_id = '32415'
user_name = ' mike_reed '
user_age = 32.0
fav_categories = ['ELECTRONICS', 'SPORT', 'BOOKS']

# Exercise 1

We will implement the changes we identify. First, we need to fix the problems with the `user_name` variable. As we saw, it has unnecessary spaces and an underscore as a separator between the first name and the last name; Your goal is to remove the spaces and then replace the underscore with the space.

In [2]:
user_name = ' mike_reed '
user_name = user_name.strip()# remove spaces in original string
user_name = user_name.replace('_', ' ') # replace underscore with space

print(user_name)

mike reed


# Exercise 2

Next, we need to split the updated `user_name` into two substrings to get a list containing two values: the string for the first name and the string for the last name.

In [3]:
user_name = 'mike reed'
name_split = user_name.split()# split the string user_name here

print(name_split)

['mike', 'reed']


# Exercise 3

Brilliant! Now we must work with the `user_age` variable. As we already mentioned, this has the wrong data type. Let's fix this problem by transforming the data type and displaying the final result.

In [4]:
user_age = 32.0
user_age = int(user_age)# changes the type of data for the age of a user

print(user_age)

32


# Exercise 4

As we know, data is not always perfect. We must consider scenarios where the value of `user_age` cannot be converted to an integer. To prevent our system from crashing, we must take measures in advance.

Write code that attempts to convert the `user_age` variable to an integer and assigns the transformed value to `user_age_int`. If the attempt fails, we display a message asking the user to provide their age as a numerical value with the message: `Please provide your age as a numerical value.`

In [5]:
user_age = 'treinta y dos' # aquí está la variable que almacena la edad como un string.

# write code that attempts to transform user_age into an integer and if it fails, prints the specified message
try:
    user_age_int = int(user_age)
except ValueError:
    print("Please provide your age as a numerical value.")

Please provide your age as a numerical value.


# Exercise 5

Finally, consider that all favorite categories are stored in uppercase. To populate a new list called `fav_categories_low` with the same categories, but in lowercase, iterate over the values in the `fav_categories` list, modify them, and add the new values to the `fav_categories_low` list. As always, show the final result.

In [6]:
fav_categories = ['ELECTRONICS', 'SPORT', 'BOOKS']
fav_categories_low = []


for cat in fav_categories:
    fav_categories_low.append(cat.lower())
    
print(fav_categories_low)

['electronics', 'sport', 'books']


# Exercise 6

We have obtained additional information about our users' spending habits, including the amount spent in each of their favorite categories. Management is interested in the following metrics:

- Total amount spent by the user.
- Minimum amount spent.
- Maximum amount spent.

Let's calculate these values and display them on the screen:

In [7]:
fav_categories_low = ['electronics', 'sport', 'books']
spendings_per_category = [894, 213, 173]

total_amount = sum(spendings_per_category)
max_amount = max(spendings_per_category) 
min_amount = min(spendings_per_category) 

print(total_amount)
print(max_amount)
print(min_amount)

1280
894
173


# Exercise 7

The company wants to offer discounts to its loyal customers. Customers who make purchases totaling more than $1,500 are considered loyal and will receive a discount.

Our goal is to create a `while` loop that checks the total amount spent and stops when it reaches it. To simulate new purchases, the `new_purchase` variable outputs a number between 30 and 80 at each iteration of the loop. This represents the amount of money spent on a new purchase and is what must be added to the total.

Once the target amount is reached and the `while` loop ends, the final amount will be displayed.

In [8]:
from random import randint

total_amount_spent = 1280
target_amount = 1500

while total_amount_spent < target_amount:
	new_purchase = randint(30, 80) # we generate a random number from 30 to 80
	total_amount_spent += new_purchase

print(total_amount_spent)

1508


# Exercise 8

Now we have all the information about a client the way we want it to be. The management of a company asked us to propose a way to summarize all the information about a user. Your goal is to create a formatted string that uses information from the `user_id`, `user_name`, and `user_age` variables.

This is the final string we want to create: `User 32415 is mike who is 32 years old.`.

In [9]:
user_id = '32415'
user_name = ['mike', 'reed']
user_age = 32

user_info = f"User {user_id} is {user_name[0]} who is {user_age} years old."

print(user_info)

User 32415 is mike who is 32 years old.


# Exercise 9

To calculate the company's income, follow these steps.

1. Use `for` to iterate over the `users` list.
2. Extract the list of expenses for each user and add the values.
3. Update the income value with the total of each user.

This way you will obtain the total income of the company that you will show on the screen at the end.

In [10]:
users = [
	  # this is the start of the first sublist
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
        [894, 213, 173]
    ], # this is the end of the first sublist

    # this is the start of the second sublist
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'shoes'],
        [439, 390]
    ] # this is the end of the second sublist
]

revenue = 0

for user in users:
	spendings_list = user[4]# extracts the list of expenses for each user and adds the values
	total_spendings = sum(spendings_list) # add the expenses of all categories to get the total for a particular user
	revenue += total_spendings # updates revenue

print(revenue)

2109


# Exercise 10

Go through the list of users that we have provided and show the names of the clientele under 30 years of age.

In [11]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]

# escribe tu código aquí
for age in users:
    if age[2] < 30:
        print(age[1][0])

kate
samantha
emily
jose
james


# Exercise 11

Let's put tasks 9 and 10 together and print the names of users who are under 30 years old and have a total spend greater than $1,000.

In [12]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]


for user in users:
    if user[2] < 30 and sum(user[4]) > 1000:
        print(user[1][0]) 
    # escribe tu código aquí

samantha
james


# Exercise 12

Now we are going to show the name and age of all the users who have purchased clothes. Prints the name and age in the same print statement.

In [13]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]

for user in users:
    if 'clothes' in user[3]:
        print(f"{user[1][0]} {user[2]}")

kate 24
samantha 29
maria 33
lisa 35
james 28
