# Questionnaire 

Store 1 aims to ensure consistency in data collection. As part of this initiative, the quality of the data collected about users must be assessed. You have been asked to review the collected data and propose changes. Below you will see data about a particular user; review the data and identify any potential issues.

In [1]:
user_id = '32415'
user_name = ' mike_reed '
user_age = 32.0
fav_categories = ['ELECTRONICS', 'SPORT', 'BOOKS']

**Options:**

1. The data type for `user_id` should be changed from a string to an integer.

2. The variable `user_name` contains a string with unnecessary spaces and an underscore between the first name and the last name.

3. The data type of `user_age` is incorrect.

4. The list `fav_categories` contains strings in uppercase. Instead, we should convert the values of the list to lowercase.

Write in the Markdown cell below the number(s) of the options you have identified as problems. If you have identified multiple issues, separate them with commas. For example, if you believe options 1 and 3 are correct, write 1, 3.

**Write your answer and explain your reasoning:** 2, 3  
2. It's better to work with cleaner and shorter data to minimize errors.  
3. In my opinion, when talking about ages, they should be integers, not floats.


# Exercise 1

We will implement the changes we identified. First, we need to correct the issues with the `user_name` variable. As we saw, it has unnecessary spaces and an underscore as a separator between the first and last name. Your goal is to remove the spaces and then replace the underscore with a space.


In [2]:
user_name = ' mike_reed '
user_name1 = user_name.strip () # Remove spaces from the original string
user_name1 = user_name.replace ('_',' ') # Replace the underscore with a space

print(user_name1)

 mike reed 


# Exercise 2

Next, we need to split the updated `user_name` into two substrings to obtain a list containing two values: the string for the first name and the string for the last name.

In [3]:
user_name = 'mike reed'
name_split = user_name.split(' ') # Split the string user_name here

print(name_split)

['mike', 'reed']


# Exercise 3

Great! Now we need to work with the variable `user_age`. As we mentioned, it has an incorrect data type. Let's fix this issue by transforming the data type and displaying the final result.

In [4]:
user_age = 32.0
user_age = int(user_age) # Change the data type for the age of a user

print(user_age)

32


# Exercise 4

As we know, data is not always perfect. We need to consider scenarios where the value of `user_age` cannot be converted into an integer. To prevent our system from crashing, we should take precautions in advance.

Write a code that tries to convert the `user_age` variable into an integer and assigns the transformed value to `user_age_int`. If the attempt fails, display a message asking the user to provide their age as a numerical value with the message: `Please provide your age as a numerical value.`

In [5]:
user_age = '32.0' # Here is the variable that stores the age as a string.

try:
    user_age_int = int (user_age) # Write a code that attempts to transform user_age into an integer and, if it fails, prints the specified message
    print ('Age as a numerical value')
except:
    print ('Please provide your age as a numerical value.')

Please provide your age as a numerical value.


# Exercise 5

Finally, consider that all the favorite categories are stored in uppercase. To fill a new list called `fav_categories_low` with the same categories, but in lowercase, iterate through the values in the `fav_categories` list, modify them, and add the new values to the `fav_categories_low` list. As always, display the final result.

In [6]:
fav_categories = ['ELECTRONICS', 'SPORT', 'BOOKS']
fav_categories_low = []

for category in fav_categories:
    fav_categories_low.append(category.lower())


print(fav_categories_low)

['electronics', 'sport', 'books']


# Exercise 6

We have obtained additional information about the spending habits of our users, including the amount spent in each of their favorite categories. Management is interested in the following metrics:

- Total amount spent by the user.
- Minimum amount spent.
- Maximum amount spent.

We will calculate these values and display them on the screen:

In [7]:
fav_categories_low = ['electronics', 'sport', 'books']
spendings_per_category = [894, 213, 173]

total_amount = sum (spendings_per_category) 
max_amount = max (spendings_per_category) 
min_amount = min (spendings_per_category) 

print(total_amount)
print(max_amount)
print(min_amount)

1280
894
173


# Exercise 7

The company wants to offer discounts to its loyal customers. Customers who spend a total amount greater than $1500 are considered loyal and will receive a discount.

Our goal is to create a `while` loop that checks the total amount spent and stops once it is reached. To simulate new purchases, the variable `new_purchase` generates a number between 30 and 80 in each iteration of the loop. This represents the amount of money spent on a new purchase and needs to be added to the total.

Once the target amount is reached and the `while` loop finishes, the final amount will be displayed.

In [8]:
from random import randint

total_amount_spent = 1280
target_amount = 1500


while total_amount_spent <= target_amount: 
	new_purchase = randint(30, 80) # Generate a random number between 30 and 80
	total_amount_spent += new_purchase 

print(total_amount_spent)

1544


# Exercise 8

Now we have all the information about a customer in the format we want it to be. The management of a company has asked us to propose a way to summarize all the information about a user. Your goal is to create a formatted string that uses information from the variables `user_id`, `user_name`, and `user_age`.

This is the final string we want to create: `User 32415 is Mike who is 32 years old.`

In [9]:
user_id = '32415'
user_name = ['mike', 'reed']
user_age = 32

user_info = f'User {user_id} is {user_name[0]} who is {user_age} years old.' 


print(user_info)

User 32415 is mike who is 32 years old.


As you know, companies collect and store data in a particular way. Store 1 wants to store all information about its customers in a table.

| user_id | user_name | user_age | purchase_category | spending_per_category |
| --- | --- | --- | --- | --- |
| '32415' | 'mike', 'reed' | 32 | 'electronics', 'sport', 'books' | 894, 213, 173 |
| '31980' | 'kate', 'morgan' | 24 | 'clothes', 'shoes' | 439, 390 |

Technically speaking, a table is simply a nested list containing a sublist for each user.

Store 1 has created such a table for its users. It is stored in the variable `users`. Each sublist contains the user's ID, first and last name, age, favorite categories, and the amount spent in each category.

# Exercise 9

To calculate the company's revenue, follow these steps.

1. Use `for` to iterate over the `users` list.
2. Extract each user's expense list and sum the values.
3. Update the revenue value with each user's total.

This will give you the total revenue of the company, which you will display at the end.

In [10]:
users = [
	  # This is the start of the first sublist
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
        [894, 213, 173]
    ], # This is the end of the first sublist

    # This is the start of the second sublist
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'shoes'],
        [439, 390]
    ] # This is the end of the second sublist
]

revenue = 0

for user in users:
	spendings_list = user [-1] # Extract the list of expenses for each user and sum the values
	total_spendings = sum(spendings_list) # Sum the expenses from all categories to get the total for a particular user.
	revenue += total_spendings # Update the income


print(revenue)

2109


# Exercise 10

Go through the list of users we have provided and display the names of customers under 30 years old.

In [11]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]

users_filtered = [] 

for user in users:
    if user [-3] <= 30 : 
        users_filtered.append (user[1][0]) 

for user in users_filtered: 
    print(user) 

kate
samantha
emily
jose
james


# Exercise 11

Let's combine tasks 9 and 10 and print the names of users who are under 30 years old and have a total spending greater than $1000.

In [12]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]

users_filtered = []
total_spend = 0

for user in users:
    spendings_list = user [-1]
    total_spendings = sum(spendings_list)
    if user [-3] <= 30 and total_spendings >= 1000: 
        users_filtered.append (user[1][0])
        
for user in users_filtered: 
    print(user) 

samantha
james


# Exercise 12

Now, we will display the name and age of all users who have bought clothing. Print the name and age in the same print statement.

In [13]:
users = [
    ['32415', ['mike', 'reed'], 32, ['electronics', 'sport', 'books'],
     [894, 213, 173]],
    ['31980', ['kate', 'morgan'], 24, ['clothes', 'books'], [439,
     390]],
    ['32156', ['john', 'doe'], 37, ['electronics', 'home', 'food'],
     [459, 120, 99]],
    ['32761', ['samantha', 'smith'], 29, ['clothes', 'electronics',
     'beauty'], [299, 679, 85]],
    ['32984', ['david', 'white'], 41, ['books', 'home', 'sport'], [234,
     329, 243]],
    ['33001', ['emily', 'brown'], 26, ['beauty', 'home', 'food'], [213,
     659, 79]],
    ['33767', ['maria', 'garcia'], 33, ['clothes', 'food', 'beauty'],
     [499, 189, 63]],
    ['33912', ['jose', 'martinez'], 22, ['sport', 'electronics', 'home'
     ], [259, 549, 109]],
    ['34009', ['lisa', 'wilson'], 35, ['home', 'books', 'clothes'],
     [329, 189, 329]],
    ['34278', ['james', 'lee'], 28, ['beauty', 'clothes', 'electronics'
     ], [189, 299, 579]],
    ]

users_filtered = []
categoria = 'clothes'
for user in users:
    if user [-2] [0] == categoria:  
        users_filtered.append (user[1][0]) 
        users_filtered.extend ([user[2]])
        
for user in users_filtered: 
    print(user) 

kate
24
samantha
29
maria
33
