As part of the analytics team, the first thing you need to do is assess the quality of a sample of collected data and prepare it for further analysis. Then, in the second part of this project, in the second sprint, you'll further develop your skills and conduct your first full analysis, responding to the client's needs.

This is the data the client provided us. It is formatted as a Python list, with the following data columns:

- **user_id:** Unique identifier for each user.
- **user_name:** The user's first name.
- **user_age:** The user's age.
- **fav_categories:** Favorite categories of items the user purchased, such as 'ELECTRONICS', 'SPORT', and 'BOOKS', etc.
- **total_spendings:** A list of integers indicating the total amount spent on each of the favorite categories.

In [None]:
users = [
    ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
    ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES',
                                       'ELECTRONICS', 'BEAUTY'], [299, 679, 85]],
    ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]],
    ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]],
    ['33767', ' Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]],
    ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]],
    ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]],
    ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]],
]

# Step 1

Store 1 aims to ensure consistency in data collection. As part of this initiative, the quality of the data collected about users must be assessed. You have been asked to review the collected data and suggest changes. Below, you will see data about a specific user; review the data and identify any potential issues.

In [2]:
user_id = '32415'
user_name = ' mike_reed '
user_age = 32.0
fav_categories = ['ELECTRONICS', 'SPORT', 'BOOKS']

**Options:**

1. The `user_id` data type must be changed from string to integer.

2. The `user_name` variable contains a string with unnecessary spaces and an underscore between the first and last names.

3. The `user_age` data type is correct and there is no need to convert it.

4. The `fav_categories` list contains uppercase strings. Instead, we must convert the list values ​​to lowercase.

For each option, write in the markdown box below whether you identified it as a real problem in the data or not. Justify your reasoning. For example, if you believe the first option is correct, write it down and explain why you think it is correct.

**Write your answer and explain your reasoning:**

_Answer:_

User_id, when recording the values ​​of each field for a data collection, they are usually entered as Strings, but generally, for convenience, identifiers are preferred as Int values.

User_name, the values ​​of records in this field that contain names, are sometimes entered by users with typos, additional characters such as spaces or special characters, or the programmed frontend mistakenly adds these additional characters, so these values ​​must be standardized.

Fav_categories, should be a requirement for data management to convert all of them to lowercase; others prefer them to be uppercase.

# Step 2

Let's implement the changes we identified. First, we need to fix the issues with the `user_name` variable. As we saw, it has unnecessary spaces and an underscore separator between the first and last names. Your goal is to remove the spaces and then replace the underscore with a space.

In [3]:
user_name = ' mike_reed '
user_name = user_name.replace('_', ' ').strip()

print(user_name)

mike reed


********Hint********

There is a method, `strip()`, that can remove spaces at the beginning and end of a string. Additionally, the `replace()` method can be used to replace part of a string. In this case, we want to replace the underscores (`_`) with spaces.

# Step 3

Next, we need to split the updated `user_name` into two substrings to obtain a list containing two values: the string for the first name and the string for the last name.

In [4]:
user_name = 'mike reed'
name_split = user_name.split()

print(name_split)

['mike', 'reed']


********Hint********

The `split()` method is used to split a string. By default, it uses a space as the separator.

# Step 4

Great! Now we need to work with the `user_age` variable. As we mentioned, it has an incorrect data type. Let's fix this problem by converting the data type and displaying the final result.

In [5]:
user_age = 32.0
user_age = int(user_age)

print(user_age)

32


********Hint********

What type of data will remove the floating-point portion?

# Step 5

As we know, data isn't always perfect. We must consider scenarios in which the `user_age` value cannot be converted to an integer. To prevent our system from crashing, we must take measures in advance.

Write code that attempts to convert the `user_age` variable to an integer and assigns the transformed value to `user_age_int`. If the attempt fails, we display a message asking the user to provide their age as a numerical value with the message: `Please provide your age as a numerical value.`

In [None]:
user_age = 'treinta y dos'

try:

    user_age = int(user_age)

except ValueError:

    while True:

        user_age = input(
            '> Error: invalid age, please enter a valid number.\n> ')

        if user_age.isdigit():
            user_age = int(user_age)
            print(user_age)
            break
        else:
            print('> Error: invalid age, please enter a valid number.')

else:
    print(user_age)

25


********Hint********

Use a `try-except` block to attempt the conversion; if it fails, provide a clear message indicating that the input must be numeric.

# Step 6

Store 1's management team has asked you to help them organize their customer data for better analysis and management.

Your task is to sort this list by user ID in ascending order to make it easier to access and analyze.

In [None]:
users = [
    ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
    ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES',
                                       'ELECTRONICS', 'BEAUTY'], [299, 679, 85]],
    ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]],
    ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]],
    ['33767', ' Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]],
    ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]],
    ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]],
    ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]],
]

users.sort()

print(users)

[['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]], ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]], ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]], ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES', 'ELECTRONICS', 'BEAUTY'], [299, 679, 85]], ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]], ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]], ['33767', ' Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]], ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]], ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]], ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]]]


********Hint********

You can use the `sort()` method on the user list to sort it in ascending order.

# Step 7

We have information about our users' spending habits, including the amount spent on each of their favorite categories. Management is interested in knowing the total amount spent by the user.

Let's calculate this value and display it.

In [8]:
fav_categories_low = ['electronics', 'sport', 'books']
spendings_per_category = [894, 213, 173]

total_amount = sum(spendings_per_category)

print(total_amount)

1280


********Hint********

What three methods can be applied to a list to calculate its minimum, maximum, and total values?

# Step 8

Company management asked us to think of a way to summarize all of a user's information. Your goal is to create a formatted string using information from the variables `user_id`, `user_name`, and `user_age`.

This is the final string we want to create: `User 32415 is mike who is 32 years old.`

In [9]:
user_id = '32415'
user_name = ['mike', 'reed']
user_age = 32

user_info = f'User {user_id} is {user_name[0]} who is {user_age}'
print(user_info)

User 32415 is mike who is 32


********Hint********

To create a string, you can use the `format()` method or f-string. To extract the name from the `user_name` list, you can use slicing.

# Step 9

Management also wants an easy way to know the number of customers whose data we have. Your goal is to create a formatted string that displays the number of customer data recorded.

This is the final string we want to create: "We have recorded data for X customers."

In [None]:
users = [
    ['32415', 'mike_reed', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', 'john doe', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
    ['32761', 'SAMANTHA SMITH', 29.0, ['CLOTHES',
                                       'ELECTRONICS', 'BEAUTY'], [299, 679, 85]],
    ['32984', 'David White', 41.0, ['BOOKS', 'HOME', 'SPORT'], [234, 329, 243]],
    ['33001', 'emily brown', 26.0, ['BEAUTY', 'HOME', 'FOOD'], [213, 659, 79]],
    ['33767', 'Maria Garcia', 33.0, ['CLOTHES', 'FOOD', 'BEAUTY'], [499, 189, 63]],
    ['33912', 'JOSE MARTINEZ', 22.0, ['SPORT', 'ELECTRONICS', 'HOME'], [259, 549, 109]],
    ['34009', 'lisa wilson ', 35.0, ['HOME', 'BOOKS', 'CLOTHES'], [329, 189, 329]],
    ['34278', 'James Lee', 28.0, ['BEAUTY', 'CLOTHES', 'ELECTRONICS'], [189, 299, 579]],
]


user_info = f"We have registered data from {len(users)} clients."
print(user_info)

We have registered data from 10 clients.


********Hint********

To create a string, you can use the `format()` method or f-string. To extract the number of customers in the list, you can use the function that returns the length of the list.

# Step 10

Now let's apply all the changes to the customer list. To simplify things, we'll provide a shorter one.
You must:
1. Remove all leading and trailing spaces from names, as well as any underscores.
2. Convert all ages to integers.
3. Separate all first and last names into a sublist.

Save the modified list as a new list called `users_clean` and display it on the screen.

In [None]:
users = [
    ['32415', ' mike_reed ', 32.0, ['ELECTRONICS', 'SPORT', 'BOOKS'], [894, 213, 173]],
    ['31980', 'kate morgan', 24.0, ['CLOTHES', 'BOOKS'], [439, 390]],
    ['32156', ' john doe ', 37.0, ['ELECTRONICS', 'HOME', 'FOOD'], [459, 120, 99]],
]

users_clean = []


# Procesa al primer usuario
user_name_1 = users[0][1].replace('_', ' ').strip().split()
user_age_1 = int(users[0][2])
users_clean.extend([user_name_1, user_age_1])

# Procesa al segundo usuario
user_name_2 = users[1][1].replace('_', ' ').strip().split()
user_age_2 = int(users[1][2])
users_clean.extend([user_name_2, user_age_2])

# Procesa al tercer usuario
user_name_3 = users[2][1].replace('_', ' ').strip().split()
user_age_3 = int(users[2][2])
users_clean.extend([user_name_3, user_age_3])


print(users_clean)

[['mike', 'reed'], 32, ['kate', 'morgan'], 24, ['john', 'doe'], 37]


********Hint********

To process each user, start by accessing the required elements of the users list. Use the `strip()` method to remove leading and trailing spaces and the `replace('_',' ')` method to replace underscores with spaces in names. Convert the age to an integer using `int()`. Separate the full name into first and last names using the `split()` method. Finally, `append` (adds) the cleaned data to the `users_clean` list.