## Exercise 1
#### The df DataFrame is given below:

#### Extract columns of object type from this DataFrame. Then fill in all the missing values for these columns with the value 'empty'.

#### Assign the result to the df_object variable and print it to the console.

In [5]:
import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer


data = {
    'size': ['XL', 'L', 'M', np.nan, 'M', 'M'],
    'color': ['red', 'green', 'blue', 'green', 'red', 'green'],
    'gender': ['female', 'male', np.nan, 'female', 'female', 'male'],
    'price': [199.0, 89.0, np.nan, 129.0, 79.0, 89.0],
    'weight': [500, 450, 300, np.nan, 410, np.nan],
    'bought': ['yes', 'no', 'yes', 'no', 'yes', 'no']
}

df = pd.DataFrame(data=data)

df_object = df.select_dtypes(include = 'object')

imputer = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value = 'empty')

df_object = pd.DataFrame(imputer.fit_transform(df_object), columns=df_object.columns)

print(df_object)

    size  color  gender bought
0     XL    red  female    yes
1      L  green    male     no
2      M   blue   empty    yes
3  empty  green  female     no
4      M    red  female    yes
5      M  green    male     no


#### Notes:

- Remember that the imputer with fit_transform returns an array, hence this has to be reconverted back into a dataframe.
- Another way would be df_object = df.select_dtypes(include=['object']).fillna('empty')
    - Use Method 1 (SimpleImputer) when:
        - You need more flexibility or plan to integrate this step into a machine learning pipeline.
        - You may want to switch to more complex imputation strategies later.
    - Use Method 2 (fillna) when:
        - You need a quick, straightforward way to fill missing values in object columns.
        - Performance and simplicity are priorities, especially when working outside a machine learning context.

## Exercise 2

#### Discretize the weight column. Divide the values of this column into three intervals of equal width. Assign the result to a new column 'weight_cut' as shown below.

#### In response, print the df object to the console.

In [6]:
df = pd.DataFrame(data={'weight': [75., 78.5, 85., 91., 84.5, 83., 68.]})

df['weight_cut'] = pd.cut(df['weight'], bins=3)
print(df)

   weight        weight_cut
0    75.0  (67.977, 75.667]
1    78.5  (75.667, 83.333]
2    85.0    (83.333, 91.0]
3    91.0    (83.333, 91.0]
4    84.5    (83.333, 91.0]
5    83.0  (75.667, 83.333]
6    68.0  (67.977, 75.667]


#### Notes:

- The .cut() method is used to segment and sort data values into discrete bins or intervals. This is particularly useful when you need to transform continuous numerical data into categorical data.
- Args:
    - x: The input array or Series to be binned.
    - bins: Defines the bin edges. It can be an integer (specifying the number of bins) or a sequence of scalars (specifying the exact bin edges).
    - right: Indicates whether the bins include the rightmost edge (default is True).
    - labels: Used to label the resulting bins. If not provided, the bins are returned as intervals.

## Exercise 3

#### Discretize the column weight into three intervals with the given boundaries:

- (60, 75]

- (75, 80]

- (80, 95]

#### Assign the result to the new column 'weight_cut' as shown below.

#### In response, print the df DataFrame to the console.

In [10]:
df = pd.DataFrame(data={'weight': [75., 78.5, 85., 91., 84.5, 83., 68.]})

df['weight_cut'] = pd.cut(df['weight'], bins=[60, 75, 80, 95])
print(df)

   weight weight_cut
0    75.0   (60, 75]
1    78.5   (75, 80]
2    85.0   (80, 95]
3    91.0   (80, 95]
4    84.5   (80, 95]
5    83.0   (80, 95]
6    68.0   (60, 75]


## Exercise 4

#### Discretize the column weight into three intervals with the given boundaries:

- (60, 75]

- (75, 80]

- (80, 95]

#### and bound to them the following labels:

- light

- normal

- heavy

#### Assign the result to the new column 'weight_cut' as show below.

#### In response, print the df DataFrame to the console.

In [11]:
df = pd.DataFrame(data={'weight': [75., 78.5, 85., 91., 84.5, 83., 68.]})

df['weight_cut'] = pd.cut(df['weight'], bins=[60, 75, 80, 95], labels = ['light', 'normal', 'heavy'])
print(df)

   weight weight_cut
0    75.0      light
1    78.5     normal
2    85.0      heavy
3    91.0      heavy
4    84.5      heavy
5    83.0      heavy
6    68.0      light
