# Cleaning and Treating Categorical Variables (Exercise)

In this exercise, you will learn how to handle categorical variables in a dataset. You will:
- Handle missing values
- Apply Label Encoding
- Apply One-Hot Encoding

Use the hints provided for guidance and check your work using the solution at the end.

## Step 1: Import Required Libraries
Import `numpy`, `pandas`, and the encoding classes from `sklearn.preprocessing`.

In [None]:
# TODO: Import numpy as np, pandas as pd
# TODO: Import LabelEncoder and OneHotEncoder from sklearn.preprocessing

<details>
<summary>Hint</summary>
Use `import numpy as np` and `import pandas as pd`. From sklearn.preprocessing, import `LabelEncoder` and `OneHotEncoder`.
</details>

## Step 2: Create the Dataset
Create a DataFrame with columns: `names`, `age`, `gender`, `rank`. Include some missing values (`np.nan`) in the `gender` column.

In [None]:
# TODO: Create the dataset as a dictionary
# TODO: Convert it to a DataFrame called df
# TODO: Display df

<details>
<summary>Hint</summary>
Use `pd.DataFrame(data)` to create the DataFrame. Example for gender column: `['Male', 'Male', np.nan, 'Female', np.nan, 'Male', np.nan]`
</details>

## Step 3: Handle Missing Values
Drop the `gender` column because it contains missing values and filling it may introduce incorrect assumptions.

In [None]:
# TODO: Drop the 'gender' column and display the DataFrame

<details>
<summary>Hint</summary>
Use `df.drop('gender', axis=1, inplace=False)` to drop the column. `axis=1` means column.
</details>

## Step 4: Label Encoding
Convert the `names` column to numerical labels.

In [None]:
# TODO: Create a LabelEncoder object
# TODO: Fit it on df['names']
# TODO: Transform the names column to numeric labels and display the result

<details>
<summary>Hint</summary>
Create `label_encoder = LabelEncoder()`, then use `label_encoder.fit(df['names'])`. Transform using `label_encoder.transform(df['names'])`.
</details>

## Step 5: One-Hot Encoding
Convert the `names` column to one-hot encoded columns.

In [None]:
# TODO: Create a OneHotEncoder object with sparse_output=False
# TODO: Fit it on df[['names']]
# TODO: Transform df[['names']] to one-hot encoded array
# TODO: Convert the result to a DataFrame and include the original 'names' column

<details>
<summary>Hint</summary>
Use `OneHotEncoder(sparse_output=False)` to create the encoder. Fit with `encoder.fit(df[['names']])`. Transform using `encoder.transform(df[['names']])`.
Then convert to DataFrame and use `columns=encoder.categories_`.
</details>

## Step 6: Self-Check (Solution)
Click below to check the complete solution.

<details>
<summary>Solution (click to expand)</summary>

```python
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

data = {
    'names': ['steve', 'john', 'richard', 'sarah', 'randy', 'micheal', 'julie'],
    'age': [20, 22, 20, 21, 24, 23, 22],
    'gender': ['Male', 'Male', np.nan, 'Female', np.nan, 'Male', np.nan],
    'rank': [2, 1, 4, 5, 3, 7, 6]
}

df = pd.DataFrame(data)
df = df.drop('gender', axis=1)

# Label Encoding
label_encoder = LabelEncoder()
label_encoder.fit(df['names'])
label_encoded_names = label_encoder.transform(df['names'])
print(label_encoded_names)

# One-Hot Encoding
onehot_encoder = OneHotEncoder(sparse_output=False)
onehot_encoder.fit(df[['names']])
onehot_encoded_names = onehot_encoder.transform(df[['names']])
onehot_encoded_df = pd.DataFrame(onehot_encoded_names, columns=onehot_encoder.categories_)
onehot_encoded_df['names'] = df['names']
print(onehot_encoded_df)
```
</details>