# ðŸ§ª Exercise: Summarizing Categorical Data with pandas

In this interactive exercise, you will practice **summarizing categorical data** using pandas.

You will learn how to:
- Count categorical values
- Group data by categories
- Convert variables to categorical data types
- Explore relationships using cross-tabulation

ðŸ‘‰ Try to solve each step **before expanding the hint**.
At the end, a **collapsed full solution** is provided for self-checking.

In [None]:
import numpy as np
import pandas as pd

## Step 0: Load and inspect the dataset

**Task**:
- Load the `mtcars` dataset from the CSV file
- Rename the columns as shown
- Set `car_names` as the index
- Display the first 15 rows

<details>
<summary>ðŸ’¡ Hint</summary>

- Use `pd.read_csv()`
- Assign column names using `cars.columns = [...]`
- Set the index using `cars.index = cars.car_names`
- Display rows with `head(15)`
</details>

In [None]:
# YOUR CODE HERE

## Step 1: Count categorical values

The `carb` column represents the **number of carburetors** per car.

**Task**:
- Extract the `carb` column
- Count how many cars fall into each carburetor category

<details>
<summary>ðŸ’¡ Hint</summary>

- Access a column using `cars.carb`
- Use the `value_counts()` method
</details>

In [None]:
# YOUR CODE HERE

## Step 2: Create a categorical subset

**Task**:
- Create a new DataFrame called `cars_cat`
- Include only the following columns:
  - `cyl`, `vs`, `am`, `gear`, `carb`
- Display the first few rows

<details>
<summary>ðŸ’¡ Hint</summary>

- Use double square brackets: `cars[[...]]`
- Call `head()` to preview
</details>

In [None]:
# YOUR CODE HERE

## Step 3: Group and describe categorical data

**Task**:
- Group `cars_cat` by the `gear` column
- Generate a statistical description for each group

<details>
<summary>ðŸ’¡ Hint</summary>

- Use the `groupby()` method
- Chain or follow with `describe()`
</details>

In [None]:
# YOUR CODE HERE

## Step 4: Convert a variable to categorical type

**Task**:
- Create a new column called `group`
- Convert the `gear` variable into a categorical data type
- Check the data type of the new column

<details>
<summary>ðŸ’¡ Hint</summary>

- Use `pd.Series()` with `dtype='category'`
- Check types using `.dtypes`
</details>

In [None]:
# YOUR CODE HERE

## Step 5: Summarize the categorical distribution

**Task**:
- Count how many cars belong to each `group` category

<details>
<summary>ðŸ’¡ Hint</summary>

- Select the column first
- Call `value_counts()`
</details>

In [None]:
# YOUR CODE HERE

## Step 6: Cross-tabulation of categorical variables

Cross-tabulation helps analyze **relationships between two categorical variables**.

**Task**:
- Create a crosstab showing transmission type (`am`) vs number of gears (`gear`)

<details>
<summary>ðŸ’¡ Hint</summary>

- Use `pd.crosstab(row_variable, column_variable)`
</details>

In [None]:
# YOUR CODE HERE

---
## âœ… Collapsed Full Solution (Self-Check)

<details>
<summary>ðŸ“˜ Click to expand solution</summary>

```python
address = '/workspaces/python-for-data-science-and-machine-learning-essential-training-part-1-3006708/data/mtcars.csv'

cars = pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp','hp','drat','wt','qsec','vs','am','gear','carb']
cars.index = cars.car_names
cars.head(15)

carb = cars.carb
carb.value_counts()

cars_cat = cars[['cyl', 'vs', 'am', 'gear', 'carb']]
cars_cat.head()

gears_group = cars_cat.groupby('gear')
gears_group.describe()

cars['group'] = pd.Series(cars.gear, dtype='category')
cars['group'].dtypes

cars['group'].value_counts()

pd.crosstab(cars['am'], cars['gear'])
```
</details>