<a href="https://colab.research.google.com/github/c-marq/CAP3321C-Data-Wrangling/blob/main/exercises/chapter-07/exercise_7_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise 7-2: Prepare the Cars Data

**CAP3321C - Data Wrangling**

---

## Overview

This exercise will guide you through the process of preparing the Cars data. You'll practice using lambda expressions and user-defined functions to add columns, then work with indexes and unstack data.

**Group Members:**
- Name 1:
- Name 2:
- Name 3:
- Name 4:

---

## Read the Data

Run these cells to load the data.

In [None]:
import pandas as pd

In [None]:
# Download the data file from GitHub
!wget -q https://raw.githubusercontent.com/c-marq/CAP3321C-Data-Wrangling/main/data/cars.csv
print("Data file downloaded successfully!")

In [None]:
# Load the cars data
cars = pd.read_csv('cars.csv')
print("Data shape:", cars.shape)

### Task 4: Display the First Five Rows (YOUR CODE)

Display the first five rows of the DataFrame.

**Expected output:** Columns include car_ID, symboling, CarName, fueltype, etc.

In [None]:
# YOUR CODE HERE - display the first 5 rows


---

## Part 1: Add and Drop Columns

Practice creating new columns using lambda expressions and user-defined functions.

### Task 5: Display Unique CarName Values (YOUR CODE)

Display the unique values for the `CarName` column.

**Hint:** Use the `unique()` method.

**Expected output:** Array of unique car names like 'alfa-romero giulia', 'audi 100 ls', etc.

In [None]:
# YOUR CODE HERE - display unique CarName values


### Task 6: Add Brand Column Using Lambda (YOUR CODE)

Add a column named `brand` that stores the brand of the car. To do that, use a **lambda expression** that calls the `split()` method to access portions of the strings in the `CarName` column.

**Hint:** The brand is the first word in CarName. Use `split(' ')[0]` to get the first word.

**Example syntax:**
```python
df['brand'] = df.apply(lambda x: x.CarName.split(' ')[0], axis=1)
```

**Expected output:** New 'brand' column containing just the brand name (e.g., 'alfa-romero', 'audi')

In [None]:
# YOUR CODE HERE - add brand column using lambda


### Task 7: Add Model Column Using a User-Defined Function (YOUR CODE)

Add a column named `model` that stores the car name without the brand. To do that, apply a **user-defined function** to each row.

**Hint:** 
1. First, write a function that takes a row and returns everything after the first word in CarName
2. Use `split(' ')[1:]` to get all words after the first, then `' '.join()` to combine them

**Example syntax:**
```python
def get_model(row):
    words = row.CarName.split(' ')[1:]  # Get all words except first
    return ' '.join(words)              # Join them with spaces

df['model'] = df.apply(get_model, axis=1)
```

**Expected output:** New 'model' column containing the model name (e.g., 'giulia', '100 ls')

In [None]:
# YOUR CODE HERE - write the function and add model column


In [None]:
# Verify brand and model columns
cars[['CarName', 'brand', 'model']].head(10)

### Task 8: Drop the CarName and car_ID Columns (YOUR CODE)

Drop the `CarName` and `car_ID` columns since we now have the brand and model columns.

**Hint:** Use `drop(columns=[...])` with a list of column names.

**Example syntax:**
```python
df = df.drop(columns=['col1', 'col2'])
```

**Expected output:** DataFrame without CarName and car_ID columns

In [None]:
# YOUR CODE HERE - drop CarName and car_ID columns


In [None]:
# Verify columns were dropped
cars.head()

---

## Part 2: Set an Index and Unstack the Data

Practice filtering, indexing, and reshaping data.

### Task 9: Filter the Cars Data (YOUR CODE)

Filter the cars DataFrame by:
1. Getting only rows for **Volkswagen** brand
2. Getting only the columns named `model`, `horsepower`, `carbody`, and `doornumber`

Assign the new DataFrame to a variable named `cars_filtered` and display the first five rows.

**Hint:** 
- Use `query()` to filter rows
- Use `[[column_list]]` to select columns
- Note: Brand may be spelled 'volkswagen' (check your unique values from earlier!)

**Example syntax:**
```python
cars_filtered = cars.query('brand == "volkswagen"')[['col1', 'col2', 'col3', 'col4']]
```

**Expected output:** DataFrame with only Volkswagen cars and 4 columns

In [None]:
# YOUR CODE HERE - filter for Volkswagen and select columns


### Task 10: Set an Index on cars_filtered (YOUR CODE)

Set an index on the `model`, `carbody`, and `doornumber` columns of the `cars_filtered` DataFrame. Assign the new DataFrame to a variable named `cars_indexed` and display it.

**Hint:** Use `set_index()` with a list of column names.

**Example syntax:**
```python
cars_indexed = cars_filtered.set_index(['col1', 'col2', 'col3'])
```

**Expected output:** DataFrame with a 3-level hierarchical index

In [None]:
# YOUR CODE HERE - set index on model, carbody, doornumber


### Task 11: Unstack the carbody Column (YOUR CODE)

Unstack the `carbody` column of the index. Note how this unstacks the carbody values (sedan and wagon) so they become columns, and how it adds NaN values to the wagon column where appropriate.

**Hint:** Use `unstack()` with the column name to unstack.

**Example syntax:**
```python
df_unstacked = df.unstack('column_name')
```

**Expected output:** Wide DataFrame with sedan and wagon as column headers under horsepower

In [None]:
# YOUR CODE HERE - unstack the carbody column


---

## Summary

In this exercise, you practiced data preparation techniques:

**Adding Columns:**
- Lambda expressions with `apply()` and string methods
- User-defined functions with `apply()`

**Dropping Columns:**
- `drop(columns=[...])` - Remove unnecessary columns

**Filtering Data:**
- `query()` - Filter rows based on conditions
- `[[column_list]]` - Select specific columns

**Working with Indexes:**
- `set_index()` - Create hierarchical indexes
- `unstack()` - Pivot index levels to columns (creates NaN where data doesn't exist)

---

**Submission:** Save this notebook and submit to Canvas before the deadline.