# Grouping and Aggregation in Pandas

Grouping and aggregation are fundamental techniques in data analysis. They allow us to explore and summarize datasets by breaking them into meaningful subgroups and then computing statistics for each subgroup.

In this notebook, we will:
- Understand what grouping means conceptually
- Load a real dataset (`mtcars`)
- Group data by a categorical column
- Apply aggregation functions (mean)

This example follows the explanation from the video transcript and demonstrates **grouping data by column index using pandas**.

## Why Grouping Matters

Imagine you are a merchant selling fruit. Your dataset contains information about apples and oranges purchased from different suppliers. If you reduce this dataset to its fundamental subgroups (by fruit type), you would end up with two groups: apples and oranges.

Grouping allows you to:
- Explore natural subgroups in your data
- Compare subsets of data
- Identify patterns and trends
- Focus analysis on specific categories

In this notebook, instead of fruit, we will analyze **cars grouped by the number of cylinders**.

In [None]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

## Loading the Dataset

We will use the classic **mtcars** dataset, which contains technical specifications for different car models.

Steps:
1. Define the file path where the CSV file is located
2. Read the CSV file into a pandas DataFrame
3. Assign meaningful column names

⚠️ Note: Update the file path below if your dataset is stored in a different location.

In [None]:
address = '/workspaces/python-for-data-science-and-machine-learning-essential-training-part-1-3006708/data/mtcars.csv'

cars = pd.read_csv(address)

cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']

cars.head()

## Understanding the Data

Each row represents a car model, and each column represents a specific attribute.

Some important columns:
- `mpg`: Miles per gallon
- `cyl`: Number of cylinders
- `hp`: Horsepower
- `wt`: Weight of the car

We will focus on the **`cyl` (cylinders)** column for grouping.

## Grouping Data by a Column

To group data in pandas, we use the `groupby()` method.

Here, we group the entire DataFrame by the **number of cylinders**. The `cyl` column contains only three values:
- 4 cylinders
- 6 cylinders
- 8 cylinders

After grouping, we will compute the **mean** of all numeric columns for each group.

In [None]:
cars_groups = cars.groupby(cars['cyl'])

cars_groups.mean(numeric_only=True)

## Interpreting the Results

The output shows one row per cylinder group (4, 6, and 8 cylinders).

For each group, pandas computes the **mean** of all numeric columns.

### Key Insight from the Transcript
- **4-cylinder cars** have the highest average miles per gallon (~26.6 mpg)
- **6-cylinder cars** have a moderate average mpg (~19.7 mpg)
- **8-cylinder cars** have the lowest average mpg (~15.1 mpg)

This demonstrates a clear trend:

> As the number of cylinders increases, the average fuel efficiency decreases.

This is exactly the kind of insight grouping and aggregation are designed to reveal.

## Summary

In this notebook, you learned how to:
- Conceptually understand data grouping
- Load and prepare a dataset using pandas
- Group data using `groupby()`
- Apply aggregation functions like `mean()`
- Interpret grouped statistics

Grouping and aggregation are powerful tools for exploratory data analysis and form the foundation for more advanced analytics workflows.