# Grouping and Sorting

In this tutorial, we will learn how to group data using the `groupby()` function and how to sort data using `sort_values()` and `sort_index()`. These operations are essential for analyzing and organizing data efficiently.

We’ll also explain the syntax used in the code step by step, including when to use parentheses `()` and brackets `[]`.

In [None]:
import pandas as pd
pd.options.display.max_rows = 10
reviews = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)
print("Setup complete.")

## Grouping Data

The `groupby()` function allows us to group data based on one or more columns. Once the data is grouped, we can apply aggregation functions (like `count()`, `min()`, or `max()`) to summarize the data for each group.

### Syntax Explanation:
- **`reviews.groupby('points')`**: This groups the data based on the `points` column. Each unique value in the `points` column becomes a group.
- **`['points']`**: After grouping, we select the `points` column to apply the aggregation function.
- **`.count()`**: This counts how many rows belong to each group.

Let’s count the number of reviews for each point value:

In [None]:
reviews.groupby('points')['points'].count()

Here’s what happens step by step:
1. **`reviews.groupby('points')`**: Groups the data by the `points` column. For example, all rows where `points` is `90` are grouped together.
2. **`['points']`**: Selects the `points` column from the grouped data.
3. **`.count()`**: Counts how many rows are in each group.

The result is a Series where the index is the unique values in `points`, and the values are the counts for each group.

### Finding Minimum Values

We can calculate summary statistics for each group. For example, let’s find the minimum price for each point value.

### Syntax Explanation:
- **`reviews.groupby('points')`**: Groups the data by the `points` column.
- **`['price']`**: Selects the `price` column from the grouped data.
- **`.min()`**: Finds the minimum value in the `price` column for each group.

Here’s the code:

In [None]:
reviews.groupby('points')['price'].min()

Step-by-step explanation:
1. **Grouping**: The data is grouped by `points`. Each unique value in `points` becomes a group.
2. **Selecting**: The `price` column is selected from the grouped data.
3. **Aggregation**: The `min()` function calculates the minimum price for each group.

The result is a Series where the index is the unique values in `points`, and the values are the minimum prices for each group.

### Grouping by Multiple Columns

We can group by multiple columns to perform more complex analyses. For example, let’s count the number of reviews for each combination of `country` and `province`.

### Syntax Explanation:
- **`reviews.groupby(['country', 'province'])`**: Groups the data by both `country` and `province`. Each unique combination of `country` and `province` becomes a group.
- **`['description']`**: Selects the `description` column from the grouped data.
- **`.count()`**: Counts how many rows are in each group.

Here’s the code:

In [None]:
reviews.groupby(['country', 'province'])['description'].count()

Explanation:
1. **Grouping**: The data is grouped by both `country` and `province`. For example, all rows where `country` is `Italy` and `province` is `Tuscany` are grouped together.
2. **Selecting**: The `description` column is selected from the grouped data.
3. **Aggregation**: The `count()` function calculates the number of rows for each group.

The result is a Series where the index is a combination of `country` and `province`, and the values are the counts for each group.

### Aggregating Multiple Functions

The `agg()` function allows us to apply multiple aggregation functions at the same time. For example, we can calculate the number of reviews (`len`), the minimum price (`min`), and the maximum price (`max`) for each country.

### Syntax Explanation:
- **`reviews.groupby('country')['price']`**: Groups the data by `country` and selects the `price` column.
- **`.agg([len, min, max])`**: Applies three functions (`len`, `min`, and `max`) to the `price` column for each group.

Here’s the code:

In [None]:
reviews.groupby('country')['price'].agg([len, min, max])

Explanation:
1. **Grouping**: The data is grouped by `country`. Each unique value in `country` becomes a group.
2. **Selecting**: The `price` column is selected from the grouped data.
3. **Aggregation**: The `agg()` function applies multiple aggregation functions (`len`, `min`, and `max`) to the `price` column for each group.

The result is a DataFrame where the index is `country`, and the columns are the results of the aggregation functions.

## Sorting Data

The `sort_values()` function allows us to sort data based on a specific column. For example, sorting countries by the number of reviews:

### Syntax Explanation:
- **`countries_reviewed.sort_values()`**: Sorts the data in ascending order based on the column values.
- **`ascending=False`**: Sorts the data in descending order.

Here’s the code:

In [None]:
countries_reviewed = reviews.groupby('country')['description'].count()
countries_reviewed.sort_values(ascending=False)