In this class, you'll learn all about grouping, and sorting in pandas.<br>
In this lecture, we include:
1. Groupwise analysis
2. Multi-Indexes
3. Sorting
_____

For this lecture too, we will be using the **winemag-data-130k-v2.csv**. To use it, let's first import it:

In [34]:
import pandas as pd
reviews = pd.read_csv("Datasets/winemag-data-130k-v2.csv", index_col=0)

____

# Groupwise analysis

One function we've been using heavily thus far is the value_counts() function.<br>
We can replicate what `value_counts()` does by using `.groupby()`<br>
<br>

#### Assignment 0

Before learning about groupby() function, let's recollect what we did in the previous lecture.<br>

Find the number of occurences for wines with different points

In [None]:
#Write your code below:

Solution:

In [None]:
reviews.points.value_counts()

<br>

Now, to do the same with `groupby()`:

In [None]:
reviews.groupby('points').points.count()

groupby() created a group of reviews which allotted the same point values to the given wines. Then, for each of these groups, we grabbed the points() column and counted how many times it appeared.

<br>
We can use any of the summary functions we've used before with this data. For example, to get the cheapest wine in each point value category, we can do the following:

In [None]:
reviews.groupby('points').price.min()

<br>

You can think of each group we generate as being a slice of our DataFrame containing only data with values that match. This DataFrame is accessible to us directly using the `apply()` method, and we can then manipulate the data in any way we see fit. For example, here's one way of selecting the name of the first wine reviewed from each winery in the dataset:

In [None]:
reviews.groupby('winery').apply(lambda pd : pd.title.iloc[0])

<br>
For even more fine-grained control, you can also group by more than one column. For an example, here's how we would pick out the best wine by country and province:

In [None]:
reviews.groupby(['country', 'province']).apply(lambda df : df.loc[df.points.idxmax()])

<br>

Another groupby() method worth mentioning is `agg()`, which lets you run a bunch of different functions on your DataFrame simultaneously. For example, we can generate a simple statistical summary of the dataset as follows:

In [None]:
reviews.groupby('country').price.agg([len, min, max])

<br>

**Effective use of groupby() will allow you to do lots of really powerful things with your dataset**
____

# Multi-Indexes

`groupby()` is slightly different in the fact that, depending on the operation we run, it will sometimes result in what is called a multi-index.<br>
A multi-index differs from a regular index in that it has multiple levels. For example:

In [None]:
countries_reviewed = reviews.groupby(['country', 'province']).description.agg([len])
countries_reviewed

In [None]:
multi_indexes = countries_reviewed.index
type(multi_indexes) #This tells us about the type -> Observe that it says MultiIndex

<br>

The multi-index method you will use most often is the one for converting back to a regular index, the `reset_index()` method:

In [None]:
countries_reviewed.reset_index()

____

# Sorting

To get data in the order want it in we can sort it ourselves. The `sort_values()` method is handy for this.

In [None]:
countries_reviewed.sort_values(by ='len')

The function `sort_values()` sorts by default in ascending order, meaning the smallest values appear first. Nevertheless, in most cases, a descending order is preferred, where the largest values take the lead. It works like this:

In [None]:
countries_reviewed.sort_values(by ='len', ascending = False)

<br>
We can also sort by more than one column at a time:

In [None]:
countries_reviewed.sort_values(by = ['country','len'])

<br>

To sort by index values, use the companion method `sort_index()`. This method has the same arguments and default order:

In [None]:
countries_reviewed.sort_index()