# 4. Grouping and Sorting
---
### 4.1. Grouping
---

In [1]:
import pandas as pd
battles_got = pd.read_csv('datasets/battles.csv')
pd.set_option('max_rows', 5)
battles_got

Unnamed: 0,name,year,battle_number,attacker_king,defender_king,attacker_1,attacker_2,attacker_3,attacker_4,defender_1,...,major_death,major_capture,attacker_size,defender_size,attacker_commander,defender_commander,summer,location,region,note
0,Battle of the Golden Tooth,298,1,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,...,1.0,0.0,15000.0,4000.0,Jaime Lannister,"Clement Piper, Vance",1.0,Golden Tooth,The Westerlands,
1,Battle at the Mummer's Ford,298,2,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Baratheon,...,1.0,0.0,,120.0,Gregor Clegane,Beric Dondarrion,1.0,Mummer's Ford,The Riverlands,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36,Siege of Raventree,300,37,Joffrey/Tommen Baratheon,Robb Stark,Bracken,Lannister,,,Blackwood,...,0.0,1.0,1500.0,,"Jonos Bracken, Jaime Lannister",Tytos Blackwood,0.0,Raventree,The Riverlands,
37,Siege of Winterfell,300,38,Stannis Baratheon,Joffrey/Tommen Baratheon,Baratheon,Karstark,Mormont,Glover,Bolton,...,,,5000.0,8000.0,Stannis Baratheon,Roose Bolton,0.0,Winterfell,The North,


Sometimes we want to group our data to do something specific. To do this, we can use the `groupby` operation.

For example, we can replicate what `value_counts` does using `groupby` by doing the following:

In [2]:
battles_got.groupby('attacker_king').attacker_king.count()

attacker_king
Balon/Euron Greyjoy          7
Joffrey/Tommen Baratheon    14
Robb Stark                  10
Stannis Baratheon            5
Name: attacker_king, dtype: int64

In this case, we created a group and counted how many times each value appears.

`value_counts` is just a shortcut to this `groupby` operation. We can also use any of the summary functions with groups.

In [3]:
battles_got.groupby('attacker_king').attacker_size.min()

attacker_king
Balon/Euron Greyjoy           20.0
Joffrey/Tommen Baratheon     618.0
Robb Stark                   100.0
Stannis Baratheon           4500.0
Name: attacker_size, dtype: float64

We also can use the `apply` method to manipulate the data in any way we see fit. For example, here's one way of selecting the first battle in which kings attacked.

In [4]:
battles_got.groupby('attacker_king').apply(lambda df: df.battle_number.iloc[0])

attacker_king
Balon/Euron Greyjoy          8
Joffrey/Tommen Baratheon     1
Robb Stark                   4
Stannis Baratheon           16
dtype: int64

You can also group by more than one column. For example, here's how we would know which was the first battle in which the attacker or defender kings participated.

In [5]:
battles_got.groupby(['attacker_king', 'defender_king']).apply(lambda df: df.loc[df.battle_number.idxmax()])

Unnamed: 0_level_0,Unnamed: 1_level_0,name,year,battle_number,attacker_king,defender_king,attacker_1,attacker_2,attacker_3,attacker_4,defender_1,...,major_death,major_capture,attacker_size,defender_size,attacker_commander,defender_commander,summer,location,region,note
attacker_king,defender_king,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Balon/Euron Greyjoy,Balon/Euron Greyjoy,Sack of Torrhen's Square,299,13,Balon/Euron Greyjoy,Balon/Euron Greyjoy,Greyjoy,,,,Stark,...,0.0,1.0,,,Dagmer Cleftjaw,,1.0,Torrhen's Square,The North,
Balon/Euron Greyjoy,Joffrey/Tommen Baratheon,"Invasion of Ryamsport, Vinetown, and Starfish ...",300,33,Balon/Euron Greyjoy,Joffrey/Tommen Baratheon,Greyjoy,,,,Tyrell,...,0.0,0.0,,,"Euron Greyjoy, Victarion Greyjoy",,0.0,"Ryamsport, Vinetown, Starfish Harbor",The Reach,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Stannis Baratheon,Mance Rayder,Battle of Castle Black,300,28,Stannis Baratheon,Mance Rayder,Free folk,Thenns,Giants,,Night's Watch,...,1.0,1.0,100000.0,1240.0,"Mance Rayder, Tormund Giantsbane, Harma Dogshe...","Stannis Baratheon, Jon Snow, Donal Noye, Cotte...",0.0,Castle Black,Beyond the Wall,
Stannis Baratheon,Renly Baratheon,Siege of Storm's End,299,16,Stannis Baratheon,Renly Baratheon,Baratheon,,,,Baratheon,...,1.0,0.0,5000.0,20000.0,"Stannis Baratheon, Davos Seaworth","Renly Baratheon, Cortnay Penrose, Loras Tyrell...",1.0,Storm's End,The Stormlands,


Another `groupby` method is `agg`, that lets you run different functions pointing to your `DataFrame` simultaneously. For example, we can generate a summary table containing statistical information of the dataset as follows:

In [6]:
battles_got.groupby(['attacker_king']).attacker_size.agg([len, min, max])

Unnamed: 0_level_0,len,min,max
attacker_king,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Balon/Euron Greyjoy,7.0,20.0,1000.0
Joffrey/Tommen Baratheon,14.0,618.0,20000.0
Robb Stark,10.0,100.0,18000.0
Stannis Baratheon,5.0,4500.0,100000.0


A good use of `groupby` method will allow you to manipulate your dataset making really useful things.

### 4.2. Multi-indexes
---
In all the examples shown before we've been working with `DataFrame` or `Series` objects with a single-label index. With the `groupby` method, depending on the operation we run, the result will sometimes be a multi-index.

A multi-index is an index that has multiple levels. For example:

In [7]:
first_battles = battles_got.groupby(['attacker_king', 'defender_king']).name.agg([len])
first_battles

Unnamed: 0_level_0,Unnamed: 1_level_0,len
attacker_king,defender_king,Unnamed: 2_level_1
Balon/Euron Greyjoy,Balon/Euron Greyjoy,1
Balon/Euron Greyjoy,Joffrey/Tommen Baratheon,2
...,...,...
Stannis Baratheon,Mance Rayder,1
Stannis Baratheon,Renly Baratheon,1


In [8]:
mi = _.index
type(mi)

pandas.core.indexes.multi.MultiIndex

Multi-indexes have several methods for dealing with their structure whith are absent for single-level indexes. They require two levels of labels to retrieve a value.

You can follow the detailed instructions of the `MultiIndex` method in the [MultiIndex / Advanced Indexing](https://pandas.pydata.org/pandas-docs/stable/advanced.html) section of the `pandas` documentation.

However, there's a `reset_index` method, often used for converting back to a regular index.

In [9]:
first_battles.reset_index()

Unnamed: 0,attacker_king,defender_king,len
0,Balon/Euron Greyjoy,Balon/Euron Greyjoy,1
1,Balon/Euron Greyjoy,Joffrey/Tommen Baratheon,2
...,...,...,...
10,Stannis Baratheon,Mance Rayder,1
11,Stannis Baratheon,Renly Baratheon,1


### 4.3. Sorting
---
The `groupby` method returns the data in index order. We can also order the data depending on values using the `sort_values` method.

In [10]:
first_battles = first_battles.reset_index()
first_battles.sort_values(by='len')

Unnamed: 0,attacker_king,defender_king,len
0,Balon/Euron Greyjoy,Balon/Euron Greyjoy,1
3,Joffrey/Tommen Baratheon,Balon/Euron Greyjoy,1
...,...,...,...
7,Robb Stark,Joffrey/Tommen Baratheon,9
4,Joffrey/Tommen Baratheon,Robb Stark,10


The `sort_values` method orders by default the data in an ascending sort, but we can also order the data in a descending sort by indicating with a parameter.

In [11]:
first_battles.sort_values(by='len', ascending=False)

Unnamed: 0,attacker_king,defender_king,len
4,Joffrey/Tommen Baratheon,Robb Stark,10
7,Robb Stark,Joffrey/Tommen Baratheon,9
...,...,...,...
10,Stannis Baratheon,Mance Rayder,1
11,Stannis Baratheon,Renly Baratheon,1


To sort by index values you just have to use the `sort_index` method. This method has a default order.

In [12]:
first_battles.sort_index()

Unnamed: 0,attacker_king,defender_king,len
0,Balon/Euron Greyjoy,Balon/Euron Greyjoy,1
1,Balon/Euron Greyjoy,Joffrey/Tommen Baratheon,2
...,...,...,...
10,Stannis Baratheon,Mance Rayder,1
11,Stannis Baratheon,Renly Baratheon,1


You can also order the data by more than one column at a time.

In [13]:
first_battles.sort_values(by=['defender_king', 'len'])

Unnamed: 0,attacker_king,defender_king,len
0,Balon/Euron Greyjoy,Balon/Euron Greyjoy,1
3,Joffrey/Tommen Baratheon,Balon/Euron Greyjoy,1
...,...,...,...
4,Joffrey/Tommen Baratheon,Robb Stark,10
5,Joffrey/Tommen Baratheon,Stannis Baratheon,2
