# Series: Grouping Methods, `.groupby()`

Author: Brad E. Sheese
***


## 06.1.3.0 Introduction
When we are working with data, the data will often contain values that we will want to treat as a group. For example, if we had data on sales every day over course of a year, we may want to know what the average for sales by month. To do this, we would:
* take the daily data and split it into groups by month
* calculate the average for each group seperately

If we were interested in the highest sales in each month, we would do the same split but the calculate the maximum for each month instead of the average.  




## 06.1.3.1 Using `.groupby()`
Pandas has a few different tools that allow us to efficiently work with data that can be split into groups. In this section, we're specifcally going to introduce the series `.groupby()` method so you can start to get a sense for how the process works. We will return to this method again, once we have introduced dataframes. 

Here's a series that contains information about championships won by NBA teams.

In [None]:
import pandas as pd

winner_list = ['Los Angeles Lakers', 'Toronto Raptors', 'Golden State Warriors',
               'Golden State Warriors', 'Cleveland Cavaliers', 'Golden State Warriors',
               'San Antonio Spurs', 'Miami Heat', 'Miami Heat', 'Dallas Mavericks',
               'Los Angeles Lakers', 'Los Angeles Lakers', 'Boston Celtics',
               'San Antonio Spurs', 'Miami Heat', 'San Antonio Spurs', 'Detroit Pistons',
               'San Antonio Spurs', 'Los Angeles Lakers', 'Los Angeles Lakers',
               'Los Angeles Lakers', 'San Antonio Spurs', 'Chicago Bulls', 'Chicago Bulls',
               'Chicago Bulls', 'Houston Rockets', 'Houston Rockets', 'Chicago Bulls',
               'Chicago Bulls', 'Chicago Bulls', 'Detroit Pistons', 'Detroit Pistons',
               'Los Angeles Lakers', 'Los Angeles Lakers', 'Boston Celtics',
               'Los Angeles Lakers', 'Boston Celtics', 'Philadelphia 76ers',
               'Los Angeles Lakers', 'Boston Celtics', 'Los Angeles Lakers',
               'Seattle Supersonics', 'Washington Bullets', 'Portland Trail Blazers',
               'Boston Celtics', 'Golden State Warriors', 'Boston Celtics', 'New York Knicks',
               'Los Angeles Lakers', 'Milwaukee Bucks', 'New York Knicks', 'Boston Celtics',
               'Boston Celtics', 'Philadelphia 76ers', 'Boston Celtics', 'Boston Celtics',
               'Boston Celtics', 'Boston Celtics', 'Boston Celtics', 'Boston Celtics',
               'Boston Celtics', 'Boston Celtics', 'St. Louis Hawks', 'Boston Celtics',
               'Philadelphia Warriors', 'Syracuse Nationals', 'Minneapolis Lakers',
               'Minneapolis Lakers', 'Minneapolis Lakers', 'Rochester Royals',
               'Minneapolis Lakers', 'Minneapolis Lakers', 'Baltimore Bullets',
               'Philadelphia Warriors']

# construct series from list
nba_series = pd.Series(index = winner_list, data = range(2020,1946,-1))

# check result
nba_series.head()

When we call `.groupby()` on a series, we must indicate how we want the grouping to be done. With our NBA championship data, we will group by the index which contains the team names. Pandas will inspect the index, and group together any rows that share the same index value. 

In [None]:
nba_grouped = nba_series.groupby(by=nba_series.index)
nba_grouped

The code above creates a groupby object that we have assigned to the variable name `nba_grouped`. This object knows how to make the groups, but it actually doesn't do anything until we provide an additional method to indicate what we'd like done. If we group by team, what do we want to know next? The mean value for the team? The maximum value? Always think of it as a two step process:
* step 1: specify how to split the data into groups to make the groupby object
* step 2: specify what you want done with all of the values that are associated with each group 

For example, let's say we just wanted to know how many championship each team as won. We split the data into group by team name, then we can use the method `.count()` on the groupby object. This will give us the count of how many times each group has won.

In [None]:
nba_grouped.count()

If we just wanted to know who had won the most, we can sort the result.

In [None]:
nba_grouped.count().sort_values(ascending = False).head()

Remember it's a two step process. First, you need to create the groupby object, then you need to call a method on the grouped object. 

## 06.1.3.2 Getting Groups within a Groupby Object
The method `.get_group()` can be used to get information on a single group from the groupby object.

In [None]:
nba_grouped.get_group('Chicago Bulls')

In [None]:
nba_grouped.get_group('Minneapolis Lakers')

## 06.1.3.3 `.groupby()` Examples

What year was each team's first championship win?

In [None]:
# groupby object + .min()
nba_grouped.min()

What year was each team's most recent championship?

In [None]:
# groupby object + .max()
nba_grouped.max()

Which team had the longest period between their first and last championship?

In [None]:
(nba_grouped.max() - nba_grouped.min()).sort_values(ascending = False).head(10)

`.value_counts()` can also be called on a groupby object. However, in this case, there's only one winner per year, so the result isn't too informative.

In [None]:
nba_grouped.value_counts().head(30)

It looks like some teams may have moved from one city to another. Let's get a sense of wins if we ignore the city.

In [None]:
# split the index into a series of lists, grab the last element from each
team_nocity = nba_series.index.str.split(' ').str[-1]

# create a series from the modified index
team_nocity_series = pd.Series(index = team_nocity, data = range(2020,1946,-1))

# group the teams by name
team_nocity_grouped = team_nocity_series.groupby(by=team_nocity_series.index)

# get the counts for the grouped object
team_nocity_grouped.count().sort_values(ascending=False).head(10)

Our data isn't up to date. It stops at 2020. Up until then, the Lakers and Celtics were tied for the number of championships, as were the Warriors and the Bulls. 