# Optimize Custom Grouping Function
In this challenge, your goal is to find the fastest solution to the problem while only using the Pandas library.

### The Challenge
The `college_pop` dataset contains the name, state and population of all higher-ed institutions in the US and its territories. For each state, find the percentage of the total state population made up by the 5 largest colleges of that state.

In [1]:
import pandas as pd
college = pd.read_csv('../data/college_pop.csv')
college.head()

Unnamed: 0,name,state,pop
0,Alabama A & M University,AL,4206.0
1,University of Alabama at Birmingham,AL,11383.0
2,Amridge University,AL,291.0
3,University of Alabama in Huntsville,AL,5451.0
4,Alabama State University,AL,4811.0


In [2]:
total = (college
         .groupby('state')
         .agg(pop=('pop', 'sum')))
total.head()

Unnamed: 0_level_0,pop
state,Unnamed: 1_level_1
AK,24932.0
AL,248298.0
AR,134820.0
AS,1276.0
AZ,520439.0


In [3]:
largest = (college
           .groupby('state')['pop'].nlargest(5)
           .reset_index(drop=False)
           .groupby('state')
           .agg(pop=('pop', 'sum')))
largest.head()

Unnamed: 0_level_0,pop
state,Unnamed: 1_level_1
AK,23974.0
AL,92059.0
AR,56985.0
AS,1276.0
AZ,287015.0


In [4]:
assert total.shape[0] == largest.shape[0]

In [5]:
result = largest.div(total).mul(100).round(2)
result.head()

Unnamed: 0_level_0,pop
state,Unnamed: 1_level_1
AK,96.16
AL,37.08
AR,42.27
AS,100.0
AZ,55.15


# Become a pandas expert

If you are looking to completely master the pandas library and become a trusted expert for doing data science work, check out my book [Master Data Analysis with Python][1]. It comes with over 300 exercises with detailed solutions covering the pandas library in-depth.

[1]: https://www.dunderdata.com/master-data-analysis-with-python