## **Pandas Library - Continuation**

* **Groupby**: is used to split data into groups based on the values of one or more columns. After grouping, you can apply functions such as **mean, sum, count, max, min, etc.**

In [56]:
import numpy as np
import pandas as pd

In [57]:
datas = {'Groups':['G1','G1','G2','G2','G3','G3'],'Authors':['DanBrawn','GRRMartin','SarahJMass','JayKristoff','ArthurCDoyle','JulioVerne'],'Books':[30,100,35,20,500,75]}

In [58]:
df = pd.DataFrame(datas)
df

Unnamed: 0,Groups,Authors,Books
0,G1,DanBrawn,30
1,G1,GRRMartin,100
2,G2,SarahJMass,35
3,G2,JayKristoff,20
4,G3,ArthurCDoyle,500
5,G3,JulioVerne,75


In [59]:
# You need to specify the column with numbers, or else it will give an error.
gr = df.groupby('Groups')['Books']

In [60]:
print(gr.mean())

Groups
G1     65.0
G2     27.5
G3    287.5
Name: Books, dtype: float64


In [61]:
gr.sum()

Groups
G1    130
G2     55
G3    575
Name: Books, dtype: int64

In [62]:
# The standard deviation measures how spread out the values are from the mean.
gr.std()

Groups
G1     49.497475
G2     10.606602
G3    300.520382
Name: Books, dtype: float64

In [63]:
gr.sum().loc['G3']

575

In [71]:
df.groupby('Groups').sum().loc['G1']

Authors    DanBrawnGRRMartin
Books                    130
Name: G1, dtype: object

In [72]:
df.groupby('Groups').count()

Unnamed: 0_level_0,Authors,Books
Groups,Unnamed: 1_level_1,Unnamed: 2_level_1
G1,2,2
G2,2,2
G3,2,2


In [None]:
df.groupby('Groups').describe() # You can use .transpose() for change 

Unnamed: 0_level_0,Books,Books,Books,Books,Books,Books,Books,Books
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
Groups,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
G1,2.0,65.0,49.497475,30.0,47.5,65.0,82.5,100.0
G2,2.0,27.5,10.606602,20.0,23.75,27.5,31.25,35.0
G3,2.0,287.5,300.520382,75.0,181.25,287.5,393.75,500.0


In [78]:
df.groupby('Groups').describe().transpose()['G3']

Books  count      2.000000
       mean     287.500000
       std      300.520382
       min       75.000000
       25%      181.250000
       50%      287.500000
       75%      393.750000
       max      500.000000
Name: G3, dtype: float64