# Pandas `GroupBy` object 
Useful when you want to explore data by categories similar to a SQL `GROUP BY`.

Documentation: https://pandas.pydata.org/docs/reference/groupby.html?highlight=groupby

### Import pandas

In [2]:
import pandas as pd

### Read CSV
In this set of videos, we'll use the a dataset that contains S&P 500 Companies with Financial Information. [Source: Kaggle](https://www.kaggle.com/paytonfisher/sp-500-companies-with-financial-information)

In [3]:
sp500 = pd.read_csv("financials.csv")
sp500.head()

Unnamed: 0,Symbol,Name,Sector,Price,Price/Earnings,Dividend Yield,Earnings/Share,52 Week Low,52 Week High,Market Cap,EBITDA,Price/Sales,Price/Book,SEC Filings
0,MMM,3M Company,Industrials,222.89,24.31,2.332862,7.92,259.77,175.49,138721100000.0,9048000000.0,4.390271,11.34,http://www.sec.gov/cgi-bin/browse-edgar?action...
1,AOS,A.O. Smith Corp,Industrials,60.24,27.76,1.147959,1.7,68.39,48.925,10783420000.0,601000000.0,3.575483,6.35,http://www.sec.gov/cgi-bin/browse-edgar?action...
2,ABT,Abbott Laboratories,Health Care,56.27,22.51,1.908982,0.26,64.6,42.28,102121000000.0,5744000000.0,3.74048,3.19,http://www.sec.gov/cgi-bin/browse-edgar?action...
3,ABBV,AbbVie Inc.,Health Care,108.48,19.41,2.49956,3.29,125.86,60.05,181386300000.0,10310000000.0,6.291571,26.14,http://www.sec.gov/cgi-bin/browse-edgar?action...
4,ACN,Accenture plc,Information Technology,150.51,25.47,1.71447,5.44,162.6,114.82,98765860000.0,5643228000.0,2.604117,10.62,http://www.sec.gov/cgi-bin/browse-edgar?action...


In [6]:
# group all rows by sector
sectors = sp500.groupby("Sector")

In [7]:
# a DataFrame
type(sp500)

pandas.core.frame.DataFrame

In [8]:
# a grouped DataFrame
type(sectors)

pandas.core.groupby.generic.DataFrameGroupBy

### Return individual columns

In [12]:
# return the sum of the price column by sectors 
sectors["Price"].sum()

Sector
Consumer Discretionary        10418.90
Consumer Staples               2711.98
Energy                         1852.40
Financials                     6055.81
Health Care                    8083.46
Industrials                    7831.47
Information Technology         8347.00
Materials                      2559.67
Real Estate                    2927.52
Telecommunication Services      100.81
Utilities                      1545.45
Name: Price, dtype: float64

In [13]:
# retun the total Dividend Yield by sector
sectors["Dividend Yield"].sum()

Sector
Consumer Discretionary        132.082638
Consumer Staples               82.735293
Energy                         64.462468
Financials                    137.172367
Health Care                    55.951842
Industrials                    99.119260
Information Technology         85.994554
Materials                      43.449399
Real Estate                   128.527017
Telecommunication Services     22.703391
Utilities                     105.258282
Name: Dividend Yield, dtype: float64

In [14]:
# return the best performing company max Earnings/Share by sector
sectors["Earnings/Share"].max()

Sector
Consumer Discretionary        44.09
Consumer Staples               9.27
Energy                         9.93
Financials                    30.30
Health Care                   38.35
Industrials                   18.73
Information Technology        22.27
Materials                     18.61
Real Estate                    6.81
Telecommunication Services     7.36
Utilities                     11.39
Name: Earnings/Share, dtype: float64