## Lecture Notes - Groups and Pivot Tables ##

**Helpful Resource:**
- [Python Reference](http://data8.org/sp22/python-reference.html): Cheat sheet of helpful array & table methods used in Data 8!

**Recommended Readings:**
- [Classifying by One Variable](https://inferentialthinking.com/chapters/08/2/Classifying_by_One_Variable.html)
- [Cross-Classifying by More than One Variable](https://inferentialthinking.com/chapters/08/3/Cross-Classifying_by_More_than_One_Variable.html)

In [59]:
# import modules to be used in this notebook

from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action='ignore',category=np.VisibleDeprecationWarning)

In [60]:
# Create a table from scratch with arrays and columns 
#   using Table().with_columns()

names = make_array("apple", "banana", "cranberry", "apple", "cranberry", "apple")
prices = make_array(1.2, 0.29, 3.2, 0.90, 4.2, 1.25)

fruits = Table().with_columns(
        "Name", names,
        "Price", prices
)

fruits

Name,Price
apple,1.2
banana,0.29
cranberry,3.2
apple,0.9
cranberry,4.2
apple,1.25


## Groups - `tbl.group()` ##

### Classifying by One Variable ###

Count the number in each category in a table

In [61]:
# return a table view with a count of row of each category

fruits.group("Name")

Name,count
apple,3
banana,1
cranberry,2


In [62]:
# use the optional 2nd argument to average the price of each category

fruits.group("Name", np.average)

Name,Price average
apple,1.11667
banana,0.29
cranberry,3.7


In [63]:
# we learned how to define and use function in Python,
#   so we can define our own function and use it in group
# let's reduce the price by 50%

# return an array of reduced prices of each category

def halfPrice(price):
    """ reduce the price by 50% """
    return price * 0.5

fruits.group("Name", halfPrice)

Name,Price halfPrice
apple,[ 0.6 0.45 0.625]
banana,[ 0.145]
cranberry,[ 1.6 2.1]


In [64]:
# when use tbl.apply() a function to table column,
#.  it returns an array of reduced prices

fruits.apply(halfPrice, "Price")

array([ 0.6  ,  0.145,  1.6  ,  0.45 ,  2.1  ,  0.625])

In [65]:
# create an array of location corresponding to each row
locations = make_array("Santa Rosa", "Petaluma", "Redwood City", "Santa Rosa", "Redwood City", "Palo Alto")

# append the location array to table fruits and assign the table view to fruits_loc
fruits_loc = fruits.with_columns("Location", locations)
fruits_loc

Name,Price,Location
apple,1.2,Santa Rosa
banana,0.29,Petaluma
cranberry,3.2,Redwood City
apple,0.9,Santa Rosa
cranberry,4.2,Redwood City
apple,1.25,Palo Alto


In [66]:
# group the table by "Name" column and average the price of each category
fruits_loc.group("Name", np.average)

Name,Price average,Location average
apple,1.11667,
banana,0.29,
cranberry,3.7,


In [67]:
fruits_loc.group("Name", halfPrice)

Name,Price halfPrice,Location halfPrice
apple,[ 0.6 0.45 0.625],
banana,[ 0.145],
cranberry,[ 1.6 2.1],


##### When `tbl.group()` calls a function in the 2nd argument, the function will apply to all columns of the table.  If the function cannot operate on the column, it may display nothing or something weird.

Use `tbl.select()` to select the columns to be displayed

In [68]:
fruits_loc.group("Name", np.average).select("Name", 1)

Name,Price average
apple,1.11667
banana,0.29
cranberry,3.7


In [69]:
fruits_loc.group("Name", np.sum).select("Name", 1)

Name,Price sum
apple,3.35
banana,0.29
cranberry,7.4


## Groups - `tbl.group()` & Pivots - `tbl.pivot()` ##

### Classifying by Two Variables ###

In [70]:
fruits_loc.group("Name")

Name,count
apple,3
banana,1
cranberry,2


In [71]:
fruits_loc.group("Location")

Location,count
Palo Alto,1
Petaluma,1
Redwood City,2
Santa Rosa,2


In [72]:
fruits_loc.group(["Name", "Location"])

Name,Location,count
apple,Palo Alto,1
apple,Santa Rosa,2
banana,Petaluma,1
cranberry,Redwood City,2


In [73]:
fruits_loc.group(["Name", "Location"], np.average)

Name,Location,Price average
apple,Palo Alto,1.25
apple,Santa Rosa,1.05
banana,Petaluma,0.29
cranberry,Redwood City,3.7


In [74]:
fruits_loc

Name,Price,Location
apple,1.2,Santa Rosa
banana,0.29,Petaluma
cranberry,3.2,Redwood City
apple,0.9,Santa Rosa
cranberry,4.2,Redwood City
apple,1.25,Palo Alto


In [75]:
fruits_loc.pivot("Location", "Name")

Name,Palo Alto,Petaluma,Redwood City,Santa Rosa
apple,1,0,0,2
banana,0,1,0,0
cranberry,0,0,2,0


In [76]:
fruits_loc.pivot("Location", "Name", values="Price", collect=np.average)

Name,Palo Alto,Petaluma,Redwood City,Santa Rosa
apple,1.25,0.0,0.0,1.05
banana,0.0,0.29,0.0,0.0
cranberry,0.0,0.0,3.7,0.0


In [79]:
fruits_loc.pivot("Name", "Location", "Price", np.average)

Location,apple,banana,cranberry
Palo Alto,1.25,0.0,0.0
Petaluma,0.0,0.29,0.0
Redwood City,0.0,0.0,3.7
Santa Rosa,1.05,0.0,0.0
