## Lecture Notes - Groups and Pivot Tables ##

**Helpful Resource:**
- [Python Reference](http://data8.org/sp22/python-reference.html): Cheat sheet of helpful array & table methods used in Data 8!

**Recommended Readings:**
- [Classifying by One Variable](https://inferentialthinking.com/chapters/08/2/Classifying_by_One_Variable.html)
- [Cross-Classifying by More than One Variable](https://inferentialthinking.com/chapters/08/3/Cross-Classifying_by_More_than_One_Variable.html)

In [None]:
# import modules to be used in this notebook

from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action='ignore',category=np.VisibleDeprecationWarning)

In [None]:
# Create a table from scratch with arrays and columns 
#   using Table().with_columns()

names = make_array("apple", "banana", "cranberry", "apple", "cranberry", "apple")
prices = make_array(1.2, 0.29, 3.2, 0.90, 4.2, 1.25)

fruits = Table().with_columns(
        "Name", names,
        "Price", prices
)

fruits

## Groups - `tbl.group()` ##

### Classifying by One Variable ###

Count the number in each category in a table

In [None]:
# return a table view with a count of row of each category

fruits.group("Name")

In [None]:
# use the optional 2nd argument to average the price of each category

fruits.group("Name", np.average)

In [None]:
# we learned how to define and use function in Python,
#   so we can define our own function and use it in group
# let's reduce the price by 50%

# return an array of reduced prices of each category

def halfPrice(price):
    """ reduce the price by 50% """
    return price * 0.5

fruits.group("Name", halfPrice)

In [None]:
# when use tbl.apply() a function to table column,
#.  it returns an array of reduced prices

fruits.apply(halfPrice, "Price")

In [None]:
# create an array of location corresponding to each row
locations = make_array("Santa Rosa", "Petaluma", "Redwood City", "Santa Rosa", "Redwood City", "Palo Alto")

# append the location array to table fruits and assign the table view to fruits_loc
fruits_loc = fruits.with_columns("Location", locations)
fruits_loc

In [None]:
# group the table by "Name" column and average the price of each category
fruits_loc.group("Name", np.average)

In [None]:
fruits_loc.group("Name", halfPrice)

##### When `tbl.group()` calls a function in the 2nd argument, the function will apply to all columns of the table.  If the function cannot operate on the column, it may display nothing or something weird.

Use `tbl.select()` to select the columns to be displayed

In [None]:
fruits_loc.group("Name", np.average).select("Name", 1)

In [None]:
fruits_loc.group("Name", np.sum).select("Name", 1)

## Groups - `tbl.group()` & Pivots - `tbl.pivot()` ##

### Classifying by Two Variables ###

In [None]:
fruits_loc.group("Name")

In [None]:
fruits_loc.group("Location")

In [None]:
fruits_loc.group(["Name", "Location"])

In [None]:
fruits_loc.group(["Name", "Location"], np.average)

In [None]:
fruits_loc

In [None]:
fruits_loc.pivot("Location", "Name")

In [None]:
fruits_loc.pivot("Location", "Name", values="Price", collect=np.average)

In [None]:
fruits_loc.pivot("Name", "Location", "Price", np.average)