# Week 02

## Citing open-source / found code

Sometimes the citation will be part of the code. Whenever you use the `import` command, I'll know the code is coming form somewhere else and it's easy to figure out where.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

plt.plot(np.sin(np.arange(0, 4 * np.pi, .1)))
plt.plot(np.cos(np.arange(0, 4 * np.pi, .1)), c="r")
plt.show()

Other times the citation will have to be a little more explicit.

A link to the original code, repo, or stackoverflow answer is enough.

In [None]:
import cv2
from scipy import fftpack
from imagehash import ImageHash

# Function for computing the perceptual hash of an image
# Based on code from the vframe project:
#   https://github.com/vframeio/vframe/blob/master/src/vframe/utils/im_utils.py#L37-L48
# which is based on code from the imagehash library:
#   https://github.com/JohannesBuchner/imagehash/blob/master/imagehash.py#L197

def phash(im, hash_size=8, highfreq_factor=4):
  wh = hash_size * highfreq_factor
  im = cv2.resize(im, (wh, wh), interpolation=cv2.INTER_NEAREST)
  if len(im.shape) > 2 and im.shape[2] > 1:
    im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
  mdct = fftpack.dct(fftpack.dct(im, axis=0), axis=1)
  dctlowfreq = mdct[:hash_size, :hash_size]
  med = np.median(dctlowfreq)
  diff = dctlowfreq > med
  return ImageHash(diff)

Ok, back to Week 02

## Setup

Let's import some helper functions and libraries

In [None]:
import random

## Ranges

<img src="./imgs/range.jpg" width="500px" />

The `range()` function creates sequences of integers for us, starting at its first parameter and going all the way to the second parameter (but doesn't include that one in the sequence).

Range of integers between 0 and 10, (including $0$, but not $10$) would be: `range(0, 10)`.

The `range()` function is _lazy_. It only does work when we have to do something with its individual elements.

So trying to print our sequence using `print(range(0,10))` won't work.

Instead we have to use a `for` loop to ask for each element of the range and print it, or, _eagerly_ turn the range into a `list` with _casting_ `list(range(0,10))`.

In [None]:
range(0, 10)


# TODO: take a look at the result of the range() expression with a loop

# TODO: change it to a list by casting it with the list() function

In [None]:
range(0, 10)

# TODO: take a look at the result of the range() expression
print(range(0, 10))

# TODO: change it to a list by casting it with the list() function
print(list(range(0, 10)))

Range of integers between 0 and 100 skipping by 10s:

In [None]:
range(0, 100, 10)

# TODO: take a look at the range values using casting

In [None]:
range(0, 100, 10)

# TODO: take a look at the range values using casting
print(list(range(0, 100, 10)))

## Lists
### Creating lists from sequences of numbers
#### Create a list with all the numbers between 0 and 1000 that end in 91

In [None]:
list_x91 = []

# TODO: using a for loop

# TODO: using casting

# Print the results
print(list_x91)

In [None]:
list_x91 = []

# TODO: using a for loop
for x in range(91, 1000, 100):
  list_x91.append(x)

# TODO: using casting
list_x91 = list(range(91, 1000, 100))

# Print the results
print(list_x91)

In [None]:
list_x91.append(1091)
print(list_x91)

### List indexing

Indexing from the front is normal:

In [None]:
print(list_x91)
print(list_x91[0])
print(list_x91[2])
print(list_x91[8])

But, Python also lets us index from the back with negative numbers:

In [None]:
print(list_x91[-1])
print(list_x91[-2])
print(list_x91[-8])

### Create a list with 10 number 0's and 3 number 4's

In Python we can repeat a list using the multiplication operation

So, `4 * [10]` actually creates a list with $4$ number $10$s.

And we can add lists with $+$, when we want to concatenate them (add the elements of one list to the end of another).

In [None]:
# TODO: math with lists
list_10_0s_3_4s = []

print(list_10_0s_3_4s)

In [None]:
# TODO: math with lists
list_10_0s_3_4s = 10 * [0] + 3 * [4]

print(list_10_0s_3_4s)

## List Comprehension

### Create a list from another list

This is something unique to `Python`.

We have special syntax that lets us include a `for` loop inside brackets to create a list using a process that  _generates_ its members one item at a time.

In [None]:
array = [ 1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862 ]
squared = [ i * i for i in array ]

for i in squared:
  print(i)

### List Comprehensions

<img src="./imgs/list-comp-00.jpg" height="150px"/>

## List Comprehension for Filtering lists

### Create a list from PARTS of another list

Get all odd numbers from a sequence:

In [None]:
array = [ 1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862 ]
odds = [ i for i in array if i % 2 == 1 ]

for i in odds:
  print(i)

### List Comprehensions + predicate = Filtering

<img src="./imgs/list-comp-01.jpg" height="150px"/>

Use comprehension to create a list of values between 100 and 500 that are divisible by 3 and 7

Remember that in Python the keywords for the Boolean operators are : `and`, `or`, `not`

In [None]:
# TODO: Create a list of values between 100 and 500 that are divisible by 3 and 7
div_3_7_list = []

# then print them
print(div_3_7_list)

In [None]:
# TODO: Create a list of values between 100 and 500 that are divisible by 3 and 7
div_3_7_list = [x for x in range(100, 500) if (x % 3 == 0) and (x % 7 == 0)]

# then print them
print(div_3_7_list)

### Create list of numbers between 0 and 100 that are divisible by 7:

In [None]:
# TODO: probably easier using comprehension
list_100_7 = []

print(list_100_7)

In [None]:
# TODO: probably easier using comprehension
list_100_7 = []

for n in range(0, 100):
  if n % 7 == 0:
    list_100_7.append(n)

print(list_100_7)

### List functions

Members of each `list` object.

<img src="./imgs/lists00.jpg" width="500px" />

### Create a list of 1000 random numbers between 0 and 1000

[Documentation](https://docs.python.org/3/library/random.html) for `random`
- [`randint()`](https://docs.python.org/3/library/random.html#random.randint)
- [`randrange()`](https://docs.python.org/3/library/random.html#random.randrange)

In [None]:
import random

# TODO: with for loop
list_of_randoms = []

print(len(list_of_randoms))

# TODO: with comprehension
list_of_randoms = []

print(len(list_of_randoms))

In [None]:
import random

# TODO: with for loop
list_of_randoms = []

for i in range(0, 1000):
  mri = random.randint(0, 1000)
  list_of_randoms.append(mri)

print(len(list_of_randoms))

# TODO: with comprehension
list_of_randoms = [random.randint(0, 1000) for i in range(0, 1000)]

print(len(list_of_randoms))

### Print the numbers and their index

In [None]:
# TODO: with len

In [None]:
# TODO: with len
for idx in range(0, len(list_of_randoms)):
  print(idx, list_of_randoms[idx])

## `enumerate()`

We can get the list item and its index in one command using enumerate.

For every item in our list we get a pair of number, the index and value of our list items:

```py
for idx,val in enumerate(mylist):
  print("item: ", val, " in index: ", idx)
```

This is the same as doing 

```py
for idx in range(0, len(mylist)):
  print("item: ", mylist(idx), " in index: ", idx)
```

but less typing and more streamlined because we get both things (value and index) at the same time.

In [None]:
# TODO: print random numbers in list with their index using enumerate

In [None]:
# TODO: print random numbers in list with their index using enumerate
for idx,val in enumerate(list_of_randoms):
  print(idx, val)

### Find the largest element on a list

Go through all of the elements and compare each element to the largest number seen so far.

Update the `largest` variable if we encounter a larger number.

In [None]:
# TODO: find max
largest = list_of_randoms[0]

print(largest)

In [None]:
# TODO: find max
largest = list_of_randoms[0]

for x in list_of_randoms:
  if x > largest:
    largest = x

print(largest)

### Find the sum of all elements on a list

Go through all of the elements and add their values to an accumulator variable.

In [None]:
# TODO: find sum
my_sum = 0

print(my_sum)

In [None]:
# TODO: find sum
my_sum = 0

for x in list_of_randoms:
  my_sum += x

print(my_sum)

### Python has built in functions for doing these things

In [None]:
max(list_of_randoms), sum(list_of_randoms)

And more:

In [None]:
min(list_of_randoms)

### Find the 5 largest and 5 smallest numbers on a list

# ðŸ¤”

### Python has a function for sorting a list that could help

In [None]:
my_sorted_list = sorted(list_of_randoms)

print(list_of_randoms)
print(my_sorted_list)

In [None]:
# TODO: 5 largest in descending order

In [None]:
# TODO: 5 largest in descending order

sorted_desc = list(reversed(my_sorted_list))
for idx in range(0, 5):
  print(sorted_desc[idx])

### Functions on lists

These are functions that Python gives us to work on lists.

There are functions for sorting, reversing and getting the length of a `list`:

<img src="./imgs/lists01.jpg" width="600px" />

#### Order from largest to smallest

Sort in reverse:

In [None]:
my_reversed_sorted_list = sorted(list_of_randoms, reverse=True)

print(list_of_randoms)
print(my_sorted_list)
print(my_reversed_sorted_list)

Python positional and keyword arguments:
https://realpython.com/python-asterisk-and-slash-special-parameters/

### With a sorted list we can more easily print the 5 smallest and 5 largest elements


In [None]:
my_sorted_list[ :5], my_sorted_list[-5: ]

### :W:T:F:?:

### Slicing

Python has a built-in mechanism for getting sub-sections of a list called *slicing*.

Instead of a single index, we specify two values in the square bracket, separated by a `:`, to specify where our slice starts and ends:

<img src="./imgs/slicing.jpg" width="700px" />

One **VERY** important thing to remember is that the second index in the bracket is **NOT** included in the slice.

In [None]:
my_list = [random.randint(0, 12) for i in range(0, 20)]
my_list, my_list[0 : 5]

As another example:  
`my_list[4 : 10]` would be used to access $6$ elements starting at position $4$, so ...
<br>elements $4$ - $9$ on the list. The second index in the slice, $10$, is not included.

In [None]:
my_list[4 : 10]

And, Python being Python, it tries to be smart and keep us from unnecessary typing:
- if the first index is blank, the slice will start at the first element 
- if the second index is blank, the slice will go until the end of the list

In [None]:
my_list, my_list[0 : 5], my_list[ :5]

In [None]:
my_list[15 : 20], my_list[15: ]

We can use negative indexes to slice from the back:

`a_list[-5 : len(a_list)]` would grab the last 5 elements from the list `my_list`,
<br>but this can be simplified with `a_list[-5: ]`.

In [None]:
my_list[-5 : len(my_list)], my_list[-5: ]

### How would we get the 5 items in the center?

In [None]:
center_index = len(my_list) // 2
center_5 = my_list[center_index - 2 : center_index + 3]

print(my_list)
print(center_5)

### This should make more sense now:

In [None]:
my_sorted_list[ :5], my_sorted_list[-5: ]

## Objects

### Creating objects

In [None]:
my_info = {
  "name": "thiago",
  "id": "hersant",
  "nnumber": 15374981,
  "zip": 11001,
  "grades": [90, 80, 86, 82, 94],
  "attendance": [True, True, False, True, True],
  "final grade": "A"
}
my_info

### Accessing values at specific keys

In [None]:
print(my_info["name"])
print(my_info["grades"])

### Modifying and Appending new key/values

In [None]:
my_info["zip"] = 11202
my_info["course"] = 5020
my_info["section"] = "A"
my_info

### Iterating over keys, values and items

[Documentation](https://docs.python.org/3/tutorial/datastructures.html#looping-techniques)

<img src="./imgs/objects.jpg" width="500px" />

In [None]:
# TODO use my_info.keys(), .values() and .items() to print contents of object

# TODO: using my_info.items(), print the type of each value associated with each key

In [None]:
# TODO use my_info.keys(), .values() and .items() to print contents of object

print(my_info.keys())
print(my_info.values())
print(my_info.items())

print()

# TODO: using my_info.items(), print the type of each value associated with each key
for k,v in my_info.items():
    print(k, type(v))

## List of objects

### Create a list of 10 objects with random heights, brooklyn zip codes and a random id between 100 and 999.

```python
my_data = [
  {"height": [60, 70], "zip": [11200, 11250], "id": [100, 999]},
  {"height": [60, 70], "zip": [11200, 11250], "id": [100, 999]},
  {"height": [60, 70], "zip": [11200, 11250], "id": [100, 999]},
  ...
]
```

To do this, we can use a call to `range()` to create a counter, and then for each of the $10$ iterations we'll `append()` an object with three items, a `height` value between $60$ and $70$, a `zip` between $11200$ and $11250$ and an `id` between $100$ and $999$.

In [None]:
# TODO: create list of random objects
my_data = []

my_data

In [None]:
# TODO: create list of random objects
my_data = []

for idx in range(10):
  my_obj = {
    "height": random.randrange(60, 71),
    "zip": random.randrange(11200, 11251),
    "id": random.randrange(100, 999)
  }
  my_data.append(my_obj)

my_data

### Let's create a list of 3 random grades for each member of the list and another item with their computed average

In [None]:
# TODO: first, append grade list to objects

In [None]:
# TODO: first, append grade list to objects
for person in my_data:
  person["grades"] = []
  for idx in range(3):
    person["grades"].append(random.randrange(80, 101))

my_data

### Average

<img src="./imgs/average00.jpg" width="500px" />
<br>
<img src="./imgs/average01.jpg" width="500px" />

In [None]:
# TODO: compute and store average of grades

In [None]:
# TODO: compute and store average of grades
for person in my_data:
  pgrades = person["grades"]
  person["average"] = sum(pgrades) / len(pgrades)

my_data

### Get highest and lowest average grades

First, get all average grades, then use `min()`/`max()`

In [None]:
grades = []
for obj in my_data:
  grades.append(obj["average"])

min(grades), max(grades)

### Sort objects by average grades

We could first get all the average grades and then sort the new list:

In [None]:
grades = []
for obj in my_data:
  grades.append(obj["average"])

by_grade = sorted(grades)

print("original:\n", grades)
print("sorted:\n", by_grade)

### But now we don't have the other associated information with each grade.

We want to sort the list while keeping the objects together.

Would be nice to be able to do something like this, just like with a `list`:

In [None]:
by_grade = sorted(my_data)
print(by_grade)

but we can't

### Sorting Objects

For lists of objects we have to tell python which values to compare to determine their order.

We do this by defining a key function.

Key functions receive one argument, that can be an object, a list, a class member, anything... and they return one numerical value.

<img src="./imgs/list-of-objects.jpg" width="620px" />

In [None]:
# this key function receives a student-info object with {height, grade, zip, etc}
# and should return just the average grade value
def averageKey(person):
  return person["average"]

# then we can just use it when we call sorted()
by_average = sorted(my_data, key=averageKey)

by_average

In [None]:
# TODO: sort by first assignment grade

In [None]:
# TODO: sort by first assignment grade

def grade0Key(A):
  return A["grades"][0]

by_grade0 = sorted(my_data, key=grade0Key)
by_grade0

### `min()`/`max()` functions also work with a `key` argument:

In [None]:
# student with highest average grade
max_by_grade = max(my_data, key=averageKey)

# student with lowest score on first assignment
min_by_hw01 = min(my_data, key=averageKey)

print(max_by_grade)
print(min_by_hw01)

## Bigger Lists

## Setup

Include some helper functions and libraries

In [None]:
!wget -q https://github.com/PSAM-5020-2026S-A/5020-utils/raw/main/src/data_utils.py

In [None]:
import matplotlib.pyplot as plt

from data_utils import object_from_json_url

### Load ANSUR 2 Dataset

The `JSON` file has a subset of the measurements found [here](https://www.openlab.psu.edu/ansur2/).

In [None]:
ANSUR_JSON_URL = "https://raw.githubusercontent.com/PSAM-5020-2026S-A/5020-utils/main/datasets/json/ansur.json"
ansur = object_from_json_url(ANSUR_JSON_URL)

# TODO: look at the data

# Answer:
#   - how many rows/records/items ?
#   - tallest height ?
#   - longest ear ?
#   - average ear length ?

In [None]:
ANSUR_JSON_URL = "https://raw.githubusercontent.com/PSAM-5020-2026S-A/5020-utils/main/datasets/json/ansur.json"
ansur = object_from_json_url(ANSUR_JSON_URL)

# TODO: look at the data

# Answer:
#   - how many rows/records/items ?
print(len(ansur))

#   - tallest height ?
all_hs = [p["height"] for p in ansur]
print(max(all_hs))

#   - longest ear ?
all_els = [p["ear"]["length"] for p in ansur]
print(max(all_els))

#   - average ear length ?
print(sum(all_els) / len(all_els))

### Let's look at a simpler version:

In [None]:
AHW_LIST_URL = "https://raw.githubusercontent.com/PSAM-5020-2026S-A/5020-utils/main/datasets/json/ansur_age_height_weight.json"
ahws = object_from_json_url(AHW_LIST_URL)

# TODO: look at data
# How is it organized ?

# Answer the following:
#   - how many items ?
#   - how do we access the height of a person ?

In [None]:
AHW_LIST_URL = "https://raw.githubusercontent.com/PSAM-5020-2026S-A/5020-utils/main/datasets/json/ansur_age_height_weight.json"
ahws = object_from_json_url(AHW_LIST_URL)

# TODO: look at data
print(ahws[:5])

# How is it organized ?
# List of lists

# Answer the following:
#   - how many items ?
print(len(ahws))

#   - how do we access the height of a person ?
#     double index. first is person index, then index into person info list and grab second item
print(ahws[0][1])

In [None]:
# age of person at index 10
print("age of person at index 10:", ahws[10][0])

# height of person at index 1234
print("height of person at index 1234:", ahws[1234][1])

# weight of person at index 567
print("weight of person at index 567:", ahws[567][2])

## List of Lists

Just like we can put lists inside objects, and objects inside lists, we can also put lists inside lists.

If we want to get to a particular value we have to use $2$ indices instead of using just one:
`list[i][j]`

The first index tells Python which of the sub-lists we want, and the second specifies the item on that list.

<img src="./imgs/list-of-lists00.jpg" width="700px" />

<img src="./imgs/list-of-lists01.jpg" width="700px" />

Sometimes we'll refer to the first index as the row index and the second index as the column index.

That's because if we imagine our list of lists as a 2-dimensional matrix of numbers, the first index tells Python which row we want to access and the second tells which column:

<img src="./imgs/list-of-lists02.jpg" width="700px" />

<img src="./imgs/list-of-lists03.jpg" width="700px" />

### Datasets

We'll see this kind of structure a lot.

It's very common for datasets to be organized by rows/columns, where each column specifies a different *property* (or *feature*) and each row is a different *measurement* (or *record*) of those features.

In our example above, our dataset had $3$ *features* (age, height, weight), and one *record* per person.

<img src="./imgs/datasets00.jpg" width="700px" />

### JSON

It's also common to find datasets specified in the JSON format.

Instead of just being a list of lists with values, each *record* is an object that specifies the names and values of its *features*:

<img src="./imgs/datasets01.jpg" width="700px" />

There are advantages and disadvantages to each. We'll soon look at another way to organize datasets that will make it easier to go from one type to the other if we have to.

## Plots

We can use the [matplot](https://matplotlib.org/stable/api/pyplot_summary.html) library to visualize our data.

In [None]:
# TODO: get heights
heights = []

plt.plot(heights, "bo", markersize=2)
plt.show()

In [None]:
# TODO: get heights
heights = []

for p in ahws:
  heights.append(p[1])

plt.plot(heights, "bo", markersize=2)
plt.show()

In [None]:
# TODO: get weights
weights = []

plt.plot(heights, "ro", markersize=2)
plt.show()

In [None]:
# TODO: get weights
weights = []

for p in ahws:
  weights.append(p[2])

plt.plot(weights, "ro", markersize=2)
plt.show()

In [None]:
# TODO: plot ages in green
ages = []

In [None]:
# TODO: plot ages in green
ages = []

for p in ahws:
  ages.append(p[0])

plt.plot(ages, "go", markersize=2)
plt.show()

### Matplotlib has memory/state

We can plot multiple lists at once by calling `plot()` repeatedly before displaying the accumulated graph with `show()`.


In [None]:
plt.plot(heights, "bo", markersize=2)
plt.plot(weights, "ro", markersize=2)
plt.plot(ages, "go", markersize=2)

plt.show()

### Sorting data can give a different perspective

In [None]:
sorted_heights = sorted(heights)
plt.plot(sorted_heights, "bo", markersize=2)
plt.show()

In [None]:
# TODO: repeat for weight and age

In [None]:
# TODO: repeat for weight and age

sorted_weights = sorted(weights)
plt.plot(sorted_weights, "ro", markersize=2)
plt.show()

sorted_ages = sorted(ages)
plt.plot(sorted_ages, "go", markersize=2)
plt.show()

### Histograms

In [None]:
min_height = min(heights)
max_height = max(heights)
plt.hist(heights, bins=range(min_height, max_height + 1))
plt.grid()
plt.show()

## Correlation

Measurement of how $2$ independent variables (features) are related to each other.

<img src="./imgs/correlation.jpg" width="800px" />

They can have *positive* or *direct* correlation, if an increase in one of the variables comes with an increase in the other.

They can have *negative* or *inverse* correlation if an increase in one of the variables is accompanied by a decrease in the other.

Or, there can be *weak* or *NO* correlation, if a change in one variable doesn't seem to be accompanied by a change in the other.

In [None]:
# use "column" lists from above to plot scatter plot
plt.scatter(ages, heights, marker="o", alpha=0.2)
plt.xlabel("age")
plt.ylabel("height")
plt.show()

In [None]:
# TODO plot other combinations of variables
# TODO: any correlation ?

In [None]:
# TODO plot other combinations of variables
# TODO: any correlation ?

plt.scatter(heights, weights, marker="o", alpha=0.2)

plt.xlabel("height")
plt.ylabel("weight")
plt.show()

In [None]:
# TODO plot other combinations of variables

heights = []
foot_lengths = []

for p in ansur:
  heights.append(p["height"])
  foot_lengths.append(p["foot"]["length"])

plt.scatter(heights, foot_lengths, marker="o", alpha=0.2)

plt.xlabel("height")
plt.ylabel("foot length")
plt.show()

## Extra Practice

### Traversing a list of objects/dictionaries

This next cell creates a list of $1000$ objects with the following keys/parameters:

```py
{
  "id": "abc1234",
  "zip": 10001,
  "grades": [70, 81, 92, 84, 89],
  "attendance": [True, False, True, ....]
}
```

The `id` field is a string made up of $3$ letters and $4$ numbers; `zip` is a NYC area zip code; `grades` is a list of $5$ grades between $0$ and $100$; and `attendance` is a list of `15` boolean values.

In [None]:
import random
import string

from matplotlib import pyplot as plt

In [None]:
data = []

for cnt in range(1000):
  id_let = random.choices(string.ascii_lowercase, k=3)
  id_num = random.choices(string.digits, k=4)
  id_str = "".join(id_let + id_num)
  att = random.choices([True, True, True, True, True, False], k=15)
  grades = [min(100, random.gauss(93-3*(len(att)-sum(att)), 5)) for g in range(5)]
  data.append({
    "id": id_str,
    "zip": random.randint(10001, 11250),
    "grades": grades,
    "attendance": att
  })

We can check the length of the list, the contents of its first item, and its keys, with:

In [None]:
display(len(data))
display(data[0])
display(data[0].keys())

### Plot: Grade vs Attendance

Let's see if someone's attendance has an effect on their grade.

We'll want to eventually plot average grade vs average attendance, but let's start simple.

Let's plot all of the students's first assignment grade, versus their first attendance.

We have to go through the list of objects/dicts and extract the first assignment grade and first attendance into separate lists of grades and attendances.

In [None]:
grades_0 = []
attendances_0 = []

for d in data:
  # TODO: append first grade to list
  # TODO: append first attendance to list

plt.plot(attendances_0, grades_0, "o")
plt.show()

In [None]:
grades = []
attendances = []

for d in data:
  # TODO: append first grade to list
  grades.append(d["grades"][0])
  # TODO: append first attendance to list
  attendances.append(d["attendance"][0])

plt.plot(attendances, grades, "o")
plt.show()

### Plot: Grade vs Attendance

Now let's plot average grade versus total attendance.

The logic is the same, but instead of just appending the first grade/attendance, we'll push average grade and sum of the attendance values.

In [None]:
grades_avg = []
attendance_sums = []

for d in data:
  # TODO: append average grade to list
  # TODO: append average attendance to list

plt.plot(attendance_sums, grades_avg, "o")
plt.show()

plt.hist(attendance_sums, bins=8)
plt.show()

In [None]:
grade_avgs = []
attendance_sums = []

for d in data:
  # TODO: append average grade to list
  grade_avgs.append(sum(d["grades"]) / len(d["grades"]))
  # TODO: append average attendance to list
  attendance_sums.append(sum(d["attendance"]))

plt.plot(attendance_sums, grade_avgs, "o")
plt.xlabel("total attendance")
plt.ylabel("average grade")
plt.show()

plt.hist(attendance_sums, bins=8)
plt.xlabel("total attendance")
plt.show()