In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("hw02.ipynb")

<img style="display: block; margin-left: auto; margin-right: auto" src="./ccsf-logo.png" width="250rem;" alt="The CCSF black and white logo">

# Homework 02: Data Types

**Recommended Reading**: 
* [Data Types](https://inferentialthinking.com/chapters/04/Data_Types.html) 
* [Sequences](https://inferentialthinking.com/chapters/05/Sequences.html)

## Assignment Reminders

- Make sure to run the code cell at the top of this notebook that starts with `# Initialize Otter` to load the auto-grader.
- For all tasks indicated with a 🔎 that you must write explanations and sentences for, provide your answer in the designated space.
- Throughout this assignment and all future ones, please be sure to not re-assign variables throughout the notebook! _For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously!_
- We encourage you to discuss this assignment with others, but make sure to write and submit your own code. Refer to the syllabus to learn more about how to learn cooperatively.

*View the related <a href="https://ccsf.instructure.com" target="_blank">Canvas</a> Assignment page for additional details.*

Run the following code cell to import the tools for this assignment.

In [2]:
import numpy as np
from datascience import *

Please complete this notebook by filling in the cells provided.
 
Throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. 

## Creating Arrays


### Note About Arrays for Those with Python Experience

Python lists are different/behave differently than NumPy arrays. In this course, we use a version of NumPy arrays, so please make your arrays using tools in the `datascience` module such as `make_array` and don't use a python list.

### Task 01 📍

Make an array called `weird_numbers` containing the following numbers (in the given order):

1. 3,000
2. the square root of 5
3. -2
4. $\pi$ to the power of the square root of 2

Remember that `math` module contains the square root function and the constant value $\pi$.

_Points:_ 2

In [3]:
# You will probably want to import something to start!
from math import sqrt, pi
...
weird_numbers = np.array([3000, sqrt(5), -2, pi ** sqrt(2)])
weird_numbers

array([  3.00000000e+03,   2.23606798e+00,  -2.00000000e+00,
         5.04749727e+00])

In [4]:
grader.check("task_01")

### Task 02 📍

Make an array called `book_title_words` containing the following three strings: "Eats", "Shoots", and "and Leaves".


_Points:_ 2

In [5]:
book_title_words = np.array(["Eats", "Shoots", "and Leaves"])
book_title_words

array(['Eats', 'Shoots', 'and Leaves'],
      dtype='<U10')

In [6]:
grader.check("task_02")

Strings have a method called `join`.  `join` takes one argument, an array of strings.  It returns a single string.  Specifically, the value of `a_string.join(an_array)` is a single string that's the [concatenation](https://en.wikipedia.org/wiki/Concatenation) ("putting together") of all the strings in `an_array`, **except** `a_string` is inserted in between each string.

### Task 03 📍

Use the array `book_title_words` and the method `join` to make two strings:

1. "Eats, Shoots, and Leaves" (call this one `with_commas`)
2. "Eats Shoots and Leaves" (call this one `without_commas`)

*Hint:* If you're not sure what `join` does, first try just calling, for example, `"foo".join(book_title_words)` .


_Points:_ 3

In [10]:
with_commas = ", ".join(book_title_words)
without_commas = " ".join(book_title_words)

# These lines are provided just to print out your answers.
print('with_commas:', with_commas)
print('without_commas:', without_commas)

with_commas: Eats, Shoots, and Leaves
without_commas: Eats Shoots and Leaves


In [8]:
grader.check("task_03")

### Task 04 📍🔎

<!-- BEGIN QUESTION -->

Running the following code makes an array and does not show an error:

``` python
make_array('a string', 1234)
```

Does this mean that it is possible to make an array where one item is a string and the other item is an integer? Respond with Yes or No and provide an explanation to support your response.

_Points:_ 2

Python allows for the creation of arrays with elements of different types, such as strings and integers. The make_array function, typically associated with the datascience library, permits this creation.

<!-- END QUESTION -->

## Indexing Arrays


These exercises give you practice accessing individual elements of arrays.  In Python (and in many programming languages), elements are accessed by *index*, so the first element is the element at index 0, the second element is at index 1, etc.

### A Note About using `an_array.item(...)` vs. `an_array[...]`.

When you are working with an array (`an_array` for example) in this class:

**We recommend that you use `an_array.item(...)` to access the items in an element in the array to be consistent with the course materials.** 

We are working to accept using `an_array[0]` in all situations. The bracket notation is the more standard way of indexing, but you might run into issues with data types that might lead to the auto-grader saying your work is incorrect.

### Task 05 📍

The cell below creates an array of some numbers.  Set `third_element` to the third element of `some_numbers`.


_Points:_ 2

In [11]:
some_numbers = make_array(-1, -3, -6, -10, -15)

third_element = some_numbers[2]
third_element

-6

In [12]:
grader.check("task_05")

The test above checks that your answer is in the correct format. **This test does not check that you answered correctly**, only that you assigned a number successfully. The same is true for most of the tests in this homework, and every other homework for this class.

### Task 06 📍

You'll sometimes want to find the *last* element of an array.  Suppose an array has 145 elements.  What is the index of its last element?


_Points:_ 2

In [13]:
index_of_last_element = 144

In [14]:
grader.check("task_06")

More often, you don't know the number of elements in an array, its *length*.  (For example, it might be a large dataset you found on the Internet.)  The function `len` takes a single argument, an array, and returns the `len`gth of that array (an integer).

### Task 07 📍

The cell below loads an array called `president_birth_years`.  Calling `.column(...)` on a table returns an array of the column specified, in this case the `Birth Year` column of the `president_births` table. The last element in that array is the most recent birth year of any deceased president. Assign that year to `most_recent_birth_year`.


_Points:_ 2

In [21]:
president_birth_years = Table.read_table("president_births.csv").column('Birth Year')

most_recent_birth_year = president_birth_years[-1]
most_recent_birth_year

1917

In [22]:
grader.check("task_07")

### Task 08 📍

Finally, assign `sum_of_birth_years` to the sum of the third, tenth, and last birth year in `president_birth_years`. You might consider breaking up your solution into parts.

_Points:_ 2

In [31]:
sum_of_birth_years = president_birth_years[2] + president_birth_years[9] + president_birth_years[-1]
sum_of_birth_years

5444

In [32]:
grader.check("task_08")

## Basic Array Arithmetic


### Task 09 📍

Multiply the numbers 42, 4224, 42422424, and -250 by 157. Assign each variable below such that `first_product` is assigned to the result of $42 * 157$, `second_product` is assigned to the result of $4224 * 157$, and so on. 

For this task, **don't** use arrays. You'll use them later.


_Points:_ 2

In [35]:
first_product = 42 * 157
second_product = 4224 * 157
third_product = 42422424 * 157
fourth_product = -250 * 157
print(first_product, second_product, third_product, fourth_product)

6594 663168 6660320568 -39250


In [36]:
grader.check("task_09")

### Task 10 📍

Now, do the same calculation, but using an array called `numbers` and only a single multiplication (`*`) operator.  Store the 4 results in an array named `products`.


_Points:_ 2

In [39]:
numbers = make_array(42, 4224, 42422424, -250)
products = numbers * 157
products

array([      6594,     663168, 6660320568,     -39250])

In [40]:
grader.check("task_10")

### Task 11 📍

Oops, we made a typo!  Instead of 157, we wanted to multiply each number by 1577.  Compute the correct products in the cell below using array arithmetic.  Notice that your job is really easy if you previously defined an array containing the 4 numbers.


_Points:_ 2

In [41]:
correct_products = numbers * 1577
correct_products

array([      66234,     6661248, 66900162648,     -394250])

In [42]:
grader.check("task_11")

### Task 12 📍

We've loaded an array of temperatures in the next cell.  Each number is the highest temperature observed on a particular day at a climate observation station, mostly from the US.  Since they're from the US government agency [NOAA](https://noaa.gov), all the temperatures are in Fahrenheit.

Convert all temperature values from Fahrenheit to Celsius by first subtracting 32 from them, then multiplying the results by $\frac{5}{9}$. Make sure to **ROUND** the final result after converting to Celsius to the nearest integer using the `np.round` function. Remember to check the documentation of a function if you are not sure how it works.

_Points:_ 3

In [44]:
max_temperatures = Table.read_table("temperatures.csv").column("Daily Max Temperature")

celsius_max_temperatures = np.round((max_temperatures - 32) * (5/9))
celsius_max_temperatures

array([ -4.,  31.,  32., ...,  17.,  23.,  16.])

In [45]:
grader.check("task_12")

### Task 13 📍

The cell below loads all the *lowest* temperatures from each day (in Fahrenheit).  Compute the size of the daily temperature range for each day.  That is, compute the difference between each daily maximum temperature and the corresponding daily minimum temperature.  **Pay attention to the units, give your answer in Celsius!** Make sure **NOT** to round your answer for this question!


_Points:_ 3

In [46]:
min_temperatures = Table.read_table("temperatures.csv").column("Daily Min Temperature")

celsius_temperature_ranges = ((max_temperatures-32)*(5/9))-((min_temperatures-32)*(5/9))
celsius_temperature_ranges

array([  6.66666667,  10.        ,  12.22222222, ...,  17.22222222,
        11.66666667,  11.11111111])

In [47]:
grader.check("task_13")

## World Population


The cell below loads a table of estimates of the world population for from 1950 until 2024. The estimates come from the [US Census Bureau website](https://www.census.gov/en.html).

In [48]:
world = Table.read_table("world_population.csv")
world

Year,Population
1950,2558023014
1951,2595838116
1952,2637936830
1953,2683288876
1954,2731379529
1955,2782973591
1956,2836130358
1957,2892165757
1958,2948892474
1959,3001437149


The name `population` is assigned to an array of population estimates.

In [49]:
population = world.column('Population')
population

array([2558023014, 2595838116, 2637936830, 2683288876, 2731379529,
       2782973591, 2836130358, 2892165757, 2948892474, 3001437149,
       3043723411, 3084753192, 3140982059, 3210747452, 3282150912,
       3351447909, 3421950085, 3491918938, 3563999493, 3638882226,
       3714262611, 3791705363, 3867790515, 3943299941, 4018289723,
       4089897507, 4160016430, 4230925634, 4301796324, 4375462721,
       4446018801, 4527470698, 4610284677, 4694209205, 4775889225,
       4860691329, 4947831057, 5037623680, 5128370129, 5218863705,
       5311093985, 5398248391, 5484859952, 5568560590, 5650427498,
       5733476988, 5815625725, 5896157461, 5975537190, 6054374592,
       6133007860, 6211824800, 6290903765, 6369899427, 6449064951,
       6528030077, 6608491622, 6690682690, 6774892222, 6859055794,
       6942148774, 7024885273, 7108237070, 7192338928, 7276120452,
       7358968084, 7441688089, 7523957647, 7605030254, 7685586259,
       7764965463, 7837646014, 7906702795, 7982019198, 8057236

In this question, you will apply some built-in Numpy functions to this array. Numpy is a module that is often used in Data Science!

<img src="array_diff.png" style="width: 600px;"/>

The difference function `np.diff` subtracts each element in an array from the element after it within the array. As a result, the length of the array `np.diff` returns will always be one less than the length of the input array.

<img src="array_cumsum.png" style="width: 700px;"/>

The cumulative sum function `np.cumsum` outputs an array of partial sums. For example, the third element in the output array corresponds to the sum of the first, second, and third elements.

### Task 14 📍

Very often in data science, we are interested understanding how values change with time. Use `np.diff` and `np.max` (or just `max`) to calculate the largest annual change in population between any two consecutive years.


_Points:_ 2

In [50]:
largest_population_change = np.max(np.diff(world.column('Population')))
largest_population_change

92230280

In [51]:
grader.check("task_14")

### Task 15 📍

Run the following code cell. What do the values in the resulting array represent (choose one)?

In [52]:
np.cumsum(np.diff(population))

array([  37815102,   79913816,  125265862,  173356515,  224950577,
        278107344,  334142743,  390869460,  443414135,  485700397,
        526730178,  582959045,  652724438,  724127898,  793424895,
        863927071,  933895924, 1005976479, 1080859212, 1156239597,
       1233682349, 1309767501, 1385276927, 1460266709, 1531874493,
       1601993416, 1672902620, 1743773310, 1817439707, 1887995787,
       1969447684, 2052261663, 2136186191, 2217866211, 2302668315,
       2389808043, 2479600666, 2570347115, 2660840691, 2753070971,
       2840225377, 2926836938, 3010537576, 3092404484, 3175453974,
       3257602711, 3338134447, 3417514176, 3496351578, 3574984846,
       3653801786, 3732880751, 3811876413, 3891041937, 3970007063,
       4050468608, 4132659676, 4216869208, 4301032780, 4384125760,
       4466862259, 4550214056, 4634315914, 4718097438, 4800945070,
       4883665075, 4965934633, 5047007240, 5127563245, 5206942449,
       5279623000, 5348679781, 5423996184, 5499213229])

1) The total population change between consecutive years, starting at 1951.

2) The total population change between 1950 and each later year, starting at 1951.

3) The total population change between 1950 and each later year, starting inclusively at 1950.

Assign `cumulative_sum_answer` to 1, 2, or 3


_Points:_ 2

In [53]:
cumulative_sum_answer = 3

In [54]:
grader.check("task_15")

## Old Faithful


Old Faithful is a geyser in Yellowstone that erupts every 44 to 125 minutes (according to [Wikipedia](https://en.wikipedia.org/wiki/Old_Faithful)). People are [often told that the geyser erupts every hour](http://yellowstone.net/geysers/old-faithful/), but in fact the waiting time between eruptions is more variable. Let's take a look.

### Task 16 📍

The first line below assigns `waiting_times` to an array of 272 consecutive waiting times between eruptions, taken from a classic 1938 dataset. Assign the names `shortest`, `longest`, and `average` so that the `print` statement is correct.


_Points:_ 2

In [56]:
waiting_times = Table.read_table('old_faithful.csv').column('waiting')

shortest = min(waiting_times)
longest = max(waiting_times)
average = np.mean(waiting_times)

print("Old Faithful erupts every", shortest, "to", longest, "minutes and every", average, "minutes on average.")

Old Faithful erupts every 43 to 96 minutes and every 70.8970588235 minutes on average.


In [57]:
grader.check("task_16")

### Task 17 📍

Assign `biggest_decrease` to the biggest decrease in waiting time between two consecutive eruptions. For example, the third eruption occurred after 74 minutes and the fourth after 62 minutes, so the decrease in waiting time was 74 - 62 = 12 minutes. 

*Hint 1*: You'll need an array arithmetic function [mentioned in the textbook](https://inferentialthinking.com/chapters/05/1/Arrays.html#functions-on-arrays). You have also seen this function earlier in the homework!

*Hint 2*: We want to return the absolute value of the biggest decrease.


_Points:_ 2

In [76]:
biggest_decrease = abs(min(np.diff(waiting_times)))
biggest_decrease

45

In [77]:
grader.check("task_17")

### Task 18 📍

Let's imagine your guess for the next wait time was always just the length of the previous waiting time. If you always guessed the previous waiting time, how big would your error in guessing the waiting times be, on average?

For example, since the first three waiting times are 79, 54, and 74, the average difference between your guess and the actual time for just the second and third eruption would be $\frac{|79-54|+ |54-74|}{2} = 22.5$.

*Tip: You can do this in one command, but we recommend that you break it up into a few smaller steps.*

_Points:_ 2

In [80]:
differences = np.diff(waiting_times)
average_error = np.average(abs(differences))

## average_error = ...
average_error

20.520295202952031

In [81]:
grader.check("task_18")

## Submit your Homework to Canvas

Once you have finished working on the homework tasks, prepare to submit your work in Canvas by completing the following steps.

1. In the related Canvas Assignment page, check the rubric to know how you will be scored for this assignment.
2. Double-check that you have run the code cell near the end of the notebook that contains the command `"grader.check_all()"`. This command will run all of the run tests on all your responses to the auto-graded tasks marked with 📍.
3. Double-check your responses to the manually graded tasks marked with 📍🔎.
3. Select the menu item "File" and "Save Notebook" in the notebook's Toolbar to save your work and create a specific checkpoint in the notebook's work history.
4. Select the menu items "File", "Download" in the notebook's Toolbar to download the notebook (.ipynb) file. 
5. In the related Canvas Assignment page, click Start Assignment or New Attempt to upload the downloaded .ipynb file.

**Keep in mind that the autograder does not always check for correctness. Sometimes it just checks for the format of your answer, so passing the autograder for a question does not mean you got the answer correct for that question.**

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [82]:
grader.check_all()

task_01 results: All test cases passed!
task_01 - 1 message: ✅ You provided an array of 4 values.

task_02 results: All test cases passed!
task_02 - 1 message: ✅ You provided an array of 3 values.
task_02 - 2 message: ✅ Great work not including the commas.
task_02 - 3 message: ✅ Great work including 'and' with the last string.

task_03 results: All test cases passed!
task_03 - 1 message: ✅ Nice! with_commas and without_commas are both strings.

task_05 results: All test cases passed!
task_05 - 1 message: ✅ Nice work not using the wrong index.

task_06 results: All test cases passed!
task_06 - 1 message: ✅ You assigned a possible int to index_of_last_element.

task_07 results: All test cases passed!
task_07 - 1 message: ✅ You successfully submitted an integer.

task_08 results: All test cases passed!
task_08 - 1 message: ✅ You successfully submitted an integer.

task_09 results: All test cases passed!
task_09 - 1 message: ✅ It seems like you've multiplied the 4 numbers by 157.

task_10 