In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab03.ipynb")

<img src="data6.png" style="width: 15%; float: right; padding: 1%; margin-right: 2%;"/>

# Lab 3 â€“ Print, Arrays, and Tables

## Data 6

In this lab, we will be talking all about *Tables*. We use tables to store all sorts of data from sports statistics to population information. If there's data you have ever been curious about, it is very likely that the Internet has a table somewhere with that data!

Tables are integral to the foundation of Data Science, and we will go over how to **query** a table. **Querying** a table is, simply put, requesting information about the table. Some examples of common queries (in English, not code):

- How many data points are there?
- Which data points have a specific characteristic?
- What is the attribute of a specific data point?
- And many more!

There are so many ways we can use tables to get information we need, and there are several existing libraries in Python that we can use to do this! In this course, we will be using the `datascience` library. This is the standard library used both in Data 6 and Data 8 at UC Berkeley. If you take Data Science classes beyond those two, you'll learn more!


The goals of this lab section:
* Practice using print to debug code
* Practice array operations using the datascience module and the numpy library
* Practice table operations with time series data

In [None]:
# Run this cell to load all required Python libraries 
import numpy as np
from datascience import *

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")

import warnings
warnings.simplefilter('ignore')

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Part 1: Print

Try running this cell:

In [None]:
print("Hello, World!")

And this one:

In [None]:
print("\N{WAVING HAND SIGN}, \N{EARTH GLOBE ASIA-AUSTRALIA}!")

Every `print` expression prints a line. Run the next cell and notice the order of the output.

In [None]:
print("First this line is printed,")
print("and then this one.")

<br/>

---

## Question 1.1: Print Practice


Given an array `arr`, write code that will display the first and last value of `arr` without [hardcoding](https://en.wikipedia.org/wiki/Hard_coding) the values of what is being printed out. Use negative indexing to get the last element.

For example, given an `arr` with the values [1, 3, 5, 7], we should see the following:

```
1
7
```

In [None]:
# Do not edit the following line of code
arr = make_array(1, 3, 5, 7)

# YOUR CODE HERE
...
...

---

## [Tutorial] The backslash character `\`


By now, you know that in Python strings can be surrounded by single quotes (`'`) or double quotes (`"`). But what if a string had both single quotes and double quotes? This happens a lot in text and dialogue:
                                                                            

In [None]:
# uncomment the following cell to see the error.
# print("Nike's slogan is, "Just Do It."")

We can use a **backslash character** `\` to **escape** quote characters inside strings to keep it all part of one string. Below, `\"` is treated as a single characterâ€”the double-quoteâ€”instead of the end of a string.

In [None]:
print("Nike's slogan is, \"Just Do It.\"")

In [None]:
# and again, using single quotes
print('Nike\'s slogan is, "Just Do It."')

The backslash can also be used for special characters and typesetting. In the below cell, the `\n` is treated as a single special character: a **newline**.

In [None]:
print("Hello\nworld!")

The backslash character has many uses! Besides strings, it can be used as a **line continuation character** to write a single line of Python on multiple lines, without changing execution:

In [None]:
print("Hello" \
      "world")

---

## Question 1.2: Multiple arguments to print

Notice how we **typecast** `s` to be a string so we can concatenate the strings together!

In [None]:
polygon = "square"
s = 4
print("The area of a " + polygon + \
      " with side length " + str(s) + \
      " is " + str(s ** 2) + ".")

You can also print the value of variables by including them as arguments, such as `name = "John"`, then `print("My name is", name)`, which would display "My name is John". The `print` function can be used for debugging purposes or to provide output to the user in interactive programs. It is a fundamental tool in Python for displaying information and verifying program behavior.

**Question**: Try the following exercise, where the goal is to print the sum of two numbers. Your output should read:

> "The sum of 5 and 3 is 8"

In [None]:
# Exercise: Print the sum of two numbers
num1 = 5
num2 = 3
sum_result = num1 + num2
...

---

## Question 1.3: Print Debugging

Let's do some print debugging! Below is some code used to calculate the price of some goods, but lucky us! We have a 10% discount. No need for a calculatorâ€”we can whip out Jupyter notebooks to do this calculation for us! However, something seems off. Use print statements to identify and fix the error.

In [None]:
item_prices = make_array(15.99, 23.49, 9.99, 5.49)
tax_rate = 8.25
discount = 10

total_price = sum(item_prices)
total_with_tax = total_price * tax_rate
total_with_discount = total_with_tax - (total_with_tax * (discount / 100))

print("Total price of items:", total_price)
print("Total after tax:", total_with_tax)
print("Total after discount:", total_with_discount)

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Part 2: `None`

By now, you have seen that in many cases, evaluating Python expressions (function calls, etc.) returns values. But what if Python doesn't return _anything_? Python has a special value and type to represent this case: `None`.

In particular, the `print` returns `None`. How can that be, when we clearly see displayed output to the notebook???

Consider the following cell:

In [None]:
# just run this cell
roller = "\N{ROLLER COASTER}"
print(roller)
roller + "\N{ROLLER SKATE}"

Above, notice that `print` was not the last line, yet still output to the screen. The last line was then evaluated and considered the output of the cell. You can verify this by the square brackets to the left of the output, compared to no square brackets next to the `print` output.

Now, considering the following more tricky code cell. Uncomment to run:

In [None]:
# proclamation = print("We love " + roller)
# print("Your course staff says:", proclamation)
# proclamation + " and \N{ROLLER SKATE}"

What happened? **Discuss** your answers to the following questions (no text entry):
1. Which expression displays `We love ðŸŽ¢` to the notebook?
1. What value is `proclamation` assigned to?
1. What happens in the second line?
1. Which line is erroring? _Hint_: string concatenation is not compatible when one of the values is `None` (which is not a string).

## Question 2

Let's fix the error you saw above. Fill in the below code so that when run, the cell prints:

```
We love ðŸŽ¢
Your course staff says: We love ðŸŽ¢
```

and has the below string as cell output:

```
'We love ðŸŽ¢ and ðŸ›¼'
```

In [None]:
proclamation = ...
...
...

# do not edit the below line
proclamation + " and \N{ROLLER SKATE}"

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Part 3: Unemployment Rates Introduction

The United States Bureau of Labor Statistics (BLS) publishes data about jobs in the US: [employment](https://www.bls.gov/charts/employment-situation/civilian-unemployment-rate.htm), [job openings, turnovers](https://www.bls.gov/jlt/jltover.htm), and so on.

Generally, the Bureau of Labor Statistics considers someone **unemployed** if that person does not have a job, has actively looked for work in the prior 4 weeks, and are currently available for work. The **unemployment rate** is the number of unemployed people as a percentage of the labor force.
    
Run the next cell to load the `unemployment_rates` table, taken from the [BLS](https://www.bls.gov/charts/employment-situation/civilian-unemployment-rate.htm). There will be no outputâ€”no output is expected as the cell contains an assignment statement. An assignment statement does not produce any output (it does not yield any value).

In [None]:
# just run this cell
unemployment_rates = Table.read_table("unemployment.csv")

Let's examine our table to see what data it contains.

---

## Question 3.1: `show`

Use the method `show` to display the first 5 rows of `unemployment_rates`.

**Note**: The terms "method" and "function" are technically not the same thing, but for the purposes of this course, we will use them interchangeably.

**Hint**: `tbl.show(3)` will show the first 3 rows of the table named `tbl`. Additionally, make sure not to call `.show()` without an argument. This may crash your kernel!

In [None]:
...

---

## Table attributes: `num_rows` and `num_columns`

We can ask for all sorts of information about the table itself:

In [None]:
unemployment_rates.num_rows

In [None]:
unemployment_rates.num_columns

---

## Question 3.2: `select`

Most of the columns are demographic-specific. If we're not interested in that information, it just makes the table difficult to read. This comes up more than you might think, because people who collect and publish data may not know ahead of time what people will want to do with it.

In such situations, we can use the table method select to choose only the columns that we want in a particular table. It takes any number of arguments. Each should be the name of a column in the table. It returns a new table with only those columns in it. The columns are in the order in which they were listed as arguments.

For example, the value of `unemployment_rates.select("Men, 20 years and over", "Women, 20 years and over")` is a table with only the unemployment rates of men and women 20 years and over. (However, this "drops" the critical time-related Month and Year columns).

**Question**: Use `select` to create a table with only the month, year, and total nationwide unemployment rate for that month. Call that new table `unemployment_totals`.

In [None]:
unemployment_totals = ...
unemployment_totals

In [None]:
grader.check("q3_2")

---

## Question 3.3: `drop`

`drop` serves the same purpose as `select`, but it takes away the columns that you provide rather than the ones that you don't provide. Like `select`, `drop` returns a new table.

**Question**: Suppose you didn't want the rates of the total unemployment, nor the men- and women-specific rates, nor the age-specific ratesâ€”just the rates by race and ethnicity. Use `drop` to create a table that doesn't include these three columns. Call that table `unemployment_races_ethnicities`.

In [None]:
unemployment_races_ethnicities = ...

In [None]:
grader.check("q3_3")

---

## Question 3.4: `sort`

Some details about `sort`:

* The first argument to `sort` is the name of a column to sort by.
* If the column has text in it, `sort` will sort alphabetically; if the column has numbers, it will sort numericallyâ€”both in ascending order by default.
* The value of `unemployment_totals.sort(label)` is a copy of `unemployment_totals`; the `unemployment_totals` table doesn't get modified.
* Rows always "stick together" when a table is sorted. It wouldn't make sense to sort just one column and leave the other columns alone. For example, if we sorted just the "Total" array, the total unemployment rates would all end up with the wrong months and years.

**Question**: Create a version of `unemployment_rates` that's sorted by total unemployment rate ("Total"), with the largest unemployment rate **first**. Call this new table `unemployment_rate_highest`.

_Hint_: To sort in descending order, you can provide the optional argument `descending=True`. See the [Python reference](https://data6.org/notes/reference).

In [None]:
unemployment_rate_highest = ...
unemployment_rate_highest

In [None]:
grader.check("q3_4")

**Discussion (no text entry)**: What month and year had the highest total unemployment rate? What happened globally in that timeframe?

---

## Question 3.5: `where`

We can also filter our table and look at specific rows. `where` takes 2 arguments:

1. The name of a column. `where` finds rows where that column's values meet some criterion.
1. A **predicate** that describes the criterion that the column needs to meet.

For now, we consider an "exact match" predicate. So `where` will return a copy of the original table, but with only rows where the specified column (first argument) is exactly the specified value (second argument).

**Question**: Use `unemployment_rates` to create a table called `unemployment_2025` containing unemployment rates for just `2025`. Note there are less than 12 rows because 2025 is not over yet.

In [None]:
unemployment_2025 = ...
unemployment_2025

In [None]:
grader.check("q3_5")

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Part 4: Unemployment over Time

This question is more exploratory. A **time series** of data is sequential measurements of the same variable*, taken at regular intervals over time. (*We'll discuss what "variable" means in a social science context very soon.)

The data in `unemployment_rates` is a time series. We would like to visualize a two-dimensional line plot of the total national unemployment rate over time. However, if we use `unemployment_rates` to try to plot "Total" (on the y-axis) by "Month" (on the x-axis), we get a bizarre plot:

In [None]:
# just run this cell
# we will learn much more about visualization syntax
# in the next unit.
unemployment_rates.plot("Month", "Total")

What's happening here? Simply put, the above plot did exactly what we wanted: It took the "Month" and "Total" values of each record in the table and plotted it, then connected the dots together. But two August unemployment rates can correspond to entirely different years (and therefore different times)! However, they would be plotted at the same horizontal value (here, August is 8). 

This issue happens because months are cyclicâ€”meaning that once we hit December (12) of one year, the next month is January (1) of the next year. 

In this question, we seek to correct thisÂ visualization by using arrays to create a new strictly _increasing_ measure of time.

## Question 4.1: `column`

Before, we used `select` to create tables with a specific set of columns. Sometimes we may want to get a specific column as an array, so that we can perform additional processing.

The `column` method takes a column label or index and returns a copy of the values of that column as a NumPy array. For example, the below cell copies the "Total" column in `unemployment_rates` and stores it in the array `totals`:

In [None]:
# just run this cell
totals = unemployment_rates.column("Total")
type(totals)

**Question**: Create two arrays `months` and `years` that are the columns of "Month" and "Year", respectively, in `unemployment_rates`.

In [None]:
months = ...
years = ...
Table().with_columns("Month", months, "Year", years)

(Above, the last line ["pretty prints"](https://en.wikipedia.org/wiki/Pretty-printing) your results in table format, making it easier to visually check if you've done things correctly. See if you can understand what it's doing!)

In [None]:
grader.check("q4_1")

---

## Question 4.2

Let's use the `months` and `years` arrays to construct a new array that accurately captures the time that the record was taken.

We'll do this by constructing a measure of time as fractions that combines months and years. We'll say that the first month (January) of 2006 is `2006.0`, the next month (February) is 2006 and 1/12 (or `2006.083333...`), the next month (March) is 2006 and 1/6 (or `2006.166666...`), and so on. This will give us 12 equally-spaced floating point values for each year, one corresponding to each month, chronologically sorted.

**Question**: Use `months` and `years` to create an array of times that follows this pattern and assign it to `times`. Your answer should involve array arithmetic.

_Hints_: This question is challenging! To help you debug:

1. The first month of every year, January, maps to a whole number (e.g., `2006.0`).
1. Like before, the last line "pretty prints" so you can check your work.

In [None]:
times = ...
Table().with_columns("Month", months, "Year", years, "Time", times)

In [None]:
grader.check("q4_2")

---

## Question 4.3: `with_columns`

We can create new tables by first making an empty `Table`, then calling `with_columns` on that table.

The `with_column` method requires two arguments:

1. The name of the column as a string
1. An array of values to put in the column

For example, `my_tbl.with_columns("My New Column", my_array)` would add to the table `my_tbl` a new column labeled "My New Column" with the values in `my_array`.

As a second example, `Table().with_columns("My New Column", my_array)` creates a one-column table.

**Question**: Create a new table with two columns labeled "Time" (using `times`, the fractional time you computed in the previous question), and "Total Unemployment" (using `totals`, the total unemployment rate per month). Call this new table `unemployment_over_time`.

In [None]:
unemployment_over_time = ...
unemployment_over_time

In [None]:
grader.check("q4_3")

---

## Question 4.4

Finally, let's use this new table to visualize our time series:

In [None]:
# just run this cell
# we will learn much more about visualization syntax
# in the next unit.
unemployment_over_time.plot("Time", "Total Unemployment")

That looks much better! Notice that the visual spike in unemployment rate matches your conclusion from the tabular sorting you did earlier in lab.

<hr>

## Done! ðŸ˜‡

## Pets of Data 6
Enjoy the grass and shade during this September heat! Congrats on finishing Lab 3.

<img src="paulina.JPG" width="50%" alt="Fluffy dog on grass"/>

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)