In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("lab.ipynb")

# Lab 1 – Introduction

## DSC 80, Fall 2023

### Due Date: Monday, October 9th at 11:59PM

## Instructions

Welcome to the first assignment in DSC 80 this quarter!

Much like in DSC 10, this Jupyter Notebook contains the statements of the problems and provides code and Markdown cells to display your answers to the problems. Unlike DSC 10, the notebook is *only* for displaying a readable version of your final answers. The coding will be done in an accompanying `lab.py` file that is imported into the current notebook, and **you will only submit that `lab.py` file**, not this notebook!

Some additional guidelines:
- **Unlike in DSC 10, labs will have both public tests and hidden tests.** The bulk of your grade will come from your scores on hidden tests, which you will only see on Gradescope after the assignment deadline.
- **Do not change the function names in the `lab.py` file!** The functions in the `lab.py` file are how your assignment is graded, and they are graded by their name. If you changed something you weren't supposed to, you can find the original code in the [course GitHub repository](https://github.com/dsc-courses/dsc80-2023-fa).
- Notebooks are nice for testing and experimenting with different implementations before designing your function in your `lab.py` file. You can write code here, but make sure that all of your real work is in the `lab.py` file, since that's all you're submitting.
- **To ensure that all of your work to be submitted is in `lab.py`, we've provided an additional uneditable notebook, called `lab-validation.ipynb`, that contains only the tests and their setup. Make sure you are able to run it top-to-bottom without error before submitting!**
- You are encouraged to write your own additional helper functions to solve the lab, as long as they also end up in `lab.py`.

**Importing code from `lab.py`**:

* Below, we import the `.py` file that's contained in the same directory as this notebook.
* We use the `autoreload` notebook extension to make changes to our `lab.py` file immediately available in our notebook. Without this extension, we would need to restart the notebook kernel to see any changes to `lab.py` in the notebook.
    - `autoreload` is necessary because, upon import, `lab.py` is compiled to bytecode (in the directory `__pycache__`). Subsequent imports of `lab` merely import the existing compiled python.

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from lab import *

In [4]:
import os
import io
import pandas as pd
import numpy as np

### Infrastructure Summary

Run the following cell to see a [YouTube video 🎥](https://youtu.be/PPKXJqu2XmY) that summarizes the above information and walks you through how to
- set up your programming environment (see the instructions in [Tech Support](https://dsc80.com/tech_support) for more details),
- access assignments,
- work on and test assignments, and
- submit assignments.

The video is also linked on the [Resources tab of the course website](https://dsc80.com/resources).

<span style="color:red"> **Note:**</span> The instruction video is cloning from the repository `dsc80-2023-wi`, but you need to clone from `dsc80-2023-fa`.


In [5]:
from IPython.display import YouTubeVideo
YouTubeVideo('PPKXJqu2XmY')

Let's get started! 🎉

## Part 1: Python Basics 🐍

### Question 0 – Consecutive Integers

Complete the implementation of the function `consecutive_ints`, which takes in a possibly empty list of integers (`ints`) and returns `True` if there exist two adjacent list elements that are consecutive integers and `False` otherwise.

For example, since `9` is next to `8`, `consecutive_ints([5, 3, 6, 4, 9, 8])` should evaluate to `True`, since `9` and `8` are consecutive integers. On the other hand, `consecutive_ints([1, 3, 5, 7, 9])` should evaluate to `False`.

***Note***: If you look at `lab.py`, you'll notice that the solution to this problem is already there. This question is done for you to show you what a completed homework problem looks like.

In [6]:
consecutive_ints([1, 3, 5, 7, 9])

False

In [7]:
consecutive_ints([1, 3, 5, 7, 8])

True

In [8]:
consecutive_ints([])

False

To run the public tests on your code for a given question, run the cell containing a call to `grader.check` that immediately follows it. 

Remember, your grade will primarily be determined by hidden tests, which are **not** run when you run `grader.check`, so it's important to extensively test your functions on your own by calling them on different inputs. Does they work for edge cases? Real-world data is **very messy** and you should expect your data processing code to break without thorough testing!

You can write custom tests either by calling your functions on different inputs here in the notebook, or by writing doctests in `lab.py`, as you did in DSC 20.

In [9]:
grader.check("q0")

### Question 1 – Median vs. Mean

Complete the implementation of the function `median_vs_mean`, which takes in a non-empty list of numbers (`nums`) and returns `True` if median of the list is less than or equal to the mean of the list and `False` otherwise.

Recall, if a list has even length, the median is the mean of the middle two elements.

***Note:*** In this question, you may only use built-in functions and methods in Python. You should not use `numpy` or `pandas` at all, nor should you import any additional packages.

In [10]:
median_vs_mean([100, 100, -100, 54])

False

In [11]:
grader.check("q1")

### Question 2 – Same Difference

Complete the implementation of the function `same_diff_ints`, which takes in a list of integers (`ints`) and returns `True` if there exist two list elements $i$ positions apart, whose absolute difference as integers is also $i$. If there are no two elements satisfying this condition, `same_diff_ints` should return `False`.

For example, because `3` (position 1) `5` (position 3) are 2 positions apart, and $|3-5| = 2$:
```py
>>> same_diff_ints([5, 3, 1, 5, 9, 8])
True
```
Whereas:
```py
>>> same_diff_ints([1, 3, 5, 7, 9])
False
```

**Important:** While implementing `same_diff_ints`, we will assume that `ints` tends to satisfy the condition, and that the pair(s) saitifying the condition tend to be close together. As such, you must implement `same_diff_ints` such that it **runs quicker in cases where the pairs are close together than in cases where the pairs are further apart**. While you will still likely need a nested `for`-loop, this will inform how you configure your loop variables. (Optimizing your code for an assumed distribution of incoming data is very common in data science).

***Hint 1:*** This is similar to Question 0.

***Hint 2:*** Make sure to define some extreme test cases, like when `ints` is an empty list. Also, use the `%%time` magic command to time your function, to make sure it satisfies the optimization requirement above.

In [12]:
same_diff_ints([1, 3, 5, 7, 9])

False

Make sure your function runs in under 5 seconds.

In [13]:
%%time
same_diff_ints([5, 3, 1, 5, 9, 8])

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 5.72 µs


True

In [14]:
grader.check("q2")

## Part 2: Strings and Files 🧵

The following questions will familiarize you with the basics of working with strings and reading data from files. Remember that by default, data from files are stored as strings in Python.

### Question 3 – $n$ Prefixes

Complete the implementation of the function `n_prefixes`, which takes a string `s` and a positive integer `n`. It returns a string containing the first `n` consecutive prefixes of `s` in reverse order.

For example, let's suppose `s` is the string `'Billy!'` and `n` is `4`. The consecutive prefixes of `'Billy!'` are:
- `'B'`
- `'Bi'`
- `'Bil'`
- `'Bill'`
- `'Billy'`
- `'Billy!'`

The first 4 of these are `'B'`, `'Bi'`, `'Bil'`, and `'Bill'`. If we combine these 4 in reverse order, we get `'BillBilBiB'`, which is what `n_prefixes('Billy!', 4)` should return. As another example, `n_prefixes('Marina', 3)` should return `'MarMaM'`. **You may assume that `n` is no larger than the length of `s`.**

***Hint:*** Recall that [strings may be sliced](https://docs.python.org/3/tutorial/introduction.html#strings), like lists.

In [15]:
n_prefixes('Marina', 3)
# n_prefixes('Billy!', 4)

'MarMaM'

In [16]:
grader.check("q3")

### Question 4 – Exploded Numbers 💣

Complete the implementation of the function `exploded_numbers`, which takes in a list of integers (`ints`) and a non-negative integer (`n`) and **returns a list of strings** containing numbers from the list expanded by `n` numbers in both directions, separated by spaces. Each integer should be [zero padded](https://www.tutorialspoint.com/python/string_zfill.htm) so that all integers outputted have the same length.

For example, consider `exploded_numbers([3, 8, 15], 2)`.
- If we explode 3 by 2 numbers in both directions, we get 1, 2, 3, 4, 5.
- If we explode 8 by 2 numbers in both directions, we get 6, 7, 8, 9, 10.
- If we explode 15 by 2 numbers in both directions, we get 13, 14, 15, 16, 17.

The longest length of any of the exploded numbers above is 2, so all of the outputted integers should have length 2.

- The string corresponding to 3 in the input is `'01 02 03 04 05'`.
- The string corresponding to 8 in the input is `'06 07 08 09 10'`.
- The string corresponding to 15 in the input is `'13 14 15 16 17'`.

So, `exploded_numbers([3, 8, 15], 2)` should return `['01 02 03 04 05', '06 07 08 09 10', '13 14 15 16 17']`. 

As another example, `exploded_numbers([9, 99], 3)` should return `['006 007 008 009 010 011 012', '096 097 098 099 100 101 102']`.

***Note***: You can assume that negative numbers will never be encountered. That is, when testing your code, we will never explode a number so much that it becomes negative.

In [17]:
exploded_numbers([3, 8, 15], 2)

['01 02 03 04 05', '06 07 08 09 10', '13 14 15 16 17']

In [18]:
grader.check("q4")

### Question 5 – Reading Files

[Recall](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) that the built-in function `open` takes in a file path and returns *a file object* (sometimes called a *file handle*). Below are a few properties of file objects:

* `open(path)` opens the file at location `path` for reading.
* `open(path)` is an *iterable*, which contains successive lines of the file.
* Once a file object is opened, after use it should be closed to avoid memory leaks. To ensure a file is closed once done, you should use a *context manager* as follows:
```py
with open(path) as fh:
    for line in fh:
        process_line(line)
```
* To read the entire file into a string, use the `read` method:
```py
with open(path) as fh:
    s = fh.read()
```

However, you should be careful when reading an entire file into memory that the file isn't too big! *You should avoid this whenever possible!*

Complete the implementation of the function `last_chars`, which takes in file object (`fh`) and returns a string consisting of the last character of each line. Note that you don't have to use `open` at all; the argument given to you is a file object, not a file path.

***Note:*** A newline (`'\n'`) is the "delimiter" of the lines of a file, and doesn't count as part of the line (as the tests imply). Every other character is part of the line. For more info on this, see [the interpretation](https://en.wikipedia.org/wiki/Newline#Interpretation) of files as a 'newline delimited variables' file.

If your implementation is correct, you should see `'hrg'` when running the cell below:

In [19]:
fp = os.path.join('data', 'chars.txt')
last_chars(open(fp))

'hrg'

In [20]:
grader.check("q5")

## Part 3: `numpy` exercises 🥧

For a refresher on `numpy` and arrays, refer to the relevant section of the [DSC 10 course notes](https://notes.dsc10.com/02-data_sets/arrays.html).

### Question 6 – Array Methods

Complete the implementations of the functions `add_root` and `where_square`. Specifications are given below. Your solutions should **not** contain any loops or list comprehensions.

#### `add_root`

`add_root` should take in a `numpy` array, `A`, and return a new `numpy` array that contains the element-wise sum of the elements in `A` with the _square roots of the positions of the elements in `A`_. 

For instance, if `A` contains the values 5, 9, and 4, the output array should contain the values 5 (5 + $\sqrt{0}$), 10 (9 + $\sqrt{1}$), and 5.4142... (4 + $\sqrt{2}$).

<br>

#### `where_square`

`where_square` should take in a `numpy` array, `A`, and return a new `numpy` array of Booleans whose `i`th element is `True` if and only if the `i`th element of `A` is a perfect square. 

For instance, `where_square(np.array([2, 9, 16, 15]))` should return `array([False, True, True, False])`.

In [21]:
add_root(np.array([5, 9, 4]))

array([ 5.        , 10.        ,  5.41421356])

In [22]:
where_square(np.array([2, 9, 16, 15]))

array([False,  True,  True, False])

In [23]:
# Don't change this cell -- it is needed for the tests to work
A_1 = np.array([2, 4, 6, 7])
out_1 = add_root(A_1)

A_2 = np.array([1, 2, 16, 17, 32, 49])
out_2 = where_square(A_2)

In [24]:
grader.check("q6")

### Question 7 – Stock Prices 📈

Complete the implementations of the functions `growth_rates` and `with_leftover`. Specifications are given below. Your solutions should **not** contain any loops or list comprehensions.

#### `growth_rates`

`growth_rates` should take in a `numpy` array, `A`, of [stock prices](https://en.wikipedia.org/wiki/Stock) for a single stock on successive days in USD. It should return an array of growth rates. That is, the `i`th number of the returned array should contain the rate of growth in stock price between the $i^{th}$ day to the $(i+1)^{th}$ day. The growth rate between two values is defined as $\frac{\text{final} - \text{initial}}{\text{initial}}$. You should return growth rates as **proportions, rounded to two decimal places**.

<br>

#### `with_leftover`

Again, suppose `A` is a `numpy` array of stock prices. Consider the following scheme: 

- Suppose that you start each day with \$20 to purchase stocks. 
- Each day, you purchase as many shares as possible of the stock. (The price changes each day, according to `A`.)
- Any money left-over after a given day is saved for possibly buying stock on a future day.

The function `with_leftover` should take in `A` and return the day (as an `int`) on which you can buy at least one full share using just "left-over" money. If this never happens, return `-1`. Note that the first stock purchase occurs on Day 0, and that you cannot purchase fractions of a share of a stock.

For example, if the stock price is \$3 every day, then the answer is `1` (corresponding to Day 1):
- Day 0: Buy 6 stocks with \\$20, and \\$2 is added to the leftover. Your total leftover is currently \\$2. This is not enough to buy one extra share, so you continue.
- Day 1: Buy 6 stocks with \\$20, and another \\$2 is added to the leftover. Your total leftover is now \\$4, so you can now buy one extra share. Hence, the answer is Day 1, and `with_leftover` should return `1`.

***Hint:*** `np.cumsum` may be helpful.

In [25]:
growth_rates(np.array([1, 2, 3, 4]))

array([1.  , 0.5 , 0.33])

In [26]:
with_leftover(np.array([3, 3, 3, 3]))

1

In [27]:
# Don't change this cell -- it is needed for the tests to work
fp = os.path.join('data', 'stocks.csv')
stocks = np.array([float(x) for x in open(fp)])
out_3_stocks = growth_rates(stocks)

A_4 = np.array([3, 3, 3, 3])
out_4 = with_leftover(A_4)

In [28]:
grader.check("q7")

## Part 4: Introduction to `pandas` 🐼

This part will help build familiarity with DataFrames in `pandas`. Fortunately, you've already a version of `pandas` before in DSC 10, called `babypandas`! Review the [DSC 10 course notes](https://notes.dsc10.com/02-data_sets/dataframes.html) as necessary.

One key difference between `babypandas` and `pandas` is the idiomatic way of accessing a column. In `babypandas`, to access column `'x'` in DataFrame `df`, you used `df.get('x')`. In `pandas`, the more common way is `df['x']`.

As always for `pandas` questions:
1. Avoid writing loops through the rows of the DataFrame to do the problem, and
2. Test the output/correctness of your code with the help of the dataset given, but be sure your code will also run on data that is similar to but different from the dataset given. (One way to do this is to sample rows from the provided DataFrame using the `.sample` method).

The file `data/salary.csv` contains salary information for the 2021-22 National Basketball Association (NBA) season 🏀. Specifically, it contains the name, team, and salary of all players who have played at least 15 games last season. We will load this file and store it as a DataFrame named `salary`.

In [29]:
# Do not edit this cell -- it is needed for the tests
salary_fp = os.path.join('data', 'salary.csv')
salary = pd.read_csv(salary_fp)
salary.head()

Unnamed: 0,Player,Position,Team,Salary
0,John Collins,PF,Atlanta Hawks,23000000
1,Danilo Gallinari,PF,Atlanta Hawks,20475000
2,Bogdan Bogdanović,SG,Atlanta Hawks,18000000
3,Clint Capela,C,Atlanta Hawks,17103448
4,Delon Wright,SG,Atlanta Hawks,8526316


### Question 8 – `pandas` Basics

Your job is to complete the implementation of the function `salary_stats`, which takes in a DataFrame like `salary` and returns a **Series** containing the following statistics:
- `'num_players'`: The number of players.
- `'num_teams'`: The number of teams.
- `'total_salary'`: The total salary amount for all players.
- `'highest_salary'`: The name of the player with the highest salary. **Assume there are no ties.**
- `'avg_los'`: The average salary of the `'Los Angeles Lakers'`, rounded to two decimal places.
- `'fifth_lowest'`: The name and team of the player who has the fifth lowest salary, separated by a comma and a space (e.g. `'Billy Triton, Cleveland Cavaliers'`). **Assume there are no ties.**
- `'duplicates'`: A Boolean that is `True` if there are any duplicate last names, and `False` otherwise. Note that some players may have a suffix on their name, such as "Jr." or "III" -- you should ignore these. That is, "Billy Triton Jr." and "Tyler Triton" should be considered to have the same last name.
- `'total_highest'`: The total salary of the team that has the highest paid player.

The index of each element in the outputted Series is specified above.

***Note 1***: Your function should work on a dataset of the same format that contains information from other years. This means that `salary_stats` should not "hard-code" any numbers or strings, but should compute them all programatically. In all cases, you may assume that none of the answers involving ranking involves a tie.

***Note 2***: The doctests and public tests don't test to see if your function returns the right numbers. You should manually inspect your result to make sure that all values seem appropriate.

In [30]:
salary_stats(salary)

num_players                                   381
num_teams                                      30
total_salary                           3433118794
highest_salary                      Stephen Curry
avg_los                           13266896.818182
fifth_lowest      Miye Oni, Oklahoma City Thunder
duplicates                                   True
total_highest                           130428103
dtype: object

In [31]:
# Do not edit this cell -- it is needed for the tests
salary_fp = os.path.join('data', 'salary.csv')
salary = pd.read_csv(salary_fp)
stats = salary_stats(salary)

salary_sample = pd.read_csv('data/salary_sample.csv')
sample_stats = salary_stats(salary_sample)

In [32]:
grader.check("q8")

### Question 9 – Reading Malformed `.csv` Files

`data/malformed.csv` is a file of comma-separated values, containing the following fields:


|column name|description|type|
|---|---|---|
|`'first'`|first name of person|`str`|
|`'last'`|last name of person|`str`|
|`'weight'`|weight of person (lbs)|`float`|
|`'height'`|height of person (in)|`float`|
|`'geo'`|location of person; comma-separated latitude/longitude|`str`|

Unfortunately, the entries contains errors that cause `pandas`' `read_csv` function to fail parsing the file with the default settings. Instead, you must read in the file manually using Python's built-in `open` function.

Complete the implementation of the function `parse_malformed`, which takes in a file path (`fp`) and returns a parsed, properly-typed DataFrame. The DataFrame should contain columns as described in the table above (with the specified types); it should agree with `pd.read_csv` when the lines are not malformed.

***Note:*** Assume that the given `.csv` file is a sample of a larger file; you will be graded against a **different** sample of the larger file that has the same type of parsing errors. That is, you should **not** hard-code your cleaning of the data to specific errors on specific lines in the data.

***Hint:*** Open `data/malformed.csv` in your text editor, and look very carefully at the placement of commas (`,`) and quotes (`"`). The first few rows of `parse_malformed('data/malformed.csv')` should be:

<img src="./imgs/example-df.png" width=45%>

In [42]:
# parse_malformed('data/malformed.csv').dtypes
parse_malformed('data/malformed.csv')['geo'].unique()

Julia,Wagner,142.0,86.0,39.8,15.4

Angelica,Rija,155.0,56.0,38.2,-71.7

Tyler,Micajah,116.0,73.0,38.0,6.9

Kathleen,Nakea,163.0,69.0,36.3,-86.8

Axel,Ronit,95.0,74.0,36.8,128.2

Amiya,Kyona,130.0,72.0,36.3,114.5

Torrey,Joshuacaleb,105.0,79.0,38.3,145.1

Mariah,Alese,149.0,68.0,36.1,45.7

Grayson,Daimen,140.0,80.0,38.1,-72.6

Yvette,Trayce,179.0,67.0,36.9,-8.3

Cody,Hatim,150.0,63.0,38.0,-7.3

Marissa,Daud,135.0,58.0,37.3,11.0

Logan,Cristel,133.0,67.0,35.5,-110.2

Kaiyah,Brinden,187.0,82.0,34.8,83.2

Ivan,Devyne,193.0,54.0,36.6,262.0

Shamaria,Aldrick,139.0,73.0,38.5,-94.6

Travis,Anavictoria,117.0,62.0,36.3,69.5

Kennedy,Dalynn,171.0,77.0,37.3,-27.5

Alina,Danniell,105.0,55.0,37.4,314.7

Cameron,Angelica,139.0,56.0,38.8,-79.3

Madison,Barkley,120.0,69.0,38.2,86.1

Jackson,Taylr,113.0,78.0,36.7,56.7

Agustin,Stephanye,91.0,62.0,36.4,54.5

Janesha,Jhayla,143.0,64.0,35.9,-70.5

Nickolas,Karenna,159.0,75.0,35.9,-73.9

Stacy,Meaghen,149.0,68.0,36.6,-27.7

Matthew,Kalis,166.0,66.0,37.8,0.2

array(['39.8,15.4', '38.2,-71.7', '38.0,6.9', '36.3,-86.8', '36.8,128.2',
       '36.3,114.5', '38.3,145.1', '36.1,45.7', '38.1,-72.6', '36.9,-8.3',
       '38.0,-7.3', '37.3,11.0', '35.5,-110.2', '34.8,83.2', '36.6,262.0',
       '38.5,-94.6', '36.3,69.5', '37.3,-27.5', '37.4,314.7',
       '38.8,-79.3', '38.2,86.1', '36.7,56.7', '36.4,54.5', '35.9,-70.5',
       '35.9,-73.9', '36.6,-27.7', '37.8,0.2', '37.2,140.1', '38.6,-63.6',
       '36.2,-2.6', '37.1,-56.1', '39.1,93.6', '38.2,16.2', '37.6,155.0',
       '36.9,-52.0', '36.8,182.4', '36.7,69.2', '37.2,56.9', '37.3,79.6',
       '37.3,123.5', '38.6,-43.4', '38.1,-106.3', '38.0,-21.9',
       '37.2,71.2', '36.2,-61.4', '37.0,49.6', '37.7,113.6', '37.0,-32.3',
       '37.5,74.6', '36.9,83.6', '35.5,-12.4', '38.6,12.4', '37.6,64.6',
       '36.6,36.9', '36.5,28.2', '38.0,36.4', '35.0,18.0', '36.9,-58.9',
       '36.2,16.0', '38.0,86.0', '38.2,252.0', '38.2,-57.5', '37.7,-93.7',
       '36.4,88.1', '36.7,139.3', '36.2,51.4', '36.4,20.5

In [34]:
"Emily,Leonid,146.0,57.0,37.8,-68.7,".strip(',')

'Emily,Leonid,146.0,57.0,37.8,-68.7'

In [35]:
# Do not edit -- needed for tests
fp = os.path.join('data', 'malformed.csv')
cols = ['first', 'last', 'weight', 'height', 'geo']
df = parse_malformed(fp)
dg = pd.read_csv(fp, nrows=4, skiprows=10, names=cols)

Julia,Wagner,142.0,86.0,39.8,15.4

Angelica,Rija,155.0,56.0,38.2,-71.7

Tyler,Micajah,116.0,73.0,38.0,6.9

Kathleen,Nakea,163.0,69.0,36.3,-86.8

Axel,Ronit,95.0,74.0,36.8,128.2

Amiya,Kyona,130.0,72.0,36.3,114.5

Torrey,Joshuacaleb,105.0,79.0,38.3,145.1

Mariah,Alese,149.0,68.0,36.1,45.7

Grayson,Daimen,140.0,80.0,38.1,-72.6

Yvette,Trayce,179.0,67.0,36.9,-8.3

Cody,Hatim,150.0,63.0,38.0,-7.3

Marissa,Daud,135.0,58.0,37.3,11.0

Logan,Cristel,133.0,67.0,35.5,-110.2

Kaiyah,Brinden,187.0,82.0,34.8,83.2

Ivan,Devyne,193.0,54.0,36.6,262.0

Shamaria,Aldrick,139.0,73.0,38.5,-94.6

Travis,Anavictoria,117.0,62.0,36.3,69.5

Kennedy,Dalynn,171.0,77.0,37.3,-27.5

Alina,Danniell,105.0,55.0,37.4,314.7

Cameron,Angelica,139.0,56.0,38.8,-79.3

Madison,Barkley,120.0,69.0,38.2,86.1

Jackson,Taylr,113.0,78.0,36.7,56.7

Agustin,Stephanye,91.0,62.0,36.4,54.5

Janesha,Jhayla,143.0,64.0,35.9,-70.5

Nickolas,Karenna,159.0,75.0,35.9,-73.9

Stacy,Meaghen,149.0,68.0,36.6,-27.7

Matthew,Kalis,166.0,66.0,37.8,0.2

In [36]:
grader.check("q9")

## Congratulations! You're done Lab 1! 🏁

As a reminder, all of the work you want to submit needs to be in `lab.py`.

To verify that all of your work is indeed in `lab.py`, and that you didn't accidentally implement a function in this notebook and not in `lab.py`, we've included another notebook in the lab folder, called `lab-validation.ipynb`. `lab-validation.ipynb` is a version of this notebook with only the `grader.check` cells and the code needed to set up the tests. 

### **Go to `lab-validation.ipynb`, and go to Kernel > Restart & Run All.** This will check if all `grader.check` test cases pass using just the code in `lab.py`.

Once you're able to pass all test cases in `lab-validation.ipynb`, including the call to `grader.check_all()` at the very bottom, then you're ready to submit your `lab.py` (and only your `lab.py`) to Gradescope. Once submitting to Gradescope, make sure to stick around until all test cases pass.

There is also a call to `grader.check_all()` below in _this_ notebook, but make sure to also follow the steps above.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [37]:
grader.check_all()

q0 results: All test cases passed!

q1 results: All test cases passed!

q2 results: All test cases passed!

q3 results: All test cases passed!

q4 results: All test cases passed!

q5 results: All test cases passed!

q6 results: All test cases passed!

q7 results: All test cases passed!

q8 results: All test cases passed!

q9 results:
    q9 - 1 result:
        Test case passed!

    q9 - 2 result:
        Test case passed!

    q9 - 3 result:
        Test case passed!

    q9 - 4 result:
        Trying:
            df['geo'].str.contains(',').all()
        Expecting:
            True
        **********************************************************************
        Line 1, in q9 3
        Failed example:
            df['geo'].str.contains(',').all()
        Expected:
            True
        Got:
            False

    q9 - 5 result:
        Test case passed!

    q9 - 6 result:
        Test case passed!