In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("lab.ipynb")

# Lab 1 – Introduction

## DSC 80, Spring 2022

### Due Date: Monday, April 4th at 11:59 PM

## Instructions

Welcome to the first assignment in DSC 80 this quarter!

Much like in DSC 10, this Jupyter Notebook contains the statements of the problems and provides code and Markdown cells to display your answers to the problems. Unlike DSC 10, the notebook is *only* for displaying a readable version of your final answers. The coding will be done in an accompanying `lab.py` file that is imported into the current notebook.

Labs and programming assignments will be graded in (at most) two ways:
1. The functions and classes in the accompanying `.py` file will be tested (a la DSC 20),
2. The notebook may be graded (if it contains free response questions or asks you to draw plots).

**Note**: Labs will have public tests and private tests. The public "smoke tests" that you will run below and which appear on Gradescope are generally worth no points. After the due date, we will replace these tests with private tests that will determine your grade. This is different from DSC 10, where labs only had public tests!

**Do not change the function names in the `*.py` file!**
- The functions in the `*.py` file are how your assignment is graded, and they are graded by their name.
- If you changed something you weren't supposed to, just use git to revert! Ask us if you need help with this, or google around for `git revert`.

**Tips for working in the notebook**:
- The notebooks serve to present the questions and give you a place to present your results for later review.
- The notebooks in *lab assignments* are not graded (only the `.py` file is submitted and graded).
- Notebooks for *projects* will serve as a final report for the assignment, and contain conclusions and answers to open ended questions that are graded.
- The notebook serves as a nice environment for 'pre-development' and experimentation before designing your function in your `.py` file. You can write code here, but make sure that all of your real work is in the `.py` file.

**Tips for developing in the `.py` file**:
- Do not change the function names in the starter code; grading is done using these function names.
- Do not change the docstrings in the functions. These are there to tell you if your work is on the right track!
- You are encouraged to write your own additional helper functions to solve the lab! 
    - Developing in python usually consists of larger files, with many short functions.
    - You may write your other functions in an additional `.py` file that you import in `lab.py` (much like we do in the notebook).
- Always document your code!

**Importing code from `lab.py`**:

* Below, we import the `.py` file that's contained in the same directory as this notebook.
* We use the `autoreload` notebook extension to make changes to our `lab.py` file immediately available in our notebook. Without this extension, we would need to restart the notebook kernel to see any changes to `lab.py` in the notebook.
    - `autoreload` is necessary because, upon import, `lab.py` is compiled to bytecode (in the directory `__pycache__`). Subsequent imports of `lab` merely import the existing compiled python.

In [53]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [54]:
from lab import *

In [55]:
import os
import io
import pandas as pd
import numpy as np
import doctest

### Infrastructure Summary

Run the following cell to see a [YouTube video 🎥](https://youtu.be/FpTo4AM9B30) that summarizes the above information and walks you through how to
- set up your programming environment (see the instructions in [Tech Support](https://dsc80.com/tech_support) for more details),
- access assignments,
- work on and test assignments, and
- submit assignments.

The video is also linked on the [Resources tab of the course website](https://dsc80.com/resources).

In [5]:
from IPython.display import YouTubeVideo
YouTubeVideo('FpTo4AM9B30')

Let's get started! 🎉

## Part 1: Python Basics 🐍

### Question 0 – Consecutive Integers

Complete the implementation of the function `consecutive_ints`, which takes in a possibly empty list of integers (`ints`) and returns `True` if there exist two adjacent list elements that are consecutive integers and `False` otherwise.

For example, since `9` is next to `8`, `consecutive_ints([5, 3, 6, 4, 9, 8])` should evaluate to `True`, since `9` and `8` are consecutive integers. On the other hand, `consecutive_ints([1, 3, 5, 7, 9])` should evaluate to `False`.

***Note***: If you look at `lab.py`, you'll notice that the solution to this problem is already there. This question is done for you to show you what a completed homework problem looks like.

In [5]:
# The cells below are here for you to write scratch work in. 
# You should write the code for your answer in `lab.py`, not here.

In [6]:
consecutive_ints([5,3,6,4,9,8])

True

In [7]:
consecutive_ints([1, 3, 5, 7, 9])

False

There are two ways to test your code:

1. Run the cell below containing `grader.check` to test your code.
2. Run doctests on `lab.py` by running the following command on the command line:
```
python -m doctest lab.py
```
If the doctests pass, then there should be *no* output.

The `grader.check` tests in your notebook will **always include** the doctests, so you do not need to run both. However, it is a good idea to run the doctests separately in the command line because that will ensure that all of your code is in `lab.py`, which is where it needs to be.

In addition, you should also try writing some of your own tests by calling your functions on different inputs. Does it work for corner cases? Real-world data is **very messy** and you should expect your data processing code to break without thorough testing!

In [9]:
grader.check("q0")

### Question 1 – Median vs. Mean

Complete the implementation of the function `median_vs_mean`, which takes in a non-empty list of numbers (`nums`) and returns `True` if median of the list is less than or equal to the mean of the list and `False` otherwise.

Recall, if a list has even length, the median is the mean of the middle two elements.

***Note:*** In this question, you may only use built-in functions and methods in Python. You should not use `numpy` or `pandas` at all, nor should you import any additional packages.

In [9]:
int(7.9)

7

In [11]:
int(53/2)

26

In [8]:
mean([1,2,3])

2.0

In [11]:
sorted([1,5,8,3,6,3,9,10])

[1, 3, 3, 5, 6, 8, 9, 10]

In [12]:
median([1,5,8,3,6,3,9,10])

5.5

In [13]:
median([1,2,3,4,5])

3

In [14]:
 median_vs_mean([6, 5, 4, 3, 2])

True

In [15]:
median_vs_mean([50, 20, 15, 40])

True

In [16]:
median_vs_mean([1, 8, 9])

False

In [17]:
grader.check("q1")

### Question 2 – Same Difference

Complete the implementation of the function `same_diff_ints`, which takes in a list of integers (`ints`) and returns `True` if there exist two list elements $i$ positions apart, whose absolute difference as integers is also $i$. If there are no two elements satisfying this condition, `same_diff_ints` should return `False`.

For example, because `3` (position 1) `5` (position 3) are 2 positions apart, and $|3-5| = 2$:
```py
>>> same_diff_ints([5, 3, 1, 5, 9, 8])
True
```
Whereas:
```py
>>> same_diff_ints([1, 3, 5, 7, 9])
False
```

**Important:** While implementing `same_diff_ints`, we will assume that `ints` tends to satisfy the condition, and that the pair(s) saitifying the condition tend to be close together. As such, you must implement `same_diff_ints` such that it **runs quicker in cases where the pairs are close together than in cases where the pairs are further apart**. While you will still likely need a nested `for`-loop, this will inform how you configure your loop variables. (Optimizing your code for an assumed distribution of incoming data is very common in data science).

***Hint 1:*** This is similar to Question 0.

***Hint 2:*** Make sure to define some extreme test cases, like when `ints` is an empty list. Also, use the `%%time` magic command to time your function, to make sure it satisfies the optimization requirement above.

In [94]:
print(same_diff_ints([0,1,2,3]))


True


In [93]:
same_diff_ints([5,4])

True

In [91]:
same_diff_ints([1, 3, 5, 7, 9,90])

False

In [92]:
same_diff_ints([1, 3, 5, 7, 9])

False

Make sure your function runs in under 5 seconds.

In [104]:
%%time
same_diff_ints([5, 3, 1, 5, 9, 8])

CPU times: user 8 µs, sys: 3 µs, total: 11 µs
Wall time: 13.8 µs


True

In [105]:
grader.check("q2")

## Part 2: Strings and Files 🧵

The following questions will familiarize you with the basics of working with strings and reading data from files. Remember that by default, data from files are stored as strings in Python.

### Question 3 – $n$ Prefixes

Complete the implementation of the function `n_prefixes`, which takes a string `s` and a positive integer `n`. It returns a string containing the first `n` consecutive prefixes of `s` in reverse order.

For example, let's suppose `s` is the string `'Billy!'` and `n` is `4`. The consecutive prefixes of `'Billy!'` are:
- `'B'`
- `'Bi'`
- `'Bil'`
- `'Bill'`
- `'Billy'`
- `'Billy!'`

The first 4 of these are `'B'`, `'Bi'`, `'Bil'`, and `'Bill'`. If we combine these 4 in reverse order, we get `'BillBilBiB'`, which is what `n_prefixes('Billy!', 4)` should return. See the doctests for more examples. **You may assume that `n` is no larger than the length of `s`.**

***Hint:*** Recall that [strings may be sliced](https://docs.python.org/3/tutorial/introduction.html#strings), like lists.

In [22]:
n_prefixes('Billy!', 4)

'BillBilBiB'

In [23]:
n_prefixes('Marina', 3)

'MarMaM'

In [24]:
n_prefixes('aaron', 2)

'aaa'

In [25]:
n_prefixes('Justin', 5)

'JustiJustJusJuJ'

In [26]:
grader.check("q3")

### Question 4 – Exploded Numbers 💣

Complete the implementation of the function `exploded_numbers`, which takes in a list of integers (`ints`) and a non-negative integer (`n`) and **returns a list of strings** containing numbers from the list expanded by `n` numbers in both directions, separated by spaces. Each integer should be [zero padded](https://www.tutorialspoint.com/python/string_zfill.htm) so that all integers outputted have the same length.

For example, consider `exploded_numbers([3, 8, 15], 2)`.
- If we explode 3 by 2 numbers in both directions, we get 1, 2, 3, 4, 5.
- If we explode 8 by 2 numbers in both directions, we get 6, 7, 8, 9, 10.
- If we explode 15 by 2 numbers in both directions, we get 13, 14, 15, 16, 17.

The longest length of any of the exploded numbers above is 2, so all of the outputted integers should have length 2.

- The string corresponding to 3 in the input is `'01 02 03 04 05'`.
- The string corresponding to 8 in the input is `'06 07 08 09 10'`.
- The string corresponding to 15 in the input is `'13 14 15 16 17'`.

So, `exploded_numbers([3, 8, 15], 2)` should return `['01 02 03 04 05', '06 07 08 09 10', '13 14 15 16 17']`. See the doctest for another example.

***Note***: You can assume that negative numbers will never be encountered. That is, when testing your code, we will never explode a number so much that it becomes negative.

In [27]:
exploded_numbers([3, 8, 15], 2) == ['01 02 03 04 05', '06 07 08 09 10', '13 14 15 16 17']

True

In [28]:
exploded_numbers([9, 99], 3)

['006 007 008 009 010 011 012', '096 097 098 099 100 101 102']

In [29]:
exploded_numbers([9, 99], 3) == ['006 007 008 009 010 011 012', '096 097 098 099 100 101 102']

True

In [30]:
grader.check("q4")

### Question 5 – Reading Files

[Recall](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) that the built-in function `open` takes in a file path and returns *a file object* (sometimes called a *file handle*). Below are a few properties of file objects:

* `open(path)` opens the file at location `path` for reading.
* `open(path)` is an *iterable*, which contains successive lines of the file.
* Once a file object is opened, after use it should be closed to avoid memory leaks. To ensure a file is closed once done, you should use a *context manager* as follows:
```py
with open(path) as fh:
    for line in fh:
        process_line(line)
```
* To read the entire file into a string, use the `read` method:
```py
with open(path) as fh:
    s = fh.read()
```

However, you should be careful when reading an entire file into memory that the file isn't too big! *You should avoid this whenever possible!*

Complete the implementation of the function `last_chars`, which takes in file object (`fh`) and returns a string consisting of the last character of each line. Note that you don't have to use `open` at all; the argument given to you is a file object, not a file path.

***Note:*** A newline (`'\n'`) is the "delimiter" of the lines of a file, and doesn't count as part of the line (as the tests imply). Every other character is part of the line. For more info on this, see [the interpretation](https://en.wikipedia.org/wiki/Newline#Interpretation) of files as a 'newline delimited variables' file.

In [31]:
fp = os.path.join('data', 'chars.txt')
last_chars(open(fp))

'hrg'

In [32]:
grader.check("q5")

## Part 3: `numpy` exercises 🥧

For a refresher on `numpy` and arrays, refer to the relevant section of the [DSC 10 course notes](https://notes.dsc10.com/02-data_sets/arrays.html).

### Question 6 – Array Methods

Complete the implementations of the functions `add_root` and `where_square`. Specifications are given below. Your solutions should **not** contain any loops or list comprehensions.

#### `add_root`

`add_root` should take in a `numpy` array, `A`, and return a new `numpy` array that contains the element-wise sum of the elements in `A` with the _square roots of the positions of the elements in `A`_. If `A` contains the values 5, 9, and 4, the output array should contain the values 5 (5 + $\sqrt{0}$), 10 (9 + $\sqrt{1}$), and 5.4142... (4 + $\sqrt{2}$).

#### `where_square`

`where_square` should take in a `numpy` array, `A`, and return a new `numpy` array of Booleans whose `i`th element is `True` if and only if the `i`th element of `A` is a perfect square.

In [33]:
l = np.array([2, 4, 6, 7])
add_root(l)

array([2.        , 5.        , 7.41421356, 8.73205081])

In [107]:
A = np.array([2, 4, 6, 7])
out = add_root(A)
isinstance(out, np.ndarray)
out.dtype



dtype('float64')

In [35]:
  
np.all(out >= A)

True

In [36]:
    
np.isclose(out[3], 7 + np.sqrt(3))

True

In [37]:
np.sqrt(9)

3.0

In [108]:
out = where_square(np.array([1, 2, 16, 17, 32, 49]))

In [39]:
isinstance(out, np.ndarray)

True

In [40]:
out.dtype == np.dtype('bool')

True

In [109]:
out[2]

True

In [110]:
out

array([ True, False,  True, False, False,  True])

In [43]:
grader.check("q6")

### Question 7 – Stock Prices 📈

Complete the implementations of the functions `growth_rates` and `with_leftover`. Specifications are given below. Your solutions should **not** contain any loops or list comprehensions.

#### `growth_rates`

`growth_rates` should take in a `numpy` array, `A`, of [stock prices](https://en.wikipedia.org/wiki/Stock) for a single stock on successive days in USD. It should return an array of growth rates. That is, the `i`th number of the returned array should contain the rate of growth in stock price between the $i^{th}$ day to the $(i+1)^{th}$ day. The growth rate between two values is defined as $\frac{\text{final} - \text{initial}}{\text{initial}}$. You should return growth rates as **proportions, rounded to two decimal places**.

#### `with_leftover`

Again, suppose `A` is a `numpy` array of stock prices. Consider the following scheme: 

- Suppose that you start each day with \$20 to purchase stocks. 
- Each day, you purchase as many shares as possible of the stock. (The price changes each day, according to `A`.)
- Any money left-over after a given day is saved for possibly buying stock on a future day.

The function `with_leftover` should take in `A` and return the day (as an `int`) on which you can buy at least one full share using just "left-over" money. If this never happens, return `-1`. Note that the first stock purchase occurs on Day 0, and that you cannot purchase fractions of a share of a stock.

For example, if the stock price is \$3 every day, then the answer is `1` (corresponding to Day 1):
- Day 0: Buy 6 stocks with \\$20, and \\$2 is added to the leftover. Your total leftover is currently \\$2. This is not enough to buy one extra share, so you continue.
- Day 1: Buy 6 stocks with \\$20, and another \\$2 is added to the leftover. Your total leftover is now \\$4, so you can now buy one extra share. Hence, the answer is Day 1, and `with_leftover` should return `1`.

***Hint:*** `np.cumsum` may be helpful.

In [111]:
fp = os.path.join('data', 'stocks.csv')

stocks = np.array([float(x) for x in open(fp)])
out = growth_rates(stocks)
isinstance(out, np.ndarray)

True

In [112]:
out

array([-0.  ,  0.01, -0.01,  0.  ,  0.  , -0.  ,  0.02,  0.01,  0.02,
        0.01,  0.01,  0.  ,  0.  ,  0.01, -0.  , -0.01, -0.  ,  0.  ,
       -0.01,  0.01,  0.  , -0.  ,  0.02,  0.  ,  0.01, -0.  ,  0.  ,
        0.01, -0.02,  0.01,  0.  , -0.01,  0.01,  0.  , -0.  , -0.01,
        0.01,  0.03, -0.01, -0.  ,  0.01,  0.01,  0.  ,  0.  , -0.  ,
        0.01,  0.01, -0.  ,  0.  ,  0.02, -0.01, -0.01,  0.01, -0.01,
        0.01,  0.02, -0.01, -0.01, -0.  ,  0.01,  0.  , -0.  ,  0.  ,
        0.01, -0.  ,  0.01, -0.  ,  0.01,  0.01,  0.01, -0.  , -0.  ,
        0.01,  0.  ,  0.  ,  0.02, -0.02, -0.  ,  0.01,  0.  ,  0.01,
        0.01, -0.  , -0.02, -0.01, -0.01, -0.01, -0.01,  0.  ,  0.01,
        0.01,  0.01, -0.  ,  0.  , -0.  , -0.01,  0.01,  0.01,  0.  ])

In [45]:
out.dtype == np.dtype('float')

True

In [46]:
out.max() == 0.03

True

In [47]:
len(out)

99

In [68]:
# Don't change this cell -- it is needed for the tests to work
fp = os.path.join('data', 'stocks.csv')
stocks = np.array([float(x) for x in open(fp)])
out_3_stocks = growth_rates(stocks)

A_4 = np.array([3, 5, 4, 3])
out_4 = with_leftover(A_4)
out_4 

3

In [49]:
np.cumsum(np.array([1,2,3,4,5]))

array([ 1,  3,  6, 10, 15])

In [50]:
import numbers

In [118]:
stocks = np.array([3, 7, 4, 3])

In [120]:
out = with_leftover(stocks)

In [53]:
isinstance(out, numbers.Integral)

True

In [121]:
out 

1

In [55]:
grader.check("q7")

## Part 4: Introduction to `pandas` 🐼

This part will help build familiarity with DataFrames in `pandas`. Fortunately, you've already a version of `pandas` before in DSC 10, called `babypandas`! Review the [DSC 10 course notes](https://notes.dsc10.com/02-data_sets/dataframes.html) as necessary.

One key difference between `babypandas` and `pandas` is the idiomatic way of accessing a column. In `babypandas`, to access column `'x'` in DataFrame `df`, you used `df.get('x')`. In `pandas`, the more common way is `df['x']`.

As always for `pandas` questions:
1. Avoid writing loops through the rows of the DataFrame to do the problem, and
2. Test the output/correctness of your code with the help of the dataset given, but be sure your code will also run on data that is similar to but different from the dataset given. (One way to do this is to sample rows from the provided DataFrame using the `.sample` method).

The file `data/salary.csv` contains salary information for the 2021-22 National Basketball Association (NBA) season 🏀. Specifically, it contains the name, team, and salary of all players who have played at least 15 games this season. We will load this file and store it as a DataFrame named `salary`.

In [69]:
# Do not edit this cell -- it is needed for the tests
salary_fp = os.path.join('data', 'salary.csv')
salary = pd.read_csv(salary_fp)
salary.head()

Unnamed: 0,Player,Position,Team,Salary
0,John Collins,PF,Atlanta Hawks,23000000
1,Danilo Gallinari,PF,Atlanta Hawks,20475000
2,Bogdan Bogdanović,SG,Atlanta Hawks,18000000
3,Clint Capela,C,Atlanta Hawks,17103448
4,Delon Wright,SG,Atlanta Hawks,8526316


### Question 8 – `pandas` Basics

Your job is to complete the implementation of the function `salary_stats`, which takes in a DataFrame like `salary` and returns a **Series** containing the following statistics:
- `'num_players'`: The number of players.
- `'num_teams'`: The number of teams.
- `'total_salary'`: The total salary amount for all players.
- `'highest_salary'`: The name of the player with the highest salary. **Assume there are no ties.**
- `'avg_los'`: The average salary of the `'Los Angeles Lakers'`, rounded to two decimal places.
- `'fifth_lowest'`: The name and team of the player who has the fifth lowest salary, separated by a comma and a space (e.g. `'Billy Triton, Cleveland Cavaliers'`). **Assume there are no ties.**
- `'duplicates'`: A Boolean that is `True` if there are any duplicate last names, and `False` otherwise.
- `'total_highest'`: The total salary of the team that has the highest paid player.

The index of each element in the outputted Series is specified above.

***Note 1***: Your function should work on a dataset of the same format that contains information from other years. This means that `salary_stats` should not "hard-code" any numbers or strings, but should compute them all programatically. In all cases, you may assume that none of the answers involving ranking involves a tie.

***Note 2***: The doctests and public tests don't test to see if your function returns the right numbers. You should manually inspect your result to make sure that all values seem appropriate.

In [122]:
# Do not edit this cell -- it is needed for the tests
salary_fp = os.path.join('data', 'salary.csv')
salary = pd.read_csv(salary_fp)
stats = salary_stats(salary)

salary_sample = pd.read_csv('data/salary_sample.csv')
sample_stats = salary_stats(salary_sample)
sample_stats

num_players                                        50
num_teams                                          26
total_salary                                428424568
highest_salary                           Kevin Durant
avg_los                                     1789256.0
fifth_lowest      Keita Bates-Diop, San Antonio Spurs
duplicates                                       True
total_highest                                46202282
dtype: object

In [100]:
salary.loc[salary['Salary'] == np.max(salary['Salary']), 'Player'].iloc[0]

'Stephen Curry'

In [164]:
highest_paid_team = salary.loc[salary['Salary'] == np.max(salary['Salary']), 'Team'].iloc[0]
np.sum(salary.loc[salary['Team'] == highest_paid_team, 'Salary'])

130428103

In [175]:
salary_sample.groupby('Team').count().shape[0] #.shape[0]

26

In [93]:
np.round(np.mean(salary.loc[salary['Team'] == 'Los Angeles Lakers', 'Salary']),2)

13266896.82

In [106]:
fifth_lowest_name = salary.sort_values(by= 'Salary').iloc[4][0]
fifth_lowest_team = salary.sort_values(by= 'Salary').iloc[4][2]
fifth_lowest = fifth_lowest_name + ", " + fifth_lowest_team
fifth_lowest

'Miye Oni, Oklahoma City Thunder'

In [109]:
salary.shape[0]

381

In [171]:
def extract_last(full_name):
    return full_name.split(' ')[1]

In [172]:
temp_df = pd.DataFrame(salary_sample['Player'].apply(extract_last)).groupby('Player').count().shape[0]
temp_df

orig_count = salary_sample.shape[0]
temp_df == orig_count

False

In [143]:
temp_df = pd.DataFrame(salary['Player'].apply(extract_last))#.groupby('Player').count()
temp_df

Unnamed: 0,Player
0,Collins
1,Gallinari
2,Bogdanović
3,Capela
4,Wright
...,...
376,Holiday
377,Kispert
378,Neto
379,Gafford


In [140]:
frame = salary['Player'].apply(extract_last)

orig_player_count = frame.shape[0]
dupe_count = salary.groupby('Player').count()
dupe_count
dupe_count = dupe_count.shape[0]
if(dupe_count == orig_player_count):
    print('There are no dupes')
else:
    print('There are dupes')

frame
dupe_count

There are no dupes


381

In [None]:
'Billy Triton, Cleveland Cavaliers'

In [166]:
grader.check("q8")

### Question 9 – Reading Malformed `.csv` Files

`data/malformed.csv` is a file of comma-separated values, containing the following fields:


|column name|description|type|
|---|---|---|
|first|first name of person|str|
|last|last name of person|str|
|weight|weight of person (lbs)|float|
|height|height of person (in)|float|
|geo|location of person; comma-separated latitude/longitude|str|

Unfortunately, the entries contains errors that cause `pandas`' `read_csv` function to fail parsing the file with the default settings. Instead, you must read in the file manually using Python's built-in `open` function.

Complete the implementation of the function `parse_malformed`, which takes in a file path (`fp`) and returns a parsed, properly-typed DataFrame. The DataFrame should contain columns as described in the table above (with the specified types); it should agree with `pd.read_csv` when the lines are not malformed.

***Note:*** Assume that the given `.csv` file is a sample of a larger file; you will be graded against a **different** sample of the larger file that has the same type of parsing errors. That is, you should **not** hard-code your cleaning of the data to specific errors on specific lines in the data.

***Hint:*** Open `data/malformed.csv` in your text editor, and look very carefully at the placement of commas (`,`) and quotes (`"`).

In [125]:
# Do not edit -- needed for tests
fp = os.path.join('data', 'malformed.csv')
cols = ['first', 'last', 'weight', 'height', 'geo']
df = parse_malformed(fp)
dg = pd.read_csv(fp, names=cols)
dg

Unnamed: 0,first,last,weight,height,geo
0,first,last,weight,height,geo
1,Julia,Wagner,142.0,86.0,"39.8,15.4"
2,Angelica,Rija,155.0,56.0,"38.2,-71.7"
3,Tyler,Micajah,116.0,"73.0""","38.0,6.9"
4,Kathleen,Nakea,163.0,69.0,"36.3,-86.8"
5,Axel,Ronit,95.0,74.0,"36.8,128.2"
6,Amiya,Kyona,130.0,72.0,"36.3,114.5"
7,Torrey,Joshuacaleb,105.0,79.0,"38.3,145.1"
8,Mariah,Alese,149.0,68.0,"36.1,45.7"
9,Grayson,Daimen,140.0,"80.0""","38.1,-72.6"


In [126]:
df = parse_malformed(fp)

In [127]:
pd.set_option('display.max_columns', None)  

In [128]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)

  pd.set_option('display.max_colwidth', -1)


In [130]:
df = parse_malformed(fp)
df

Unnamed: 0,first,last,weight,height,geo
0,Julia,Wagner,142.0,86.0,"39.8,15.4"
1,Angelica,Rija,155.0,56.0,"38.2,-71.7"
2,Tyler,Micajah,116.0,73.0,"38.0,6.9"
3,Kathleen,Nakea,163.0,69.0,"36.3,-86.8"
4,Axel,Ronit,95.0,74.0,"36.8,128.2"
5,Amiya,Kyona,130.0,72.0,"36.3,114.5"
6,Torrey,Joshuacaleb,105.0,79.0,"38.3,145.1"
7,Mariah,Alese,149.0,68.0,"36.1,45.7"
8,Grayson,Daimen,140.0,80.0,"38.1,-72.6"
9,Yvette,Trayce,179.0,67.0,"36.9,-8.3"


In [129]:
str1 = 'cool\"'
str1
str1 = str1.replace('\"', '')
str1

'cool'

In [222]:
grader.check("q9")

## Congratulations! You're done! 🏁

Submit your `.py` file to Gradescope. Note that you only need to submit the `.py` file; this notebook should not be uploaded.

Before submitting, you should ensure that all of your work is in the `.py` file. You can do this by running the doctests below, which will verify that your work passes the public tests **and** that your work is in the `.py` file. Run the cell below; you should see no output.

In [131]:
!python -m doctest lab.py

In addition, `grader.check_all()` will verify that your work passes the public tests. Ultimately, the Gradescope autograder is also going to run `grader.check_all()`, so you should ensure these pass as well (which they should if the doctests above passed).

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [132]:
grader.check_all()

q0 results: All test cases passed!

q1 results: All test cases passed!

q2 results: All test cases passed!

q3 results: All test cases passed!

q4 results: All test cases passed!

q5 results: All test cases passed!

q6 results: All test cases passed!

q7 results: All test cases passed!

q8 results: All test cases passed!

q9 results: All test cases passed!