## Problem 0

In the last homework, you analyzed a `test` function that was used to write tests for some, but not all, of the homework problems.

1. What differences do you notice between the functions that we _do_ test with this method and the ones that we _don't_ test with this method? Why do you think we chose to use it where we did?

2. If you were in charge of writing a Python package to help other programmers write automated tests in Python, what would you want it to do that the `test` method _doesn't_ do? You can talk about functionalities that you wish it had that it doesn't have such as "I wish I could test stuff about a function besides just its return value", or you can talk about stuff that you wish it did _differently_ such as "I wish the way the variables get passed in were more clear. Something like this...(and then explain how you'd like to see it done)."

Please put your answers to this quesiton in `problem_0.md`.

## Problem 1

As winter comes to an end each year in Chicago, residents have to cope with driving on city streets littered with potholes. The Chicago Department of Transportation (DOT) is responsible for fixing city streets and relies on data reported by residents through its 311 system. But, how likely is it that your request will be responded to in a timely manner? To answer this question, we can look at data that the City of Chicago publishes on performance metrics for services and repairs such as fixing potholes, broken lights in alley, and downed wires through its [open data portal](https://data.cityofchicago.org/).

For this problem, you are provided a comma-separated value file called metrics.csv. This file contains information on service requests submitted to the DOT broken down by activity. Included in this file are the target number of days to complete the activity and the average number of days it took to complete them.

Your task is to write a program that displays the average number of days to complete each activity over a calendar year for each year included in the data. This will help us to understand how fast the city responds to requests for each activity and whether they've gotten better or worse over time. The output should show all years for a given activity together (as opposed to all activities for a given year) and should be sorted by ascending year. For rows where the period spans the end of one year and the beginning of another, use the start date for determining the year.

So, the procedure is: 
1. For each _Activity_, take all the rows for that activity whose Period begins in a given _year_.
2. Each of these _Period_ rows has an average days to complete activity, and a number of total completed requests. The average days to complete activity is an average _over_ the total completed requests. So you'll need to multiply these two figures together to get the _total_ time those requests took.
3. Sum up the _total_ time that the requests for this activity took in a given year.
4. Sum up the _total number of completed requests_ for this activity in a given year.
5. Divide the total time by the total number of completed requests to arrive at an average time-per-request for the activity for the year. 

Although we have not covered it yet, you are allowed to use the csv standard library module.

Sample output (?? replaced by answer):

```
Alley Grading-Unimproved (2011): Target=180, Average=??
Alley Grading-Unimproved (2012): Target=180, Average=??
Alley Grading-Unimproved (2013): Target=180, Average=??
Alley Grading-Unimproved (2014): Target=180, Average=??
Alley Grading-Unimproved (2015): Target=180, Average=??
Alley Grading-Unimproved (2016): Target=180, Average=??
Alley Grading-Unimproved (2017): Target=180, Average=??
Alley Grading-Unimproved (2018): Target=180, Average=??
Alley Light Out (2011): Target=30, Average=??
Alley Light Out (2012): Target=30, Average=??
Alley Light Out (2013): Target=30, Average=??
Alley Light Out (2014): Target=30, Average=??
Alley Light Out (2015): Target=30, Average=??
...
```

Problem 1 test:

In [None]:
# Don't worry about understanding these two lines.
# They are commands we use to get this notebook to autoreload
# so you don't have to rerun your kernel every time you change your homework files.
%load_ext autoreload
%autoreload 2

from problem_1 import calculate_metrics
actual = calculate_metrics("metrics.csv")

sample_solutions = [
    'Alley Grading-Unimproved (2018): Target=180, Average=438.22',
    'Bike Lane Post/Ped Xing Sign Repair (2017): Target=5, Average=1151.69',
    'CDOT Construction Complaints (2017): Target=14, Average=19.09',
    'Gym Shoe/Object On Electrical Wire (2016): Target=7, Average=110.63',
    'Landscape Median Maintenance (2011): Target=30, Average=3.93',
    'Pavement Cave-In Survey (2015): Target=3, Average=2.04'
]

issue_counter = 0
if actual == None or not len(actual) == 262:
    issue_counter += 1
    print("Looks like your output is not the correct length.")

else:

    for example in sample_solutions:
        if example not in actual:
            issue_counter += 1
            print(f"Looks like your averages are not correct. Missing:{example}")
    
if issue_counter == 0:
    print("Looks like your solutions pass our tests! Yay!")

## Problem 2

Write a function called `full_paths` that joins together single components of a path to produce a full path with directories separated by slashes. For example, it should operate in the following manner:

```
>>> full_paths(['usr', ['lib', 'bin'], 'config', ['x', 'y', 'z']])
['/usr/lib/config/x',
 '/usr/lib/config/y',
 '/usr/lib/config/z',
 '/usr/bin/config/x',
 '/usr/bin/config/y',
 '/usr/bin/config/z']
>>> full_paths(['codes', ['python', 'c', 'c++'], ['Makefile']], base_path='/home/user/')
['/home/user/codes/python/Makefile',
 '/home/user/codes/c/Makefile',
 '/home/user/codes/c++/Makefile']
 ```

The function definition should look as follows:

```
def full_paths(path_components, base_path='/'):
    ...
```

The `path_components` argument accepts an iterable in which each item is either a list of strings or a single string. Each item in `path_components` represents a level in the directory hierarchy. The function should return every combination of items from each level. With the `path_components` list, a string and a list containing a single string should produce equivalent results as the second example above demonstrates. The `base_path` argument is a prefix that is added to every string that is returned. The function should return a list of all the path combinations (a list of strings).

If you need to check whether a variable is iterable, the "Pythonic" way to do this is

```
from collections.abc import Iterable

if isinstance(x, Iterable):
    ...
```

However, note that strings are iterable too!

You are allowed to use functionality from the standard library for this problem.




In [None]:
from problem_2 import full_paths

path_components_1 = ['usr', ['lib', 'bin'], 'config', ['x', 'y', 'z']]
expected_1 = [
    '/usr/lib/config/x',
    '/usr/lib/config/y',
    '/usr/lib/config/z',
    '/usr/bin/config/x',
    '/usr/bin/config/y',
    '/usr/bin/config/z'
]

path_components_2 = ['codes', ['python', 'c', 'c++'], ['Makefile']]
base_path = '/home/user/'
expected_2 = [
    '/home/user/codes/python/Makefile',
    '/home/user/codes/c/Makefile',
    '/home/user/codes/c++/Makefile'
]

if not full_paths(path_components_1) == expected_1:
    print(f"Looks like a path component of \n {path_components_1} \n doesn't return the correct solution of \n {expected_1}")

if not full_paths(path_components_2, base_path) == expected_2:
    print(f"Looks like a path component of \n {path_components_2} with a base path of \n {base_path} doesn't return the correct solution of \n {expected_2}")

else:
    print("Looks like your solutions passes our tests! Nice!")

## Problem 3

Nowadays we take word completion for granted. Our phones, text editors, and word processing programs all give us suggestions for how to complete words as we type based on the letters typed so far. These hints help speed up user input and eliminate common typographical mistakes (but can also be frustrating when the tool insists on completing a word that you donâ€™t want completed).

You will implement two functions that such tools might use to provide command completion. The first function, `fill_competions`, will construct a dictionary designed to permit easy calculation of possible word completions. A problem for any such function is what vocabulary, or set of words, to allow completion on. Because the vocabulary you want may depend on the domain a tool is used in, you will provide `fill_competions` with a representative sample of documents from which it will build the completions dictionary. The second function, `find_completions`, will return the set of possible completions for a start of any word in the vocabulary (or the empty set if there are none). In addition to these two functions, you will implement a simple main program to use for testing your functions.

## Specifications

* `fill_completions(fd)` returns a dictionary. This function takes as input an opened file. It loops through each line in the file, splitting the lines into individual words (separated by whitespace) and builds a dictionary:

    * The keys of the dictionary are tuples of the form `(n, l)` for a non-negative integer n and a lowercase letter l.
    * The value associated with key `(n, l)` is the set of words in the file that contain the letter `l` at position `n`. For simplicity, all vocabulary words are converted to lower case. For example, if the file contains the word "Python" and `c_dict` is the returned dictionary, then the sets `c_dict[0, "p"]`, `c_dict[1, "y"]`, `c_dict[2, "t"]`, `c_dict[3, "h"]`, `c_dict[4, "o"]`, and `c_dict[5, "n"]` all contain the word "python".
    * Words are stripped of leading and trailing punctuation.
    * Words containing non-alphabetic characters are ignored, as are words of length 1 (since there is no reason to complete the latter).
    
* `find_completions(prefix, c_dict)` returns a set of strings. This function takes a prefix of a vocabulary word and a completions dictionary of the form described above. It returns the set of vocabulary words in the completions dictionary, if any, that complete the prefix. It the prefix cannot be completed to any vocabulary words, the function returns an empty set.

In [None]:
from problem_3 import find_completions, fill_completions

issues = 0

with open('articles.txt', 'r', encoding="utf-8") as f:
    c_dict = fill_completions(f)
    
print(len(c_dict))
    
if find_completions("za", c_dict) != {'zara', 'zakharova', 'zapad'}:
    issues += 1
    print("Looks like the za prefix did not return the correct results of {'zakharova', 'zapad', 'zara'}.")

if find_completions("lum", c_dict) != {'lumley', 'lump', 'lumet'}:
    issues += 1
    print("Looks like the za prefix did not return the correct results of {'lumley', 'lump', 'lumet'}.")

if find_completions("multis", c_dict) != set():
    issues += 1
    print("Looks like the multis prefix did not return the correct response of an empty set.")

print(f"{3 - issues} out of 3 checks succeeded.")

## Problem 4

The City of Townsville has, until now, had an annual **flat rate income tax**: that is, everyone paid 5% of their income to fund city operations.

The city is now considering a proposal to instead adopt a **graduated income tax**, such that people with higher incomes will have a higher tax rate on the portion of their income that falls into differet brackets.

They have hired you to write a program to figure out how much a person would pay in taxes under the new system, given their income.

The proposed brackets are:

```
- For income up to $40,000, zero taxes
- For income between $40,000 and $60,000, a 2% tax rate
- For income between $60,000 and $90,000, a 5% tax rate
- For income between $90,000 and $120,000, a 7% tax rate
- For income between $120,000 and $150,000, a 9% tax rate
- For income between $150,000 and $180,000, a 11% tax rate
- For income between $180,000 and $210,000, a 13% tax rate
- For income between $210,000 and $240,000, a 15% tax rate
- Above $240,000, a 20% tax rate
```

When you've done it right, running the test in the following cell should produce "8/8 examples worked as expected."

In [None]:
#Here, we have imported a module containing the same code you explained
#in problem 0 from the previous chapter.
from that_test_function_from_last_chapter import test 
from problem_4 import taxes_owed

tax_examples = [
    (40000, 0.0),
    (42000, 40.0),
    (70000, 900.0),
    (100000, 2600.0),
    (125000, 4450.0),
    (187000, 10910.0),
    (238000, 18100.0),
    (1000000, 170400.0),
]

test(function=taxes_owed, examples=tax_examples)

### Problem 5 

The City of Townsville is so happy with your work on the tax bracket function that they want you to help them assess the impact of the change on the city.

They care about two things:

1. **Total revenue:** The city needs revenue to fix potholes, replace the water main, and issue loans to local businesses, like the Townsville Cafe and the City Bookstore, that were hit hard by a pandemic lockdown. With any extra, they'd like to provide stimulus checks for folks who were laid off.

2. **Folks under poverty line:** In Townsville, the poverty line is \$40,000. The city would like to reduce the number of residents earning less than that by as much as possible.

3. **Poverty burden:** The city would like to know how much money, total, folks under the poverty line are falling short of $40,000. They use this number to inform budgets for poverty relief efforts.

The file `townsville.csv` contains the city's data about its residents and their income.

Write a program that calculates from this file the total tax revenue, the number of people under the poverty line, and the summed amount by which they fall under the poverty line after taxes:
- With a 5\% flat income tax
- With the proposed graduated income tax

In [None]:
from test import test 
from problem_5 import total_impact

impact_examples = [
    ("flat_rate", (146450.0, 13, 55450.0)),
    ("graduated_rate", (300300.0, 8, 36000.0)),
]

test(function=total_impact, examples=impact_examples)

### Problem 6 

The City of Townsville anticipates that people are going to get very angry if they move to a graduated income tax, because people will be worried that they personally have to pay more in taxes.

Write a function called `inflection_point` that calculates the exact income at which a person will owe more under the graduated system than the flat 5% rate, so that Townsville can get an idea of how many people, and how high of income people, will be affected this way.

In [None]:
from problem_6 import inflection_point

inflection_income = inflection_point()
inflection_income

if inflection_income == 163334.0:
    print(f"Yep, ${inflection_income} is the income at which someone starts paying more in graduated income tax than with the flat rate.")
else:
    print("Nope, try again.")

**Important note**: The realities of implementing a change to a local tax code are somewhat more complex than what problems 4-6 describe. The city would have to decide how to distribute loans and stimulus, for example, which folks might have opinions about. But the _math_ is accurate: In a graduated income tax system, the lucky lawyer who takes home a \$250,000 salary does not get taxed 20\% on the whole thing. They only get taxed 20\% on the _last_ \$10,000, and each dollar before that gets taxed at whatever _lower_ tax bracket it falls into. 

In other words, **there is no circumstance in a graduated income tax system** where a higher salary results in lower take-home pay because of taxes.

Also worth noting: the tax rates in this example problem are unrealistically _high_. By contrast, when the state of Illinois proposed a graduated income tax in 2020, [the highest tax rate on the table was about 8%](https://www.ilga.gov/legislation/publicacts/101/101-0008.htm), and that was to be for income above \$750,000. California has a graduated income tax with the highest top tax bracket in the United States at [13.3\%, on income above a million dollars](https://www.nerdwallet.com/article/taxes/california-state-tax). 

The 2020 proposal in Illinois to repeal the law requiring a flat rate income tax [failed](https://ballotpedia.org/Illinois_Allow_for_Graduated_Income_Tax_Amendment_(2020)) in large part because households whose taxes would have been _lower_ as a result of the change voted _against_ pursuing it.

**This is why it's important for people in high-leverage positions, such as yourself, to understand how to market and explain your ideas.**