# Justice Innovation Lab Performance Task

Thank you for your interest in the Justice Innovation Lab. We appreciate the time you are taking to apply, and we hope that this performance task gives a chance to demonstrate some of your skills in the context of hypothetical examples.

# Instructions

**PLEASE READ CAREFULLY**

This is not a timed task, but we aimed for a task that does not take more than 6 hours.  You may add more cells for work, but please do not delete any existing  notebook cells. 

**Please use Python or R.** Please use this ipynb file to supply your answers. It should be completed using either a python or R kernel. If you do not have access to a computer capable of running your python or R code, you may use free online computing services such as kaggle or colab, or any other online computing service. We have tested both kaggle and collab and they are sufficiently powerful to complete all the tasks below. 
 - https://www.kaggle.com/notebooks/welcome
 - https://colab.research.google.com/
 

**Please show your work.** We are interested in your thought process, so please show/annotate your code. We prefer that you display images inline rather than in separate files. If you need to install any packages, please include the code required to install them. Eg: 

> ```! pip install new_package  # (for python)```

> ```! install.packages(new.package, dependencies = TRUE) # (for R)```


**Please send us a single compressed file.** When you are done, please follow the directions at the end to submit your work.

**Please let us know if you have any questions.** If you have questions about the task please email rory.pulvino@justiceinnovationlab.org (we promise there's a human on the other end, but also expect slow response times).

# Name, Date & Email
Please enter your full name and email address in the markdown space below. 

Name: 

Date: 

Email: 



# Section 1: Data Manipulation
Suppose a jurisdiction has data on:

Recent traffic stops by some a local Police Department officers (traffic stops data): 
- https://drive.google.com/uc?export=download&id=1fp7716e-l7RZusH6wo8UeUQNM7J0dyYN

Crime incidents reported to the police department (crime incidents data): 
- https://drive.google.com/uc?export=download&id=12gnXMitHQM1nTqBm20Gcq84r7qgttWY-

Using these data, please perform the following tasks.

## Task 1.A 
Load the data we provided and view a few rows of each. As a note, if you're reading in the courts data for Section 2 using pd.read_csv in Python, you may need to use the following syntax to read from the url:

courts_df = pd.read_csv(courts_data_url, names = ['stop_id', 'ticket_amount', 
                                                  'ticket_status'], skiprows = 1)

*Reminder*: for all questions, please write your code in this notebook!  This is what we will review and score. 

## Task 1.B
Crime incidents are reported down to the hour/minute, while stops are only reported down to the day.

Please create a new column based on the `REPORT_DAT` column of the crime incidents data that rounds the exact timestamp for the crime down to the day - for instance, "2017-06-03T12:54:27.000Z" would become "2017-06-03". Call this column `report_daily`.

## Task 1.C 
Create a new data set `crimes_by_day` that gives the total count of crime incidents per day. For the purposes of aggregating, you can consider each unique `OBJECTID` as reflecting a different crime incident.

Similarly, create a new data set `stops_by_day` that gives the total count of traffic stops per day. For the purposes of aggregating, you can consider each unique `stop_id` as reflecting a different stop.

## Task 1.D
For either the traffic stops by day or crime incidents by day, check if there are days missing in the period measured. If so, fill them with the mean across all other days for that data.


## Task 1.E
Merge the traffic stops by day data with the crimes by day data. Which specific date has the most traffic stops? Which specific date has the most crime incidents?

## Task 1.F 
Write a function that: 
1. tests for missing dates from a column in a dataframe,
2. appends any missing dates to the dataframe as a new column,
3. uses an argument to impute missing values of either the mean or median for some other column

## Task 1.G
Create a plot where the x axis is the date and the y axis is the ratio of traffic stops to crime incidents. Make sure the axis labels are informative.

*Note*: make sure your code prints out the plot. You do not need to attach the plot here.

What seasonal patterns do you notice? Explain in 1-2 sentences


## Task 1.H 
Create a new data set `offenses_by_month` that gives the total count of crime incidents by offense type per month. For the purposes of aggregating, you can consider each unique `OBJECTID` as reflecting a different crime incident. Consider that this table should be interpretable to a non-technical audience and may require manipulation of the underlying data.

[Explain any manipulation of the dataset you performed and demonstrate how you would make such manipulation repeatable for future similar datasets.]

## Task 1.I 
The crime data used here was gathered from the MPD's open data site. We have downloaded the 2021 data set on March 15, 2021 and it is available here: https://drive.google.com/uc?export=download&id=1HyslCBSZ7hzeFWw3PLogatrrveoHTWUJ. Please write a function that downloads the most recent version of the data from the [open data site](https://opendata.dc.gov/datasets/crime-incidents-in-2021) and appends any new incidents to the existing dataset with an indicator that it is newly added to the existing data source.

# Section 2: Policy Evaluation
Suppose the city government enacted a policy that allows people with low incomes (less than $30,000 per year) to have their ticket amount reduced 20 to 40 percent. The goal of the new policy is to reduce the number of overdue tickets among low-income residents.

In this section, you'll use the same traffic stop data as in Section 1. You'll also use data on the court outcomes of tickets: 
- https://drive.google.com/uc?export=download&id=1CLyfh4-_oc8mTRdh4Fl9vH7JYOsUcUom

and data on residents' income:
- https://drive.google.com/uc?export=download&id=1CaERRReTMStFgutGEJo6hC34Od7mrHSG

Please use those three data sets to answer the following questions.


## Task 2
Determine whether there is a large difference between the ticket amounts for people with incomes less than \\$30,000 and those with incomes greater than or equal to \\$30,000. (Tickets are given out in \\$1.00 increments).


Explain your answer in 1-2 sentences.

 
[insert explanation here]

 

## Task 2.B

The stops data also contains information regarding the individual's race. Using the driver_race column and the ticket column, determine if there are differences in who is stopped and whether they receive a ticket. Present your findings in a table along with (1) a description of the table and (2) an explanation of the results.

[insert explanation here]

# Section 3: General Questions

## Task 3.A

Explain your interest in JIL's racial equity work

[insert explanation here]

## Task 3.B

Give a specific example where you were responsible for writing code to clean and analyze data. Be sure to include the tools you used, whether the process was for professional or academic purposes, and whether the process you devised could be run without your input.

[insert explanation here]

## Task 3.C

Explain your experience with github i.e. how often you use github, what commands you are familiar with, etc.

[insert explanation here]

# Send us your work!!!

Once you have completed the tasks:
 - Create a PDF or knitted HTML file of this of this notebook.  
 - Compress **this IPYNB file** and the PDF or HTML file along with any accompanying files or scripts you have created that are necessary to show your work into a SINGLE COMPRESSED FILE. (e.g., zip, bz2, gz, tar, xz). 
 - Name the compressed file with your full name, e.g. lastname-firstname.zip.

Do BOTH of the following: 
 - Upload the compressed file here: https://app.box.com/f/9b7682f783314340ae57c96091b723e9
 - Email the compressed file to hiring@justiceinnovationlab.org with the subject line 'Data Fellow Performance Task'.

What to expect next:
 - We will review completed tasks as they are submitted, hopefully within a week of submission.
 - After reviewing your performance task we will get back to you with next steps e.g. submission of a resume and cover letter.
