## Analysing VPD Crime Data

We've uploaded a tiny portion of the crime data shared by the [Vancouver Police Department](https://vancouver.ca/police/)'s [Open Data initiative](https://geodash.vpd.ca/opendata/). The complete file has well over half a million rows. The portion we uploaded is all crimes labelled as "break and enter" (in two variants: commercial and residential) and "theft of" (in two variants: vehicle and bicycle) in 2018.

You can see our information file in this directory named `crimedata_subset_bne_theft_of_bike_veh_2018.csv`. You can also find the license for this information and a PDF file from VPD describing the information source.

Let's see if we can answer the question: At what time of day does crime of various types peak in Vancouver?

We'll **start from the project final submission template** to get good practice both on using HtDAP and preparing for the project! (We've edited this slightly to note places where we'll deviate from the project.)

### Step 1a: Planning 
#### Identify the information in the file your program will read

This is information from VPD about crimes in Vancouver with the following fields:
+ TYPE: the type of crime activity, which is one of "Break and Enter Commercial", "Break and Enter Residential/Other", "Theft of Bicycle", and "Theft of Vehicle", which are all largely self-explanatory (the other is other non-commercial buildings).
+ YEAR,MONTH,DAY,HOUR,MINUTE: the reported time of the crime with its 4-digit year, its month as a number 1-12, its day as a number 1-31, its hour in 24-hour format (0-23, where 0 is midnight), and its minute (0-59); 0 or midnight sometimes means "data is missing" rather than an actual time
+ HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y: reported location of the crime, with the rough address, the Vancouver neighbourhood (like "Kitsilano"), and the "easting" and "northing" of the location in metres from a reference point on the surface of the Earth

### Step 1b: Planning 
#### Write a description of what your program will produce

(Because this is Module 7, we're not doing graphs/charts. We've also decided we're focusing on types of crime and when in the day they occur.)

Thoughts on what we questions we could ask/answers we could explore with this information:
+ line chart of the frequency of crime (of each type, i.e., 4 lines) against the hour of the day
+ make a map of frequency of crime by neighbourhood for a particular type
+ make a pie chart of types of crime (by frequency)
+ chart of types of crime vs. month

All good starting points, but... Karina and Steve command that we do:
**Find the time of day (hour) at which a given type of crime is most common.** (Remember that midnight (0) is handled a little strangely.)

### Step 1c: Planning 
#### Write or draw examples of what your program will produce

```python
expect(main("crime-data.csv", CrimeType.BE_COMM), 8)
```

This is just an example; if 8AM were the most frequent hour in the day, it would return 8.



To include a sketch, make a markdown cell, open the cell, and from the VERY bottom of the Edit menu, select Insert Image.

We didn't insert because Steve's computer barfed.

### Step 2a: Building
#### Design data definitions

We need the crime type (to select the one chosen by whoever calls main) and the hour (to find the one with the most crimes of that type).

In [1]:
from cs103 import *
from typing import NamedTuple, List
from enum import Enum
import csv

##################
# Data Definitions

CrimeType = Enum('CrimeType', ['BE_COMM', 'BE_RES', 'THEFT_OF_VEHICLE', 'THEFT_OF_BICYCLE'])
# interp. the type of a crime, one of break-and-enter of a commercial building 
# (BE_COMM), break-and-enter of a residential (or other) building (BE_RES),
# theft of a motor vehicle (THEFT_OF_VEHICLE), or theft of a bicycle.
# examples are redundant for enumerations

# template based on enumeration (4 cases)
@typecheck
def fn_for_crime_type(ct: CrimeType) -> ...:
    if ct == CrimeType.BE_COMM:
        return ...
    elif ct == CrimeType.BE_RES:
        return ...
    elif ct == CrimeType.THEFT_OF_VEHICLE:
        return ...
    elif ct == CrimeType.THEFT_OF_BICYCLE:
        return ...

CrimeData = NamedTuple('CrimeData', [('hour', int),         # in range[1,23]
                                     ('type', CrimeType)])
# interp. a row of crime data with the hour it was reported
# (as a 24-hour time, excluding 0/midnight because the data is 
# corrupt for midnight, sadly!) and the type of crime.
CD1 = CrimeData(CrimeType.BE_RES, 11)
CD2 = CrimeData(CrimeType.THEFT_OF_BICYCLE, 23)

# template based on compound (2 fields) and reference rule (on CrimeType)
@typecheck
def fn_for_crime_data(cd: CrimeData) -> ...:
    return ...(cd.hour,
               fn_for_crime_type(cd.type))


# List[CrimeData]
# interp. a list of crime data
LOCD0 = []
LOCD1 = [CD1, CD2]

# template based on arbitrary-sized data and reference rule
@typecheck
def fn_for_locd(locd: List[CrimeData]) -> ...:
    # description of accumulator
    acc = ... # type: ...
    
    for cd in locd:
        acc = ...(fn_for_crime_data(cd), acc)
        
    return ...(acc)

In [None]:
# Here are some definitions we'll need later on that aren't particularly interesting to work on in class!

# List[str]
# interp. a list of strings
LOS0 = []
LOS1 = ['hello', 'world']

# template based on arbitrary-sized data
@typecheck
def fn_for_los(los: List[str]) -> ...:
    # description of accumulator
    acc = ... # type: ...
    
    for s in los:
        acc = ...(s, acc)
        
    return ...(acc)


# List[int]
# interp. a list of integers
LOI0 = []
LOI1 = [1, -12]

# template based on arbitrary-sized data
@typecheck
def fn_for_loi(loi: List[int]) -> ...:
    # description of accumulator
    acc = ... # type: ...
    
    for i in loi:
        acc = ...(i, acc)
        
    return ...(acc)

### Step 2b: Building
#### Design a function to read the information and store it as data in your program

We've split this off into a separate cell so we can finish this in our first week of class in Module 7!

In [None]:
@typecheck
def read(filename: str) -> List[Consumed]:
    """    
    reads information from the specified file and returns ...
    """
    #return []  #stub
    # Template from HtDAP
    # loc contains the result so far
    loc = [] # type: List[Consumed]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            # you may not need to store all the rows, and you may need
            # to convert some of the strings to other types
            c = Consumed(row[0], ... ,row[n])
            loc.append(c)
    
    return loc

start_testing()

# Examples and tests for read
expect(..., ...)

summary()



### Step 2c: Building
#### Design functions to analyze the data

Complete these steps in the code cell below. You will likely want to rename the analyze function so that the function name describes what your analysis function does.


**NOTE:** To make this manageable in class, we will provide some finished helper functions with the second week's notes.

In [None]:
###########
# Functions

@typecheck
def main(filename: str) -> ...:
    """
    Reads the file from given filename, analyzes the data, returns the result 
    """
    # Template from HtDAP, based on function composition 
    return analyze(read(filename)) 
    
    


@typecheck
def analyze(loc: List[Consumed]) -> Produced: 
    """ 
    ... 
    """ 

    return ...


start_testing()

# Examples and tests for main
expect(..., ...)

summary()

start_testing()

# Examples and tests for analyze 
expect(..., ...) 

summary()