## Analysing VPD Crime Data

We've uploaded a tiny portion of the crime data shared by the [Vancouver Police Department](https://vancouver.ca/police/)'s [Open Data initiative](https://geodash.vpd.ca/opendata/). The complete file has well over half a million rows. The portion we uploaded is all crimes labelled as "break and enter" (in two variants: commercial and residential) and "theft of" (in two variants: vehicle and bicycle) in 2018.

You can see our information file in this directory named `crimedata_subset_bne_theft_of_bike_veh_2018.csv`. You can also find the license for this information and a PDF file from VPD describing the information source.

Let's see if we can answer the question: At what time of day does crime of various types peak in Vancouver?

We'll **start from the project final submission template** to get good practice both on using HtDAP and preparing for the project! (We've edited this slightly to note places where we'll deviate from the project.)

### Step 1a: Planning 
#### Identify the information in the file your program will read

For each crime, the information is:
* type of crime
* date
* time
* hundred block (address)
* neighbourhood
* latitude
* longtitude

There are an arbitrary number of crimes in the file

### Step 1b: Planning 
#### Write a description of what your program will produce

You must brainstorm at least three ideas for graphs or charts that your program could produce and choose the one that you'd like to work on. You can choose between a line chart, histogram, bar chart, scatterplot, or pie chart. *Note: we might focus on non-graphs for now, since we're really studying HtDAP rather than the project.*

* Most common type of crime
* The neighbourhood that had the most crimes of these types
* The month with the most bike thefts
* What time is most common for a business to be broken into?
* When do crimes occur over the hours of the day for the various types of crime.
* Most common type of crime overnight

#### We will design a program to find the most common type of crime overnight.

### Step 1c: Planning 
#### Write or draw examples of what your program will produce

For the project you must include an image that shows what your chart or plot will look like. You can insert an image using the Insert Image command near the bottom of the Edit menu. 

For today, since we are not producing a graph or chart, we'll write an expect for our example.

expect(main('crimedata_subset_bne_theft_of_bike_veh_2018.csv'), CrimeType.bec)


### Step 2a: Building
#### Design data definitions

Double click this cell to edit.

Before you design data definitions in the code cell below, you must explicitly document here which information in the file you chose to represent and why that information is crucial to the chart or graph that you'll produce when you complete step 2c. *Note: we'll skip the "chart or graph" part!*

In [5]:
from cs103 import *
from typing import NamedTuple, List
from enum import Enum
import csv

##################
# Data Definitions

CrimeType = Enum('CrimeType', ['bec', 'ber', 'tv', 'tb'])
# interp. a type of crime which is either break and enter commerical ('bec'),
#         break and enter residential ('ber'), theft of vehicle ('tv'), 
#         or theft of bicycle ('tb')
# examples are redundant for enumerations

@typecheck
# template based on one of (4 cases)
def fn_for_crime_type(ct: CrimeType) -> ...:
    if ct == CrimeType.bec:
        return ...
    elif ct == CrimeType.ber:
        return ...
    elif ct == CrimeType.tv:
        return ...
    elif ct == CrimeType.tb:
        return ...

CrimeData = NamedTuple('CrimeData', [('type', CrimeType),
                                     ('time', int)]) # in range [0, 23]
# interp. crime data including it's type and the time (hour of the day)
#         it occurred
CD1 = CrimeData(CrimeType.tb, 0)
CD2 = CrimeData(CrimeType.tv, 11)

@typecheck
# template based on compound and the reference rule
def fn_for_crime_data(cd: CrimeData) -> ...:
    return ...(fn_for_crime_type(cd.type),
               cd.time)


# List[CrimeData]
# interp. a list of crime data

LOC0 = []
LOC1 = [CD1, CD2]

@typecheck
# template based on arbitrary-sized and the reference rule
def fn_for_loc(locd: List[CrimeData]) -> ...:
    # description of the acc
    acc = ... # type: ...
    for cd in locd:
        acc = ...(acc, fn_for_crime_data(cd))
    return ...(acc)



### Step 2b: Building
#### Design a function to read the information and store it as data in your program

We've split this off into a separate cell so we can finish this in our first class in Module 7!

In [6]:
@typecheck
def read(filename: str) -> List[CrimeData]:
    """    
    reads information from the specified file and returns a list
    of crime data
    """
    #return []  #stub
    # Template from HtDAP
    # loc contains the result so far
    loc = [] # type: List[CrimeData]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            # you may not need to store all the rows, and you may need
            # to convert some of the strings to other types
            c = CrimeData(parse_crime_type(row[0]), parse_int(row[4]))
            loc.append(c)
    
    return loc

@typecheck
def parse_crime_type(ct: str) -> CrimeType:
    """
    convert ct to a CrimeType
    """
    if ct == "Break and Enter Commercial":
        return CrimeType.bec
    elif ct == "Break and Enter Residential":
        return CrimeType.ber
    elif ct == "Break and Enter Residential/Other":
        return CrimeType.ber
    elif ct == "Theft of Vehicle":
        return CrimeType.tv
    elif ct == "Theft of Bicycle":
        return CrimeType.tb

start_testing()

# Examples and tests for read
expect(read("crimedata_subset_bne_theft_of_bike_veh_2018_test1.csv"), 
            [CrimeData(CrimeType.bec, 6),
             CrimeData(CrimeType.bec, 18),
             CrimeData(CrimeType.bec, 0)])
expect(read("crimedata_subset_bne_theft_of_bike_veh_2018_test2.csv"), 
            [CrimeData(CrimeType.ber, 12),
             CrimeData(CrimeType.tb, 8),
             CrimeData(CrimeType.tv, 1)])

expect(parse_crime_type("Break and Enter Commercial"), CrimeType.bec)
expect(parse_crime_type("Break and Enter Residential"), CrimeType.ber)
expect(parse_crime_type("Break and Enter Residential/Other"), CrimeType.ber)
expect(parse_crime_type("Theft of Vehicle"), CrimeType.tv)
expect(parse_crime_type("Theft of Bicycle"), CrimeType.tb)

summary()

[92m7 of 7 tests passed[0m


### Step 2c: Building
#### Design functions to analyze the data

Complete these steps in the code cell below. You will likely want to rename the analyze function so that the function name describes what your analysis function does.


**NOTE:** To make this manageable in class, we might provide some finished helper functions.

In [26]:
###########
# Functions

@typecheck
def main(filename: str) -> CrimeType:
    """
    Reads the file from given filename, returns the most common
    crime type overnight (0:00 hrs - 6:00hrs)
    
    Assume there's at least one crime overnight
    
    """
    # Template from HtDAP, based on function composition 
    return most_common_crime_type_overnight(read(filename)) 
    
@typecheck
def most_common_crime_type_overnight(locd: List[CrimeData]) -> CrimeType: 
    """ 
    returns the most common crime type overnight (0:00 hrs - 6:00hrs)
    
    Assume there's at least one crime overnight
    """ 

    # return CrimeType.tv
    # template based on function composition
    return most_common_crime_type(filter_for_overnight(locd))

@typecheck
def filter_for_overnight(locd: List[CrimeData]) -> List[CrimeData]:
    """
    returns the crimes in locd that occurred overnight (0:00 hrs - 6:00hrs)
    """
    #return locd
    # template from List[CrimeData] 
    # acc stores all the overnight crimes seen so far
    acc = [] # type: List[CrimeData]
    for cd in locd:
        if is_overnight(cd):
            acc.append(cd)
    return acc

@typecheck
def is_overnight(cd:CrimeData) -> bool:
    """
    return True if cd occurred between 0:00 hrs - 6:00hrs (inclusive) or False otherwise
    """
    #return False
    # template from CrimeData
    return cd.time >=0 and cd.time <=6

@typecheck
def most_common_crime_type(locd: List[CrimeData]) -> CrimeType:
    """
    returns the most common crime type in locd
    """
    #return CrimeType.bec
    # template from List[CrimeData]
    # description of the acc
    num_bec = 0 # type: int
    num_ber = 0 # type: int
    num_tv = 0 # type: int
    num_tb = 0 # type: int
    for cd in locd:
        if is_of_type(cd, CrimeType.bec):
            num_bec = num_bec + 1
        elif is_of_type(cd, CrimeType.ber):
            num_ber = num_ber + 1
        elif is_of_type(cd, CrimeType.tv):
            num_tv = num_tv + 1
        elif is_of_type(cd, CrimeType.tb):
            num_tb = num_tv + 1
    
    return highest_crime_type(num_bec, num_ber, num_tv, num_tb)


@typecheck
def is_of_type(cd: CrimeData, t: CrimeType) -> bool:
    """
    return True if cd is of type t, False otherwise
    """
    #return False
    # template from CrimeData with an additional parameter
    return type_matches(cd.type, t)

@typecheck
def type_matches(ct1: CrimeType, ct2: CrimeType) -> bool:
    """
    return True if ct1 and ct2 are equal, False otherwise
    """
    # return False
    # template from CrimeType 
    if ct1 == CrimeType.bec:
        return ct2 == CrimeType.bec
    elif ct1 == CrimeType.ber:
        return ct2 == CrimeType.ber
    elif ct1 == CrimeType.tv:
        return ct2 == CrimeType.tv
    elif ct1 == CrimeType.tb:
        return ct2 == CrimeType.tb    

@typecheck
def highest_crime_type(num_bec: int, num_ber:int, num_tv: int, num_tb: int) -> CrimeType:
    """
    return the type with the highest number of crimes, with ties being broken in the order of 
    CrimeType.bec, CrimeType.ber, CrimeType.tv, then CrimeType.tb
    """
    # return CrimeType.bec
    # template based on atomic distinct with additional parameters
    if num_bec >= num_ber and num_bec >= num_tv and num_bec >= num_tb:
        return CrimeType.bec
    elif num_ber >= num_tv and num_ber >= num_tb:
        return CrimeType.ber
    elif num_tv >= num_tb:
        return CrimeType.tv
    else:
        return CrimeType.tb

start_testing()

# Examples and tests for main
expect(main('crimedata_subset_bne_theft_of_bike_veh_2018_test1.csv'), CrimeType.bec)
expect(main('crimedata_subset_bne_theft_of_bike_veh_2018_test2.csv'), CrimeType.tv)

# Examples and tests for most_common_crime_type_overnight 
expect(most_common_crime_type_overnight([CrimeData(CrimeType.bec, 6),
                                         CrimeData(CrimeType.bec, 18),
                                         CrimeData(CrimeType.bec, 0)]), 
      CrimeType.bec) 
expect(most_common_crime_type_overnight([CrimeData(CrimeType.ber, 12),
                                         CrimeData(CrimeType.tb, 8),
                                         CrimeData(CrimeType.tv, 1)]), 
      CrimeType.tv) 

expect(most_common_crime_type_overnight([CrimeData(CrimeType.ber, 12),
                                         CrimeData(CrimeType.tb, 2),
                                         CrimeData(CrimeType.tv, 1)]), 
      CrimeType.tv) 

expect(filter_for_overnight([CrimeData(CrimeType.bec, 6),
                             CrimeData(CrimeType.bec, 18),
                             CrimeData(CrimeType.bec, 0)]), 
      [CrimeData(CrimeType.bec, 6), 
       CrimeData(CrimeType.bec, 0)])
expect(filter_for_overnight([CrimeData(CrimeType.ber, 12),
                             CrimeData(CrimeType.tb, 8),
                             CrimeData(CrimeType.tv, 1)]), 
      [CrimeData(CrimeType.tv, 1)])
expect(filter_for_overnight([CrimeData(CrimeType.ber, 12),
                             CrimeData(CrimeType.tb, 8),
                             CrimeData(CrimeType.tv, 7)]), 
      [])

expect(is_overnight(CrimeData(CrimeType.bec, 0)), True)
expect(is_overnight(CrimeData(CrimeType.bec, 1)), True)
expect(is_overnight(CrimeData(CrimeType.tv, 6)), True)
expect(is_overnight(CrimeData(CrimeType.tv, 7)), False)
expect(is_overnight(CrimeData(CrimeType.tv, 13)), False)

expect(is_of_type(CrimeData(CrimeType.bec, 0), CrimeType.bec), True)
expect(is_of_type(CrimeData(CrimeType.bec, 0), CrimeType.ber), False)
expect(is_of_type(CrimeData(CrimeType.ber, 3), CrimeType.bec), False)
expect(is_of_type(CrimeData(CrimeType.ber, 3), CrimeType.ber), True)

expect(type_matches(CrimeType.bec, CrimeType.bec), True)
expect(type_matches(CrimeType.bec, CrimeType.ber), False)
expect(type_matches(CrimeType.bec, CrimeType.tv), False)
expect(type_matches(CrimeType.bec, CrimeType.tb), False)

expect(highest_crime_type(1, 0, 0, 0), CrimeType.bec)
expect(highest_crime_type(1, 1, 1, 1), CrimeType.bec)
expect(highest_crime_type(1, 1, 1, 0), CrimeType.bec)
expect(highest_crime_type(1, 0, 1, 1), CrimeType.bec)
expect(highest_crime_type(1, 1, 0, 0), CrimeType.bec)
expect(highest_crime_type(1, 0, 1, 0), CrimeType.bec)
expect(highest_crime_type(1, 0, 0, 1), CrimeType.bec)
expect(highest_crime_type(0, 1, 0, 0), CrimeType.ber)
expect(highest_crime_type(0, 1, 1, 1), CrimeType.ber)
expect(highest_crime_type(0, 1, 1, 0), CrimeType.ber)
expect(highest_crime_type(0, 1, 0, 1), CrimeType.ber)
expect(highest_crime_type(0, 0, 1, 1), CrimeType.tv)
expect(highest_crime_type(0, 0, 1, 0), CrimeType.tv)
expect(highest_crime_type(0, 0, 0, 1), CrimeType.tb)

summary()

[92m35 of 35 tests passed[0m


In [27]:
## now we can call the main function with our full set of information to find which crime type was most
## common overnight
main('crimedata_subset_bne_theft_of_bike_veh_2018.csv')

<CrimeType.bec: 1>