In this notebook, we will do a few more exercises to become familiar with the `read` function of the HtDAP recipe.

We will be using the same source file (`crimedata_subset_bne_theft_of_bike_veh_2018.csv`), but this time, we are interested in storing the date of the crime (as year, month and day) and the neighborhood. The necessary data definitions are already given to you.

In [1]:
from cs103 import *
from enum import Enum
from typing import NamedTuple, List
import csv

##################
# Data Definitions

Month = Enum('Month', ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC'])
# interpr. a month of the year

# Examples are redundant for enumeration

@typecheck
# Template based on Enumeration (12 cases)
def fn_for_month(m: Month) -> ...:
    if m == Month.JAN:
        return ...
    elif m == Month.FEB:
        return ...
    elif m == Month.MAR:
        return ...
    elif m == Month.APR:
        return ...
    elif m == Month.MAY:
        return ...
    elif m == Month.JUN:
        return ...
    elif m == Month.JUL:
        return ...
    elif m == Month.AUG:
        return ...
    elif m == Month.SEP:
        return ...
    elif m == Month.OCT:
        return ...
    elif m == Month.NOV:
        return ...
    elif m == Month.DEC:
        return ...
    
    
CrimeData = NamedTuple('CrimeData', [('neighborhood', str),
                                     ('year', int),
                                     ('month', Month),
                                     ('day', int)])      # in range [1,31]
                                     

# interpr. a Crime Data point including the neighborhood where the crime occurred
# and the date as year, month and day.

CD1 = CrimeData("West End", 2000, Month.JAN, 1)
CD2 = CrimeData("", 2020, Month.DEC, 31)    # Example to show that the neighborhood information may be missing

@typecheck
# Template based on composition (4 fields) and reference rule
def fn_for_crime_data(cd: CrimeData) -> ...:
    return ...(cd.neighborhood,
               cd.year,
               fn_for_month(cd.month),
               cd.day)


# List[CrimeData]
# interp. a list of CrimeData

LOC0 = []
LOC1 = [CD1, CD2]

@typecheck
# Template for arbitrary sized and reference rule
def fn_for_locd(locd: List[CrimeData]) -> ...:
    # description of the accumulator
    acc = ...   # type: ...

    for cd in locd:
        acc = ...(fn_for_crime_data(cd), acc)

    return ...(acc)



Before we move onto writing the `read` function, let's stop and think about which column may need to be parsed.

https://www.menti.com/al2ctz7hn4kz

Parse functions always take 1 parameter, and it's always the same type: **which one?**

Answer: a string.

In [2]:
# Write the necessary parse functions here

@typecheck
def parse_month(m: str) -> Month:
    """
    Given a numerical string between 1 and 12, returns the corresponding Month
    """
    # return Month.JAN # stub
    # return ...(m)  # template
    if m == "1":
        return Month.JAN
    elif m == "2":
        return Month.FEB
    elif m == "3":
        return Month.MAR
    elif m == "4":
        return Month.APR
    elif m == "5":
        return Month.MAY
    elif m == "6":
        return Month.JUN
    elif m == "7":
        return Month.JUL
    elif m == "8":
        return Month.AUG
    elif m == "9":
        return Month.SEP
    elif m == "10":
        return Month.OCT
    elif m == "11":
        return Month.NOV
    elif m == "12":
        return Month.DEC

start_testing()

expect(parse_month("1"), Month.JAN)
expect(parse_month("2"), Month.FEB)
expect(parse_month("3"), Month.MAR)
expect(parse_month("4"), Month.APR)
expect(parse_month("5"), Month.MAY)
expect(parse_month("6"), Month.JUN)
expect(parse_month("7"), Month.JUL)
expect(parse_month("8"), Month.AUG)
expect(parse_month("9"), Month.SEP)
expect(parse_month("10"), Month.OCT)
expect(parse_month("11"), Month.NOV)
expect(parse_month("12"), Month.DEC)

summary()

[92m12 of 12 tests passed[0m


In [6]:
# Now, complete the read function for CrimeData, using the HtDAP template

@typecheck
def read(filename: str) -> List[CrimeData]:
    """    
    reads information from the specified file and returns a list of crime data
    (returns all rows EXCEPT those with missing neighborhood)
    """
    #return []  #stub
    # Template from HtDAP
    # locd contains the result so far
    locd = [] # type: List[CrimeData]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            if is_reliable(row):
                cd = CrimeData(row[7], parse_int(row[1]), parse_month(row[2]), parse_int(row[3]))
                locd.append(cd)
    
    return locd

start_testing()

# Examples and tests for read
expect(read("testfile_empty.csv"), [])
expect(read("testfile_small.csv"), [CrimeData("West End", 2018, Month.MAR, 2), 
                                    CrimeData("Kitsilano", 1994, Month.OCT, 14)])
expect(read("testfile_small3.csv"), [CrimeData("West End", 2018, Month.JAN, 2), 
                                     CrimeData("West End", 2018, Month.JAN, 3)])

summary()

[92m3 of 3 tests passed[0m


**Extra practice exercise:** write a function equivalent to `is_reliable`, this time checking that the neighborhood information is not missing, and add it to your read function.

In [4]:
# Complete is_reliable below (run first to pass all tests for read)

@typecheck
def is_reliable(row: List[str]) -> bool:
    """
    return True if the neighborhood information (row[6]) is not empty.
    
    ASSUMES row is a full row of column values from a crime data information file,
    specifically, row[6] must exist and be the neighborhood field.
    """
    # return True
    # template treats columns as atomic and uses indexing instead
    # return ...(row)  #template
    return row[6] != ""