# Module 7 - How to Design Analysis Programs

## ...but first, some updates

- The deadline of the project milestone has been moved to Monday, March 25 (10PM)
- Opportunities to get additional help with the project will be available in class (March 26, April 9 and 11) and during tutorials (last 3 weeks), on top of OH already available and Piazza.
- Midterm grades to be released tomorrow - look out for a Piazza announcement.

## Important update regarding the last 3 weeks of course
- Sadly, I will have to absent myself and will not be able to deliver the last 3 weeks of class in person.
- The last lecture on Module 7 (March 28) will be delivered on Zoom and recorded (similarly to what happened during campus closures)
- Module 8 (April 2 and 4) will be covered in class by the TAs. I will also post a video to cover the same content, so that you have mutliple sources available.
- 2 TAs will be in class during designated in-class OH (March 26, April 9 and 11). I will also be available on Zoom at the same time (be aware of high attendance in both venues). I will also deliver my regular OH (Tu-Th 2-3:15) on Zoom (all links on Canvas).
- As I expect more traffic during these OH, I will be using the queue system explained at the top of this [Canvas page](https://canvas.ubc.ca/courses/130118/pages/schedule-tutorials-and-office-hours?module_item_id=6407799). Students not in the queue will not be admitted!

## Module 7 - Learning goals
- Identify the information that is available to you. 
- Identify many possible outputs that your program could produce, given the information that you have available. 
- Decide which subset of information you need to represent as data in your program to solve particular problems. 
- Design a read function to read information from a file and represent it as data in your program. 
- Describe all steps you need to do for the Project Milestone! 

## How to Design Analysis Programs - HtDAP recipe

1. Planning
    1. Identify the information in the file your program will read. 
    2. Write a description of what your program will produce.
    3. Write or draw examples of what your program will produce. 
2. Building the program
    1. Design data definitions. 
    2. Design a function to read the information and store it as data in your program. 
    3. Design functions to analyze the data. 

## Analysing VPD Crime Data

We've uploaded a tiny portion of the crime data shared by the [Vancouver Police Department](https://vancouver.ca/police/)'s [Open Data initiative](https://geodash.vpd.ca/opendata/). The complete file has well over half a million rows. The portion we uploaded is all crimes labelled as "break and enter" (in two variants: commercial and residential) and "theft of" (in two variants: vehicle and bicycle) in 2018.

You can see our information file in this directory named `crimedata_subset_bne_theft_of_bike_veh_2018.csv`. You can also find the license for this information and a PDF file from VPD describing the information source.

Let's see if we can answer the question: At what time of day does crime of various types peak in Vancouver?

We'll **start from the project final submission template** to get good practice both on using HtDAP and preparing for the project! (We've edited this slightly to note places where we'll deviate from the project.)

### Step 1a: Planning 
#### Identify the information in the file your program will read

- Open the csv file that you are going to work on and describe its information. 
- See what fields each data has and what type of data you can see on it. 
- Identify all data in the file, not only the data you are going to use in your analysis. 

#### Side note: CSV files

- CSV is a simple file format used to store tabular data, such as a spreadsheet or database.
- It is a text file that uses a comma to separate values.
- The first line has all fields’ names and all other lines have the data itself. 
- Jupyter can open it as a text file. Excel can open it, but will convert it to a spreadsheet and Jupyter will not be able to read it anymore.
- Jupyter shows line numbers on the left (notice that maybe one line does not fit on the screen, but it is still only one line). 
- The file must have exactly one empty line at the end! 
- The csv file must be in the same folder as your program. 

*Your solution here*

### Step 1b: Planning 
#### Write a description of what your program will produce

- Explain what you are going to analyze and how you are going to do this. 
- Describe what your problem should return, one value and a graph you are going to plot. 
- Give any special information you are going to need to assume or problems you can see in your data file (csv files may have problems and you are not allowed to change then in any way!). 

You must brainstorm at least three ideas for graphs or charts that your program could produce and choose the one that you'd like to work on. You can choose between a line chart, histogram, bar chart, scatterplot, or pie chart. *Note: we might focus on non-graphs for now, since we're really studying HtDAP rather than the project.*

*Your solution here*

### Step 1c: Planning 
#### Write or draw examples of what your program will produce

- Add an example of how you are going to run your program (function name and its inputs or parameters). 
- Describe any values it should return. 
- Attach a drawing of the graph you are going to plot. 
- You should not use an application and this don’t need to be based in any real information from your csv file. To do that use Jupyter’s insert image command (the image file should be on the same folder as the code file): 
```
![title](image_name.jpg) 
```


You must include an image that shows what your chart or plot will look like. You can insert an image using the Insert Image command near the bottom of the Edit menu. *Note: we'll practice using the "insert image" command just for the fun of it, but we are still not focusing on graphs/charts.*

**TODO: sketch a graph of crime over the hours of the day for the various types of crime.**

### Step 2a: Building
#### Design data definitions

- You should design data that your program is going to use. 
- You should focus only on the fields that your program is going to use and choose the correct type for each one. 
- If you chose a non-primitive (interval, enumeration, optional...) you have to design it. 
- Create a compound data to represent one line of data on the csv file (Consumed). 
- Create a list from this compound data to represent all lines of data on the csv file (List[Consumed]). 
- You may need more data that can be added later (for example, List[int] or List[str]). 

*Note: we'll skip the "chart or graph" part!*

In [None]:
from cs103 import *
from typing import NamedTuple, List
from enum import Enum
import csv

##################
# Data Definitions


In [None]:
# Here are some definitions we'll need later on that aren't particularly interesting to work on in class!

# List[str]
# interp. a list of strings
LOS0 = []
LOS1 = ['hello', 'world']

# template based on arbitrary-sized data
@typecheck
def fn_for_los(los: List[str]) -> ...:
    # description of accumulator
    acc = ... # type: ...
    
    for s in los:
        acc = ...(s, acc)
        
    return ...(acc)


# List[int]
# interp. a list of integers
LOI0 = []
LOI1 = [1, -12]

# template based on arbitrary-sized data
@typecheck
def fn_for_loi(loi: List[int]) -> ...:
    # description of accumulator
    acc = ... # type: ...
    
    for i in loi:
        acc = ...(i, acc)
        
    return ...(acc)

### Step 2b: Building
#### Design a function to read the information and store it as data in your program

- You should complete the read function from its template. 
- Change the Consumed type name.
- Check what columns from the file you need.
- Check if the types needs parsing (changing the representation on the computer). All data on the csv file is a string. 
- You can also add any other function you need (as to filter only the data that matters from your file, or remove err rows from the csv file).
- You should create at least two small csv files for testing, so you can be sure your function is working before using it in the large files. 

In [None]:
@typecheck
def read(filename: str) -> List[Consumed]:
    """    
    reads information from the specified file and returns ...
    """
    #return []  #stub
    # Template from HtDAP
    # loc contains the result so far
    loc = [] # type: List[Consumed]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            # you may not need to store all the rows, and you may need
            # to convert some of the strings to other types
            c = Consumed(row[0], ... ,row[n])
            loc.append(c)
    
    return loc

# row is a list of string that has all data from one row. 
# row[0] has the data from the first column of row. 
# row[1] has the data from the second column of row. 
# row[i] has the data from the (i+1)th column of row. 

# All data from row is a string, if you need a different type you must convert it
# using a parse function.
# parse_int() and parse_float() are already defined in the cpsc103 package.
# For all other types (enumeration, optional, etc...) you must create your parse function. 


start_testing()

# Examples and tests for read
expect(..., ...)

summary()



### Step 2c: Building
#### Design functions to analyze the data

Complete these steps in the code cell below. You will likely want to rename the analyze function so that the function name describes what your analysis function does.


**NOTE:** To make this manageable in class, we will provide some finished helper functions with the second week's notes.

In [None]:
###########
# Functions

@typecheck
def main(filename: str) -> ...:
    """
    Reads the file from given filename, analyzes the data, returns the result 
    """
    # Template from HtDAP, based on function composition 
    return analyze(read(filename)) 
    
    


@typecheck
def analyze(loc: List[Consumed]) -> Produced: 
    """ 
    ... 
    """ 

    return ...


start_testing()

# Examples and tests for main
expect(..., ...)

summary()

start_testing()

# Examples and tests for analyze 
expect(..., ...) 

summary()