<div style="float:left">
    <h1 style="width:450px">Live Coding 4: Object-Oriented Programming 2</h1>
    <h2 style="width:450px">Making Use of Packages and Functions</h2>
</div>
<div style="float:right"><img width="100" src="https://github.com/jreades/i2p/raw/master/img/casa_logo.jpg" /></div>

## Task 1: From Notebooks To...

How do we take our work and use it in non-notebook contexts. Let's look at exporting notebooks for a moment via `File` > `Export Notebook As` > `...`. Depending on how you installed everything you will see a different range of options, the most interesting ones for our purposes are: 
1. To Executable Script -- this produces a `.py` file that can be run directly in Python. So now if you have done a lot of work in Jupyter and want to turn your work into something more like a traditional application, you can!
- To PDF -- this passes through the intermediary of something called LaTeX, which is a very old, but very powerful type-setting 'language'. The quality of output makes Word look like it was designed for 7 year-olds, but getting that quality can be a huge pain.
- To Markdown -- depending on the quality of your Markdown renderer this can be very good and very useful indeed.
- To Reveal.js -- used to create web-compatible presentations. I've never quite got my head around reveal, but it's a common format for writing presentations using entirely Open Source software.

## Task 2: Picking Up Last Week...

Recreating the `ds` data structure from last week... which we can do by copy+pasting the code into one block:

#### Task 2.1: Download &amp; Convert to LoL

In [2]:
from urllib.request import urlopen
import csv 

# Given the info you were given above, what do you 
# think the value of 'url' should be? What
# type of variable is it? int or string? 
url = 'https://raw.githubusercontent.com/jreades/fsds/master/data/2019-sample-crime.csv'

# Read the URL stream into variable called 'response'
# using the function that we imported above
response = urlopen(url)

# Now read from the stream, decoding so that we get actual text
datafile = response.read().decode('utf-8')

urlData = [] # Somewhere to store the data

csvfile = csv.reader(datafile.splitlines())

for row in csvfile:
    urlData.append( row )

print(f"urlData has {len(urlData)} rows and {len(urlData[0])} columns.")

urlData has 101 rows and 11 columns.


#### Task 2.2: Convert to DoL

In [3]:
# Now convert this to a data structure!
ds = {}

# Column names as dictionary keys
col_names = urlData[0]
for c in col_names:
    ds[c] = []

In [4]:
print(ds)

{'ID': [], 'Case Number': [], 'Date': [], 'Primary Type': [], 'Description': [], 'Location Description': [], 'Arrest': [], 'Domestic': [], 'Year': [], 'Latitude': [], 'Longitude': []}


In [5]:
# Then values in a list attached to each key
for row in urlData[1:]:
    for c in range(0,len(col_names)):
        #print(col_names[c])
        ds[ col_names[c] ].append( row[c] )

#### Task 2.3: Fix Data Types

In [6]:
# Convert the raw data to data of the appropriate
# type: 'column data' (cdata) -> 'column type' (ctype)
def to_type(cdata, ctype):
    fdata = []
    for c in cdata:
        try:
            if ctype==bool:
                fdata.append( c=='True' )
            else:
                fdata.append( ctype(c) )
        except TypeError:
            fdata.append( c )
    return fdata

# And here's my 'map' of column names
# to column data types...
cols = {
    'Latitude':float,
    'Longitude':float,
    'ID':int,
    'Year':int,
    'Arrest':bool,
    'Domestic':bool,
    'Case Number':str,
    'Date':str,
    'Primary Type':str,
    'Description':str,
    'Location Description':str
}

# Now apply this!
for k in ds.keys():
    ds[ k ] = to_type(ds[k], cols[k])

In [7]:
# Check it worked by printout out the first few values 
# of each column type... 
for k, v in ds.items():
    print(f"{k:<15} ({type(v[0]).__name__}): " + ', '.join(str(x) for x in v[:5]))
    # Another way to do it...
    # print(f"{k}:\t" + ", ".join(map(str, v[:5])))

ID              (int): 11667185, 11909178, 11852571, 11804855, 11808164
Case Number     (str): JC237601, JC532226, JC462365, JC405161, JC409088
Date            (str): 04/20/2019 11:00:00 PM, 12/02/2019 10:35:00 AM, 10/06/2019 04:50:00 PM, 08/23/2019 10:00:00 PM, 08/26/2019 12:00:00 AM
Primary Type    (str): BURGLARY, DECEPTIVE PRACTICE, BATTERY, THEFT, BATTERY
Description     (str): FORCIBLE ENTRY, FRAUD OR CONFIDENCE GAME, AGGRAVATED DOMESTIC BATTERY - OTHER DANGEROUS WEAPON, OVER $500, SIMPLE
Location Description (str): COMMERCIAL / BUSINESS OFFICE, GROCERY FOOD STORE, CLEANING STORE, STREET, ALLEY
Arrest          (bool): False, False, True, False, False
Domestic        (bool): False, False, True, False, False
Year            (int): 2019, 2019, 2019, 2019, 2019
Latitude        (float): 41.751307057, 41.903996883000005, 41.880328606, 41.924383963000004, 41.755797128000005
Longitude       (float): -87.60346764, -87.64323023799999, -87.758473298, -87.64144151299999, -87.634426259


## Task 3: Using Set, List, and Dictionary Functions

- What are the distinct Primary Types?
- How many Narcotics charges are there?
- Creating an Index for updating/inserting? 

#### Task 3.1: Distinct Types

Using set to create a categorical data type.

In [9]:
s = set([1,2,2,2,2,3])
print(s)

{1, 2, 3}


In [18]:
s = set(ds['Primary Type'])
print(s)

{'CRIMINAL SEXUAL ASSAULT', 'OTHER OFFENSE', 'BURGLARY', 'STALKING', 'ASSAULT', 'MOTOR VEHICLE THEFT', 'CRIMINAL DAMAGE', 'ROBBERY', 'SEX OFFENSE', 'BATTERY', 'INTERFERENCE WITH PUBLIC OFFICER', 'NARCOTICS', 'DECEPTIVE PRACTICE', 'THEFT', 'WEAPONS VIOLATION'}


And maybe use string methods to make these a little prettier:

In [10]:
ds['Primary Type']

['BURGLARY',
 'DECEPTIVE PRACTICE',
 'BATTERY',
 'THEFT',
 'BATTERY',
 'THEFT',
 'NARCOTICS',
 'THEFT',
 'THEFT',
 'CRIMINAL DAMAGE',
 'ASSAULT',
 'BURGLARY',
 'CRIMINAL DAMAGE',
 'OTHER OFFENSE',
 'NARCOTICS',
 'STALKING',
 'DECEPTIVE PRACTICE',
 'BATTERY',
 'CRIMINAL DAMAGE',
 'BATTERY',
 'THEFT',
 'THEFT',
 'MOTOR VEHICLE THEFT',
 'DECEPTIVE PRACTICE',
 'BURGLARY',
 'THEFT',
 'THEFT',
 'BURGLARY',
 'THEFT',
 'ASSAULT',
 'NARCOTICS',
 'ASSAULT',
 'THEFT',
 'THEFT',
 'THEFT',
 'NARCOTICS',
 'OTHER OFFENSE',
 'BATTERY',
 'MOTOR VEHICLE THEFT',
 'THEFT',
 'THEFT',
 'BATTERY',
 'THEFT',
 'CRIMINAL DAMAGE',
 'DECEPTIVE PRACTICE',
 'CRIMINAL DAMAGE',
 'DECEPTIVE PRACTICE',
 'THEFT',
 'WEAPONS VIOLATION',
 'BURGLARY',
 'THEFT',
 'NARCOTICS',
 'MOTOR VEHICLE THEFT',
 'BATTERY',
 'DECEPTIVE PRACTICE',
 'OTHER OFFENSE',
 'SEX OFFENSE',
 'THEFT',
 'WEAPONS VIOLATION',
 'CRIMINAL SEXUAL ASSAULT',
 'THEFT',
 'THEFT',
 'BATTERY',
 'BATTERY',
 'BATTERY',
 'BURGLARY',
 'THEFT',
 'ROBBERY',
 'THEFT',

In [11]:
[str(x).capitalize() for x in ds['Primary Type']]

['Burglary',
 'Deceptive practice',
 'Battery',
 'Theft',
 'Battery',
 'Theft',
 'Narcotics',
 'Theft',
 'Theft',
 'Criminal damage',
 'Assault',
 'Burglary',
 'Criminal damage',
 'Other offense',
 'Narcotics',
 'Stalking',
 'Deceptive practice',
 'Battery',
 'Criminal damage',
 'Battery',
 'Theft',
 'Theft',
 'Motor vehicle theft',
 'Deceptive practice',
 'Burglary',
 'Theft',
 'Theft',
 'Burglary',
 'Theft',
 'Assault',
 'Narcotics',
 'Assault',
 'Theft',
 'Theft',
 'Theft',
 'Narcotics',
 'Other offense',
 'Battery',
 'Motor vehicle theft',
 'Theft',
 'Theft',
 'Battery',
 'Theft',
 'Criminal damage',
 'Deceptive practice',
 'Criminal damage',
 'Deceptive practice',
 'Theft',
 'Weapons violation',
 'Burglary',
 'Theft',
 'Narcotics',
 'Motor vehicle theft',
 'Battery',
 'Deceptive practice',
 'Other offense',
 'Sex offense',
 'Theft',
 'Weapons violation',
 'Criminal sexual assault',
 'Theft',
 'Theft',
 'Battery',
 'Battery',
 'Battery',
 'Burglary',
 'Theft',
 'Robbery',
 'Theft',

In [19]:
s = set([str(x).capitalize() for x in ds['Primary Type']])
print(s)

{'Sex offense', 'Criminal damage', 'Theft', 'Robbery', 'Assault', 'Motor vehicle theft', 'Battery', 'Other offense', 'Deceptive practice', 'Weapons violation', 'Criminal sexual assault', 'Interference with public officer', 'Burglary', 'Narcotics', 'Stalking'}


#### Task 3.2: Find All Matches

How would be find not just the first `index()` but all that match?

- [Google it](https://www.google.com/search?q=python3+find+all+index+of+element+in+list&rlz=1C5CHFA_enGB917GB917&oq=python3+find+all+indexes+of&aqs=chrome.2.69i57j33l3.8922j0j4&sourceid=chrome&ie=UTF-8)
- [Looks promising](https://stackoverflow.com/questions/6294179/how-to-find-all-occurrences-of-an-element-in-a-list) _[Note that one solution uses a list comprehension and the other numpy]_

In [23]:
# Paste in solution as template
# indices = [i for i, x in enumerate(my_list) if x == "whatever"]
# Now we need to modify it for our purposes

target = 'NARCOTICS'
indices = [i for i, x in enumerate(ds2['Primary Type']) if x == target]
print(indices)

[6, 14, 30, 35, 51, 75, 99]


In [24]:
# We can't do this as a slice, so for loop necessary
result = {}

# Create empty result set
for c in col_names:
    result[c] = []

# Notice how much simpler this is than 
# when we had a list-of-lists
for i in indices:
    for c in col_names:
        result[ c ].append( ds2[c][i] )

# But this is *still* slower than just using grep!

In [25]:
# Check result
for c in result.keys():
    print(f"{c}:\t{result[c]}")

ID:	[11826687, 11669330, 11855998, 11622744, 11592042, 11807399, 11648994]
Case Number:	['JC431308', 'JC241440', 'JC466763', 'JC113884', 'JC147807', 'JC408479', 'JC217114']
Date:	['09/13/2019 07:57:00 AM', '04/28/2019 09:38:00 AM', '10/09/2019 08:48:00 PM', '01/12/2019 09:03:00 AM', '02/11/2019 01:12:00 AM', '08/26/2019 05:41:00 PM', '04/08/2019 11:10:00 PM']
Primary Type:	['NARCOTICS', 'NARCOTICS', 'NARCOTICS', 'NARCOTICS', 'NARCOTICS', 'NARCOTICS', 'NARCOTICS']
Description:	['POSSESSION OF DRUG EQUIPMENT', 'POSS: CANNABIS 30GMS OR LESS', 'POSS: PCP', 'MANU/DELIVER:SYNTHETIC DRUGS', 'POSS: CANNABIS 30GMS OR LESS', 'MANU/DELIVER: HEROIN (WHITE)', 'MANU/DELIVER: HEROIN (WHITE)']
Location Description:	['ALLEY', 'SIDEWALK', 'CTA BUS STOP', 'GROCERY FOOD STORE', 'STREET', 'APARTMENT', 'ABANDONED BUILDING']
Arrest:	[True, True, True, True, True, True, True]
Domestic:	[False, False, False, False, False, False, False]
Year:	[2019, 2019, 2019, 2019, 2019, 2019, 2019]
Latitude:	[41.89036711, 41

## Task 4: Using an Index

Logically, we think of an 'index' as something unique to each row. Such that if we see the same value elsewhere in the data we assume that the existing values should be overwritten, or that there is a problem with the data!

- So what would be the index in this case?
- How might we 'remember' this?

In [16]:
idx = 'ID' # Index Column

#### Task 4.1: Updating a Record

In [14]:
new_record = ['11622744','JC113844','11/30/2019 07:00:00 AM','ROBBERY',
              'POSSESSION OF DRUG EQUIPMENT','SIDEWALK','True','False',
              '2020','41.5000000','-87.50000000']

So perhaps we can use our `to_type` function?

In [None]:
# Convert the raw data to data of the appropriate
# type: 'column data' (cdata) -> 'column type' (ctype)
def to_type(cdata, ctype):
    fdata = []
    for c in cdata:
        try:
            if ctype==bool:
                fdata.append( c=='True' )
            else:
                fdata.append( ctype(c) )
        except TypeError:
            fdata.append( c )
    return fdata

In [39]:
# Find the index
print(new_record[ col_names.index(idx) ])
lkp = to_type(new_record[ col_names.index(idx) ], 'int')
print(lkp)
print(ds[idx].index(lkp))

11622744
['1', '1', '6', '2', '2', '7', '4', '4']


ValueError: ['1', '1', '6', '2', '2', '7', '4', '4'] is not in list

Why doesn't that work? 

- Look at the function... 
- Note that is _assumes_ all input is a list!
- So we need to make it more generic
- How do we [check if something is a string](https://www.google.com/search?q=check+if+variable+is+string+python&rlz=1C5CHFA_enGB917GB917&oq=check+if+variable+is+string+python&aqs=chrome..69i57j0l7.4572j0j7&sourceid=chrome&ie=UTF-8)? 

In [11]:
def to_type(cdata, ctype):
    # If a string
    if isinstance(cdata, str):
        try:
            if ctype==bool:
                return cdata==True
            else:
                return ctype(cdata)
        except TypeError:
            return cdata
    
    # Not a string (assume list)
    else: 
        fdata = []
        for c in cdata:
            try:
                if ctype==bool:
                    fdata.append( c=='True' )
                else:
                    fdata.append( ctype(c) )
            except:
                fdata.append( c )
        return fdata
    
    return fdata

In [12]:
# And here's my 'map' of column names
# to column data types...
col_map = {
    'Latitude':float,
    'Longitude':float,
    'ID':int,
    'Year':int,
    'Arrest':bool,
    'Domestic':bool,
    'Case Number':str,
    'Date':str,
    'Primary Type':str,
    'Description':str,
    'Location Description':str
}

In [18]:
# Find the index
print(new_record[ col_names.index(idx) ])

# What's going on in this next line? Can't figure it out? Print out the parts!
lkp  = to_type(new_record[ col_names.index(idx) ], col_map[idx])

# Now find the Index Row so that we can update
idxr = ds[idx].index(lkp)
print(f"Index is: {idxr}")

# Output the record
for c in ds.keys():
    print(f"\t{c:<20} -> {ds[c][idxr]}")

11622744
Index is: 35
	ID              -> 11622744
	Case Number     -> JC113884
	Date            -> 01/12/2019 09:03:00 AM
	Primary Type    -> NARCOTICS
	Description     -> MANU/DELIVER:SYNTHETIC DRUGS
	Location Description -> GROCERY FOOD STORE
	Arrest          -> True
	Domestic        -> False
	Year            -> 2019
	Latitude        -> 41.880715605
	Longitude       -> -87.726481891


What's there now?

And now for the update!

In [19]:
# How would we know that the update were working...
# print out the Key, Original Value, and New Value!
for cid in range(0,len(col_names)):
    print(col_names[cid])
    print("\t" + str(ds[col_names[cid]][idxr]))
    print("\t" + str(new_record[cid]))

ID
	11622744
	11622744
Case Number
	JC113884
	JC113844
Date
	01/12/2019 09:03:00 AM
	11/30/2019 07:00:00 AM
Primary Type
	NARCOTICS
	ROBBERY
Description
	MANU/DELIVER:SYNTHETIC DRUGS
	POSSESSION OF DRUG EQUIPMENT
Location Description
	GROCERY FOOD STORE
	SIDEWALK
Arrest
	True
	True
Domestic
	False
	False
Year
	2019
	2020
Latitude
	41.880715605
	41.5000000
Longitude
	-87.726481891
	-87.50000000


In [57]:
# And now the update
for cid in range(0,len(col_names)):
    ds[col_names[cid]][idxr] = to_type(new_record[cid], col_map[col_names[cid]])

In [20]:
# Check the updated record
for c in ds.keys():
    print(f"{c:>20}\t->\t{ds[c][idxr]}")

                  ID	->	11622744
         Case Number	->	JC113884
                Date	->	01/12/2019 09:03:00 AM
        Primary Type	->	NARCOTICS
         Description	->	MANU/DELIVER:SYNTHETIC DRUGS
Location Description	->	GROCERY FOOD STORE
              Arrest	->	True
            Domestic	->	False
                Year	->	2019
            Latitude	->	41.880715605
           Longitude	->	-87.726481891


#### Task 4.2 (Optional): Creating Functions from Code

- Turn the 'find value in index' code into a function
- Turn the 'update value based on index' code into a function
- Create an 'add row' function