# Debugging

## Syntax Errors

Given two lists, one of people's names and another of their scores, create a list of tuples such that for each person you have a tuple of their name and their score.

You might come up with a solution that looks like:

In [8]:
names = ['a','b','c','d','e']
scores = [90,76,55,82,88]

In [11]:
people_and_scores = []
for i in range(len(names)):
    people_and_scores.append((names[i],scores[i]))
people_and_scores
    

[('a', 90), ('b', 76), ('c', 55), ('d', 82), ('e', 88)]

There's a better way of doing it:  the **zip command**

Let's take a look at the documentation for the zip command:
https://docs.python.org/3.5/library/functions.html#zip
Hmmmmm.  Not all that useful, so let's try it out:

In [19]:
zip(names,scores)

<zip at 0x7fe95048fb88>

In [6]:
for i in zip(names,scores):
    print(i)

('a', 90)
('b', 76)
('c', 55)
('d', 82)
('e', 88)


In [20]:
people_and_scores2 = []
for i in zip(names,scores):
    people_and_scores2.append(i)
people_and_scores2

[('a', 90), ('b', 76), ('c', 55), ('d', 82), ('e', 88)]

In [24]:
people_and_scores3 = list(zip(names,scores))
people_and_scores3

[('a', 90), ('b', 76), ('c', 55), ('d', 82), ('e', 88)]

Ok, but let's say you had a structure that looks like people_and_scores and you wanted to extract just the names.  How would you do that?

In [26]:
names = []
for i in people_and_scores:
    names.append(i[0])
names
    

['a', 'b', 'c', 'd', 'e']

Back to our documentation:
https://docs.python.org/3.5/library/functions.html#zip

There's a blurb there about
> zip() in conjunction with the * operator can be used to unzip a list:

followed by a code example:

```
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> list(zipped)
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zip(x, y))
>>> x == list(x2) and y == list(y2)
True
```

Google for "python zip explained", get
https://stackoverflow.com/questions/19339/transpose-unzip-function-inverse-of-zip#19343

In [30]:
list(list(zip(*people_and_scores))[0])

['a', 'b', 'c', 'd', 'e']

Note also, however, that there's a link that talks about using generators:

https://stackoverflow.com/questions/30805000/how-to-unzip-an-iterator


In [70]:
import itertools

In [86]:
names,scores = itertools.tee(people_and_scores)

In [82]:
names

<itertools._tee at 0x7fe95014aa08>

In [83]:
for n in names:
    print(n,type(n))

('a', 90) <class 'tuple'>
('b', 76) <class 'tuple'>
('c', 55) <class 'tuple'>
('d', 82) <class 'tuple'>
('e', 88) <class 'tuple'>


In [87]:
names = (x[0] for x in names)

In [88]:
for n in names:
    print(n,type(n))

a <class 'str'>
b <class 'str'>
c <class 'str'>
d <class 'str'>
e <class 'str'>


In [47]:
for i in names:
    print(i)

a
b
c
d
e


Ok, let's try this again.

### Top-level goal:  to create a list of (lat, lon) tuples where lat is between X and Y

We're going to read a file efficiently using a generator:


In [14]:
filename = 'assets/ride_final2.csv'

In [15]:
def read_lat_and_lon_by_line(filename):
    with open(filename) as f:
        while True:
            line = f.readline()
            if not line:
                break
            data = line.split(',')
            yield (data[1],data[2])

In [16]:
f = read_lat_and_lon_by_line(filename)

In [17]:
f

<generator object read_lat_and_lon_by_line at 0x00000233D7CEDEB0>

In [18]:
count = 0
for i in read_lat_and_lon_by_line(filename):
    count = count+1
    if count > 5:
        break
    print(i)

('"Latitude"', '"Longitude"')
('"504719750"', '"-998493490"')
('"504717676"', '"-998501870"')
('"504716354"', '"-998506792"')
('"504714055"', '"-998515244"')


Let's get rid of the first line (the header line):

In [19]:
def read_lat_and_lon_by_line(filename):
    with open(filename) as f:
        first = True
        while True:
            line = f.readline()
            if first:
                line = f.readline()
                first = False
            if not line:
                break
            data = line.split(',')
            yield (data[1],data[2])

In [20]:
def read_lat_and_lon_by_line(filename):
    with open(filename) as f:
        first = True
        while True:
            line = f.readline()
            if first:
                line = f.readline()
                first = False
            if not line:
                break
            data = line.split(',')
            yield (data[1],data[2])

In [21]:
count = 0
for i in read_lat_and_lon_by_line(filename):
    count = count+1
    if count > 5:
        break
    print(i)

('"504719750"', '"-998493490"')
('"504717676"', '"-998501870"')
('"504716354"', '"-998506792"')
('"504714055"', '"-998515244"')
('"504711900"', '"-998523278"')


In [22]:
((lat,lon) for (lat,lon) in read_lat_and_lon_by_line(filename) if lon < -998493490 )

<generator object <genexpr> at 0x00000233D7D40510>

In [23]:
import csv
def read_lat_and_lon_with_reader(filename):
    with open(filename, 'r') as csvfile:
        csvreader = csv.DictReader(csvfile)
        for row in csvreader:
            yield (int(row['Latitude']),int(row['Longitude']))
        
    

In [24]:
g = ((lat,lon) for (lat,lon) in read_lat_and_lon_with_reader(filename) if lon < -998493490 )

In [25]:
for r in g:
    print(r)

(504717676, -998501870)
(504716354, -998506792)
(504714055, -998515244)
(504711900, -998523278)
(504709729, -998531192)
(504707299, -998540018)
(504705967, -998544934)
(504703695, -998553170)
(504701326, -998561924)
(504700547, -998564668)
(504698641, -998571568)
(504696909, -998577942)
(504695977, -998581247)
(504695051, -998584562)
(504692926, -998592970)
(504692111, -998597619)
(504691299, -998606407)
(504691173, -998612263)
(504690996, -998620769)
(504690902, -998629455)
(504690919, -998630192)
(504690924, -998630680)
(504691253, -998633926)
(504691172, -998633950)
(504684780, -998633605)
(504677722, -998633704)
(504675542, -998633688)
(504673200, -998633697)
(504666467, -998633752)
(504665599, -998633751)
(504661165, -998633676)
(504656517, -998633739)
(504653635, -998633704)
(504646885, -998633610)
(504645873, -998633596)
(504638364, -998633928)
(504636279, -998634532)
(504635317, -998635011)
(504631201, -998637323)
(504630451, -998638943)
(504630652, -998641677)
(504630408, -998

Next: debugging with print() statements

For example, let's say there's some bad data in the data file.



Understanding error stacks

Passing reference to pdb, set_trace

PixieDebugger?

## Copy-and-paste errors

From https://datascienceplus.com/how-to-achieve-parallel-processing-in-python-programming/

Copy 
```
import multiprocessing as multip
print(“Total number of processors on your machine is: ”, multip.cpu_count())
```

What's wrong?

In [5]:
import multiprocessing as multip
print("Total number of processors on your machine is: ", multip.cpu_count())

Total number of processors on your machine is:  6


# set_trace

# Pixie Debugger

In [26]:
import csv

In [27]:
filename = 'assets/ride_final2_extra.csv'

In [28]:
from IPython.core.debugger import set_trace


In [29]:
def read_lat_and_lon_with_reader(filename):
    with open(filename, 'r') as csvfile:
        csvreader = csv.DictReader(csvfile)
        for row in csvreader:
            print(row['Latitude'],row['Longitude'])
            yield (int(row['Latitude']),int(row['Longitude']))

In [30]:
for i in read_lat_and_lon_with_reader(filename):
    print(i)


FileNotFoundError: [Errno 2] No such file or directory: 'assets/ride_final2_extra.csv'

Google for this error.

Couple of ways to proceed:
1. Fix the data file.
2. Write a more generalized routine

Question: under what circumstances is each of these approaches preferred?

In [18]:
# option 1: fix the datafile.

In [16]:
import numpy as np
def read_lat_and_lon_with_reader(filename):
    with open(filename, 'r') as csvfile:
        csvreader = csv.DictReader(csvfile)
        for row in csvreader:
            print(row['Latitude'],row['Longitude'])
            try:
                yield (int(row['Latitude']),int(row['Longitude']))
            except:
                yield np.nan,np.nan

In [17]:
for i in read_lat_and_lon_with_reader(filename):
    print(i)

504719750 -998493490
(504719750, -998493490)
504717676 -998501870
(504717676, -998501870)
504716354 -998506792
(504716354, -998506792)
504714055 -998515244
(504714055, -998515244)
504711900 -998523278
(504711900, -998523278)
504709729 -998531192
(504709729, -998531192)
504707299 -998540018
(504707299, -998540018)
504705967 -998544934
(504705967, -998544934)
504703695 -998553170
(504703695, -998553170)
504701326 -998561924
(504701326, -998561924)
504700547 -998564668
(504700547, -998564668)
504698641 -998571568
(504698641, -998571568)
504696909 -998577942
(504696909, -998577942)
504695977 -998581247
(504695977, -998581247)
504695051 -998584562
(504695051, -998584562)
504692926 -998592970
(504692926, -998592970)
504692111 -998597619
(504692111, -998597619)
504691299 -998606407
(504691299, -998606407)
504691173 -998612263
(504691173, -998612263)
504690996 -998620769
(504690996, -998620769)
504690902 -998629455
(504690902, -998629455)
504690919 -998630192
(504690919, -998630192)
504690924 

In [31]:
import random
def find_max (values):
    max = 0
    for val in values:
        if val < max:
            max = val
    return max
print(find_max(random.sample(range(100), 10)))

0


In [34]:
import pixiedust

Pixiedust database opened successfully
Table VERSION_TRACKER created successfully
Table METRICS_TRACKER created successfully

Share anonymous install statistics? (opt-out instructions)

PixieDust will record metadata on its environment the next time the package is installed or updated. The data is anonymized and aggregated to help plan for future releases, and records only the following values:

{
   "data_sent": currentDate,
   "runtime": "python",
   "application_version": currentPixiedustVersion,
   "space_id": nonIdentifyingUniqueId,
   "config": {
       "repository_id": "https://github.com/ibm-watson-data-lab/pixiedust",
       "target_runtimes": ["Data Science Experience"],
       "event_id": "web",
       "event_organizer": "dev-journeys"
   }
}
You can opt out by calling pixiedust.optOut() in a new cell.


[31mPixiedust runtime updated. Please restart kernel[0m
Table USER_PREFERENCES created successfully
Table service_connections created successfully


In [35]:
%%pixie_debugger
import random
def find_max (values):
    max = 0
    for val in values:
        if val < max:
            max = val
    return max
print(find_max(random.sample(range(100), 10)))

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

But keep in mind that most common things have already been implemented in an efficient way:

In [36]:
max(random.sample(range(100),10))

97

In [37]:
import inspect

In [38]:
inspect.getmodule(max)

<module 'builtins' (built-in)>

In [40]:
max??

[1;31mDocstring:[0m
max(iterable, *[, default=obj, key=func]) -> value
max(arg1, arg2, *args, *[, key=func]) -> value

With a single iterable argument, return its biggest item. The
default keyword-only argument specifies an object to return if
the provided iterable is empty.
With two or more arguments, return the largest argument.
[1;31mType:[0m      builtin_function_or_method


In [41]:
from decimal import *

In [42]:
Decimal(1).max(Decimal(2))


Decimal('2')

In [43]:
Decimal??

[1;31mInit signature:[0m [0mDecimal[0m[1;33m([0m[0mvalue[0m[1;33m=[0m[1;34m'0'[0m[1;33m,[0m [0mcontext[0m[1;33m=[0m[1;32mNone[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
Construct a new Decimal object. 'value' can be an integer, string, tuple,
or another Decimal object. If no value is given, return Decimal('0'). The
context does not affect the conversion and is only passed to determine if
the InvalidOperation trap is active.
[1;31mFile:[0m           c:\users\huang\appdata\local\programs\python\python39\lib\decimal.py
[1;31mType:[0m           type
[1;31mSubclasses:[0m     


In [44]:
import pixiedust

In [45]:
%%pixie_debugger
import random
def find_max (values):
    max = 0
    for val in values:
        if val > max:
            max = val
    return max
print(find_max(random.sample(range(100), 10)))


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [14]:
find_max([1,2,3])

3