# E2 - debugging python
## Introduction
In this exercise we will explore a couple ways of debugging. They all have advantages and disadvantages, and which methods you decide to use while coding is personal preference. Download this ipynb file and open it in vscode. 

## The task
We have been given a csv called 'test.csv' which has 4 columns that contain numerical values between 0 and 1. Our task is to find the summed squareroots for each row. This means for each row we take the squareroot of each value and sum those values together. To have a quick look at how our dataset is formatted, lets have a look at the first 10 rows:

In [2]:
import pandas as pd
x = pd.read_csv('test.csv',header=None)

x.head(10)

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


Unnamed: 0,0,1,2,3
0,0.548814,0.715189,0.602763,0.544883
1,0.423655,0.645894,0.437587,0.891773
2,0.963663,0.383442,0.791725,0.528895
3,0.568045,0.925597,0.071036,0.087129
4,0.020218,0.83262,0.778157,0.870012
5,0.978618,0.799159,0.461479,0.780529
6,0.118274,0.639921,0.143353,0.944669
7,0.521848,0.414662,0.264556,0.774234
8,0.45615,0.568434,0.01879,0.617635
9,0.612096,0.616934,0.943748,0.68182


As we expected we have 4 columns and all the values seem to be numbers between 0 and 1. Now we can write our code to make a list, y, of all the summed squareroots.

In [8]:
import math
y= []
for index, row in x.iterrows():
    summed_sqrts = 0
    for value in row:
        summed_sqrts += math.sqrt(value)
    y.append(summed_sqrts)

ValueError: math domain error

Oh no! We got an error:

```
ValueError                                Traceback (most recent call last)
Cell In[8], line 6
      4 summed_sqrts = 0
      5 for value in row:
----> 6     summed_sqrts += math.sqrt(value)
      7 y.append(summed_sqrts)

ValueError: math domain error
```
The challenge now is to understand why we are getting this error. Our first hint is the error message which we can try to understand and figure out the error from there. In this case, the error message is telling us that on line 6 we are trying to input an invalid value into math.sqrt(). 

Lets try to find out exactly what value in our csv is causing this error and why. To do this we will go over a few methods

## Print debugging:
One of the most common methods of debugging is the use of print statements. By inserting print statements into your code you can see what variables are at various points in your code. In this case we can print the value variable just before we squareroot it. This way, the last printed value before our error is the one that is causing the error.

In [9]:
y= []
for index, row in x.iterrows():
    summed_sqrts = 0
    for value in row:
        print(value)
        summed_sqrts += math.sqrt(value)
    y.append(summed_sqrts)

0.5488135039273248
0.7151893663724195
0.6027633760716439
0.5448831829968969
0.4236547993389047
0.6458941130666561
0.43758721126269245
0.8917730007820797
0.9636627605010292
0.3834415188257777
0.7917250380826646
0.5288949197529045
0.5680445610939323
0.9255966382926609
0.07103605819788694
0.08712929970154071
0.02021839744032572
0.8326198455479381
0.7781567509498506
0.870012148246819
0.978618342232764
0.7991585642167237
0.46147936225293185
0.7805291762864554
0.1182744258689332
0.6399210213275238
0.1433532874090464
0.944668917049584
0.5218483217500717
0.4146619399905236
0.26455561210462697
0.7742336894342167
0.4561503322165485
0.5684339488686485
0.01878980043635514
0.6176354970758771
0.6120957227224214
0.6169339968747569
0.943748078514624
0.6818202991034834
0.359507900573786
0.4370319537993415
0.6976311959272649
0.06022547162926983
0.6667667154456677
0.6706378696181594
0.21038256107384087
0.1289262976548533
0.31542835092418386
0.36371077094262255
0.5701967704178796
0.43860151346232035
0.988

ValueError: math domain error

We can see that we generated a very long list of values, most of which we aren't interested in. This can work, although can cause issues if we ended up printing for values that our computer can deal with. 

## VScode debugger
The VScode debugger is very helpful to watch how variables change as your code runs. It will stop running the code at breakpoints and allow you to look at variable values and step through the code line by line. Add a breakpoint at the commented text below and debug the cell. Step through your code line by line. Watch how y, value, summed_sqrts and index all change as you run through your code. 

If you want a breakpoint at any errors that are raised, make sure you tick the 'Raised Exceptions' and 'Uncaught Exceptions' breakpoint options. If we do this, we can see that value is a negative number when the exception is raised. 

In [10]:
y= []
# Insert breakpoint here, then run and debug and step through
for index, row in x.iterrows():
    summed_sqrts = 0
    for value in row:
        summed_sqrts += math.sqrt(value)
    y.append(summed_sqrts)

ValueError: math domain error

## PDB - python debugger
PDB is a python module made for debugging python code. It has similar functionality to the VSdebugger, although is actually coded into your python script. The post_mortem() function will allow you to check your variables when an exception is raised. In practice this will open a text box that you can enter commands into. Run the code below and type either "print(value)" or "p value" to print the value that caused the error. Type 'exit' to get out of the pdb. 

Note in this version of the code I have added a variable called col_num and renamed index to row_num. This means that we can also print the row and column of the value causing the error. 

In [7]:
y= []
for row_num, row in x.iterrows():
    summed_sqrts = 0
    for col_num, value in enumerate(row):
        try:
            summed_sqrts += math.sqrt(value)
        except:
            import pdb; pdb.post_mortem()
    y.append(summed_sqrts)

> [0;32m/tmp/ipykernel_12806/3234858595.py[0m(6)[0;36m<module>[0;34m()[0m
[0;32m      4 [0;31m    [0;32mfor[0m [0mcol_num[0m[0;34m,[0m [0mvalue[0m [0;32min[0m [0menumerate[0m[0;34m([0m[0mrow[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      5 [0;31m        [0;32mtry[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 6 [0;31m            [0msummed_sqrts[0m [0;34m+=[0m [0mmath[0m[0;34m.[0m[0msqrt[0m[0;34m([0m[0mvalue[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      7 [0;31m        [0;32mexcept[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      8 [0;31m            [0;32mimport[0m [0mpdb[0m[0;34m;[0m [0mpdb[0m[0;34m.[0m[0mpost_mortem[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
683
-0.3897864227871187
-0.3897864227871187
-0.3897864227871187
-0.3897864227871187
-0.3897864227871187
-0.3897864227871187
-0.3897864227871187
