# Intro to Debugging
## Facts on Debugging
* 50% of the budget on software projects is spent on testing and debugging and may reach 75%

* Software bugs in 2002 cost 59.5 billions dollars and year but improvements in testing and debuggings can reduce the cost by a third to 22 billion a year.

The the worst thing about debuggings is that it is a search process that can take any length from a few minutes, hours or days. If you dont know how long it may take its best to use a systematic process. 

### Where does the term bug come from?
On September 9th 1947 a moth got stuck in a relay and got carbonized and caused a short circuit and cause the machine to break. Technicians receive mouth from the relay and this is the first bug found on a computer. 

## Remove HTML Markup
![](img/remove_html.PNG)
We are tasked the removing HTML Tags from an input. Below we have an example of an input and output. We also have a picture of a finite state machine that will represents how our code will work. We can be in 2 states either the <code>tag</code> or <code>not tag</code> mode. We can only add a character if we are not in tag mode and the character is a not <code><</code> else we would be moved to tag mode. In the other case where were are in tag mode if the character is <code>></code> we keave to no tag mode but in all other cases we stay in tag mode. Now let see if we translate this to code.

In [2]:
def remove_html_markup(s):
    tag =False
    out = ""
    for c in s:
        if c == '<':
            tag = True
        elif c == '>':
            tag = False
        elif not tag:
            out = out + c
    return out
print(remove_html_markup("<b>foo</b>"))

foo


Our first example seem to work exactly as we expected but now we should check a few more examples.

In [3]:
print(remove_html_markup("""<a href="foo.html">foo</a>"""))
print(remove_html_markup("""<a href=">">foo</a>"""))
print(remove_html_markup("""<a href="">foo</a>"""))

foo
"foo
foo


Look like we have a bug in our code! Looks like it seems that we need to account for <code>></code> inside our html tag. Looks like we can ignore brackets inside our quotation marks. Let add a third state to our state machine.

![](img/first_bug.PNG)

In [9]:
def remove_html_markup(s):
    tag = False
    quote = False
    out = ""
    
    for c in s:
        if c == '<' and not quote:
            tag = True
        elif c == '>'and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c
    return out
print(remove_html_markup("""<a href="foo.html">foo</a>"""))
print(remove_html_markup("""<a href=">">foo</a>"""))
print(remove_html_markup("""<a href="">foo</a>"""))

foo
foo
foo


Look like we are now passing all of our test. Lets if our code has any other bugs.

In [5]:
print(remove_html_markup('"<b>foo</b>"'))

<b>foo</b>


Looks like their is still a bug in our code from above. The markup seems to still exist in our test. The first thing we would may want to do is to print out everything so we can see what is happening in our code. Printing out our could cause a security nightmare, in Mac OS version 10.7.2/10.7.3 there was a security issue because a programer had left debugging  print statements in the code. This resulted in you as a user let you in or not let you in and store your log of your passwords entered. Before we go any further lets review the devil's guide to debugging, which tell us how not to go about debugging our code.

## The Devil's Guide to Debugging
* Scatter output statements everywhere
* Debug the program into existence
* Never back up earlier versions
* Don't bother understanding what the program should do
* Use the most obvious fix
Failures are errors, a defect is an error in the code that maybe result in a failure. A defect may also be referred to as a bug. Every infection can be traced back to a defect that caused it.

## The Scientific Method
Say you are Isacc Newton and you sit under a tree, and an apple falls. If apples keeps falling, you may want to try to see if bricks fall down. It seems cups also falls down. You may think that everything falls down. Later on someone says a balloon does up. 

* Inital observation
* Hypothesis
* Prediction
* Experiment
* Observation
* Support-refine, Reject-create new

Repeat these steps until your hypothesis is consistent with your observation.

Lets try this with our code. We would like to match the results from the table below.

|input|expected|output|
|---|---|---|
|`<b>foo</b>`| foo  | foo  |   
|`"<b>foo</b>"`   |"foo"   | `<b>foo</b>`   |   

What do we think is happening here?

Two hypotheses that are consistent with our observations so far are 
1. Double quotes are stripped from tagged input
2. The tag `<b>` is always stripped from the input


Let try out first hypothese that double quotes are stripped from tagged input with 3 test cases below


In [10]:
print(remove_html_markup('"foo"'))
print(remove_html_markup('"<bar"'))
print(remove_html_markup(""))

foo
<bar



These results agree with our first hypothesis seem to be that double quotation marks are stripped from our input. The only time quotes are handled are in the line <code>elif c == '"' or c == "'":</code>. We think the error is due to tag and we can check this by setting an asset condition. The code below adds <code>assert not tag</code> to throw an exception if tag is ever set to true.



In [11]:
def remove_html_markup(s):
    tag = False
    quote = False
    out = ""
    
    for c in s:
        assert not tag
        if c == '<' and not quote:
            tag = True
        elif c == '>'and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c
    return out

After inseting assert not tag we have two possible outcomes, the program raises an exception or the outputs stay the same meaning tag is never set.

In [12]:
print(remove_html_markup('"foo"'))
print(remove_html_markup('"<bar"'))
print(remove_html_markup(""))

foo
<bar



The results seem to be the same meaning that the tag is never set. We may come up with a new hypthese that the expression <code>c == '"' or c == "'" and tag:</code> never evaluates as true. To check this we add our assert right after.

In [17]:
def remove_html_markup(s):
    tag = False
    quote = False
    out = ""
    
    for c in s:
        assert not tag
        if c == '<' and not quote:
            tag = True
        elif c == '>'and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            assert False
            quote = not quote
        elif not tag:
            out = out + c
    return out

The two possible outcomes of our program would be that the program raises an exception or output stays the same.

In [18]:
print(remove_html_markup('"foo"'))
print(remove_html_markup('"<bar"'))
print(remove_html_markup(""))

AssertionError: 

How we know that the codition evaluates to true. Know that double quote are striped from before but now lets try to see if single quotes are also stripped.

In [19]:
print(remove_html_markup("'foo'"))

'foo'


So it seems that the single quotes do not get removed. What we know now is that <code>c == '"' or c == "'" and tag:</code> become True when c == '"' and False when c == "'". This gives us a big hint we know that our statment is not evaluating to what we would like. It seems that even when tag is false the expression can still evaluate as true. This is likely due to the order of our operations not being explicite. We will should eval the or statement before the and like the following, <code>(c == '"' or c == "'") and tag:</code>