<div align="center">
    <h1>DS-210: Programming for Data Science</h1>
    <h1>Lecture 34</h1>
</div>

# 1. Error handling in Rust
# 2. Algorithm design: dynamic programming 


# <font color="red">1. Error handling in Rust</font>
# 2. Algorithm design: dynamic programming 


## Error handling in Rust

Two basic options:

* terminate when an error occurs: macro `panic!(...)`

* pass information about an error: enum `Result<T,E>`

## Macro `panic!(...)`

* Use for unrecoverable errors
* Terminates the application

In [2]:
fn divide(a:u32, b:u32) -> u32 {
    if b == 0 {
        panic!("I'm sorry, Dave. I'm afraid I can't do that.");
    }
    a/b
}

In [3]:
divide(20,7)

2

In [4]:
divide(20,0)

thread '<unnamed>' panicked at 'I'm sorry, Dave. I'm afraid I can't do that.', src/lib.rs:4:9
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: _run_user_code_3
   3: evcxr::runtime::Runtime::run_loop
   4: evcxr::runtime::runtime_hook
   5: evcxr_jupyter::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


Error: Subprocess terminated with status: exit status: 101

## Enum `Result<T,E>`

```rust
enum Result<T,E> {
    Ok(T),
    Err(E),
}
```

Functions can use it to
* return a result
* or information about an encountered error

In [7]:
fn divide(a:u32, b:u32) -> Result<u32, String> {
    if b != 0 {
        Ok(a / b)
    } else {
        let str = format!("Division by zero {} {}", a, b);
        Err(str)
    }
}

In [8]:
divide(20,7)

Ok(2)

In [9]:
divide(20,0)

Err("Division by zero 20 0")

* Useful when the error best handled somewhere else
* **Example:** input/output subroutines in the standard library

## Common pattern: propagating errors

* We are interested in the positive outcome: `t` in `Ok(t)`
* But if an error occurs, we want to propagate it
* This can be handled using `match` statements

In [11]:
// compute a/b + c/d
fn calculate(a:u32, b:u32, c:u32, d:u32) -> Result<u32, String> {
    let first = match divide(a,b) {
        Ok(t) => t,
        Err(e) => return Err(e),
    };
    let second = match divide(c,d) {
        Ok(t) => t,
        Err(e) => return Err(e),
    };    
    Ok(first + second)
}


In [12]:
calculate(16,4,18,3)

Ok(10)

In [13]:
calculate(16,0,18,3)

Err("Division by zero 16 0")

## The question mark shortcut

* Place `?` after an expression that returns `Result<T,E>`

* This will:
  * give the content of `Ok(t)`
  * or return `Err(e)` from the encompassing function

In [14]:
// compute a/b + c/d
fn calculate(a:u32, b:u32, c:u32, d:u32) -> Result<u32, String> {
    Ok(divide(a,b)? + divide(c,d)?)
}

In [15]:
calculate(16,4,18,3)

Ok(10)

In [16]:
calculate(16,0,18,3)

Err("Division by zero 16 0")

### Error handling summary

* In some languages we have the pattern try/catch or throw/catch or try/except (C++, Java, Javascript, Python).
* Rust does not have something equivalent

The Rust pattern for error handling is the following:
```
    let do_steps = || -> Result<(), MyError> {
        do_step_1()?;
        do_step_2()?;
        do_step_3()?;
        Ok(())
    };

    if let Err(_err) = do_steps() {
        println!("Failed to perform necessary steps");
    }
```

* Create a closure with the code you want to guard.  Use the ? shorthand inside the closure for anything that can return an Error.  Use a match or if let statement to catch the error.

# 1. Error handling in Rust
# <font color="red">2. Algorithm design: dynamic programming</font>


## Big picture: rest of this lecture and next

Review a few approaches to algorithm design:

* dynamic programming

* greedy approach

* divide and conquer

## Homework 9: Best decision tree for a classification problem

**Input:** set of $n$ labelled points $(x_i, z_i)$, where $x_i \in \mathbb R$ and $z_i \in \{0,1\}$

**Goal:** find decision tree with $L$ leaves and highest accuracy on the input set

## Homework 9 restriction: $L=2$

<div align="center">
    <b>How to solve it?</b>
</div>

**Two–leaf decision tree:** if $x < T$, output $\alpha$, else output $(1-\alpha)$

**Two parameters:** $T$ and $\alpha$
  * suffices to try $T = x_i$ for all $x_i$'s and $\alpha \in \{0,1\}$
  * at most $2n$ options

**Algorithms:**
* **Simple:** evaluate accuracy for each $T$ and $\alpha$ ${}\Rightarrow{}$ $O(n^2)$ time
* **More sophisticated:** sort points, move the threshold for each $\alpha$ updating accuracies ${}\Rightarrow{}$ $O(n \log n)$ time

## General $L$

**How do decision trees with at most $L$ leaves partition the line?**

* at most $L$ line segments: prediction fixed to $0$ or $1$ for each

* $\binom{n}{L-1} = O\left(n^{L-1}\right)$ thresholds configurations to consider

* test each: $O\left(n^L\right)$–time algorithm

<div align="center">
    <h2>Our goal: much faster algorithm</h2>
</div>

## Define subproblems

**Simplifying assumption:** $x_1 < x_2 < \ldots < x_n$

$M[l,k] ={}$the minimum number of mistakes, when classifying the first $k$ points, using $l$ ranges
 * $l \in \{1,\ldots,L\}$
 * $k \in \{1,\ldots,n\}$

$M[L,n]$ will give the best accuracy

## How to compute $M[l,k]$?
$M[l,k] ={}$the minimum number of mistakes, when classifying the first $k$ points, using $l$ ranges
 * $l \in \{1,\ldots,L\}$
 * $k \in \{1,\ldots,n\}$

### One label predictions on $\{x_k: i \le k \le j\}$

* Define $S[i,j] = {}$number of mispredictions for one label classifiers on this set which is all the numbers between locations $i$ and $j$.

* $S[i,j]$ minimum of the numbers of $0$ and $1$ labels on this set





### Compute $M[1,k]$ for all $k$
  * $M[1,k] \leftarrow S[1,k]$
  * $O(n)$ time overall

## How to compute $M[l,k]$?
$M[l,k] ={}$the minimum number of mistakes, when classifying the first $k$ points, using at most $l$ ranges
 * $l \in \{1,\ldots,L\}$
 * $k \in \{1,\ldots,n\}$

$S[i,j] = {}$the minimum number of mistakes, when classifying points $\{x_k: i \le k \le j\}$ with one range

### Compute $M[l,k]$ for $l\ge 2$ and  all $k$

  $$For\ k\ in\ (1,\ldots,n)\ \ M[2,k] \leftarrow \min_{i=\{1,\ldots,k\}}\left(M[1,i] + S[i+1,k]\right)$$
  $$\vdots$$
  $$\vdots$$
  $$For\ k\ in\ (1,\ldots,n)\ \ M[l,k] \leftarrow \min_{i=\{1,\ldots,k\}}\left(M[l-1,i] + S[i+1,k]\right)$$


## Time complexity?

* Computing $S[i,j]$ for all $i$ and $j$: $O(n^2)$

* Computing $M[l+1,i]$ for all $i$ from $M[l,i]$: $O(n^2)$

* Total running time: $O(L) \cdot O(n^2) = O(Ln^2)$

* Much better than the more straightforward $O(n^L)$

## Reconstructing the solution

* This gives us $M[L,n]={}$the minimum number of mistakes overall
* How to get the best solution, not just the best cost?

Iteratively:
* Start from $M[L,n]$
* Find $i$ the best $M[L-1,i] + S[i + 1, n]$
* Label $\{x_{i+1},\ldots,x_n\}$ with the better of $0$ and $1$
* Continue with $M[L-1,i]$
* ...

## Dynamic programming in general

* Define a small number of subproblems that are
  * sufficient to solve the general problem
  * helpful to solve each other

**The most classic example:** edit distance
  * minimum number of edits to turn one string into another
  * edits: deletions, insertions, substitutions
  * correcting spelling mistakes: how far are two words?

**Can you solve it?**

https://www.geeksforgeeks.org/edit-distance-dp-5/

In [18]:
fn edit_distance(str1: &Vec<char>, str2: &Vec<char>) -> usize {
    let m: usize = str1.len();
    let n: usize = str2.len();

    let mut dp: Vec<Vec<usize>> = vec![vec![0; n + 1]; m + 1];
    for i in 0..=m {
        for j in 0..=n {
            if i == 0 {
                dp[i][j] = j;
            } else if j == 0 {
                dp[i][j] = i;
            } else if str1[i - 1] == str2[j - 1] {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                let x = std::cmp::min(dp[i][j - 1], dp[i - 1][j]);
                let y = std::cmp::min(x, dp[i - 1][j - 1]);
                dp[i][j] = 1 + y;
            }
        }
    }

    return dp[m][n];
}

let str1 = "THis is a cat";
let str2 = "This is a corgie";
let v1: Vec<char> = str1.chars().collect();
let v2: Vec<char> = str2.chars().collect();
let dist = edit_distance(&v1, &v2);
println!("Distance between {} and {} is {}", str1, str2, dist);


Distance between THis is a cat and This is a corgie is 6
