<div align="center">
    <h1>DS-210: Programming for Data Science</h1>
    <h1>Lecture 23</h1>
</div>

# 1. Memory management in vectors
# 2. Hash maps

# <font color="red">1. Memory management in vectors</font>
# 2. Hash maps

## Last time: vectors `Vec<T>`

* Dynamic-length array/list 
* Allowed operations:
  * access item at specific location
  * `push`: add something to the end
  * `pop`: remove an element from the end



* Python: list
* C++: `vector<T>`
* Java: `ArrayList<T>` / `Vector<T>`


<div align="center">
    <h3>How to implement this efficiently?</h3>
</div>

## Select implementation details

### Challenges

* Size changes: allocate on the heap?
* What to do if a new element added?
  * Allocate a larger array and copy everything? 
  * Linked list?

### Solution

* Allocate more space than needed!
* When out of space:
  * Increase storage size by, say, 100%
  * Copy everything

### Under the hood
Variable of type `Vec<T>` contains:
* pointer to allocated memory
* size: the current number of items
* capacity: how many items could currently fit

**Important:** size${}\le{}$capacity

## Example

Method `capacity()` reports the current storage size

In [13]:
// print out the current size and capacity
fn info<T>(vector:&Vec<T>) {
    println!("size = {}, capacity = {}",vector.len(),vector.capacity());
}

In [14]:
let mut v = Vec::with_capacity(7);
let mut capacity = v.capacity();
info(&v);
for i in 1..=1000 {
    v.push(i);
    if v.capacity() != capacity {
        capacity = v.capacity();
        info(&v);
    }
};

size = 0, capacity = 7
size = 8, capacity = 14
size = 15, capacity = 28
size = 29, capacity = 56
size = 57, capacity = 112
size = 113, capacity = 224
size = 225, capacity = 448
size = 449, capacity = 896
size = 897, capacity = 1792


In [15]:
info(&v);
while let Some(_) = v.pop() {}
info(&v);

size = 1000, capacity = 1792
size = 0, capacity = 1792


## Example (continued)

In [17]:
// shrinking the size manually
info(&v);

for i in 1..=13 {
    v.push(i);
}

info(&v);

v.shrink_to_fit();

info(&v);
// note: size and capacity not guaranteed
//       to be the same

size = 0, capacity = 1792
size = 13, capacity = 1792
size = 13, capacity = 13


In [18]:
// creating vector with specific capacity
let mut v2 : Vec<i32> = Vec::with_capacity(1234);
info(&v2);

// avoids reallocation if you know how many items
// to expect

size = 0, capacity = 1234


In [19]:
// Does not remove from the vector
println!("{:?} {:?}", v.get(1), v);
// But this one does
println!("{:?} {:?}", v.pop(), v);

Some(2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
Some(13) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]


### Some other useful functions
* `append` Add vector at the end of another `vec.append(&mut vec2)
* `clear` Remove all elements from the vector `vec.clear`
* `dedup` Remove consecutive identical elements `vec.dedup()`
* `drain` Remove a slice from the vector `vec.drain(2..4)`
* `remove` Remove an element from the vector `vec.remove(2)`
* `sort` Sort the elements of a mutable vector `vec.sort()`
* Complete list at https://doc.rust-lang.org/std/vec/struct.Vec.html

## Sketch of analysis: Amortization

* Inserting an element not constant time, $O(1)$

### However
* **Assumption:** allocating memory size $t$ takes $O(t)$ or $O(1)$ time


* **Slow operations:** <font color="red">$O($current_size$)$ time</font>
* **Fast operations:** <font color="green">$O(1)$ time</font>


* Slow operation every $\Omega($current_size$)$ fast operations

* **On average:** $O(1)$ time
* Fast operations pay for slow operations


* **Terminology:** $O(1)$ *amortized* time


### Shrinking?

* Can be implemented this way too
* Example: shrink by 50% if less than 25% used
* Most implementations don't shrink automatically

## Digression (Sorting Vectors in Rust)

In [26]:
// This works great
let mut a = vec![1, 4, 3, 6, 8, 12, 5];
a.sort();
println!("{:?}", a);

[1, 3, 4, 5, 6, 8, 12]


In [27]:
// But the compiler does not like this one, since sort depends on total order
let mut a = vec![1.0, 4.0, 3.0, 6.0, 8.0, 12.0, 5.0];
a.sort();
println!("{:?}", a);

Error: the trait bound `{float}: Ord` is not satisfied

In [28]:
// This is ok since we don't use sort, sort_by depends on the function you pass in to compute order
let mut a = vec![1.0, 4.0, 3.0, 6.0, 8.0, 12.0, 5.0];
a.sort_by(|x, y| x.partial_cmp(y).unwrap());
println!("{:?}", a);

[1.0, 3.0, 4.0, 5.0, 6.0, 8.0, 12.0]


In [29]:
// When partial order is not well defined in the inputs you get a panic
let mut a = vec![1.0, 4.0, 3.0, 6.0, 8.0, 12.0, 5.0];
let mut x: f64 = -1.0;
x = x.sqrt();
a.push(x);
println!("{:?}", a);
a.sort_by(|x, y| x.partial_cmp(y).unwrap());
println!("{:?}", a);

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/lib.rs:125:35
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   3: alloc::slice::insert_head
   4: alloc::slice::merge_sort
   5: _run_user_code_20


[1.0, 4.0, 3.0, 6.0, 8.0, 12.0, 5.0, NaN]


   6: evcxr::runtime::Runtime::run_loop
   7: evcxr::runtime::runtime_hook
   8: evcxr_jupyter::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


Error: Subprocess terminated with status: exit status: 101

# 1. Memory management in vectors
# <font color="red">2. Hash maps</font>

## Collection `HashMap<K,V>`

**Goal:** a mapping from elements of `K` to elements of `V`

* elements of `K` called *keys*

* elements of `V` called *values*


* Python: dictionaries
* C++: `unordered_map<K,V>`
* Java: `Hashtable<K,T>`


In [3]:
// creating a hash map and inserting pair

use std::collections::HashMap;

// number of wins in a local Conterstrike league
let mut wins = HashMap::<String,u16>::new();

wins.insert(String::from("Boston University"),24);
wins.insert(String::from("Harvard"),22);
wins.insert(String::from("Boston College"),20);
wins.insert(String::from("Northeastern"),32);

Extracting a reference: returns `Option<&V>`

In [4]:
wins.get("Boston University")

Some(24)

In [5]:
wins.get("MIT")

None

Insert if not present:

In [6]:
wins.entry(String::from("MIT")).or_insert(10);
wins.get("MIT")

Some(10)

Updating:

In [7]:
{ // block to limit how long the reference lasts
    let entry = wins.entry(String::from("Boston University")).or_insert(10);
    *entry = 50;
}
wins.insert(String::from("Boston University"),24);
wins.get("Boston University")

Some(24)

## Iterating

In [8]:
for (k,v) in &wins {
    println!("{}: {}",k,v);
};

for (k,v) in wins.iter() {
    println!("Iter {}: {}",k,v);
};


Northeastern: 32
Boston College: 20
MIT: 10
Boston University: 24
Harvard: 22
Iter Northeastern: 32
Iter Boston College: 20
Iter MIT: 10
Iter Boston University: 24
Iter Harvard: 22


In [9]:
for (k,v) in &mut wins {
    *v += 1;
};

for (k,v) in &wins {
    println!("{}: {}",k,v);
};

for (k,v) in wins.iter_mut() {
    *v += 1;
};

for (k,v) in wins.iter() {
    println!("Mut iter {}: {}",k,v);
};



Northeastern: 33
Boston College: 21
MIT: 11
Boston University: 25
Harvard: 23
Mut iterNortheastern: 34
Mut iterBoston College: 22
Mut iterMIT: 12
Mut iterBoston University: 26
Mut iterHarvard: 24


### Using HashMaps with Match statements

In [10]:
use std::collections::HashMap;

let mut crispy_crêpes_café = HashMap::new();
crispy_crêpes_café.insert(String::from("Nutella Crêpe"),5.85);
crispy_crêpes_café.insert(String::from("Strawberries and Nutella Crêpe"),8.75);
crispy_crêpes_café.insert(String::from("Roma Tomato, Pesto and Spinach Crêpe"),8.90);
crispy_crêpes_café.insert(String::from("Three Mashroom Crêpe"),8.90);

fn on_the_menu(cafe: &HashMap<String,f64>, s:String) {
    print!("{}: ",s);
    match cafe.get(&s) {
        None => println!("not on the menu"),
        Some(price) => println!("${:.2}",price),
    }
}
on_the_menu(&crispy_crêpes_café, String::from("Four Mashroom Crêpe"));
on_the_menu(&crispy_crêpes_café, String::from("Three Mashroom Crêpe"));



Four Mashroom Crêpe: not on the menu
Three Mashroom Crêpe: $8.90


## Storage

* Array representing $B$ buckets
* *Hash function* $h: K \rightarrow \{0,1,\ldots,B-1\}$
  * maps keys in the collection to buckets

### General ideas
  * Store keys (and associated values) in buckets
  * Searching: go over the entire bucket

### Collision: two keys mapped to the same bucket  
  * Make hash function $h$ very random $\Rightarrow$ few collisions
  * What to do if two keys in the same bucket



## Handling collisions

### Chaining

* Keep collection for items in the same bucket
  * (traditional:) linked list
  * vector
* Search through the collection to find key
<br><br><br>

### Open addressing (simplest version)

* Each array entry: $($key$,$value$)$

**Inserting:**
  * entry $h(k)$ busy: try $h(k) + 1$, $h(k) + 2$, etc. 
  * insert into first empty


**Searching:**
  * try $h(k)$, $h(k) + 1$, $h(k)+2$, etc.
  * stop when found or empty entry

## Growing collection: amortization

Example: if number of keys${}\ge 0.75 B$
* Double $B$
* Pick new hash function
* Move the information

## Adversarial data

* Could create lots of collisions

* Potential basis for *denial of service attacks*

### What makes a good hash function?

* Uniform distribution of inputs to the buckets available!!!
* Consistent hashing adds the property that not too many things move around when the number of buckets changes

http://www.partow.net/programming/hashfunctions/index.html  
https://en.wikipedia.org/wiki/Consistent_hashing

## Next time

* Typical graph representations

## Read sections 8.1 and 8.3 from the Rust Book.