<div align="center">
    <h1>DS-210: Programming for Data Science</h1>
    <h1>Lecture 31</h1>
</div>

# 1. Strings: `String` and `&str`
# 2. Lifetimes

# <font color="red">1. Strings: `String` and `&str`</font>
# 2. Lifetimes

## Rust and strings

* We have avoided this topic so far


* It's complicated


* Unicode is complicated


* Advantages: internationalization and emojis out of the box

* **Rust:** Unicode strings are a first–class citizen


* **Classical programming languages:**
  * ASCII strings are the default
  * Easier to manage
  * Additional libraries needed to deal with Unicode


## Reminder: Single characters (Unicode scalar values)

* Type: `char`
* Size: 4 bytes
* Note the single quotes!

In [2]:
let a : char = 'a';
let b = '🦕';

Dinosaurs:<br>
&nbsp;&nbsp;&nbsp;🦕 (U+1F995)<br>
&nbsp;&nbsp;&nbsp;🦖 (U+1F996)

In [3]:
// Mayan numeral (not all unicode characters are supported everywhere)
let c = '𝋥';

In [4]:
std::mem::size_of_val(&a)

4

In [5]:
std::mem::size_of_val(&b)

4

## String literals

* String literal${}={}$when you create a string `"like this"`
* Note the double quotes
* What type are they?

In [6]:
let sample = "Hello, DS210!";

In [7]:
let sample: String = "Hello, DS210!";

Error: mismatched types

In [8]:
let sample: &str = "Hello, DS210!";

`&str` is a **string slice**, internally behaves like `&[u8]`

## Encoding of characters

`a` and `🦕` were both 4 bytes

In [9]:
std::mem::size_of_val("a")

1

In [10]:
std::mem::size_of_val("🦕")

4

Characters need 1–4 bytes to be encoded.

In [11]:
let dinos = "🦕🦖";
std::mem::size_of_val(dinos)

8

In [5]:
let mixed = "a🦖b🦕";
std::mem::size_of_val(mixed)

10

In [6]:
// Iterating through characters
for (i, c) in mixed.chars().enumerate() {
    println!("{} {} {}", i, c, std::mem::size_of_val(&c));
}


0 a 4
1 🦖 4
2 b 4
3 🦕 4


()

Can select substrings, but they must be aligned with actual characters (or runtime error)

In [12]:
dinos[0..1]

thread '<unnamed>' panicked at 'byte index 1 is not a char boundary; it is inside '🦕' (bytes 0..4) of `🦕🦖`', src/lib.rs:130:40
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
   2: core::str::slice_error_fail
   3: run_user_code_10
   4: evcxr::runtime::Runtime::run_loop
   5: evcxr::runtime::runtime_hook
   6: evcxr_jupyter::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Segmentation fault.
   0: evcxr::runtime::Runtime::install_crash_handlers::segfault_handler
   1: <unknown>
   2: mi_free
   3: alloc::alloc::dealloc
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/alloc.rs:105:14
      <alloc::alloc::Global as core::alloc::Allocator>::deallocate
             at /rustc/9d1b2106e23b1ab

Error: Child process terminated with status: signal: 6 (core dumped)

In [13]:
let dinos = "🦕🦖";
dinos[0..4]

"🦕"

In [14]:
let sample = "Hello, world!";
sample[7..]

"world!"

## Strings

* String type is dynamic: `Vec<u8>` internally
* Can add characters and strings to the end

In [15]:
let mut sample = String::new();

//append string
sample.push_str("abc");
sample

"abc"

In [16]:
// append character
sample.push('d');
sample

"abcd"

## Converting literals to type `String`

Use `.to_string()` or `String::from(...)`

In [17]:
let string_1 = "This is a test".to_string();
let string_2 = String::from("This is a test");
string_1 == string_2

true

Can also use macro `format!(...)`:
  * same syntax as `println!(...)`
  * produces an object of type `String`

In [18]:
let sample: String = format!("{} == {}",string_1,string_2);
sample

"This is a test == This is a test"

## String concatenation via `+`

* Takes ownership of the first parameter
* Second parameter: `&str`

In [11]:
let string_1 = "abc".to_string();
let string_2 = "def".to_string();

In [13]:
string_1 + &string_2

"abcdef"

Why `+` takes ownership of `string_1`:
 * reason: efficiency
 * no need to copy the content of the first string (unless the container size has to be increased)

## Writing generic code
* Use string slices &str if possible
* This will work with `String` and `&str`

In [21]:
fn show(message: &str) {
    println!("{}",message);
}

In [22]:
// automatic conversion to &str from &String
let mut my_string = String::from("ds210");
show(&my_string);
show("ds210");

ds210
ds210


# 1. Strings: `String` and `&str`
# <font color="red">2. Lifetimes</font>

## Lifetimes

* How long your reference is valid
* Important when sharing references
  * Example: via function output

**Challenge:** return the reference to the greater of two integers 

In [23]:
fn ref_to_max(x:&mut i32, y:&mut i32) -> &mut i32 {
    if *x >= *y {
        x
    } else {
        y
    }
}

Error: missing lifetime specifier

## Specifying lifetimes

<code>'t</code> specifies how long a reference lives (`t` is some string) 

 * immutable example: <code>&'t i32</code>
 * mutable example: <code>&'t mut i32</code>

In [24]:
fn ref_to_max<'a>(x:&'a mut i32, y:&'a mut i32) -> &'a mut i32 {
    if *x >= *y {
        x
    } else {
        y
    }
}

In [25]:
let mut x = 13;
let mut y = 3;
{
    println!("{} {}",x,y);
    *ref_to_max(&mut x, & mut y) = 5;
    println!("{} {}",x,y);
    *ref_to_max(&mut x, & mut y) = 1;
    println!("{} {}",x,y);
    *ref_to_max(&mut x, & mut y) = 0;
    println!("{} {}",x,y);
};

13 3
5 3
1 3
1 0


## Applying this function

* Different references may have different lifetimes
* Rust will automatically select the shortest 

In [26]:
let mut x = 1;
let mut y = 10;
{
    let ref1 = &mut y;
    {
        let ref2 = &mut x;
        *ref_to_max(ref1, ref2) = 3;
    }
    *ref1 *= -1;
};
(x,y)

(1, -3)

## Multiple lifetimes possible

In [27]:
fn multiple<'a, 'b>(x:&'a str, y:&'b str) -> (&'a str,&'b str) {
    (x,y)
}
multiple("abc","def")

("abc", "def")

## String literals are forever

* Memory for them assigned in the code

* Their references do not expire

* Can be specified by `'static`

In [28]:
let example: &'static str = "abc";

# The 3 rules of lifetimes

* The compiler assigns a lifetime parameter to each parameter that’s a reference. A function with one parameter gets one lifetime parameter: fn foo<'a>(x: &'a i32); a function with two parameters gets two separate lifetime parameters: fn foo<'a, 'b>(x: &'a i32, y: &'b i32); and so on.

* If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters: fn foo<'a>(x: &'a i32) -> &'a i32.

* If there are multiple input lifetime parameters, but one of them is &self or &mut self because this is a method, the lifetime of self is assigned to all output lifetime parameters. 

All functions `get_shorter` below equivalent

In [2]:
struct TwoStrings{
    a: String,
    b: String,
}

In [3]:
fn get_shorter_1(ts:&TwoStrings) -> &str {
    if ts.a.len() < ts.b.len() {
        &ts.a
    } else {
        &ts.b
    }
}

In [4]:
fn get_shorter_2<'a>(ts:&'a TwoStrings) -> &'a str {
    if ts.a.len() < ts.b.len() {
        &ts.a
    } else {
        &ts.b
    }
}

In [5]:
fn get_shorter_3<'a>(ts:&'a TwoStrings) -> &str {
    if ts.a.len() < ts.b.len() {
        &ts.a
    } else {
        &ts.b
    }
}

In [6]:
let two_strings: TwoStrings = TwoStrings {a:"abc".to_string(), b:"defg".to_string()};
println!("1 {}", get_shorter_1(&two_strings));
println!("2 {}", get_shorter_2(&two_strings));
println!("3 {}", get_shorter_3(&two_strings));


1 abc
2 abc
3 abc


# What about the 3rd rule
**Example 2:** one of the lifetime parameters is `&self` or `&mut self` => its lifetime used as the lifetime of output

Methods `get_longer` below equivalent

In [7]:
impl TwoStrings {
    fn get_longer_1(&self, _unused:&TwoStrings) -> &str {
        if self.a.len() < self.b.len() {
            &self.a
        } else {
            &self.b
        }
    }

    fn get_longer_2<'a, 'b>(&'a self, _unused:&'b TwoStrings) -> &'a str {
        if self.a.len() < self.b.len() {
            &self.a
        } else {
            &self.b
        }
    }
}

In [8]:
impl TwoStrings {
    fn get_longer_3(&self, unused:&TwoStrings) -> &str {
        if self.a.len() < self.b.len() {
            &unused.a
        } else {
            &unused.b
        }
    }
    
}

Error: lifetime may not live long enough

In [9]:
impl TwoStrings {
    fn get_longer_3<'a, 'b>(&'a self, unused:&'b TwoStrings) -> &'b str {
        if self.a.len() < self.b.len() {
            &unused.a
        } else {
            &unused.b
        }
    }
    
}

In [12]:
let other_strings: TwoStrings = TwoStrings {a:"foobar".to_string(), b:"barfoo".to_string()};

println!("1 {} ", two_strings.get_longer_1(&other_strings));
println!("2 {} ", two_strings.get_longer_2(&other_strings));
println!("3 {} ", two_strings.get_longer_3(&other_strings));

1 abc 
2 abc 
3 foobar 


## Read section 10.3 for lifetimes