# Strings

Strings are just character arrays.

You should know: 
- how they're represented in memory
- basic operations:
    - comparison, copying, joining, splitting, matching
    
Advanced string processing algorithms often use **hash tables** and dynamic programming. 

Also shown here:
- How to use `functools.reduce` for running sum and use **initializer argument**.

### 0. String basics

Strings are immutable. `s=s[1:]` and `s+='123'` create a new array of character that is then assigned back to `s`.


A little warm up. Check is string is palindrome with $O(n)$ time and $O(1)$ space.

Like in the **Primitives** notebook, this function takes advantage of complement bitwise operator `~`.

Recall:
> Returns the complement of x - the number you get by switching each 1 for a 0 and each 0 for a 1. This is the same as -x - 1. 

So, $\sim(x-1)==-1*x$ and $\sim x==-1*(x+1)$. This second fact can be used in palindrome check. 

In [1]:
# little warm up
def is_pdrome(s: str) -> bool:
    # s[~i] for i in [0,len(s)] is s[-(i + 1)]
    return all(s[i] == s[~i] for i in range(len(s) // 2))

is_pdrome('abba')

True

In [12]:
print(len('racecar') // 2)

s = 'racecar'
for i in range(len(s) // 2):
    print(i, s[i], s[~i])

# racecar is odd and you can see how integer division doing the work here.
# if palindrome is odd, then middle letter is shared and can be ignored.
# if even, all letters are compared in word in the check.

3
0 a a
1 b b


## 5.1 Interconvert strings and integers

Implement methods to take a string representing an integer and return the corresponding integer, and vice versa.

The code should handle negative integers and not use any library functions like `int`.

> Hint: Build the result one digit at a time.

Ingredients for int to str:
- The `ord()` function returns an integer representing the Unicode character.
- The `chr()` method returns a character (a string) from an integer (represents unicode code point of the character)

Ingredients for str to int:
- `string.digits` returns the string '0123456789'
- `s.index(2)` returns character at index 2 (s is a str).

Str to int takes advantage of knowing: A base-10 number $d_2d_1d_0$ encodes the number $10^2*d_2 + 10^1*d_1 + d_0$

The efficient way to compute $10^{i+1}$ is to use existing value $10^i$ and multiply that by 10.

The elegant solution below is to begin from the leftmost digit and with each succeeding digit, multiply the partial result by 10 and add that digit.

For example:
> for 314, initialize result as 0, the first iter r=3, in the second r=3X10 + 1 = 31, then in last iter, r = 31X10 + 4 = 314. Again, negative sign is handled as a separate case.

In [17]:
ord('0')

48

In [18]:
# first int to string can use modulo 10 to get the least significant digit.
1 % 10, 123 % 10

def int_2_str(x: int) -> str:
    '''Since we build character array starting with least significant digit,
    then we need to reverse the array when returning it and add negative sign back.
    
    TIL: working with negative integers can be trick, so handle them separately as a 
    case here. See how x is always positive.
    '''
    is_negative = False
    if x < 0:
        x, is_negative = -x, True
    
    s = []
    while True:
        s.append(
            chr(ord('0') + x % 10)
        )
        
        x //= 10 # removes least significant digit
        if x == 0:
            break
    
    return ('-' if is_negative else '') + ''.join(reversed(s))

In [19]:
int_2_str(-123)

'-123'

In [29]:
chr(ord('0') + 2 % 10) # will be 0 if modulo returns 0 otherwise returns unicode of int.
123 // 10

12

In [44]:
import functools
import string

def str_2_int(x: str) -> int:
    '''This feels like cheating bc of functools but good to know
    that can maybe use these.
    '''
    return (-1 if s[0] == '-' else 1) * functools.reduce(
                lambda running_sum, c: running_sum * 10 + string.digits.index(c),
                s[s[0] in '-+':], 0)

In [45]:
str_2_int('-123')

-123

In [47]:
s = '-123'
print(s[False:]) # True/False can be used as 0,1 to slice array
# the reduce do this (can you follow how?):
0 + 1, 1 * 10 + 2, 12 * 10 + 3

-123


(1, 12, 123)

#### Aside: `functool.reduce`

The reducer above uses optional argument: `initializer` that I haven't seen before.  "If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. *If initializer is not given and iterable contains only one item, the first item is returned.*"

>For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5)

In [58]:
ll = [1, 2, 3, 4, 5]

print(functools.reduce(lambda x, y: x+y,ll))
# vs. using initializer 
functools.reduce(lambda x, y: x+y, ll, 1) # start with x = 1 rathern than x = ll[0] and y in 11[1:]

15


16

In [55]:
# see, 0 [the initializer] replaces `c`, not running_sum, in this case and screws up everything.
functools.reduce(lambda c, running_sum: running_sum * 10 + string.digits.index(c),
                s[s[0] in '-+':], 0)

TypeError: must be str, not int