# Strings
- A string is like a special kind of array but is immutable

## Tips
- Similar to arrays, string problems often have simple brute-force solutions that use $O(n)$ space, but subtler solutions that use the string itself to **reduce the complexity** to $O(1)$
- Understand the **implications** of a string type which is **immutable**, e.g., the need to allocate a new string when concatenating immutable strings. Know **alternatives** to immutable strings, e.g., a **list** in Python
- Updating a mutable string from the font is slow, so see if it's possible to **write values from the back**
- indexing works the same as lists

In [1]:
from typing import List, Iterator, Tuple
import bisect
import collections
import math
import functools
import itertools
import random

from utils import run_tests

## Libraries

In [2]:
s = 'The cow jumped over the moon'
t = 'The moon is made of cheese'
print(s)
print(t)

print('\ns.startswith("The"):       ', s.startswith("The"))

print('\ns.endswith(("moo", "moon"):', s.endswith(("moo", "moon")))     # tuple of string to try

print('\ns + t:                     ', s + t)

strings = ['the', 'cat', 'and', 'the', 'hat']
print('\n', strings)
print('" ".join(strings):           ' , " ".join(strings))

The cow jumped over the moon
The moon is made of cheese

s.startswith("The"):        True

s.endswith(("moo", "moon"): True

s + t:                      The cow jumped over the moonThe moon is made of cheese

 ['the', 'cat', 'and', 'the', 'hat']
" ".join(strings):            the cat and the hat


### Is Palindrome?
A palindrome is a string the reads the same forwards and backwards.  
Key to optimal solution is to traverse string forward and backwards to simultaneously

In [3]:
def is_palindrome(s: str) -> bool:
    ''' 
    run time half that of reversing string
    '''
    # note: ~i = -(i+1)
    return all(s[i] == s[~i] for i in range(len(s) // 2))

inputs, outputs = ('cat', 'aabbaa', 'aba', 'abca'), (False, True, True, False)
run_tests(is_palindrome, inputs, outputs)

Time complexity is $O(n)$ and space complexity $O(1)$

### 6.1: Interconvert Strings and Integers

In [4]:
def int_to_string(num: int) -> str:

    if num < 0:
        is_negative, num = True, abs(num)
    else:
        is_negative = False
    
    digits = []
    # process one digit at a time
    # processing digits in reverse order
    while True:
        num, digit = num // 10, num % 10
        digits.append(chr(ord('0') + digit))   # get code 0, add digit, then convert to character
        if num == 0:
            break

    if is_negative:
        digits.append('-')

    return ''.join(reversed(digits))


# sould be able to handle '314', '+314' or '-314'
def string_to_int(s: str) -> int:
    string_digits = {s:d for s, d, in zip(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], range(10))}

    sign = -1 if s[0] == '-' else 1

    running_sum = 0
    for i in s[s[0] in '+-':]:             # this skips first entry if has symbol
        running_sum = running_sum * 10 + string_digits[i]  # mutliplying by 10 shift place value to left

    return sign * running_sum


def string_to_int_v2(s: str) -> int:
    string_digits = {s:d for s, d, in zip(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], range(10))}
    return (-1 if s[0] == '-' else 1) * functools.reduce(
        lambda running_sum, c: running_sum * 10 + string_digits[c],
        s[s[0] in '+-':], 0
    )

inputs, outputs = (123, -123, 4, -4, 0), ('123', '-123', '4', '-4', '0')
run_tests(int_to_string, inputs, outputs)

inputs, outputs = ('123', '-123', '+123', '4', '-4', '+4', '0'), (123, -123, 123, 4, -4, 4, 0)
run_tests(string_to_int, inputs, outputs)

run_tests(string_to_int_v2, inputs, outputs)

$O(n)$ time and space complexity

In [5]:
print('ord("0") - returns Unicode code point one-character string,      e.g.:',  ord('0'))
print('chr(ord("0")) - returns Unicode string for one-character string, e.g.:',  chr(ord('0')))
print('ord("0") + 5:     ',  ord('0') + 5)
print('chr(ord("0") + 5):',  chr(ord('0') + 5))


ord("0") - returns Unicode code point one-character string,      e.g.: 48
chr(ord("0")) - returns Unicode string for one-character string, e.g.: 0
ord("0") + 5:      53
chr(ord("0") + 5): 5


In [6]:
{s:d for s, d, in zip(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], range(10))}

{'0': 0,
 '1': 1,
 '2': 2,
 '3': 3,
 '4': 4,
 '5': 5,
 '6': 6,
 '7': 7,
 '8': 8,
 '9': 9}

### 6.2: Base Conversion
Generalized decimal number system: $a_{k-1}a_{k-2}\cdots a_1a_0$, where $0 \leq a_i < b$, denotes in base-*b* the integer $a \times b^0 + a_1 \times b^1 + a_2 \times b^2 + \cdot + a_{k-1} \times b^{k-1}$          

Write a function that converts a string integer in $b_1$ to $b_2$   
e.g.: '615', $b_1 = 7$ and $b_2 = 13$ --> '1A7'    

Assume $b_1 \geq 2$ and $b_2 \leq 16$

In [7]:
def base_conversion(num_as_str: str, b1: int, b2: int) -> str:
    string_int_map = {s:d for s, d, in zip(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'], range(17))}
    int_string_map = {d:s for s, d in string_int_map.items()}

    # convert each digit to integer
    is_negative = num_as_str[0] == '-'
    num_as_int = functools.reduce(
        lambda sum_so_far, c: sum_so_far * b1 + string_int_map[c],
        num_as_str[is_negative:], 0
    )

    # process integers in revese order converting to b2
    digits_as_string = []
    while True:
        num_as_int, d = num_as_int // b2, num_as_int % b2 
        digits_as_string.append(int_string_map[d])
        if num_as_int == 0:
            break
    
    if is_negative:
        digits_as_string.append('-')

    return ''.join(reversed(digits_as_string))


assert base_conversion('615', b1=7, b2=13) == '1A7'
assert base_conversion('-615', b1=7, b2=13) == '-1A7'



In [8]:
# 615 Base-7 as decimal
5 + (1 * 7) + (6 * 7**2)

306

$Time complexity in $O(n(1 + \log_{b_2}{b_1}))$   
First, perform $n$ multply-and-adds to get $x$ from s.   
Then, perform $\log_{b_2}x$ multiply-and-adds.   
$x$ is upper-bounded by $b_1^n$, so $\log_{b_2}(b_1^n)$


### 6.3: Compute the Spreadsheet Column Encoding

In [9]:
def spreadsheet_column_decoder(s: str) -> int:
    num_letters = 26
    string_int_map = {chr(ord('A') + i):i+1 for i in range(num_letters)}

    return functools.reduce(
        lambda sum_so_far, c: sum_so_far * num_letters + string_int_map[c],
        s, 0
    )

inputs, outputs = ('A', 'D', 'Z', 'AA', 'AC', 'DB', 'AZ', 'EZ', 'ZZ'), (1, 4, 26, 27, 29, 26*4+2, 52, 26*5+26, 702)
run_tests(spreadsheet_column_decoder, inputs, outputs)


def spreadsheet_column_decoder_v2(s: str) -> int:
    '''
    essentially string to integer conversion with base 26
    '''
    return functools.reduce(
        lambda sum_so_far, c: sum_so_far * 26 + ord(c) - ord('A') + 1,
        s, 0
    )

inputs, outputs = ('A', 'D', 'Z', 'AA', 'AC', 'DB', 'AZ', 'EZ', 'ZZ'), (1, 4, 26, 27, 29, 26*4+2, 52, 26*5+26, 702)
run_tests(spreadsheet_column_decoder_v2, inputs, outputs)

$O(n)$ time complexity

#### Variant: Solve the same problem  with "A" corresponding to 0

In [10]:
# i think this is right
def spreadsheet_column_decoder_A0(s: str) -> int:

    return spreadsheet_column_decoder_v2(s) - 1

inputs, outputs = ('A', 'D', 'Z', 'AA', 'AC', 'DB', 'AZ', 'EZ', 'ZZ'), (0, 3, 25, 26, 28, 25+(26*3)+2, 51, 25+26*4+26, 701)
run_tests(spreadsheet_column_decoder_A0, inputs, outputs)

#### Variant: Convert Integer to Spreadsheet Index

In [11]:
def spreadsheet_column_encoder(num: int) -> str:

    encoding = []
    # process in reverse order
    while True:
        digit = num % 26
        if digit == 0:           # handle case for Z separately
            encoding.append('Z')
            num = num // 26 - 1   
        else:
            encoding.append(chr(ord('A') + digit - 1))
            num = num // 26
        if num == 0:
            break
    
    return ''.join(reversed(encoding))

inputs, outputs = (1, 4, 26, 27, 29, 26*4+2, 52, 26*5+26, 702), ('A', 'D', 'Z', 'AA', 'AC', 'DB', 'AZ', 'EZ', 'ZZ')
run_tests(spreadsheet_column_encoder, inputs, outputs)

In [12]:
ord('A')
for i in range(26):
    print(chr(ord('A') + i))

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z


In [13]:
chr(ord('A') + 1)

'B'

### 6.4: Replace and Remove

### 6.5: Palindrome with Punctuation
Check if a string is a palindrome after removing punctuation

In [14]:
def is_palindrome_punct(s: str) -> bool:

    i, j = 0, len(s) - 1

    while i < j:      # cannot use // 2 because may need to skip punctuation

        # skip punctuation
        while not s[i].isalnum():
            i += 1
        while not s[j].isalnum():
            j -= 1
        
        if s[i].lower() != s[j].lower():
            return False
        
        i += 1
        j -= 1

    return True
    
inputs, outputs = ('aabbaa', 'abc', 'a,b,.a', 'A man, a plan, a canal, Panama', 'Able was I, ere I saw Elba', 'Ray a Ray'), (True, False, True, True, True, False)
run_tests(is_palindrome_punct, inputs, outputs)

def is_palindrome_punct_pythonic(s: str) -> bool:
    return all(
        a == b
        for a, b in zip(map(str.lower, filter(str.isalnum, s)),
                        map(str.lower, filter(str.isalnum, reversed(s)))
                        )
    )

run_tests(is_palindrome_punct_pythonic, inputs, outputs)


$O(n)$ time and $O(1)$ space complexity

### 6.6: Reverse All the Words in a Sentence
e.g. 'Bob likes Alice' --> 'Alice likes Bob'  

In [25]:
def reverse_words(s: List[chr]) -> None:
    '''
    - reversing the string first will put the words in reverse order except with their character reversed
    - then just reverse the characters in each word     
    '''
    def reverse_range(s: List[chr], start: int, finish: int) -> None:
        while start < finish:
            s[start], s[finish] = s[finish], s[start]
            start += 1
            finish -= 1

    # reverse string
    s.reverse()

    # find space to figure out range of word
    start = finish = 0
    while finish < len(s):
        if s[finish] == ' ':
            reverse_range(s, start, finish-1)
            start = finish + 1
        finish += 1

    # reverse last word
    reverse_range(s, start, len(s) - 1)

    return ''.join(s)

s = list('Bob likes Alice')
s = reverse_words(s)
s


'Alice likes Bob'

$O(n)$ time and $O(1)$ space complexity

In [16]:
s = 'Bob likes Alice'
s = list(s)
s.reverse()
s

['e', 'c', 'i', 'l', 'A', ' ', 's', 'e', 'k', 'i', 'l', ' ', 'b', 'o', 'B']

### 6.7: The Look-and-Say Problem
Subsequent numbers are derived by describing the previous number in terms of consecutive digits.   
$<1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211>$

In [17]:
def look_and_say(n: int) -> str:

    def next_number(s: str) -> str:
        result = []
        i = 0
        while i < len(s):

            # iterate through duplicates
            count = 1
            while i + 1 < len(s) and s[i] == s[i+1]:
                count += 1
                i += 1
            result.append(str(count) + s[i])
            i += 1
            
        return ''.join(result)
    
    s = '1'
    for _ in range(n-1):
        s = next_number(s)
    return s

# power of itertools groupby
def look_and_say_pythonic(n: int) -> str:
    s = '1'
    for _ in range(n-1):
        s = [str(len(list(group))) + key for key, group in itertools.groupby(s)]
        s = ''.join(s)
    return s

print([look_and_say(i) for i in range(1, 9)])
print([look_and_say_pythonic(i) for i in range(1, 9)])

['1', '11', '21', '1211', '111221', '312211', '13112221', '1113213211']
['1', '11', '21', '1211', '111221', '312211', '13112221', '1113213211']


Each successiven number can be at most twice as many digits as the previous number --> $2^n$ and there are $n$ iterations    
$O(n2^n)$ time complexity

In [18]:
for key, group in itertools.groupby('1113213211'):
    print(key, list(group))

1 ['1', '1', '1']
3 ['3']
2 ['2']
1 ['1']
3 ['3']
2 ['2']
1 ['1', '1']


### 6.8: Convert from Roman to Decimal

In [19]:
def roman_to_decimal(roman: str) -> int:
    roman_int_map = {   
                        'I': 1,
                        'V': 5,
                        'X': 10,
                        'L': 50,
                        'C': 100,
                        'D': 500,
                        'M': 1000
                    }

    i = 0
    sum_so_far = 0
    while i < len(roman):
        # check if small numeral precedes larger numeral
        # dont forget to skip extra index
        if i + 1 < len(roman) and roman_int_map[roman[i]] < roman_int_map[roman[i+1]]:
            sum_so_far += (roman_int_map[roman[i+1]] - roman_int_map[roman[i]])
            i += 1
        else:
            sum_so_far += roman_int_map[roman[i]]
        i += 1
    
    return sum_so_far


# process in reverse order
def roman_to_decimal_pythonic(roman: str) -> int:
    roman_int_map = {   
                        'I': 1,
                        'V': 5,
                        'X': 10,
                        'L': 50,
                        'C': 100,
                        'D': 500,
                        'M': 1000
                    }
    return functools.reduce(
        lambda val, i: val + (-roman_int_map[roman[i]] if roman_int_map[roman[i]] < roman_int_map[roman[i+1]] else roman_int_map[roman[i]]),
        reversed(range(len(roman) - 1)), roman_int_map[roman[-1]]
    )

inputs, outputs = ('I', "II", 'IV', 'V', 'VIII', 'IX', 'X', 'XIV', 'XV', 'XVI', 'XXXV', 'XLI', 'XLV', 'LXIV', 'XXXXXIIIIIIIII', 'LVIIII', 'LIX'), (1, 2, 4, 5, 8, 9, 10, 14, 15, 16, 35, 41, 45, 64, 59, 59, 59)
run_tests(roman_to_decimal, inputs, outputs)
run_tests(roman_to_decimal_pythonic, inputs, outputs)

$O(n) time complexity

#### Variant: Check whether the roman numeral is valid

#### Variant: Convert integer to shortest valid roman numeral

In [20]:
def int_to_roman(num: int) -> str:

    int_roman_map = {   
                        100: 'C',
                        90:  'XC',
                        50:  'L',
                        40:  'XL',
                        10:  'X',
                        9:   'IX',
                        5:   'V',
                        4:   'IV',
                        1:   'I'
                    }
    roman_numerals = []
    for base in int_roman_map.keys():
        digit, num = num // base, num % base 
        roman_numerals.append(int_roman_map[base] * digit)  # note: a string * 0 is an empty character

    return ''.join(roman_numerals)

outputs, inputs = ('I', "II", 'IV', 'V', 'VIII', 'IX', 'X', 'XIV', 'XV', 'XVI', 'XXXV', 'XLI', 'XLV', 'LXIV', 'LIX'), (1, 2, 4, 5, 8, 9, 10, 14, 15, 16, 35, 41, 45, 64, 59)
run_tests(int_to_roman, inputs, outputs)


### 6.9: Compute all valid IP addresses

#### Variant

### 6.10: Write a String Sinusoidally

      

In [21]:
# e.g. s = 'Hello_World!    
#   e       _       l      s[1], s[5], s[9]
# H   l   o   W   r   d    s[0], s[2], s[4], s[6], s[8], s[10]
#       l       o      !   s[3], s[7], s[11]

def snake_string(s: str) -> str:

    result = []

    # top row
    for i in range(1, len(s), 4):
        result.append(s[i])

    # middle row
    for i in range(0, len(s), 2):
        result.append(s[i])

    for i in range(3, len(s), 4):
        result.append(s[i])

    return ''.join(result)

print(snake_string('Hello_World!'))

def snake_string_pythonic(s: str) -> str:
    return s[1::4] + s[0::2] + s[3::4]      # start:stop:step -> start::step

print(snake_string_pythonic('Hello_World!'))


e_lHloWrdlo!
e_lHloWrdlo!


$O(n)$ time complexity

### 6.11: Implement Run Length Encoding (RLE)
encode successive repeated characters by the repetition count and that character     
e.g. aaaabcccaa -> 4a1b3c2a | 3e4f2e -> eeeffffee

In [22]:
def rle_encode(s: str) -> str:
    start = 0
    size = 1
    result = []
    while start < len(s):
        while start + size < len(s) and s[start + size] == s[start]:
            size += 1
        result.append(str(size) + s[start])
        start += size
        size = 1

    return ''.join(result)

def rle_encode_v2(s: str) -> str:
    size = 1
    result = []
    for i in range(1, len(s) + 1):
        # found a different character
        if i == len(s) or s[i] != s[i - 1]:
            result.append(str(size) + s[i - 1])
            size = 1

        size += 1

    return ''.join(result)


def rle_encode_pythonic(s: str) -> str:
    return ''.join([str(len(list(group))) + key for key, group in itertools.groupby(s)])

print(rle_encode('aaaabcccaa'))
print(rle_encode_v2('aaaabcccaa'))
print(rle_encode_pythonic('aaaabcccaa'))
print(rle_encode_pythonic('eeeffffee'))
print()


def rle_decode(s: str) -> str:
    result = []
    count = 0
    for c in s:
        if c.isdigit():
            count = count * 10 + int(c)  # if count more than 0, need to move place value
        # c is a character
        else:
            result.append(c * count)
            count = 0
    return ''.join(result)

print(rle_decode('3e4f2e'))
print(rle_decode('4a1b3c2a'))
print(rle_decode('4a1b3c12a'))
print(rle_decode('4a1b35c12a'))

4a1b3c2a
4a2b4c3a
4a1b3c2a
3e4f2e

eeeffffee
aaaabcccaa
aaaabcccaaaaaaaaaaaa
aaaabcccccccccccccccccccccccccccccccccccaaaaaaaaaaaa


$O(n)$ time complexity

### 6.12: Find the first Occurrence of a Substring
A brute force algorithm is $O(n^2)$   


In [23]:
def robin_karp(t: str, s: str) -> int:
    pass