# STRINGS

## Longest Common Prefix

### a) Word Matching
One by one calculate the LCP of each of the given string with the current LCP so far. The final result - longest common prefix of all the strings  
Time c. O(MN) where N = # chars in strings & M = length of largest string  
Space c. O(M)

In [13]:
# common prefix for 2 strings
def common_prefix(str1, str2): 
  
    _res = '' 
    n1, n2 = len(str1), len(str2)
      
    # Compare str1 and str2 
    i = j = 0
    while i <= n1 - 1 and j <= n2 - 1: 
      
        if (str1[i] != str2[j]):
            break
              
        _res += str1[i] 
        i += 1
        j += 1
  
    return _res 
  
# find longest LCP 
def LCP(arr): 
  
    prefix = arr[0]  
    for i in range (1, len(arr)): 
        prefix = common_prefix(prefix, arr[i]) 
  
    return prefix 
  

arr = ['eurasia', 'euroasian', 'euran', 'europe'] 
res = LCP(arr) 

if ans: 
    print('LCP = ', res)
else: 
    print('No LCP') 

LCP =  eur


### b) Character matching

In [1]:
def LCP(arr):      
    
    minlen = len(arr[0])                                       # find length of shortest string
    for i in range(1, len(arr)): 
        if len(arr[i]) < minlen: 
            minlen = len(arr[i])
            
    res = '' 
    for i in range(minlen):                                    # current char must be same in all strings      
        
        current = arr[0][i] 
   
        for j in range(1, len(arr)): 
            if arr[j][i] != current: 
                return res
           
        res = res + current


arr = ['eurasia', 'euroasian', 'euran', 'europe'] 
res = LCP(arr) 

if res: 
    print('LCP = ', res)
else: 
    print('No LCP') 

LCP =  eur


## Reverse string
Think of the base case - string length <= 1

In [16]:
def reverse(s):    
    
    if len(s) <= 1:                       # base case
        return s
    
    return reverse(s[1:]) + s[0]          # recursion

In [18]:
print(reverse('hello world'))
print(reverse('123456789'))

dlrow olleh
987654321


## Get all permutations of string
If s='abc' => ['abc', 'acb', 'bac', 'bca', 'cab', 'cba']  
(If char is repeated - each occurence is distinct; if s='xxx' => list of 6 "versions" of 'xxx'

* For each char, __set it aside__ and get a list of __all permutations for the remainig string__;
* __Add the char set aside__ to each element of that list, and __append the result to final list__. E.g. set aside 'a' in 'abc', get all permitation of 'bc' = ['bc', 'cb'], then add 'a' to each of them = 'abc' and 'acb', add these to final list.

In [19]:
def permute(s):
    
    out = []
    if len(s) == 1:                                         # base case
        out = [s]
        
    else:        
        for i, char in enumerate(s):                         # for each char in string            
            for perm in permute(s[:i] + s[i+1:]):           # permite string w/out this char                
                out += [char + perm]                         # add removed char and append to output

    return out

In [20]:
permute('abc')

['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

## Reverse a string without affecting special characters
Given string with special chars and letters (a-zA-Z), reverse it without affecting special chars. Example: "a,b!c" => "c,b!a"  
Copying letters to a separate array, revirsing it, then iterating over input and inserting when there is a letter - time c. = O(n) + space c.  
A better solution:  
* l = 0, r = n-1;
* While l < r:  
    a) If not str[l].isalpha(): l++  
    b) If not str[r].isalpha(): r--  
    c) Swap str[l] and str[r]

In [68]:
# time c. O(n), space c. O(1)
def reverse_string(text):
        
    # initiate left and right indices
    left = 0
    right = len(text) - 1
    
    while left < right:
                
        # find actual letters, skip special chars
        while not text[left].isalpha():
            left += 1
        while not text[right].isalpha():
            right -= 1
            
        # swap once found, change left and right indices    
        text[left], text[right] = text[right], text[left]
        left += 1
        right -= 1
                
    return ''.join(text)
   
      
input_string = "a!!!b.c.d,e'f,ghi"
print (" Input string: ", input_string) 
print ("Output string: ", reverse_string(list(input_string))) 

 Input string:  a!!!b.c.d,e'f,ghi
Output string:  i!!!h.g.f,e'd,cba


## Check if anagrams
Anagrams share exact same characters (rearranged and ignoring spaces / capitalization)

In [77]:
# not optimal
def anagram_check(s1, s2):
    
    # remove spaces and make lowercase
    s1 = s1.replace(' ','').lower()
    s2 = s2.replace(' ','').lower()
        
    return sorted(s1) == sorted(s2)


# O(N)
def anagram_check2(s1, s2):
    
    # remove spaces and lowercase letters
    s1 = s1.replace(' ','').lower()
    s2 = s2.replace(' ','').lower()
    
    # edge case
    if len(s1) != len(s2):
        return False
    
    # counting dict (or defaultdict())
    count = {}    
    
        
    # iterate over first string (ADD counts)
    for letter in s1:
        if letter in count:
            count[letter] += 1
        else:
            count[letter] = 1
            
    # iterate over second string (SUBSTRACT counts)
    for letter in s2:
        if letter in count:
            count[letter] -= 1
        else:
            count[letter] = 1
    
    # check if all are 0
    for k in count:
        if count[k] != 0:
            return False
    
    return True

In [87]:
anagrams = [('public relations', 'crap built on lies'), ('dog','god'), ('clint eastwood','old west action'), ('dd','aa')]

print('Using sort-based aproach:')
for a in anagrams:
    print('\t{} for "{}"'.format(anagram_check(*a), ' AND '.join(a)))
    
print('\nUsing counting aproach:')
for a in anagrams:
    print('\t{} for "{}"'.format(anagram_check2(*a), ' AND '.join(a)))

Using sort-based aproach:
	True for "public relations AND crap built on lies"
	True for "dog AND god"
	True for "clint eastwood AND old west action"
	False for "dd AND aa"

Using counting aproach:
	True for "public relations AND crap built on lies"
	True for "dog AND god"
	True for "clint eastwood AND old west action"
	False for "dd AND aa"


## Sentence Reversal
Print sentence with the word order reversed

__Correct solution__: loop over text, extract words, push them to __"stack"__, pop them in reverse order

In [139]:
# easiest
def reverse_sent1(s):
    return " ".join(reversed(s.split()))

def reverse_sent2(s):
    return " ".join(s.split()[::-1])

# Correct (manual split)
def reverse_sent3(s):
        
    words, spaces = [], [' ']    
    
    i = 0                                                     # index        
    while i < len(s):
        
        if s[i] not in spaces:                                # if not a space            
            start = i                                         # word start            
            while i < len(s) and s[i] not in spaces:                
                i += 1                                        # word end            
            words.append(s[start:i])                          # append word to list        
        i += 1                                                # increase index
      
    return ' '.join(words[::-1])

In [140]:
print(reverse_sent1('   Hello John    how are you   '))
print(reverse_sent2('   Hello John    how are you   '))
print(reverse_sent3('   Hello John    how are you   '))

you are how John Hello
you are how John Hello
you are how John Hello


In [205]:
# easiest expanded
def reverse_words(string):
        
    sent = string.strip().split()                                                 # set = list of words
    
    left = 0
    right = len(sent) - 1    
    while left < right:
        sent[left], sent[right] = sent[right], sent[left]
        left += 1
        right -= 1    

    return ' '.join(sent)


#if __name__ == "__main__":
string = 'I am who I am and I like pizza'
print(string + '\n' + reverse_words(string))

I am who I am and I like pizza
pizza like I and am I who am I


## String compression
Compress 'AAAABBBBCCCCCDDEEEE' into 'A4B4C5D2E4'
*  Work off of a list of characters, convert it back to string
* Time and space complexity of O(n)

In [154]:
def compress(s):
    """
    Compresses without checking - RunLength Compression algo
    """
    
    if len(s) == 0:   return ''                # edge cases        
    elif len(s) == 1: return s + '1'    
    
    res = ''                                   # run / res is empty
    last, count = s[0], 1                      # intialize
        
    i = 1    
    while i < len(s):        
        
        if s[i] == s[i - 1]:                   # check if same letter            
            count += 1
        else:            
            res = res + s[i - 1] + str(count)  # if not, store previous data
            count = 1        
        i += 1                                 # to terminate while loop    
    
    res = res + s[i - 1] + str(count)          # put everything back into run - WHY THIS LINE?
    
    return res

In [155]:
examples = ['AABBCCC', 'AAAAABBBBCCCCC', 'AAAABBBCCCCCCCDDDDDDD', '']

for example in examples:
    print(compress(example), example)

A2B2C3 AABBCCC
A5B4C5 AAAAABBBBCCCCC
A4B3C7D7 AAAABBBCCCCCCCDDDDDDD
 


## Knuth-Morris-Pratt Algorithm
Find a pattern within a piece of text; time c. = O(n + m)

Principle: whenever we detect a mismatch in position i of pattern, we already know some of the characters in the text of the next window. We take advantage of this information to avoid matching the characters that we know will anyway match (by shifting to the left only by pi[i].

In other words, if a match which had begun at text[m] fails while comparing text[m + i] to pattern[i], then the next possible match must begin at text[m + (i - T[i])] - at a higher index than m, so that pi[i] < i, and this way we skip the portion of the pattern that will match anyway

__Precomputing a table of prefixes__:

* KMP algorithm preprocesses pattern and constructs an auxiliary pi[] of size m (len(pattern)) which is used to skip characters while matching.
* __pi__ is an array of __longest proper prefixes which are also suffixes__ (lpps). Proper prefix - any prefix, but the whole string. For string “ABC”, prefixes = “”, “A”, “AB” and “ABC”, BUT proper prefixes = “”, “A” and “AB”. Suffixes = “”, “C”, “BC” and “ABC”.
* We search for lpps in sub-patterns. More clearly we focus on sub-strings of patterns that are either prefix and suffix.
* For each sub-pattern pattern[0..i] where i = [0, m-1], __pi[i] stores length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i]__.

Note: pi[i] could also be defined as longest prefix which is also proper suffix. We need to use "proper" at one place to make sure that the whole substring is not considered

In [151]:
# return all positions of a substring in a larger string
def kmp( pattern='', text='' ):
        
    n = len(text)
    m = len(pattern)
    matches = []
    pi = get_prefixes(pattern)                                  # table of precomputed prefixes
    j = 0                                                       # index for pattern
    for i in range(n):                                          # index for text
        while j > 0 and pattern[j] != text[i]:
            j = pi[j - 1]
        if pattern[j] == text[i]:
            j = j + 1
        if j == m:
            matches.append(i - m + 1)
            j = pi[j-1]

    return matches


def get_prefixes(pattern_):
        
    m = len(pattern_)
    pi = [0] * m
        
    LPP = 0                                                      # longest proper prefix which is also a suffix
    for j in range(1, m):
        while LPP > 0 and pattern_[LPP] != pattern_[j]:
            LPP = pi[LPP - 1]
                        
        if pattern_[LPP] == pattern_[j]:
            LPP = LPP + 1
                        
        pi[j] = LPP
                
    return pi


# if __name__ == '__main__':

# Test 1)
pattern = "abc1abc12"
text1 = "alskfjaldsabc1abc1abc12k23adsfabcabc"
text2 = "alskfjaldsk23adsfabcabc"
print(kmp(pattern, text1))
print(kmp(pattern, text2))

# Test 2)
pattern = "ABABX"
text = "ABABZABABYABABX"
print(kmp(pattern, text))

# Test 3)
pattern = "AAAB"
text = "ABAAAAAB"
print(kmp(pattern, text))

# Test 4)
pattern = "abcdabcy"
text = "abcxabcdabxabcdabcdabcy"
print(kmp(pattern, text))

# Test 5)
pattern = "aaab"
get_prefixes(pattern)

[14]
[]
[10]
[4]
[15]


[0, 1, 2, 0]

## Longest common subsequence (DP)
Subsequence - not necessarily contiguous. Example: 'abghed' & 'bhaxyz' have lcs 'abh'

In [167]:
def lcs(s1, s2):
        
    matrix = [ ['' for x in range(len(s2))] for x in range(len(s1)) ]
    for i in range(len(s1)):
        for j in range(len(s2)):
            if s1[i] == s2[j]:
                if i == 0 or j == 0:
                    matrix[i][j] = s1[i]
                else:
                    matrix[i][j] = matrix[i-1][j-1] + s1[i]
            else:
                matrix[i][j] = max(matrix[i-1][j], matrix[i][j-1], key=len)

    cs = matrix[-1][-1]

    return len(cs), cs

print(lcs("abcdaf", "acbcf"))

X = "AGGTAB"
Y = "GXTXAYB"
print(lcs(X, Y))

(4, 'abcf')
(4, 'GTAB')


## Longest common substring (DP)
Time c. O(nm), space c. O(nm) (space can be converted to O(n))

In [168]:
def longest_common_substring(s1, s2):
        
    m = [[0] * (len(s2) + 1) for i in range(len(s1) + 1)]
    length, idx_longest = 0, 0
        
    for i in range(1, 1 + len(s1)):
        for j in range(1, 1 + len(s2)):
                        
            if s1[i - 1] == s2[j - 1]:
                m[i][j] = m[i - 1][j - 1] + 1
                if m[i][j] > length:
                    length = m[i][j]
                    idx_longest = i
            else:
                m[i][j] = 0
                                
    return s1[ idx_longest - length: idx_longest ]


s1 = 'I went there'
s2 = 'No matter what I went there and found it'
longest_common_substring(s1, s2)

'I went there'

## Longest common substring in array of strings

In [6]:
# FIND LONGEST COMMON SUBSTRING
def find_stem(arr): 
  
    reference, res = arr[0], ' ' 
  
    for i in range(len(reference)):                              # generate all possible substrings in reference string
        for j in range( i + 1, len(reference) + 1):
            
            stem = reference[i:j] 
            k = 1
            for k in range(1, len(arr)):  
                 
                if stem not in arr[k]:                           # Check if the generated stem is common to all words
                    break              
            
            if (k + 1 == len(arr) and len(res) < len(stem)):     # If current substr is in all strings and greater than current 
                res = stem 
  
    return res

# incomplete solution - the last elements in arr is not taken into account
s1 = 'I went there'
s2 = 'No matter what I went there and found it'
s3 = "That's where I went"
s4 = 'I we'
find_stem([s1, s2, s3, s4])

'I went'

## Unique characters in string?

In [157]:
# built-in data structure & built in function
def uni_char(s):
    return len(set(s)) == len(s)

#  built-in data structure & look-up method
def uni_char2(s):
    chars = set()
    for char in s:
        
        if char in chars:                 # check if in set
            return False
        else:            
            chars.add(char)                # add to set
                        
    return True

In [160]:
examples = ['', 'goo', 'abcdefg', 'aabbcdddeeeeee', 'abcdeghiklmnop']

for example in examples:
    print(uni_char(example))
print()
    
for example in examples:
    print(uni_char2(example))

True
False
True
False
True

True
False
True
False
True


## Hacker Rank

## Find a string

In [34]:
def count_substring(string, sub_string):
    count = 0
    for i in range(len(string)-len(sub_string)+1):
        if string[i:i+len(sub_string)] == sub_string:
            count += 1
    return count

count_substring('abababab', 'ab')

4

## Number of steps to make word palindrome
Minimum number of operations needed to make the string a palindrome  
* One can only reduce the value of a letter by 1, i.e. he can change d to c, but he cannot change c to d or d to b.
* Letter a may not be reduced

In [38]:
def minimum_reductions(s):
    n = len(s)
    count = 0
    for i in range(n // 2):
        left = ord(s[i])
        right = ord(s[(n - 1) - i])
        if left != right:
            if left > right:
                count += left - right
            else:
                count += right - left
    return count


s = 'gsaldgk'
print(minimum_reductions(s))

# OR

count=0
for i in range(len(s)//2):
    if s[i]!=s[-i-1]:
        count+=abs(ord(s[i])-ord(s[-i-1]))
print(count)

19
19


## Reduce string
Delete all double letters  
Using stack

In [1]:
def reduce(s):
    
    stack = []
            
    for char in s:                                                       # iterate over remaining part of string
        if not stack:                                                    # stack can be empty if prev double chars were removed
            stack.append(char)
        elif char == stack[-1]:                                          # double; stack[-1] = peek()
            stack.pop()
            continue                                                     # skip this char
        else:
            stack.append(char)                                           # not a double
    return ''.join(stack)


sa = "aaabbcccdddd"
reduce(sa)

'ac'

## Funny string
Funny if absolute difference in the ascii values of the chars at adjacent positions are the same for the string and its reverse string

In [85]:
strings = ['acxz', 'bcxz', 'abba', 'abbat', 'tabbat']
for string in strings:
    reversed_string = string[::-1]
    print('Funny' if all((abs(ord(reversed_string[i])-ord(reversed_string[i-1])) == abs(ord(string[i])-ord(string[i-1]))) \
                          for i in range(1,len(string))) else 'Not Funny')

Funny
Not Funny
Funny
Not Funny
Funny


## Parse domain

In [86]:
def domain_name(url):
    return url.split("//")[-1].split("www.")[-1].split(".")[0]


print(domain_name("http://github.com/SaadBenn"))
print(domain_name("http://www.zombie-bites.com"))
print(domain_name("https://www.cnet.com"))

github
zombie-bites
cnet


## Delete reoccurring characters

In [89]:
# Google warmup interview question: delete any reoccurring character
# time c. O(n)
def delete_reoccurring_characters(string):
        
    seen = set()
    output = ''
        
    for char in string:
        if char not in seen:
            seen.add(char)
            output += char
                        
    return output

delete_reoccurring_characters('aaabbccccddddd')

'abcd'

## First unique char in string

In [2]:
def first_unique(string):
    
    count = dict()
    for char in string:
        if char not in count:
            count[char] = 1
        else:
            count[char] += 1
            
    for char in string:
        if count[char] == 1:
            return char
        
    return -1

first_unique('aabbcccddddde')

'e'

## Common elements in multiple lists

In [99]:
def count_common_elements(all_lists):
        
    all_sets = list(map(set, all_lists))
    common_elements = set.intersection(*all_sets)
        
    return common_elements

a = [1,2,3,4,5]
b = [2,3,4,5,6]
c = [3,4,5,6,7]
count_common_elements([a, b, c])

{3, 4, 5}

## Group anagrams

In [109]:
# check if anagrams: hashing (O(n))
def anagram_check2(s1, s2):
    
    # remove spaces and lowercase letters
    s1 = s1.replace(' ','').lower()
    s2 = s2.replace(' ','').lower()
    
    # edge case
    if len(s1) != len(s2):
        return False
    
    # counting dict (or defaultdict())
    count = {}    
    
        
    # iterate over first string (ADD counts)
    for letter in s1:
        if letter in count:
            count[letter] += 1
        else:
            count[letter] = 1
            
    # iterate over second string (SUBSTRACT counts)
    for letter in s2:
        if letter in count:
            count[letter] -= 1
        else:
            count[letter] = 1
    
    # check if all are 0
    for k in count:
        if count[k] != 0:
            return False
    
    return True


# group anagrams using sorting (time c. = NlogN)
def group_anagrams(strings):
    
    dict_anagram,  res = {},  []
    
    idx = 0
    for one_string in strings:
        sorted_string = ''.join(sorted(one_string))
        if sorted_string not in dict_anagram:
            dict_anagram[sorted_string] = idx                              # dict value - index in a list of anagrams
            idx += 1
            res.append([])                                                       # create empty list in the end
            res[-1].append(one_string)                                           # add anagram to it
        else:
            res[dict_anagram[sorted_string]].append(one_string)            # find correct list in big list and add
                        
    return res


# group anagrams using linear-time anagram_check (time c. = O(n))
def group_anagrams2(strings):
    
    dict_anagram,  res = {},  []    
    
    idx = 0
    for one_string in strings:
                
        key_found = ''
        for key in dict_anagram:                                                 # equivalent to "if sorted_string in dict"
            if anagram_check2(key, one_string):
                key_found = key
                break            
            
        if not key_found:
            dict_anagram[one_string] = idx                                       # dict value - index in a list of anagrams
            idx += 1
            res.append([])                                                       # create empty list in the end
            res[-1].append(one_string)                                           # add anagram to it
        else:
            res[dict_anagram[key_found]].append(one_string)                      # find correct list in big list and add
                        
    return res


strings = ["eat", "tea", "tan", "ate", "nat", "bat"]
group_anagrams(strings)

[['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

## Integer to Roman number

In [112]:
# Input in range(1, 3999)
def int_to_roman(num):
    """
    :type num: int
    :rtype: str
    """
    m = ["", "M", "MM", "MMM"];
    c = ["", "C", "CC", "CCC", "CD", "D", "DC", "DCC", "DCCC", "CM"];
    x = ["", "X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC"];
    i = ["", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX"];
    return m[num//1000] + c[(num%1000)//100] + x[(num%100)//10] + i[num%10]

int_to_roman(2020)

'MMXX'

## Is Palindrome?

In [123]:
"""
Is a string a palindrome - ignore non letters and cases
Example: 'A man, a plan, a canal: Panama' = True, 'race a car' = False
Note: ask the interviewer about empty strings. Here an empty string is a valid palindrome
"""
from string import ascii_letters


def remove_punctuation(s):
    
    return "".join(i.lower() for i in s if i in ascii_letters)


# O(n) solution
def is_palindrome_two_pointers(s):
    
    i = 0
    j = len(s)-1
    while i < j:
        while i < j and not s[i].isalnum():
            i += 1
        while i < j and not s[j].isalnum():
            j -= 1
        if s[i].lower() != s[j].lower():
            return False
        i, j = i+1, j-1
                
    return True


# using stack
def is_palindrome_stack(s):
    
    stack = []
    s = remove_punctuation(s)

    for i in range(len(s)//2, len(s)):
        stack.append(s[i])
    for i in range(0, len(s)//2):
        if s[i] != stack.pop():
            return False
                
    return True


a = 'A man, a plan, a canal: Panama'
b = 'race a car'

print(is_palindrome_two_pointers(a))
print(is_palindrome_two_pointers2(a))
print(is_palindrome_string_reverse(a))
print(is_palindrome_stack(a))

print()

print(is_palindrome_two_pointers(b))
print(is_palindrome_two_pointers2(b))
print(is_palindrome_string_reverse(b))
print(is_palindrome_stack(b))

True
True
True
True

False
False
False
False


## Check if syllables are rotated

In [125]:
def is_rotated(s1, s2):
    if len(s1) == len(s2):
        return s2 in s1 + s1
    else:
        return False
    

s1 = 'random'
s2 = 'domran'

is_rotated(s1, s2)

True

## Is isogram?

In [126]:
# Isogram = word or phrase w/out repeating letters
def is_isogram(word):
   
    letter_list = []                                                         # empty list to append unique letters
    for letter in word.lower():
        
        if letter.isalpha():                                                 # check letters only
            if letter in letter_list:
                return False
            letter_list.append(letter)
                        
    return True

s1 = 'abcdefg'
s2 = 'abcbcdefg'

print(is_isogram(s1))
print(is_isogram(s2))

True
False


## Hash value of string

In [4]:
def hash_value(string, base):
    """Calculate the hash value of a string using base.

    Example: 'abc' = 97 x base^2 + 98 x base^1 + 99 x base^0
    @param s string to compute hash value for
    @param base base to use to compute hash value
    @return hash value
    """
    hash_value = 0
    power = len(string)-1
        
    for i in range(len(string)):
        hash_value += ord(string[i]) * (base ** power)
        power -= 1

    return hash_value


hash_value('come on',1000)

99111109101032111110

## Make sentences with dictionary
For a given string and a dictionary, how many sentences can you make from the string with all the words from the dictionary.  
Example: "applet", {app, let, apple, t, applet} => 3

In [185]:
def make_sentence(string, dictionaries):
        
    global count
    if len(string) == 0:
        return True
        
    for i in range(0, len(string) + 1):
        prefix, suffix = string[0:i], string[i:]
        if prefix in dictionaries:
            if suffix in dictionaries or make_sentence(suffix, dictionaries):
                count += 1
                                
    return True


count = 0

string = "applet"
dictionary = {'app', 'let', 'apple', 't', 'applet'}

string = 'thing'
dictionary = {'thing'}
make_sentence(string, dictionary)
print(count)

True
1


## Is pangram
A __pangram__ or holoalphabetic sentence is a sentence using __every letter of a given alphabet at least once__

In [194]:
# Naive
def is_pangram(string):
        
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    for char in alphabet: 
        if char not in string.lower(): 
            return False
  
    return True


my_string = 'the quick brown fox jumps over the lazy dog'
print(is_pangram(my_string))

True


In [195]:
# Using set and string module
import string

def is_pangram(string): 
    return set(string.lower()) >= alphabet 


alphabet = set(string.ascii_lowercase)
my_string = "The quick brown fox jumps over the lazy dog"
print(is_pangram(my_string))

True


## String = multiple copies of substring?
Given non-empty string - check if it is composed of multiple copies of one substring.

Examples:

Input: "abab"  
Output: True ("ab" twice)

Input: "aba"  
Output: False

Input: "abcabcabcabc"  
Output: True ("abc" four times)

In [208]:
# cool trick!
def repeat_substring(s):   
    return s in (s + s)[1:-1]

print(repeat_substring('abab'))
print(repeat_substring('aba'))
print(repeat_substring('abcabcabcabc'))

True
False
True


In [None]:
def string_matching_naive(text='', pattern=''):
    """Returns positions where pattern is found in text.

    We slide the string to match 'pattern' over the text

    O((n-m)m)
    Example: text = 'ababbababa', pattern = 'aba'
                     string_matching_naive(t, s) returns [0, 5, 7]
    @param text text to search inside
    @param pattern string to search for
    @return list containing offsets (shifts) where pattern is found inside text
    """

    n = len(text)
    m = len(pattern)
    offsets = []
    for i in range(n-m+1):
        if pattern == text[i:i+m]:
            offsets.append(i)

    return offsets

## TRIES

Using Trie, search complexities can be brought to optimal limit (key length). String search in a well balanced BST - M * log N (M=max string length, N=num keys in tree). Trie search - O(M). Penalty - space.

Node => multiple branches, 1 branch => possible character of keys.Mark last node of every key as end of word (node field isEndOfWord).

* Insert: every char is Trie node; children = array of pointers to next level trie nodes. Key char = index in array children. Input key new => construct non-existing nodes + mark end of the word. Input key = prefix of existing key => mark last node of key as end of a word (is_leaf). Key length determines Trie depth.

* Search for key - similar, but only compare chars and move down.

In picture - every char = trie_node_t. E.g. root’s children a, b and t are filled, all other nodes of root will be NULL. Similarly, “a” at next level has one child (“n”), all other children are NULL.

__Quick lookup of words/patterns in a set of words, but high space c.__:  
* Insert and search time c. = key length
* space c. = ALPHABET_SIZE * key_length * N where N = num keys in Trie (O(n^2)?) - impractical, unless space is of no concern
* There are efficient representation of trie nodes (e.g. compressed trie, ternary search tree, etc.) to minimize memory requirements of trie

In [None]:
# Example
'''
                       root
                    /   \    \
                    t   a     b
                    |   |     |
                    h   n     y
                    |   |  \  |
                    e   s  y  e
                 /  |   |
                 i  r   w
                 |  |   |
                 r  e   e
                        |
                        r
'''
pass

In [None]:
class TrieNode:
    def __init__(self):
        self.nodes = dict()  # Mapping from char to TrieNode
        self.is_leaf = False

    def insert_many(self, words: [str]):  # noqa: E999 This syntax is Python 3 only
        """
        Inserts a list of words into the Trie
        :param words: list of string words
        :return: None
        """
        for word in words:
            self.insert(word)

    def insert(self, word: str):  # noqa: E999 This syntax is Python 3 only
        """
        Inserts a word into the Trie
        :param word: word to be inserted
        :return: None
        """
        curr = self
        for char in word:
            if char not in curr.nodes:                             # nodes = dict()
                curr.nodes[char] = TrieNode()
            curr = curr.nodes[char]
        curr.is_leaf = True

    def find(self, word: str) -> bool:  # noqa: E999 This syntax is Python 3 only
        """
        Tries to find word in a Trie
        :param word: word to look for
        :return: Returns True if word is found, False otherwise
        """
        curr = self
        for char in word:
            if char not in curr.nodes:
                return False
            curr = curr.nodes[char]
        return curr.is_leaf


def print_words(node: TrieNode, word: str):  # noqa: E999 This syntax is Python 3 only
    """
    Prints all the words in a Trie
    :param node: root node of Trie
    :param word: Word variable should be empty at start
    :return: None
    """
    if node.is_leaf:
        print(word, end=' ')

    for key, value in node.nodes.items():
        print_words(value, word + key)


def test():
    words = ['banana', 'bananas', 'bandana', 'band', 'apple', 'all', 'beast']
    root = TrieNode()
    root.insert_many(words)
    # print_words(root, '')
    assert root.find('banana')
    assert not root.find('bandanas')
    assert not root.find('apps')
    assert root.find('apple')

test()

### Auto-complete feature using Trie
Given trie and a prefix typed in the search query, provide all auto-complete recommendations (trie stores past searches)

Example: {“abc”, “abcd”, “aa”, “abbbaba”}, user types “ab”, output = {“abc”, “abcd”, “abbbaba”}.

Prerequisite Trie Search and Insert

* Search for given query using standard Trie search algorithm.
* If query prefix itself is not present, return -1 to indicate the same.
* If query is present and is end of word in Trie, print query. This can quickly checked by seeing if last matching node has isEndWord flag set. We use this flag in Trie to mark end of word nodes for purpose of searching.
* If last matching node of query has no children, return.
* Else recursively print all nodes under subtree of last matching node

__Improvements__  
The number of matches might just be too large so we have to be selective while displaying them. We can restrict ourselves to display only the relevant results. By relevant, we can consider the past search history and show only the most searched matching strings as relevant results.  
Store another value for the each node where isleaf=True which contains the number of hits for that query search. For example if “hat” is searched 10 times, then we store this 10 at the last node for “hat”. Now when we want to show the recommendations, we display the top k matches with the highest hits.

In [30]:
# Python3 program to demonstrate auto-complete  
# feature using Trie data structure. 
# Note: This is a basic implementation of Trie 
# and not the most optimized one. 
class TrieNode(): 
    def __init__(self): 
          
        # Initialising one node for trie 
        self.children = {} 
        self.last = False

class Trie(): 
    def __init__(self): 
          
        # Initialising the trie structure. 
        self.root = TrieNode() 
        self.word_list = [] 
  
    def formTrie(self, keys): 
          
        # Forms a trie structure with the given set of strings 
        # if it does not exists already else it merges the key 
        # into it by extending the structure as required 
        for key in keys: 
            self.insert(key) # inserting one key to the trie. 
  
    def insert(self, key): 
          
        # Inserts a key into trie if it does not exist already. 
        # And if the key is a prefix of the trie node, just  
        # marks it as leaf node. 
        node = self.root 
  
        for a in list(key): 
            if not node.children.get(a): 
                node.children[a] = TrieNode() 
  
            node = node.children[a] 
  
        node.last = True
  
    def search(self, key): 
          
        # Searches the given key in trie for a full match 
        # and returns True on success else returns False. 
        node = self.root 
        found = True
  
        for a in list(key): 
            if not node.children.get(a): 
                found = False
                break
  
            node = node.children[a] 
  
        return node and node.last and found 
  
    def suggestionsRec(self, node, word): 
          
        # Method to recursively traverse the trie 
        # and return a whole word.  
        if node.last: 
            self.word_list.append(word) 
  
        for a,n in node.children.items(): 
            self.suggestionsRec(n, word + a) 
  
    def printAutoSuggestions(self, key): 
          
        # Returns all the words in the trie whose common 
        # prefix is the given key thus listing out all  
        # the suggestions for autocomplete. 
        node = self.root 
        not_found = False
        temp_word = '' 
  
        for a in list(key): 
            if not node.children.get(a): 
                not_found = True
                break
  
            temp_word += a 
            node = node.children[a] 
  
        if not_found: 
            return 0
        elif node.last and not node.children: 
            return -1
  
        self.suggestionsRec(node, temp_word) 
  
        for s in self.word_list: 
            print(s) 
        return 1
    
    
keys = ["dog", "cat", "a", "over", "help", "helps", "helping"]                     # past searches
key = "help"                                                                       # key for autocomplete suggestions
status = ["Not found", "Found"] 
  
# create trie object 
t = Trie() 
  
# creating the trie structure with the  
# given set of strings. 
t.formTrie(keys) 
  
# autocompleting the given key using  
# our trie structure. 
comp = t.printAutoSuggestions(key) 
  
if comp == -1: 
    print("No other strings found with this prefix\n") 
elif comp == 0: 
    print("No string found with this prefix\n")

help
helps
helping


### Word Break (Famous Google Interview Question) - Trie Solution
Can the input string can be segmented into a space-separated sequence of dictionary words - famous Google interview question.

Dict: { i, like, sam, sung, samsung, mobile, ice, cream, icecream, man, go, mango}  
Input string:  ilikesamsung    
Output: Yes 
The string can be segmented as "i like samsung"

Extending a DP array-based solution (no Python version) with tries
__replace pCrawl with something more coherent (curr? as in the first Trie example above)__

In [1]:
class Solution(object): 
    def wordBreak(self, s, wordDict): 
        """ 
        Author : @amitrajitbose 
        :type s: str 
        :type wordDict: List[str] 
        :rtype: bool 
        """
        """CREATING THE TRIE CLASS"""
  
        class TrieNode(object): 
              
            def __init__(self): 
                self.children = [] #will be of size = 26 
                self.isLeaf = False
              
            def getNode(self): 
                p = TrieNode() #new trie node 
                p.children = [] 
                for i in range(26): 
                    p.children.append(None) 
                p.isLeaf = False
                return p 
              
            def insert(self, root, key): 
                key = str(key) 
                pCrawl = root 
                for i in key: 
                    index = ord(i)-97
                    if (pCrawl.children[index] == None): 
                        # node has to be initialised 
                        pCrawl.children[index] = self.getNode() 
                    pCrawl = pCrawl.children[index] 
                pCrawl.isLeaf = True #marking end of word 
              
            def search(self, root, key): 
                #print("Searching %s" %key) #DEBUG 
                pCrawl = root 
                for i in key: 
                    index = ord(i)-97
                    if (pCrawl.children[index] == None): 
                        return False
                    pCrawl = pCrawl.children[index] 
                if (pCrawl and pCrawl.isLeaf): 
                    return True
          
        def checkWordBreak(strr, root):
                        
            n = len(strr) 
            if (n == 0): 
                return True
            for i in range(1,n+1): 
                if (root.search(root, strr[:i]) and checkWordBreak(strr[i:], root)): 
                    return True
            return False
          
        """IMPLEMENT SOLUTION"""
        root = TrieNode().getNode() 
        for w in wordDict: 
            root.insert(root, w) 
        out = checkWordBreak(s, root) 
        if(out): 
            return "Yes"
        else: 
            return "No"

print(Solution().wordBreak("thequickbrownfox", ["the", "quick", "fox", "brown"])) 
print(Solution().wordBreak("bedbathandbeyond", ["bed", "bath", "bedbath", "and", "beyond"])) 
print(Solution().wordBreak("bedbathandbeyond", ["teddy", "bath", "bedbath", "and", "beyond"])) 
print(Solution().wordBreak("bedbathandbeyond", ["bed", "bath", "bedbath", "and", "away"])) 

Yes
Yes
Yes
No


## Morse code transformations

In [None]:
"""
International Morse Code defines a standard encoding where each letter is mapped to
a series of dots and dashes, as follows: "a" maps to ".-", "b" maps to "-...", "c"
maps to "-.-.", and so on.

For convenience, the full table for the 26 letters of the English alphabet is given below:
        'a':".-",
        'b':"-...",
        'c':"-.-.",
        'd': "-..",
        'e':".",
        'f':"..-.",
        'g':"--.",
        'h':"....",
        'i':"..",
        'j':".---",
        'k':"-.-",
        'l':".-..",
        'm':"--",
        'n':"-.",
        'o':"---",
        'p':".--.",
        'q':"--.-",
        'r':".-.",
        's':"...",
        't':"-",
        'u':"..-",
        'v':"...-",
        'w':".--",
        'x':"-..-",
        'y':"-.--",
        'z':"--.."

Now, given a list of words, each word can be written as a concatenation of the
Morse code of each letter. For example, "cab" can be written as "-.-.-....-",
(which is the concatenation "-.-." + "-..." + ".-"). We'll call such a
concatenation, the transformation of a word.

Return the number of different transformations among all words we have.
Example:
Input: words = ["gin", "zen", "gig", "msg"]
Output: 2
Explanation:
The transformation of each word is:
"gin" -> "--...-."
"zen" -> "--...-."
"gig" -> "--...--."
"msg" -> "--...--."

There are 2 different transformations, "--...-." and "--...--.".
"""

morse_code = {
    'a':".-",
    'b':"-...",
    'c':"-.-.",
    'd': "-..",
    'e':".",
    'f':"..-.",
    'g':"--.",
    'h':"....",
    'i':"..",
    'j':".---",
    'k':"-.-",
    'l':".-..",
    'm':"--",
    'n':"-.",
    'o':"---",
    'p':".--.",
    'q':"--.-",
    'r':".-.",
    's':"...",
    't':"-",
    'u':"..-",
    'v':"...-",
    'w':".--",
    'x':"-..-",
    'y':"-.--",
    'z':"--.."
}
def convert_morse_word(word):
    morse_word = ""
    word = word.lower()
    for char in word:
        morse_word = morse_word + morse_code[char]
    return morse_word

def unique_morse(words):
    unique_morse_word = []
    for word in words:
        morse_word = convert_morse_word(word)
        if morse_word not in unique_morse_word:
            unique_morse_word.append(morse_word)
    return len(unique_morse_word)