<a href="https://colab.research.google.com/github/chuk-yong/Daily-Coding-Problem/blob/main/2_1_Strings_find_anagram.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Find Anagram Indices
Given a word w and a string s, find all indices in s which are the starting locations of anagrams of w.  
w = 'ab'

s = 'abxaba'

return [0, 3, 4]


## Using Counter
Counter returns the keys representing the alphabets and values are their occurences.


In [None]:
from collections import Counter


In [3]:
w = 'ab'
s = 'abxaba'

In [None]:
# exploring Counter
Counter(s)

Counter({'a': 3, 'b': 2, 'x': 1})

In [None]:
def compare(s1, s2):
  return Counter(s1) == Counter(s2)

def indices(w,s):
  result =[]
  for i in range(len(s)-len(w)+1):
    slider = s[i:i+len(w)] # sliding window along s
    if compare(slider, w):
      result.append(i)
  return result



In [None]:
print(indices(w,s))

[0, 3, 4]


In [None]:
## Solution - Brute forces as above
## takes O(w x s) time
from collections import Counter

def is_anagram(s1,s2):
  return Counter(s1) == Counter(s2)

def anagram_indices(word, s):
  result = []
  for i in range(len(s)-len(w)+1):
    windown = s[i:I+len(word)]
    if is_anagram(window, word):
      result.append(i)
  return result

## Using hash tables
First make a frequency dictionary of the initial window and the target word.  Move along the string, increment the count of each new character and decrement the count of the old. If at any point, there is no difference between the frequency of the target word and the counter, add the corresponding starting index to our result.

##Using defaultdict: 
defaultdict has a property in that it does not return error when an item is not in the dictionary.  You can define the default behaviour when you call the function.  For example:

def def_value():
    return "Not Present"
# now initiate our dictionary
```
d = defaultdict(def_value)
```

Another way:

```
d = defaultdict(lambda: "Not Present")
```

Or the most common way, to return 0:

```
d = defaultdict(lambda:0)
or
d = defaultdict(int)
```

In [4]:
from collections import defaultdict

def del_if_zero(dict, char):
  if dict[char] == 0:
    del dict[char]

def anagram_indices(word, s):
  result = []

  freq = defaultdict(int)
  for char in word:
    freq[char] +=1
  
  for char in s[:len(word)]:
    freq[char] -= 1
    del_if_zero(freq, char)

  if not freq:
    result.append(0)

  for i in range(len(word), len(s)):
    start_char, end_char = s[i-len(word)], s[i]
    freq[start_char] += 1
    print(i, freq)
    del_if_zero(freq, start_char)

    freq[end_char] -= 1
    print(i, freq)
    del_if_zero(freq, end_char)

    if not freq:
      beginning_index = i - len(word) + 1
      result.append(beginning_index)
  
  return result



In [5]:
w = 'ab'
s = 'abxaba'

print(anagram_indices(w,s))

2 defaultdict(<class 'int'>, {'a': 1})
2 defaultdict(<class 'int'>, {'a': 1, 'x': -1})
3 defaultdict(<class 'int'>, {'a': 1, 'x': -1, 'b': 1})
3 defaultdict(<class 'int'>, {'a': 0, 'x': -1, 'b': 1})
4 defaultdict(<class 'int'>, {'x': 0, 'b': 1})
4 defaultdict(<class 'int'>, {'b': 0})
5 defaultdict(<class 'int'>, {'a': 1})
5 defaultdict(<class 'int'>, {'a': 0})
[0, 3, 4]


## Step through
i freq

2 defaultdict(<class 'int'>, {'a': 1}). 
2 defaultdict(<class 'int'>, {'a': 1, 'x': -1}). 

3 defaultdict(<class 'int'>, {'a': 1, 'x': -1, 'b': 1})\
3 defaultdict(<class 'int'>, {'a': 0, 'x': -1, 'b': 1})\
After this 'a' will be deleted

4 defaultdict(<class 'int'>, {'x': 0, 'b': 1})\
We see x again, now as the starting char.  Will be deleted after this pass.
4 defaultdict(<class 'int'>, {'b': 0}\
Now 'b' will be deleted and freq is empty.  We know that we have seen our anagram. Update our indices with 4-len(word)+1=3

5 defaultdict(<class 'int'>, {'a': 1})\
5 defaultdict(<class 'int'>, {'a': 0})\

### ?? How does it work when we don't compare it to our word ab??

In [7]:
w = 'ab'
s1 = 'abxcbc'
print(anagram_indices(w,s))

2 defaultdict(<class 'int'>, {'a': 1})
2 defaultdict(<class 'int'>, {'a': 1, 'x': -1})
3 defaultdict(<class 'int'>, {'a': 1, 'x': -1, 'b': 1})
3 defaultdict(<class 'int'>, {'a': 0, 'x': -1, 'b': 1})
4 defaultdict(<class 'int'>, {'x': 0, 'b': 1})
4 defaultdict(<class 'int'>, {'b': 0})
5 defaultdict(<class 'int'>, {'a': 1})
5 defaultdict(<class 'int'>, {'a': 0})
[0, 3, 4]


As you can see we tricked the solution to return the same result.  But cbc is not the anagram we are looking for.
So did we just discover a bug in the solution?  How is it possible that we can just slide through windows without comparing it to the word we are supposed to find?