In [5]:
import pandas as pd
import os
import numpy as np

Highest ROI activity: Leetcode problems, study problems and solutions and try to do it on your own

- break up by algorithm / data structure type and interview problem

Trying HackerRank

# Tips

## My own tips

What I learned:
- Go slow. Have a good understanding of the problem first.
- My logic is more efficient if I actually write it out in English
- I test with a few cases, also in plain English
- Take a break if stuck
- Review a solution as a last resort
- Be mindful what kind of pre-processing is already done
    - like string to a list
- Determine if a print or return is needed
- Determine if unique values or all values are a factor

## HR tips

from [here](https://www.hackerrank.com/interview/interview-preparation-kit/tips-and-guidelines/videos)

**Steps for solving algorithms** [video](https://www.youtube.com/watch?v=GKgAVjJxh9w&feature=emb_title)

1. Listen/read problem carefully. When solving, you might not need every detail of the problem, but a detail might be helpful to solve it optimally.
2. Use a good example, one that is big enough and has no special cases.
3. Write a brute force algorithm. Better than nothing at all, checks for understanding, a starting point for optimization.
4. OPTIMIZE. A good chunk of time in interview could be spent here.
5. Walk through your algorithm. Know exactly what you're going to do before coding.
6. Code. Whiteboard: use space wisely. Computer: use consistent coding style, descriptive variable names (then ask if I can abbreviate). Modularize code upfront. Write overall function that wraps smaller functions. Any conceptual chunks can be pushed off into other functions.
7. Test. Don't use big example from step 2 right away. Start by analyzing each line of code. Then start with smaller test cases first, followed by edge cases, and big cases if there's time. Don't panic if there's a bug. Just talk about why that happens.

**3 algorithm strategies** [video](https://www.youtube.com/watch?v=84UYVCluClQ&feature=emb_title)

1. BUD
    - Bottlenecks: How could you improve the slowest step?
    - Unnecessary work: Examine how you can use work already performed (e.g. determine if you can get a missing variable with pieces you already have)
    - Duplicated work: How can you reduce repeated steps?
    
2. Space/Time tradeoffs
    - often it means using a different data structure, **often a hash table**
    - you're using more space, to get improved time complexity
    
3. D.I.Y.
    - think about how you're thinking of solving **a big example problem that's reasonably generic** without code
    - then reverse engineer that into code

# Sock Merchant (8/5/20)

HackerRank practice
[Sock Merchant](https://www.hackerrank.com/challenges/sock-merchant/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=warmup)


## Using a dictionary

In [2]:
def sockMerchant(n, ar):
    n_pairs = 0
    sock_dict = {}

    for i in range(len(ar)):
        if ar[i] not in sock_dict:
            sock_dict[ar[i]] = 1
        else:
            n_pairs += 1
            # remove dictionary for that value
            del sock_dict[ar[i]]
        print(sock_dict)

    return n_pairs

In [3]:
my_n = 9
my_ar = [10, 20, 20, 10, 10, 30, 50, 10, 20]
sockMerchant(my_n, my_ar)

{10: 1}
{10: 1, 20: 1}
{10: 1}
{}
{10: 1}
{10: 1, 30: 1}
{10: 1, 30: 1, 50: 1}
{30: 1, 50: 1}
{30: 1, 50: 1, 20: 1}


3

## Using a set

Use set as a way to hold unordered and unindexed unique items.

In [4]:
def sockMerchant_v2(n, ar):
    n_pairs = 0
    sock_set = set()
    for i in range(n):
        if ar[i] not in sock_set:
            sock_set.add(ar[i])
        else:
            n_pairs +=1
            # remove element from the set
            sock_set.remove(ar[i])
        #print(sock_set)

    return n_pairs


In [5]:
my_n = 9
my_ar = [10, 20, 20, 10, 10, 30, 50, 10, 20]
sockMerchant_v2(my_n, my_ar)

3

# Fruit subsets problem (8/5/20)

Given a list of fruit, generate all subsets.

Input:
fruit_list = ['apple', 'orange', 'banana']

Output:
[[],
 ['apple'],
 ['orange'],
 ['apple', 'orange'],
 ['banana'],
 ['apple', 'banana'],
 ['orange', 'banana'],
 ['apple', 'orange', 'banana']]

In [6]:
def fruit_subsets(fruit_list):
    output = [[]]
    for fruit in fruit_list:        
        output += [curr + [fruit] for curr in output]
    return output

In [7]:
my_fruit_list = ['apple', 'orange', 'banana']
fruit_subsets(my_fruit_list)

[[],
 ['apple'],
 ['orange'],
 ['apple', 'orange'],
 ['banana'],
 ['apple', 'banana'],
 ['orange', 'banana'],
 ['apple', 'orange', 'banana']]

In [8]:
def fruit_subsets_v1(fruit_list):
    output = [[]]
    for fruit in fruit_list:    
        for curr in output:
            output += [curr + [fruit]]
    return output

In [None]:
my_fruit_list = ['apple', 'orange', 'banana']
fruit_subsets_v1(my_fruit_list)

In [None]:
def fruit_subsets_v2(fruit_list):
    output = [[]]
    for fruit in fruit_list:    
        for curr in output:
            output += [curr + [fruit]]
    return output

In [None]:
my_fruit_list = ['apple', 'orange', 'banana']
fruit_subsets_v2(my_fruit_list)

In [31]:
# Lufan's solution: need to use copy

def subset(fruit):
    res = [[]]
    for i in fruit:
        for j in range(len(res)):
            cur = res[j].copy()
            cur.append(i)
            res.append(cur)
    return res


In [23]:
subset(my_fruit_list)

[[],
 ['apple'],
 ['orange'],
 ['apple', 'orange'],
 ['banana'],
 ['apple', 'banana'],
 ['orange', 'banana'],
 ['apple', 'orange', 'banana']]

# HR Counting Valleys (8/6/20)

[HackerRank](https://www.hackerrank.com/challenges/counting-valleys/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=warmup)
    

Strategy:
- Keep track of net values by setting sea level as zero in `res`
- When net goes negative, then be prepared to count the valley (set `tracker` to True). Otherwise ignore it.
- If net goes from a negative to zero, add to the counter and reset the `tracker`
- He only goes up or down, not lateral.

In [5]:
def countingValleys(n, s):
    res = 0
    net = 0
    net_value_tracker = False
    for step in s:
        if step == 'U':
            net += 1
        else:
            net -= 1
            
        if (net_value_tracker) & (net == 0):
            res += 1
            net_value_tracker = False
            
        if net < 0:
            net_value_tracker = True
        else:
            net_value_tracker = False
        
        print(res, net, net_value_tracker)
        
    return res

In [6]:
my_n = 8
my_s = 'UDDDUDUU'

countingValleys(my_n, my_s)

0 1 False
0 0 False
0 -1 True
0 -2 True
0 -1 True
0 -2 True
0 -1 True
1 0 False


1

In [7]:
my_n = 12
my_s = 'DDUUDDUDUUUD'

countingValleys(my_n, my_s)

0 -1 True
0 -2 True
0 -1 True
1 0 False
1 -1 True
1 -2 True
1 -1 True
1 -2 True
1 -1 True
2 0 False
2 1 False
2 0 False


2

# HR Jumping on the Clouds (8/7/20)

[HackerRank](https://www.hackerrank.com/challenges/jumping-on-the-clouds/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=warmup)

1. Don't actually iterate, but look ahead and jump accordingly.
2. If one can jump 2 ahead, do that, otherwise, move one ahead.
3. The last cloud is always a zero. You therefore know the answer before you actually reach the last cloud. Last position you have to go to depends on whether second-to-last index is a 0 or 1.

In [77]:
def jumpingOnClouds(c):
    
    # Case 1
    #  0,1,2,3,4
    # [0,0,1,0,0]
    # 3 jumps
    
    # Case 2
    #  0,1,2,3,4
    # [0,0,0,1,0]
    # 2 jumps
    
    # Case 3
    #  0,1,2
    # [0,1,0]
    # 1 jump
    
    el = 0
    jumps = 0
    while el+2 < len(c):
        
        if c[el+2] == 0:
            el += 2
        else:
            el += 1
        jumps += 1
    
    # If there's a zero on second-to-last element, return jumps+1
    if el == len(c)-2:
        return jumps + 1
    
    # Otherwise (e.g. if there's a one on second-to-last element) return jumps
    else:
        return jumps
    

In [78]:
my_c = [0, 0, 1, 0, 0, 1, 0]
jumpingOnClouds(my_c)

4

In [79]:
my_c = [0, 0, 0, 0, 1, 0]
jumpingOnClouds(my_c)

3

In [80]:
my_c = [0, 0, 0, 1, 0, 0]
jumpingOnClouds(my_c)

3

In [81]:
my_c = [0, 1, 0]
jumpingOnClouds(my_c)

1

In [82]:
my_c = [0]
jumpingOnClouds(my_c)

0

# HR Repeated String

[HackerRank](https://www.hackerrank.com/challenges/repeated-string/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=warmup&h_r=next-challenge&h_v=zen)

Strategy 1:
1. Print out the repeats
2. Count the a's


Strategy 2:
1. Use the number of a's in the original string, division, and modulo to determine the result






Example: 'aba', 10
aba aba aba aba aba aba abaaba
            *
result is 7

In [89]:
def repeatedString(s, n):
    
    # Count number of a's in original string
    num_a = 0
    for i in s:
        if i == 'a':
            num_a += 1

    # Determine the number of a's using floor division from the total length
    n_strings = n // len(s)
    total_As = num_a * n_strings

    # Use modulo
    rem = n % len(s)
    for i in s[0:rem]:
        if i == 'a':
            total_As += 1

    return total_As

In [90]:
my_s = 'aba'
my_n = 10

repeatedString(my_s, my_n)

7

In [91]:
my_s = 'a'
my_n = 1000000000000

repeatedString(my_s, my_n)

1000000000000

## With regular expression

In [86]:
import re
re.findall('a', my_s)

['a', 'a']

# HR Ransom Note

[HR](https://www.hackerrank.com/challenges/ctci-ransom-note/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=dictionaries-hashmaps)

1. Easy check is to see if the number of words in magazine is less than note, which is an automatic no.
2. If it's equal or greater, than one can iterate through the note and then check off that the magazine contains it.
3. Brute force strategy is to iterate through both again.
4. Another strategy is that passing through the magazine list once should leave a more efficient solution.



In [102]:
def checkMagazine(magazine, note):
    
    mag_words = magazine.split()
    note_words = note.split()
    
    # Check if number of words in magazine is less than note. Automatic no.
    if len(mag_words) < len(note_words):
        return 'No'
    
    el_note = 0 
    while el_note <= len(note_words)-1:
        print(note_words[el_note])
        if note_words[el_note] not in mag_words:
            return 'No'
        el_note += 1
    return 'Yes'
    

In [103]:
my_magazine = 'give me one grand today night'
my_note = 'give one grand today'

checkMagazine(my_magazine, my_note)

give
one
grand
today


'Yes'

In [104]:
my_magazine = 'ive got a lovely bunch of coconuts'
my_note = 'ive got some coconuts'

checkMagazine(my_magazine, my_note)

ive
got
some


'No'

**Time complexity**

`x in s` for a list is O(n) time complexity on average so overall algorithm is O(n^2)

`x in d` for a dictionary could be O(1) time complexity on average so overall algorithm could be O(n)

## Faster solution

In [105]:
def checkMagazine_v1(magazine, note):
    
    mag_words = set(magazine.split())
    note_words = note.split()
    
    # Check if number of words in magazine is less than note. Automatic no.
    if len(mag_words) < len(note_words):
        return 'No'
    
    el_note = 0 
    while el_note <= len(note_words)-1:
        print(note_words[el_note])
        if note_words[el_note] not in mag_words:
            return 'No'
        el_note += 1
    return 'Yes'
    

In [107]:
set(my_magazine.split())

{'a', 'bunch', 'coconuts', 'got', 'ive', 'lovely', 'of'}

In [106]:
my_magazine = 'ive got a lovely bunch of coconuts'
my_note = 'ive got some coconuts'

checkMagazine_v1(my_magazine, my_note)

ive
got
some


'No'

## Edit for HackerRank - a split of string to list is already performed


In [121]:
def checkMagazine_v2(magazine, note):
    
    mag_words = set(magazine)
    
    # Check if number of words in magazine is less than note. Automatic no.
#     if len(mag_words) < len(note):
#         print('No')
    
    el_note = 0 
    while el_note <= len(note)-1:
        #print(note[el_note])
        if note[el_note] not in mag_words:
            return print('No')
        else:
            mag_words.remove(note[el_note])
        el_note += 1
    return print('Yes')
    

In [122]:
my_magazine = ('ive got a lovely bunch of coconuts').split()
my_note = ('ive got some coconuts').split()

checkMagazine_v2(my_magazine, my_note)

No


In [123]:
my_magazine = ('two times three is not four').split()
my_note = ('two times two is four').split()

checkMagazine_v2(my_magazine, my_note)

No


**Word can be used only once**

## Edit with HR strategies

1. Listen/read problem carefully. When solving, you might not need every detail of the problem, but a detail might be helpful to solve it optimally.
2. Use a good example, one that is big enough and has no special cases.
3. Write a brute force algorithm. Better than nothing at all, checks for understanding, a starting point for optimization.
4. OPTIMIZE. A good chunk of time in interview could be spent here.
5. Walk through your algorithm. Know exactly what you're going to do before coding.
6. Code. Whiteboard: use space wisely. Computer: use consistent coding style, descriptive variable names (then ask if I can abbreviate). Modularize code upfront. Write overall function that wraps smaller functions. Any conceptual chunks can be pushed off into other functions.
7. Test. Don't use big example from step 2 right away. Start by analyzing each line of code. Then start with smaller test cases first, followed by edge cases, and big cases if there's time. Don't panic if there's a bug. Just talk about why that happens.

**Revised strategy**
- Go through each word of the note.
- Cut out the word out of the magazine. (Therefore, can't use a set).
- Keep repeating.

Time complexity is O(n^2) because:
- iteration through words of the note is O(n)

Then these will be executed
- checking item `in` list is O(n)
- .remove method of a list is O(n)


In [134]:
def checkMagazine_v3(magazine, note):
    
    el_note = 0 
    while el_note <= len(note)-1:
        #print(note[el_note])
        if note[el_note] not in magazine:
            return print('No')
        else:
            magazine.remove(note[el_note])
        el_note += 1
    return print('Yes')
    

In [135]:
my_magazine = ('give me one grand today night').split()
my_note = ('give one grand today').split()

checkMagazine_v3(my_magazine, my_note)

Yes


In [130]:
my_magazine = ('ive got a lovely bunch of coconuts').split()
my_note = ('ive got some coconuts').split()

checkMagazine_v3(my_magazine, my_note)

No


In [131]:
my_magazine = ('two times three is not four').split()
my_note = ('two times two is four').split()

checkMagazine_v3(my_magazine, my_note)

No


In [137]:
my_magazine = ('give me one one grand today night').split()
my_note = ('give one one grand today').split()

checkMagazine_v3(my_magazine, my_note)

Yes


**This passes**

## Edit with HR strategies - optimize with a hash

For each magazine word, create a dictionary with the count of the word. Use the note and decrement the value of the magazine count dictionary. If it's zero, then the note cannot be written.

It is O(n) to create the magazine dictionary and O(n) to iterate through the note. Therefore, overall time complexity is O(n).

I am increasing my space complexity to create the dictionary, but improving my time complexity.

In [143]:
def checkMagazine_v4(magazine, note):
    
    # Create the magazine dictionary
    mag_dict = dict()
    for mag_word in magazine:
        if mag_word not in mag_dict:
            mag_dict[mag_word] = 1
        else:
            mag_dict[mag_word] += 1
    
    # Iterate through the note and check against the magazine dictionary
    
    for note_word in note:
        
        # If the word isn't in the magazine dictionary at all
        if note_word not in mag_dict:
            return print('No')
        
        # If the word was there but the number of instances has been exhausted
        # Bug I had to fix was that the prior boolean and can't be executed
        if mag_dict[note_word] == 0:
            return print('No')
        
        # Decrement a word in the dictionary
        else:
            mag_dict[note_word] -= 1
        
    # If it goes through the whole note without executing a 'No',
    # then that means the whole note can be written
    return print('Yes')

In [144]:
my_magazine = ('give me one grand today night').split()
my_note = ('give one grand today').split()

checkMagazine_v4(my_magazine, my_note)

Yes


In [145]:
my_magazine = ('ive got a lovely bunch of coconuts').split()
my_note = ('ive got some coconuts').split()

checkMagazine_v4(my_magazine, my_note)

No


In [146]:
my_magazine = ('two times three is not four').split()
my_note = ('two times two is four').split()

checkMagazine_v4(my_magazine, my_note)

No


In [147]:
my_magazine = ('give me one one grand today night').split()
my_note = ('give one one grand today').split()

checkMagazine_v4(my_magazine, my_note)

Yes


In [150]:
# limit on number of words
my_magazine = ('give me one grand today night').split()
my_note = ('give one one grand today').split()

checkMagazine_v4(my_magazine, my_note)

No


# HR Two Strings

[HR](https://www.hackerrank.com/challenges/two-strings/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=dictionaries-hashmaps&h_r=next-challenge&h_v=zen)

1. Understand the problem in detail. There are multiple pairs to evaluate.
2. Look at example. Understand input and output. The only return value is if there is **a** substring, which means just a common letter. That's it. **I iniitally thought I'd have to come up with multiple substrings, which would have led me down the wrong path.**
3. Brute force solution is to iterate through each letter in the smaller word, and check it is in the larger word. Return 'YES' once that condition is True. If the iteration makes it through the smaller word without the condition being met, then return 'NO'. This is O(n^2) time complexity.
4. Optimize. Since it only matters that a common letter is identified, only unique values between both strings matter. I can turn each string into a set, which is O(n) but in separate loops. Then iterating through the set of the smaller string is O(n) and checking a set is O(1), so overall time complexity is O(n).

Example input:

2
hello
world
hi
world

In [153]:
for i in set('word'):
    print(i)

d
o
w
r


In [154]:
len(set('word'))

4

In [162]:
def twoStrings(s1, s2):
    
    # Turn each string into a set
    s1_set = set(s1)
    s2_set = set(s2)
    
    # Identify the shorter set
    if len(s1_set) <= len(s2_set):
        short, long = s1_set, s2_set
    else:
        long, short = s1_set, s2_set
        
    # Iterate through shorter set, and check if it's in longer
    for i in short:
        if i in long:
            return 'YES'
        
    # If it goes completely through the shorter set, then there's no common substring
    return 'NO'

In [163]:
my_s1 = 'hello'
my_s2 = 'world'
twoStrings(my_s1, my_s2)

'YES'

In [164]:
my_s1 = 'hi'
my_s2 = 'world'
twoStrings(my_s1, my_s2)

'NO'

# HR Sherlock and Anagrams

[HR](https://www.hackerrank.com/challenges/sherlock-and-anagrams/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=dictionaries-hashmaps&h_r=next-challenge&h_v=zen&h_r=next-challenge&h_v=zen)

1. Read/listen in detail. (A detail might be used for optimization). Understand input/output.
2. Have a good example. Something big enough without a special case.
3. Brute force. Good to get something, check for understanding, and then build off for optimization.
4. Optimize.
5. Walk through algorithm. Know exactly what you're doing before coding.
6. Code. Actually make sure code reflects what you want it to do. Modularize if possible.
7. Test. Use small examples first. Make sure you know what each line is doing. Finding a bug is a good opportunity to show off more of what you know!

1. The output is returning the number of possible anagrams. Definition of anagram is that letter can be re-arranged (not palindrome). Repeated letter counts.
2. See examples.
3. Brute force is to iterate through the string. Keep track of different lengths and keep searching for a match.
4. Optimize. (DIY method).
    - Look at indexes of repeated letters. Repeated letters are required for an anagram to happen.
    - For each letter, keep track of the indexes.
    - If a letter has more than 1 index, then use that as the basis to check for anagrams.
    - Example: 'ifailuhkqq', `letter_dict = {'i':[0, 3], 'f':[1]}` etc.
    - Example: 'cdcd', `letter_dict = {'c':[0, 2], 'd':[1, 3]}` etc.
    - Anagrams can happen in the indexes between a one repeating letter. If there are multiple repeating letters, it scales quickly.
    - Use the dictionary values.
5. Walk through algorithm.
6. Code

In [None]:
# Brute force solution was already getting hairy
# def sherlockAndAnagrams(s):
#     length_str = len(s)
#     length = 1
#     while length < length_str:
#         for i in range(len(s)):
#             ss2check = i[0:length]
#     return res

In [167]:
# Brute force solution was already getting hairy
def sherlockAndAnagrams(s):
    letter_d = dict()
    for i in range(len(s)):
        letter = s[i]
        if s[i] not in letter_d:
            letter_d[letter] = [i]
        else:
            letter_d[letter].append(i)

    
    return letter_d
    #return res

In [168]:
my_s = 'abba'
sherlockAndAnagrams(my_s)

{'a': [0, 3], 'b': [1, 2]}

In [173]:
(15 % 3 == 0) & (15 % 5 == 0)

True

# HR Strings: Making Anagrams

[HR](https://www.hackerrank.com/challenges/ctci-making-anagrams/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=strings)


1. Listen/read problem carefully. When solving, you might not need every detail of the problem, but a detail might be helpful to solve it optimally.
    - deletions can be in either string
    - not the same length
    - return the minimum number of deletions
    - guaranteed solution?
    - edge cases: already an anagram (0); not possible for an anagram?
2. Use a good example, one that is big enough and has no special cases.
3. Write a brute force algorithm. Better than nothing at all, checks for understanding, a starting point for optimization.
    - loop through string a, and check if it's in string b; then do the same the other way around
    - O(n^2) twice
4. OPTIMIZE. A good chunk of time in interview could be spent here.
    - create dictionaries of both, O(n) time
    - loop through the keys of dict_a, and decrement value of either dict_a or dict_b until it matches
    - keep track of each decrement which represents number of deletions

5. Walk through your algorithm. Know exactly what you're going to do before coding.

6. Code. Whiteboard: use space wisely. Computer: use consistent coding style, descriptive variable names (then ask if I can abbreviate). Modularize code upfront. Write overall function that wraps smaller functions. Any conceptual chunks can be pushed off into other functions.

7. Test. Don't use big example from step 2 right away. Start by analyzing each line of code. Then start with smaller test cases first, followed by edge cases, and big cases if there's time. Don't panic if there's a bug. Just talk about why that happens.

**Go through line by line**

In [1]:
abs(1-2)

1

In [11]:
def makeAnagram(a, b):
    
    # Make dictionaries
    a_dict = {}
    b_dict = {}
    
    for i in a:
        if i not in a_dict:
            a_dict[i] = 1
        else:
            a_dict[i] += 1
            
    for j in b:
        if j not in b_dict:
            b_dict[j] = 1
        else:
            b_dict[j] += 1
    
    # res is the number of deletions
    res = 0
    
    # Loop through a string
    for key_a in a_dict:
        
        # Delete the value for that letter that's missing in the b_dict
        if key_a not in b_dict:
            res += a_dict[key_a]
        
        # Take the absolute difference since it takes care of deletions that would happen in both
        else:
            res += abs(a_dict[key_a] - b_dict[key_a])
    
    # We only care about what's missing from the a_dict that's in b_dict here
    for key_b in b_dict:
        if key_b not in a_dict:
            res += b_dict[key_b]
            
    return res        

In [12]:
my_a = 'cde'
my_b = 'abc'

makeAnagram(my_a, my_b)

4

In [13]:
my_a = 'fcrxzwscanmligyxyvym'
my_b = 'jxwtrhvujlmrpdoqbisbwhmgpmeoke'

makeAnagram(my_a, my_b)

30

## Using collections to convert list into dictionary of counts

In [5]:
from collections import Counter
a = [10, 10, 10, 20, 20, 20, 30]
c = Counter(a)
dict(c)

{10: 3, 20: 3, 30: 1}

# Writing a file

In [16]:
os.getcwd()

'/Users/lacar/Documents/Data_science/Jupyter_notebooks/_my_DS_notes/python_ds_and_algorithms'

In [19]:
os.listdir()

['ds_and_algo_practice_4.ipynb',
 'jenny_20_Useful_Python_Scripts.ipynb',
 'pythonds_6_sorting_and_searching.ipynb',
 'pythonds_123_analysis.ipynb',
 'git_notes.ipynb',
 '.DS_Store',
 'ds_and_algo_practice_2.ipynb',
 'udemy_ds_and_algorithms.ipynb',
 'baseFilename.txt',
 'ds_and_algo_practice_1.ipynb',
 'ds_and_algo_practice_3.ipynb',
 'pythonds_7_trees_and_tree_algorithms.ipynb',
 'pythonds_4_data_structures.ipynb',
 'ds_and_algo_practice_og.ipynb',
 'drchuck_oop_and_classes.ipynb',
 'pythonds_5_recursion.ipynb',
 '.ipynb_checkpoints',
 'pythonds_figs',
 'spaced_study_schedule.ipynb',
 'drchuck_regular_exp.ipynb',
 'python_and_jupyter_setup.ipynb']

## Reading a file

In [32]:
ext

'c'

In [44]:
with open('baseFilename.txt') as fh:
    ext_seen = set()
    for line in fh:
        #print(line.rstrip())
        ext = line.rstrip().split('.')[1]
        #print(ext)
        
        if ext not in ext_seen:
            ext_seen.add(ext)
            #print('set of ', ext_seen)
        
            if line.rstrip().endswith('.' + ext):
                filename = ext + "_file00.txt"
                #print(filename)
                f = open(filename, "a")
                f.write(line)
                f.close()
                
        else:
            if line.rstrip().endswith('.' + ext):
                filename = ext + "_file00.txt"
                #print(filename)
                f = open(filename, "a")
                f.write(line)
                f.close()
            

first.c
c
set of  {'c'}
c_file00.txt
first.cpp
cpp
set of  {'c', 'cpp'}
cpp_file00.txt
first.cs
cs
set of  {'c', 'cpp', 'cs'}
cs_file00.txt
second.c
c
c_file00.txt


In [37]:
ext_seen

{'c'}

In [38]:
ext_seen.add('cs')

## Writing a file

In [None]:
f = open("demofile2.txt", "a")
f.write("Now the file has more content!")
f.close()

# HR 



1. Read/listen in detail. (A detail might be used for optimization). Understand input/output.
2. Have a good example. Something big enough without a special case.
3. Brute force. Good to get something, check for understanding, and then build off for optimization.
4. Optimize.
5. Walk through algorithm. Know exactly what you're doing before coding.
6. Code. Actually make sure code reflects what you want it to do. Modularize if possible.
7. Test. Use small examples first. Make sure you know what each line is doing. Finding a bug is a good opportunity to show off more of what you know!

# QotD 8/17/20

You are given an array of length n + 1 whose elements belong to the set {1, 2, ..., n}. By the pigeonhole principle, there must be a duplicate. Find it in linear time and space.

In [10]:
my_list = [1,2,3,3,4,5]
for i in my_list:
    val = my_list.pop()
    if val == my_list[-1]:
        print(val)
    

3


In [7]:
print(val)

4


In [5]:
my_list

[1, 2, 3, 3, 4]

# LC . Robot Return to Origin

There is a robot starting at position (0, 0), the origin, on a 2D plane. Given a sequence of its moves, judge if this robot ends up at (0, 0) after it completes its moves.

The move sequence is represented by a string, and the character moves[i] represents its ith move. Valid moves are R (right), L (left), U (up), and D (down). If the robot returns to the origin after it finishes all of its moves, return true. Otherwise, return false.

Note: The way that the robot is "facing" is irrelevant. "R" will always make the robot move to the right once, "L" will always make it move left, etc. Also, assume that the magnitude of the robot's movement is the same for each move.

Example 1:

Input: "UD"
Output: true 
Explanation: The robot moves up once, and then down once. All moves have the same magnitude, so it ended up at the origin where it started. Therefore, we return true.

The idea is that every 'U' gets cancelled by a 'D' and same with 'L' and 'R'. Iterate through string. Use a dictionary to keep track of the counts for each letter. The number of 'U' and 'D' should be equal as should the number of 'L' and 'R'.

In [1]:
def judgeCircle(moves):
    
    # Simple case to check. An odd number of moves can 
    # never be back to origin.
    if len(moves) % 2 !=0:
        return False
    
    dir_counts = dict()
    # Initialize dictionary so that all start at 0
    dir_counts['U'] = dir_counts['D'] = dir_counts['L'] = dir_counts['R'] = 0
    
    for i in moves:
        dir_counts[i] += 1
        
    if (dir_counts['U'] == dir_counts['D']) & (dir_counts['L'] == dir_counts['R']):
        return True
    else:
        return False

In [2]:
judgeCircle('UD')

True

In [3]:
judgeCircle('LL')

False

# Insight QotD (9/1/20)

Determine whether there exists a one-to-one character mapping from one string s1 to another s2.
For example, given s1 = abc and s2 = bcd, return true since we can map a to b, b to c, and c to d.
Given s1 = foo and s2 = bar, return false since the o cannot map to two characters.


In [59]:
def check_mapping(s1, s2):
    if len(s1) != len(s2):
        return False
    
    # Use a dictionary where key is each letter in s1, value is letter in s2
    string_dict = dict()
    
    for i in range(len(s1)):
        if s1[i] in string_dict:
            if string_dict[s1[i]] != s2[i]:
                return False
            else:
                continue
        if s2[i] in string_dict.values():
                return False
        else:
            string_dict[s1[i]] = s2[i]
            
    return True

In [60]:
check_mapping('abc', 'bcd')

True

In [61]:
check_mapping('abc', 'bcd')

True

In [62]:
check_mapping('foo', 'bar')

False

In [63]:
check_mapping('bar', 'foo')

False

In [64]:
check_mapping('bar', 'bar')

True

In [65]:
check_mapping('foo', 'foo')

True

# QotD (9/9/20)

Simulation of dog problem

# InsightMentoring (10/21/20)

In [2]:
my_string = 'CGTAGCTGTGTGTACAAGGCCCGGGAACGTATTCACCGT'
my_k = 5


In [3]:
def sortedSubStr(freq, k):
    dic = {}
    for i in range(len(freq) - k + 1):
        substr = freq[i : i + k]
        if substr in dic:
            dic[substr] += 1
        else:
            dic[substr] = 1
    print(dic)
    res = sorted(dic, key=dic.get, reverse=True)
    return res



In [4]:
sortedSubStr(my_string, my_k)

{'CGTAG': 1, 'GTAGC': 1, 'TAGCT': 1, 'AGCTG': 1, 'GCTGT': 1, 'CTGTG': 1, 'TGTGT': 2, 'GTGTG': 1, 'GTGTA': 1, 'TGTAC': 1, 'GTACA': 1, 'TACAA': 1, 'ACAAG': 1, 'CAAGG': 1, 'AAGGC': 1, 'AGGCC': 1, 'GGCCC': 1, 'GCCCG': 1, 'CCCGG': 1, 'CCGGG': 1, 'CGGGA': 1, 'GGGAA': 1, 'GGAAC': 1, 'GAACG': 1, 'AACGT': 1, 'ACGTA': 1, 'CGTAT': 1, 'GTATT': 1, 'TATTC': 1, 'ATTCA': 1, 'TTCAC': 1, 'TCACC': 1, 'CACCG': 1, 'ACCGT': 1}


['TGTGT',
 'CGTAG',
 'GTAGC',
 'TAGCT',
 'AGCTG',
 'GCTGT',
 'CTGTG',
 'GTGTG',
 'GTGTA',
 'TGTAC',
 'GTACA',
 'TACAA',
 'ACAAG',
 'CAAGG',
 'AAGGC',
 'AGGCC',
 'GGCCC',
 'GCCCG',
 'CCCGG',
 'CCGGG',
 'CGGGA',
 'GGGAA',
 'GGAAC',
 'GAACG',
 'AACGT',
 'ACGTA',
 'CGTAT',
 'GTATT',
 'TATTC',
 'ATTCA',
 'TTCAC',
 'TCACC',
 'CACCG',
 'ACCGT']

In [7]:
def substring_frequency_slicing(read2count, k):
#def substring_frequency_slicing(f, k):
    
    # Initialize dictionary that will keep track of substring frequency
    substring_dict = {}
    
    for i in range(0, len(read2count)-k+1):

        # Extract substrings from that index to that index+k 
        substr = read2count[i:i+k]

        # If it is not already in the dictionary, add it and assign a value of 1
        if substr not in substring_dict:
            substring_dict[substr] = 1

        # If it is in the dictionary, then iterate the count by 1
        else:
            substring_dict[substr] += 1
        
    sorted_dict = sorted(substring_dict.items(), key=lambda item: item[1], reverse=True)
    return sorted_dict

In [8]:
substring_frequency_slicing(my_string, my_k)

[('TGTGT', 2),
 ('CGTAG', 1),
 ('GTAGC', 1),
 ('TAGCT', 1),
 ('AGCTG', 1),
 ('GCTGT', 1),
 ('CTGTG', 1),
 ('GTGTG', 1),
 ('GTGTA', 1),
 ('TGTAC', 1),
 ('GTACA', 1),
 ('TACAA', 1),
 ('ACAAG', 1),
 ('CAAGG', 1),
 ('AAGGC', 1),
 ('AGGCC', 1),
 ('GGCCC', 1),
 ('GCCCG', 1),
 ('CCCGG', 1),
 ('CCGGG', 1),
 ('CGGGA', 1),
 ('GGGAA', 1),
 ('GGAAC', 1),
 ('GAACG', 1),
 ('AACGT', 1),
 ('ACGTA', 1),
 ('CGTAT', 1),
 ('GTATT', 1),
 ('TATTC', 1),
 ('ATTCA', 1),
 ('TTCAC', 1),
 ('TCACC', 1),
 ('CACCG', 1),
 ('ACCGT', 1)]

# Python `assert`

Used when debugging code.
[link](https://www.w3schools.com/python/ref_keyword_assert.asp)

In [8]:
assert 3==3

assert 3==5

# Codility, dominator coding task

From [here](https://app.codility.com/programmers/lessons/8-leader/dominator/)

An array A consisting of N integers is given. The dominator of array A is the value that occurs in more than half of the elements of A.

Write a function that, given an array A consisting of N integers, returns index of any element of array A in which the dominator of A occurs. The function should return −1 if array A does not have a dominator.

For example, given array A (shown below), write a function such that the function may return 0, 2, 4, 6 or 7, as explained above.

My notes:

- O(n) time: Loop through and keep track of the number of each element. When it reaches the end, determine if each is greater than half the size of the array.
- I'm thinking of using a dictionary, where each unique encountered element is a key, and the value is a list of each index that element is found.
- 



In [35]:
def solution(A):
    this_dict = dict()
    for i in range(len(A)):
        if A[i] not in this_dict:
            this_dict[A[i]] = [i]
        else:
            this_dict[A[i]].append(i)
            
    dom = -1
    for j in this_dict:
        if len(this_dict[j]) > len(A)/2:
            dom = this_dict[j][0]
    # This is just returning the first element in the 
    # dictionaries list even though it can be any
    return dom
    #return this_dict

In [36]:
my_A = [3, 4, 3, 2, 3, -1, 3, 3]

In [37]:
solution(my_A)

0

In [None]:
solution

In [3]:
x = [0]

In [5]:
type(x)

list

# Fellowship areas of overlap

In [3]:
jnj = 'Immunology, Gastroenterology, Rheumatology, Immuno-dermatology, IL-23 Pathway, Neuroscience, Neurodegenerative Disorders, Multiple Sclerosis, Glutamatergic Pathway Diseases, Schizophrenia, Mood Disorders, Infectious Diseases and Vaccines, Viral Hepatitis & Adjacent Liver Diseases, Prevention & Treatment of Viral & Bacterial Respiratory Infections, Oncology, Pulmonary Hypertension, Pulmonary Arterial Hypertension (and other types of PH), Idiopathic Pulmonary Fibrosis, Adjuvants, Novel Viral Vectors & Vaccine Technologies, Oncology, Solid Tumor Targeted Therapy, Immuno-Oncology, prostate cancer, hematologic malignancies, Cardiovascular and Thrombosis, Retinal Disease, Gene Therapy, Metabolism/NASH, Renal Disease, Cardiovascular, Osteoarthritis, Obesity, Osteoporosis, Glaucoma, Contact Lens, Surgical, Dry eye, lung cancer, Microbiome, Immunosciences, Predictive biomarkers, Behavioural Neurobiology'
jnj_list = jnj.split(',')

In [4]:
jnj_list

['Immunology',
 ' Gastroenterology',
 ' Rheumatology',
 ' Immuno-dermatology',
 ' IL-23 Pathway',
 ' Neuroscience',
 ' Neurodegenerative Disorders',
 ' Multiple Sclerosis',
 ' Glutamatergic Pathway Diseases',
 ' Schizophrenia',
 ' Mood Disorders',
 ' Infectious Diseases and Vaccines',
 ' Viral Hepatitis & Adjacent Liver Diseases',
 ' Prevention & Treatment of Viral & Bacterial Respiratory Infections',
 ' Oncology',
 ' Pulmonary Hypertension',
 ' Pulmonary Arterial Hypertension (and other types of PH)',
 ' Idiopathic Pulmonary Fibrosis',
 ' Adjuvants',
 ' Novel Viral Vectors & Vaccine Technologies',
 ' Oncology',
 ' Solid Tumor Targeted Therapy',
 ' Immuno-Oncology',
 ' prostate cancer',
 ' hematologic malignancies',
 ' Cardiovascular and Thrombosis',
 ' Retinal Disease',
 ' Gene Therapy',
 ' Metabolism/NASH',
 ' Renal Disease',
 ' Cardiovascular',
 ' Osteoarthritis',
 ' Obesity',
 ' Osteoporosis',
 ' Glaucoma',
 ' Contact Lens',
 ' Surgical',
 ' Dry eye',
 ' lung cancer',
 ' Microbiome

# --

## ---