<a href="https://colab.research.google.com/github/bundickm/Warmup_Notebooks/blob/master/Warmup_Strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> *Respect the building blocks, master the fundamentals, and the potential is unlimited* -PJ Ladd

Pre-lecture warm-ups are all about setting you up for success - sometimes that means code challenges, sometimes it means discussing interview questions. Today it means working on fundamentals. 

Read through the lecture notebook with a focus on understanding. At the end of the sections are code challenges for you to practice and reinforce what you read about. After lecture ends, a team lead will post a solution notebook (with explanations) for those exercises. 

Should you finish early, there is additional content at the bottom to explore. If you don't complete the notebook within the hour, don't sweat it, this notebook is meant for you to build up your knowledge and skills - you can always continue working on it after lecture or when you complete the assignment.

# Strings

## Basics

In [0]:
company = 'Acme'

We can verify we are dealing with a string using the `type` function. This is can be helpful when dealing with items we expect to be a certain data type or when an item looks like it should be another data type (Example: `'5' does not equal 5`)

In [0]:
type(company)

str

In [0]:
type('5') == type(5)

False

So what is a string? Strings are a sequence of characters and the individual characters can be referenced with the bracket operator and the characters index number.

In [0]:
company[2]

'm'

You can count the index number from left to right like above, or you can work backwards with negative index numbers.

In [0]:
company[-1]

'e'

Strings are immutable though, meaning you cannot change the characters in the string.

In [0]:
company[2] = 'p'

TypeError: ignored

Since we can't modify the string, what happens when we do change the string or combine two strings? On the surface it looks like we can change alter the string, under the hood though Python is making a new string in a new location. We can verify this by using the `id` function.

In [0]:
demo = 'word'
print("Demo's Location in Memory:", id(demo))

demo += ' 2'
print('Modified Demon:', demo)
print('New location means new string:', id(demo))

Demo's Location in Memory: 140665396700024
Modified Demon: word 2
New location means new string: 140664847193288


Often we need to know how long a string is, this can be done easily with the `len` operator.

In [0]:
len(company)

4

We can loop over a string character-by-character as well.

In [0]:
company = 'Acme'
for char in company:
  print(char)

A
c
m
e


We can also combine strings with the `+` operator.

In [0]:
company = company + ' Corporation'
company

'Acme Corporation'

Or duplicate a string or character quickly

In [0]:
'Wololo ' * 5

'Wololo Wololo Wololo Wololo Wololo '

## Slicing

In [0]:
company = 'Acme Corporation'

We have already seen how we can select a single character by referencing its index number (position in the string).

In [0]:
# Negative index numbers allow us to count from the end to the start
company[-1]

'n'

We can select multiple characters with slice notation. Simply add brackets with colons after the string.

```
'some string'[a:b:c]
```

- The number at `a` tells where to start the slice and the substring will include the character at that index location. If you don't put a value for `a`, it defaults to `0`.
- The number at `b` tells where to end the slice and will not include that character in the substring. If you don't put a value for `b` it defaults to the end of the string.
- The number at `c`, is the step value. A positive number indicates stepping through the string left to right, a negative number steps in reverse order. The size of the number tells us how many characters we step on - `1` means we visit 100% of the characters, `2` we visit every other character, `3` is every third character, etc.

In [0]:
company[::]

'Acme Corporation'

In [0]:
company[:7]

'Acme Co'

In [0]:
company[::2]

'Am oprto'

In [0]:
company[2:-2:1]

'me Corporati'

In [0]:
company[-1:3:-2]

'niaorC'

In [0]:
# Slicing is useful for quite a bit, including easy reversal of a string
company[::-1]

'noitaroproC emcA'

We can also split on on spaces or any other character to easily make a list of substrings.

In [0]:
company.split()

['Acme', 'Corporation']

In [0]:
company.split('o')

['Acme C', 'rp', 'rati', 'n']

## String Comparison

Relational operators work on strings as well, but be careful because there is some behavior that might not expect.

In [0]:
# apple is an apple
'apple' == 'apple'

True

String comparisons are case sensitive. Using the `lower` function covered in the next section we can work around the case sensitivity.

In [0]:
# Apple isn't an apple
'Apple' == 'apple'

False

You also might not expect "aardvark" to be less than "banana", since when we make alphabetical lists "a" is usually highger than "b".

In [0]:
'aardvark' > 'banana'

False

Under the hood Python is converting our string to numbers to make the letter-by-letter comparison. We can even see this is the case using the `ord` function.

In [0]:
# Now we see why 'a' is less than 'b' and how that ordering is done.
print('a:', ord('a'))
print('b:', ord('b'))

a: 97
b: 98


The `is` operator also works on strings.

In [0]:
'apple' is 'apple'

True

In [0]:
'Apple' is 'apple'

False

One last comparison we can make is using the `in` boolean operator. This takes two strings and returns True if the first one is a substring of the second one.

In [0]:
'nesting doll' in 'russian nesting doll'

True

In [0]:
'apple' in 'banana'

False

You can even use the `not` boolean operator for additional readability.

In [0]:
'apple' not in 'banana'

True

## String Formatting

Often when we have strings, we want to print them to the screen in a particular way or combined with other data types. Python makes this easy with a host of string related [functions](https://docs.python.org/2/library/string.html) as well as [f-string formatting.](https://realpython.com/python-f-strings/)

In [0]:
# Convert other datatypes to string with typecasting
str(5)

'5'

In [0]:
# Change the case of a string easily. 
# Often we will cast a string to all one case to make comparisons easier.
print('HELP'.lower())
print('Me'.upper())

help
ME


In [0]:
# Capitalize the first letter of the first word
temp = 'this is a test of the emergency broadcast system. this is a test.'
print(temp.capitalize())

This is a test of the emergency broadcast system. this is a test.


In [0]:
# Capitalize the first letter of each word
print(temp.title())

This Is A Test Of The Emergency Broadcast System. This Is A Test.


f-string formatting allows us to combine variables and strings in an easy and readable way.

In [0]:
name = 'Johnny Appleseed'
amount = 5
item = 'apple seeds'
print(f'{name} has {amount} {item}.')

Johnny Appleseed has 5 apple seeds. 5


f-strings are notated with an `f` before the string. Any variables you want to pass to the string are wrapped in curly braces at the location you want that value to appear at.

f-strings can be stored in a variable of their own but are converted to a normal string, meaning if you change a variable that was passed to the f-string it won't alter the saved f-string.

In [0]:
name = 'Johnny Appleseed'
amount = 5
item = 'apple seeds'
sentence = f'{name} has {amount} {item}.'
print(sentence)

name = 'Jill'
amount = 0
print(sentence)

Johnny Appleseed has 5 apple seeds.
Johnny Appleseed has 5 apple seeds.


## Find

There are times where you will want to know the location of a particular character or substring. This can be done with the `find` function. The function has 3 paremeters, but only the first (the substring to search for) is required. `find` is kind of the opposite of the bracket notation (`[]`).

In [0]:
# find returns the index of the start of the substring, 
# in this case it is where the 'a' is located.
'example'.find('amp')

2

In [0]:
'example'[2]

'a'

In [0]:
# find returns -1 if it can't find the substring
'example'.find('not here')

-1

What happens though when the substring we are searching for occurs mutliple times in the string we are searching through? Unfortunately, `find` will return the location of only the first substring it finds.

In [0]:
'example'.find('e')

0

In [0]:
'example'[0]

'e'

The other 2 parameters hinted at above are `start` and `end`. They tell where `find` should start and stop its search for the substring.

In [0]:
'example'.find('e', 5)

6

In [0]:
'example'.find('p', 0, 4)

-1

## String Practice

### Read the Docs
The documentation for strings can be found [here](https://docs.python.org/3/library/stdtypes.html#string-methods). Use it to complete the following tasks on the string `problem` below.

In [0]:
problem = '  problem 1  \n'

In [0]:
# Remove all leading and trailing whitespace. Correct output is 'problem 1'


In [0]:
# Replace the '1' in `problem` with a '2'. Correct output is '  problem 2  \n'


In [0]:
# How many spaces ' ' are in `problem`?


In [0]:
# Choose another string function and demonstrate it below using `problem`


### Caeser Cypher
A Caeser cypher is a weak form of encryption that "rotates" each character in a string by a given number of places. Rotating means shifting the letter through the alphabet and wrapping around from 'z' to 'a'. For example, if you rotate `m` by 3 you get `p`, if you rotate `z` by 1 you get `a`.

Write a function that takes a string and an integer. Convert the string to all uppercase characters and then rotate it by the given integer amount.

Example:
```python
caeser('dog', 2)
>>> 'FQI'
```

*Hint: If you are stuck google how to convert a character to a number, and a number to a character.*

# Debugging

So far we have looked at how to respond bugs as they crop up, but this reactive approach is not ideal. It means there is a high likelyhood that we end up with code we say is finished but has errors laying in wait for the right problem input or a useful idiot user. How do we take a more proactive approach to bugs? Test cases.

We can write more robust code by purposely thinking about all the ways to cause bugs. We can modify our code to avoid those issues or, at the very least, we can make informed decisions when leaving possible bugs in.

Let's take a look at some common test cases:
1. **Sanity Check** - Testing to make sure everything is working as you expect. This is often the the toy test cases you do on the regular.

2. **Edge Cases** - Test really large and really small values. For a number it could be the maximum or minimum value allowed for it's data type. For an iterable, like a list or string, try an empty one and a really long one. The idea here being that everything in between should work if the extremes work.

3. **Corner Cases** - A corner case is when multiple edge cases are tested at the same time (just like a corner in real life is where multiple edges meet). Is everything still behaving when all the dials are set to the max? Any weird interactions? Often this is where you see performance impacts as multiple really big or really small things cause slow downs you didn't expect.

4. **Incompatible Types** - This is when you send a data type that you shouldn't. While it is a good idea to remove all potential for bugs, we often don't enforce type. This is risky but often safe enough when there is no user interaction. As soon as you rely on user input, you should check for and enforce the correct data type.

By testing our code thorougly we ensure that it holds up in production when we aren't around to fix it the moment it breaks. Robust code also makes for code that is worth reusing. If you write clean, bug free code with reusability in mind, you will start to build up a toolbox of useful code snippets that will save you time and you can slot in with confidence since you know the limits.

## Debug Practice

Below is a function to count unique words in a string. Use the function to practice each of the test case types above.

In [0]:
import string

def count_unique_words(text):
    # Strip punctuation and remove capitalization
    text = text.translate(str.maketrans('', '', string.punctuation)).lower()

    return len(set(text.split()))


### Sanity Check

Give inputs that you would expect to be common.

In [2]:
# Example

count_unique_words('This is a test of a moderately short string.')

8

In [0]:
# Sanity Check 1



In [0]:
# Sanity Check 2



### Edge Cases

Try cases at the extremes of what is expected.

In [8]:
# Example
long_string = 'a couple of test words ' * 99999
count_unique_words(long_string)

5

In [0]:
# Edge Case 1



In [0]:
# Edge Case 2



### Corner Cases

For this function, it only has one parameter so it doesn't have corner cases to test. Write a function with multiple parameters and then test a corner case.

In [0]:
# Your function here



In [0]:
# Your corner case test here



### Incompatible Types

The given function expects a string, pass it a data type it doesn't expect. Rewrite the function to prevent it from crashing when passed a non-string data type and then test the rewritten function.

In [0]:
# Incompatible Type Test



In [0]:
# Rewrite the function here



In [0]:
# Incompatible Type Test on the corrected function



# Further Exploration

### Regex

Regular expession is a bit of a mini-language within Python and other languages that allows for powerful and compact manipulation of strings. Regular Expression is a deep topic that takes time to master, but is extremely useful. [Here](https://www.datacamp.com/community/tutorials/python-regular-expression-tutorial) is a tutorial to walk you through the basics of regex, and below is a tutorial video as well.

## Aditional Resources

[Tutorial Video on Strings](https://www.youtube.com/watch?v=k9TUPpGqYTo)

[Tutorial Video on String Formatting](https://www.youtube.com/watch?v=nghuHvKLhJA)

[Tutorial Video on Regular Expression](https://www.youtube.com/watch?v=K8L6KVGG-7o)

[Regex Tutorial](https://www.datacamp.com/community/tutorials/python-regular-expression-tutorial)