# Strings
Python has an absurd amount of power when dealing with strings.
### What are strings?
Strings are sequences of characters
### So what are characters?
This brings up one of the primary divisions between python 2 and 3. Python 3 handles characters and thus strings more robustly by defaulting characters to be unicode - an industrial strength character computing standard. Strings are bytes in python 2. To declare a string you simply put characters between quotes

In [1]:
# Define a string
my_string = 'my own personal string'

In [2]:
# Define a string with quotes inside. Escape the quote with \
my_string_w_quotes = 'asdfsa\'asdf'
my_string_w_quotes

"asdfsa'asdf"

In [3]:
# both sinlge and double quotes work with python
# a more elegant way to handle quotes inside quotes is to use both double/single quotes
my_string_w_quotes = "my own personal string with an inner quote. it's grand!"
my_string_w_quotes

"my own personal string with an inner quote. it's grand!"

In [4]:
# If you have a bizarre string with both double and single quotes you can go one level up and use triple quotes
my_string_w_2_quote_types = '''My friend said, "I'm only a mediocre pythonista". I got mad! '''
my_string_w_2_quote_types

'My friend said, "I\'m only a mediocre pythonista". I got mad! '

In [5]:
# The interpreter outputted an escaped '

### Print as a function
One other notable change between python 2 and 3 is that print is now a function and not a statement. In the last example the output has an escaped character. To view the string in a prettier format with the outer quotes and escaped characters omitted use the print function

In [6]:
print(my_string_w_2_quote_types)

My friend said, "I'm only a mediocre pythonista". I got mad! 


In [7]:
# Triple quotes can be used as block quotes
block_quote = """
This is
my 
block
quote
y'all
"""

In [8]:
# ugly with newline \n char
block_quote

"\nThis is\nmy \nblock\nquote\ny'all\n"

In [9]:
# pretty
print(block_quote)


This is
my 
block
quote
y'all



### The many string methods
The str class has many methods. Methods are called with the . (dot) notation. object.method(arguments)

In [10]:
# Methods are called with object.method(arguments) notation
# test many methods on string
test_string = 'this is a TesT string.'

In [11]:
# Capitalize first letter
test_string.capitalize()

'This is a test string.'

In [12]:
#make lowercase
test_string.lower()

'this is a test string.'

In [13]:
#count occurences of a substring
test_string.count('is')

2

In [14]:
# are all characters alphanumeric?
test_string.isalnum()

False

In [15]:
#split a string by a given character. Default is space. Returns a list
test_string.split()

['this', 'is', 'a', 'TesT', 'string.']

### So how many string methods are there?
There are far too many methods to remember for all the python objects that you will encounter. To get all the methods use the dir command

In [16]:
# use print to make the output shorter and wider and not one long list
print(dir(test_string))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


### What are all those methods with underscores?
There are actually two underscores before and after the first half of the methods above. These signify special standard builtin methods available (but not necessarily implemented) for all python objects. For instance, the `__add__` method alerts the programmer that you can use the plus sign (+) operator with this object.

### Advanced: How can you view all the normal string methods?
We will have to print out only those methods without the double underscores (pronounced 'dunder' in the python world)

In [17]:
# print out only the normal methods. This will be explained later if it doesn't make sense now
print([method for method in dir(test_string) if method[0] != '_'])

['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


### How do I find out what these methods do?
Use the help function to see the documentation. Lets find out what the `endswith` method does.

In [18]:
# get documentation for endswith. You can also just add a ? to the method to do the same thing
help(test_string.endswith)

Help on built-in function endswith:

endswith(...) method of builtins.str instance
    S.endswith(suffix[, start[, end]]) -> bool
    
    Return True if S ends with the specified suffix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    suffix can also be a tuple of strings to try.



In [19]:
# endswith has one required argument, the suffix and returns True or False if the string ends in that suffix
test_string = 'yet another TEST StrinG'
test_string.endswith('string')

False

### Problem 4
<span style="color:green">Find and use the string method that will replace each letter 'a' with 'A'. </span>

In [20]:
replace_string = 'replace each letter a in this string with A'
# your code here

### Chaining methods
If there are many string methods that you would like to use in succession, you can continue to use the dot notation to continue doing string operations since many string methods return strings

In [21]:
test_string = '?!?!?!A HIDDEN TEST STRING??!!?!'
# Lets remove punctuation, make lower case, replace all t's with e and count the instances of e
test_string.strip('?!').lower().replace('t', 'a').count('e')
## Wow that was 4 chained methods ??!!

2

In [22]:
# In other langauges (like javascript), chaining methods is very common and is usually written 
# more clearly with one method per line.
# You can do this in python by wrapping your expression in parentheses
# This is because the interpretter expects new lines to be broken up by \
(test_string
    .strip('?!')
    .lower()
    .replace('t', 'a')
    .count('e'))

2

In [23]:
#  This also works but is ugly
#  Unlike other languages (which ignore whitespace), a new line in python signals a new statement.
test_string \
    .strip('?!') \
    .lower() \
    .replace('t', 'a') \
    .count('e')

2

In [24]:
# We can print out each step of the expression as its evaluated
foo = test_string.strip('?!')
print(foo);

foo = foo.lower()
print(foo)

foo = foo.replace('t', 'a')
print(foo)

foo = foo.count('e')
print(foo)

A HIDDEN TEST STRING
a hidden test string
a hidden aesa saring
2


### Problem 5
<span style="color:green">Strip each letter 'a' from the right side, switch the case of each letter(ie from lower to upper and from upper to lower) and find the position(aka index) of the first letter 'o' </span>. Use a combination of the methods above. You can check the [string method documentation](https://docs.python.org/3.5/library/stdtypes.html#string-methods) for more

In [25]:
test_string = 'aaaa TOO many aaaaaaaaa'
# your code here

### remember that `__add__` special method
Since we saw `__add__`, we know that the + operator will do something with strings. Using the plus sign is how you concatenate strings in python

In [26]:
'abcde' + 'fghijk' + 'lmnop'

'abcdefghijklmnop'

### What happens if you subtract strings?
Since the `__sub__` special method was not in the string method list, this will yield an error

In [27]:
'asdfa' - 'a'

TypeError: unsupported operand type(s) for -: 'str' and 'str'

### Why is there a `__mul__` special method?
Looking back up the notebook, you can see the special method `__mul__`. This alerts the programmer that the multiplication operator (`*`) has been implemented. What does it do?

In [28]:
# The string is repeatedly concatenated to itself via multiplication
'some test words | ' * 5

'some test words | some test words | some test words | some test words | some test words | '

### String Interpolation
String interpolation is substituting variables inside of strings. There are a couple ways to do this, including a new more [intuitive way](https://www.python.org/dev/peps/pep-0501/) in python 3.6. But we will go over the most popular one to date. There is much [more to string interpolation](https://pyformat.info/)

In [29]:
# Put curly braces every place within a string where you want a substitution to occur.
# After closing string quote, use .format method with arguments equal to the order that you would like them substituted
name = 'Ted'
occupation = 'data scientist'
salary = 3

worker_info = 'Employee {} is a {} and earns {} dollars per year'.format(name, occupation, salary)
print(worker_info)

Employee Ted is a data scientist and earns 3 dollars per year


### Get substrings of strings
Here is where we first introduce the bracket operator [ ] which is fundamental to most all scientific computing in python. The [ ] operator has the ability to grab item(s) from a sequence and grab them in most any manner you wish. Since strings are sequences of characters the [ ] operator provides lots of functionality for strings

In [30]:
test_string = 'is this precourse too easy?'

In [31]:
# Let's get the 4th character
test_string[4]

'h'

In [32]:
# 'h' appears to be the 5th character and not the 4th. Like most programming languages, 
#  sequences are 0-indexed in python
test_string[0]

'i'

In [33]:
# How to get the last letter of the string?
# First we get the length
last_position = len(test_string) - 1
test_string[last_position]

'?'

In [34]:
# That seems a little cumbersome. Python surely must have an easier way.
# There is an easier way. Python allows indexing from the last element using negative indices starting with -1
test_string[-1]

'?'

In [35]:
# You can keep grabbing items any distance from the end
test_string[-5]

'e'

### Slicing strings
Slicing a string, means to retrieve a subset of the string. There are numerous ways in which this accomplished. The bracket operator [ ] again is used along with a colon :

It is best learned by example

In [36]:
# substrings are easily retrieved by giving the [] operator a starting and ending position separated by a :
# foo[a:b] - slices from position a to b-1. It does not include position b
test_string[4:9]

'his p'

In [37]:
# you can also slice from the end using negative indices
test_string[-5:-1]

'easy'

In [38]:
# You can slice by only giving either a starting or ending index
test_string[:6]

'is thi'

In [39]:
test_string[-7:]

'o easy?'

In [40]:
# you can chain together slicing
test_string[5:15][-3:]

'our'

### More formal slicing definition [start : stop : step]
Now that you have seen some string slicing in action. String slicing works within the bracket operator by passing it the starting index, the stoping index and the stepping amount. `foo[4:10:2]` slices from element 4 up to (but including) element 10 by 2. If the step is not given it is defaulted to 1.

In [41]:
# its also possible to slice every nth letter with the syntax [start:stop:step]
# This slices location 4 to 10 picking up every other character
test_string[4:10:2]

'hsp'

In [42]:
# Very usefully, it is possible to take negative steps. Make sure start is higher than stop this time
test_string[8:3:-2]

'psh'

In [43]:
# And to fully reverse a string simply do not input a start or stop, just a step of -1

In [44]:
test_string[::-1]

'?ysae oot esruocerp siht si'

In [45]:
# error will occur if you try and access an index out of range
test_string[40]

IndexError: string index out of range

In [46]:
# but no error will occur when your index is out of range in a slice
test_string[4:600]

'his precourse too easy?'

### Problem 6
<span style="color:green">Slice this string from index 5 to the end by every 4th element</span>

In [47]:
test_string = 'or is this precourse too difficult?'
# your code here

### Problem 7
<span style="color:green">Get every third element starting from the end to the beginning </span>

In [48]:
# your code here

### Problem 8
<span style="color:green">Use four chained operators on a string of your choice. Look at the methods in the docs or above </span>

In [49]:
# Enter in a string inside the quotes
your_string = ''
# your code here

### What happens if you try and assign a new value to a character or slice of a string?
Since strings are immutable (can't be changed once created) so this will cause an error

In [50]:
test_string[7] = 'z'

TypeError: 'str' object does not support item assignment

In [51]:
test_string[7:20:-1] = 'z'

TypeError: 'str' object does not support item assignment

### Multiline comments with strings

In [52]:
"""
This area can be
used as a multiline comment since
the normal comment character # does not allow for this.

This multiline comment is especially important when writing docstrings for functions
"""
foo = "executing a string assignment"
# no output

### Simple test whether a string contains a substring

In [53]:
foo = "executing a string assignment"

In [54]:
# you can use the index method, which gives you the position of the substring if found
foo.index('cut')

3

In [55]:
# or if you just need a boolean returned you use the keyword in
'cut' in foo

True

In [56]:
# can also use keyword not to return the opposite
'foo' not in foo

True

## Done!