# Strings

> In computer science a string is traditionally a sequence of characters and numbers. A string is essentially an array of characters.

Strings are a really important especially for communicating between the user and the program.

You can think of a string as a sentence that is **enclosed in quotes** (either 'single quotes' or "double quotes").

In [1]:
from __future__ import print_function

Let's first create two strings.

In [2]:
st1 = 'I am a string'
st2 = "me too!"

## Referencing and slicing

We said before that a string is essentially an **array of characters**. So, how can we retrieve a single character from our string?  
```python
str[i]
```
This returns the *i-th* character in the string `str` (count starts from 0).

**Slicing** is when we want to retrieve a substring from our initial string.
```python
str[i:j:k]
```
This returns a substring of string `str` that starts from the character with index *i*, ends with the character with index *k-1* and returns one character every *k* ones.
- ***i***: **starting point**
- ***j***: **ending point**
- ***k***: **step**

Negative indices start count from the end of the string. 

Let's look at some examples:

In [3]:
print('Whole string: ', st1)
print('st1[3]:       ', st1[3])
# In python the first element in a set (string, list, tuple dictionary, array, etc) has an index of 0!
# so by calling index number 3 we are referring to the 4th character (in this case letter 'm')
print('st1[-1]:      ', st1[-1]) # Returns the last character: 'g'
print('st1[4]:       ', st1[:4]) # Returns the first 4 characters: 'I am'
print('st1[-3:]:     ', st1[-3:]) # Returns the last 3 characters: 'ing'
print('st1[2:6]:     ', st1[2:6]) # Returns the characters with indices 2-5 (doesn't return index 6!): 'am a'
print('st1[3:10:2]:  ', st1[3:10:2]) # Returns characters with indices from 2 to 8 with a step of 2 (indices: 2,4,6,8): 'masr'

Whole string:  I am a string
st1[3]:        m
st1[-1]:       g
st1[4]:        I am
st1[-3:]:      ing
st1[2:6]:      am a
st1[3:10:2]:   masr


## Helpful built-in string functions

These are either universal python functions that can take a string as a parameter (e.g *len*), or built-in methods of string type objects (e.g *string.index*)

In [4]:
print('st1:             ', st1)
print('len(st1):        ', len(st1))
# returns the length of the string: 13
print("st1.index('a'):  ", st1.index('a'))
# returns the index of the first matching argument passed (in this case 'a'): 2
print("st1.count('a'):  ", st1.count('a'))
# returns how many times 'a' appears in the string: 2
print("st1.count('i'):  ", st1.count('i'))
# strings are case sensitive: 1
st3 = 'sasasas'
# we'll create a new string to show this better
print("st3:             ", st3)
print("st1.count('sas'):", st3.count('sas'))
# counts only discrete appearances: 2 (even though 'sas' appears 3 times it returns 2 because the first 's' from the second 'sas' is the same as the last 's' of the first 'sas')

st1:              I am a string
len(st1):         13
st1.index('a'):   2
st1.count('a'):   2
st1.count('i'):   1
st3:              sasasas
st1.count('sas'): 2


## Logical operations
These help us figure out if a string contains a certain character or substring.

In [5]:
print('st1:            ', st1)
print("'a' in st1:     ", 'a' in st1)
# Returns True (because there is an 'a' in st1)
print("'o' in st1:     ", 'o' in st1)
# Returns False (because there isn't an 'o' in st1)
print("'o' not in st1: ", 'o' not in st1)
# Returns the opposite of the previous

st1:             I am a string
'a' in st1:      True
'o' in st1:      False
'o' not in st1:  True


## String operations
Concatenating and repeating strings is done easily in python.

In [6]:
print('st1:       ', st1)
print('st2:       ', st2)
print('st1 + st2: ', st1 + st2) # concatenation of st1 and st2
print("st2 * 3:   ", st2 * 3) # equivalent to adding st2 to itself 3 times

st1:        I am a string
st2:        me too!
st1 + st2:  I am a stringme too!
st2 * 3:    me too!me too!me too!


## Capitalization
Python strings also have a lot of built in functions for manipulating the capitalization in strings. These are mostly used for user inputs. 

For example, say you want to ask a user his name. He could reply: `John`, `john` or `JOHN`. All three of these are the same for a human, but totally different for the computer. Capitalization methods help us with these cases.

In [7]:
st4 = 'rAnDoMLy CAPitaLiZed StrInG'
print('st4:              ', st4)
print("st4.capitalize(): ", st4.capitalize())
# returns string with first letter capitalized and rest lowercase
print("st4.lower():      ", st4.lower())
# all lowercase
print("st4.upper():      ", st4.upper())
# all uppercase
print("st4.swapcase():   ", st4.swapcase())
# swaps upper for lowercase and vice versa
print("st4.title():      ", st4.title())
# capitalizes the first letter of each word

st4:               rAnDoMLy CAPitaLiZed StrInG
st4.capitalize():  Randomly capitalized string
st4.lower():       randomly capitalized string
st4.upper():       RANDOMLY CAPITALIZED STRING
st4.swapcase():    RaNdOmlY capITAlIzED sTRiNg
st4.title():       Randomly Capitalized String


## Whitespace manipulation
`str.strip` can help us deal with a lot of issues with whitespace (or even excess characters).

In [8]:
st5 = '   lots       of whitespace    '
print('st5:                        ', st5)
print("st5.lstrip():               ", st5.lstrip())
# removes leading whitespace
print("st5.lstrip(' stlow'):       ", st5.lstrip(' stlow'))
# Removes leading characters (in this case it removed all the whitespace, 'lots' and
# the 'o' from of, but didn't remove the 'w' from 'whitepace' (because of the 'f')
print("st5.rstrip():               ", st5.rstrip())
# same as lstrip but strips from the right
print("st5.strip():                ", st5.strip())
# removes both from the left and from the right

st5:                            lots       of whitespace    
st5.lstrip():                lots       of whitespace    
st5.lstrip(' stlow'):        f whitespace    
st5.rstrip():                   lots       of whitespace
st5.strip():                 lots       of whitespace


## Split and Join
Split is used for splitting a string into a list of substrings according to a delimiter. Join helps us merging a list of characters into a single string.

In [9]:
print('st1:              ', st1)
spl = st1.split()
print('st1.split():      ', spl)
# Splits string into a list of substrings (default delimiter: space)
print("st1.split('a'):   ", st1.split('a'))
# Split with delimiter 'a' (spacing is preserved)
print("st1.split('a', 1):", st1.split('a', 1))
# Split with delimiter 'a'. Performs only 1 split
print("''.join(spl):    ", ''.join(spl))
# Joins sequence of strings as a string
print("'-'.join(spl):    ", '-'.join(spl))
# Joins sequence of strings with '-' as delimiter

st1:               I am a string
st1.split():       ['I', 'am', 'a', 'string']
st1.split('a'):    ['I ', 'm ', ' string']
st1.split('a', 1): ['I ', 'm a string']
''.join(spl):     Iamastring
'-'.join(spl):     I-am-a-string


## Replace
Python also has a built-in method for finding a substring in a string and replacing it with another one. This is also useful for removing a part of a string completely:

In [10]:
print("Before replace: ", st2)
print('After replace:  ', st2.replace('too','three'))
# replaces 'too' with 'three' in st2
print("replace 'zz': ", 'razzndzzom stzzrizzng'.replace('zz', ''))
# removes 'zz' from the string
print('remove whitespace from previous example:', st5.replace(' ', ''))
# removes all whitespace from the string
print('remove whitespace from previous example:', st5.replace('  ', ' '))
# replaces double spaces with single ones

Before replace:  me too!
After replace:   me three!
replace 'zz':  random string
remove whitespace from previous example: lotsofwhitespace
remove whitespace from previous example:   lots    of whitespace  


## Formatting

The best way of incorporating values from variables to strings is through formatting. This is especially useful for printing data on the screen. Formatting is typically done with the built-in `string.format()` method.
This method takes a string and replaces occurrences of curly brackets (`{}`) with whatever parameter we pass into it.

In [11]:
ct = 55
print('bla bla {} bla')
print('bla bla {} bla'.format(ct))

bla bla {} bla
bla bla 55 bla


There are a lot of formatting options, we wont go into much detail about formatting in this tutorial, but we will revisit the topic in the future.

There is also an older way of using formatting strings with the percent (`%`) sign. This does **not** utilize the `string.format()` method.

In [12]:
ts = 'The first string I used in this tutorial was: %s, and the second one was: %s' %(st1,st2)
print(ts)
n1, n2 = 5, 10
print('bla bla %i bla %.2f' %(n1,n2))
print('%(language)s has %(number)03d quote types.' %{"language": "Python", "number": 2})
print('%i is an integer, %f is a float, %s is a string.' %(15, 1.66, 'asdf'))

The first string I used in this tutorial was: I am a string, and the second one was: me too!
bla bla 5 bla 10.00
Python has 002 quote types.
15 is an integer, 1.660000 is a float, asdf is a string.


## Example

We want to write a program that splits a string according to commas (,) and full stops (.) but preserves full stops. Spacing after commas and full stops should also be removed.

In [13]:
ex_str = 'This is the string that we will use to test our example. ' \
            'The expected output of the program should contain every word ' \
            'this string has, but it should be split according to punctuation. ' \
            'Full stops should be preserved, but commas should not. ' \
            'Spacing after punctuation, should also be removed.'
print(ex_str)

This is the string that we will use to test our example. The expected output of the program should contain every word this string has, but it should be split according to punctuation. Full stops should be preserved, but commas should not. Spacing after punctuation, should also be removed.


Let's do the easy part first. Let's split the string according to commas:

In [14]:
temp = ex_str.split(',')
print(temp)

['This is the string that we will use to test our example. The expected output of the program should contain every word this string has', ' but it should be split according to punctuation. Full stops should be preserved', ' but commas should not. Spacing after punctuation', ' should also be removed.']


We did manage to split the string, but we haven't yet removed the excess spacing in the beginning of our substrings.  
One thought would be to remove the first character from each of these strings, but that would also remove the fist character from the first string (This ...). If we wanted to do it this way we would have to keep that in mind. This would also require an elementwise list operation which we haven't covered yet.

What we can do is remove spacing during the split phase:

In [15]:
temp = ex_str.split(', ')
print(temp)

['This is the string that we will use to test our example. The expected output of the program should contain every word this string has', 'but it should be split according to punctuation. Full stops should be preserved', 'but commas should not. Spacing after punctuation', 'should also be removed.']


Now it's better. this method has one problem though which we will discuss later.

Let's try perform the other split:

In [16]:
fin_lst = temp.split('. ')

AttributeError: 'list' object has no attribute 'split'

So... we can't split a list of strings.  
Furthermore the `.split()` method does not accept multiple delimiters and we don't know how to perform elementwise operations on lists.  
What can we do?

We could replace all full stops to commas, and then perform the split:

In [None]:
temp = ex_str.replace('.', ',')
print(temp)

... and then perform the split

In [None]:
temp = temp.split(', ')
print(temp)

OK, now we're close. The only thing to do is to modify it a bit so that we preserve the full stops:

In [None]:
temp = ex_str.replace('. ', '., ')
print(temp)
fin_lst = temp.split(', ')
print('\nFinal list:')
print(fin_lst)

Now we finally got it!

The only problem would be if our string didn't have spaces after punctuation marks (e.g: 'strings like this,would not be split').

We can solve this problem by replacing all punctuation + spacing with just the punctuation:

In [None]:
temp = ex_str.replace('. ', '.').replace(', ', ',')
print(temp)

But what if someone by mistake had placed an extra space? (*'like this,    bla bla'*)

We would have to somehow figure out how much whitespace we have and then replace it with a single space.

The easiest way to do this is to do a primary split on our string (with a single space as our delimiter). This would create a list of the words in our string. Then we could reassemble (join) the string with a single whitespace as the delimiter of the words. This method would effectively substitute all multiple whitespaces with single whitespace.
```python
temp = ' '.join(ex_str.split())
```

Let's write the program as a whole:

In [None]:
temp = ' '.join(ex_str.split())
# first we remove excess whitespace
temp = temp.replace('. ', '.').replace(', ', ',')
# then we remove all whitespace after punctuation marks
temp = temp.replace('.', '.,')
# then we replace every fullstop with fullstop+comma (in order to preserve full stops)
fin_lst = temp.split(',')
# finally we split the string according to the commas
# this has a side effect of creating an empty element in the last spot of the list, but this is of little importance and we can always remove it:
fin_lst.pop()
# only if string ends with a full stop
print(fin_lst)

Finally, we'll test this in a more difficult string.

In [None]:
test_str = 'Element 1, element 2.   Element 3,          element    4,element 5.  Element   6.Element 7.'
temp = ' '.join(test_str.split())
temp = temp.replace('. ', '.').replace(', ', ',')
temp = temp.replace('.', '.,')
fin_lst = temp.split(',')
fin_lst.pop()
print(fin_lst)

## Exercises

1. Write a python program which takes a string and replaces all occurrences of it's first character with the dollar sign (\$):  
e.g: restart ---> resta\$t

2. Write a Python program to get a single string from two given strings, separated by a space and swap the first two characters of each string:  
e.g: 'abcd', 'wxyz' ---> 'wxcd abyz'