# Strings

Strings can be treated as a list of characters. This means each character has an index(are zero indexed) and the individual charaters can be accessed using bracket notation.

In [2]:
string = 'hello world'
a = string[4]
print(a)

o


Strings are **immutable**, trying to change one raises a `TypeError` exception.

In [12]:
string[4] = 'z'
string

TypeError: 'str' object does not support item assignment

Strings can be **sliced** using the same techniques as lists:

```py
variable_name[start:end]
```
This returns a **new** string, the original is unchanged

In [5]:
new_string = string[3:6]
new_string

'lo '

In [6]:
string

'hello world'

Removing the first index `[:end]`, slice starts from the beginning and goes upto but not ingluding the end.

Removing the las index, `[start:]`, the slice starts from `start` index and goes all the way to the end.

To create a copy, use the `[:]` notation.

In [7]:
b = string[:]
b

'hello world'

In [10]:
b = 'goodbye'
b

'goodbye'

In [9]:
string

'hello world'

Just like lists, you can count back from the end of a string by using `negative` indices.

As with **lists**, you can `concatenate` two or more strings with the `+` operator, which returns a **new** string.

Python provides some built in methods for working with strings:

`len()` - returns the strings length, number of characters

In [11]:
string[-4:]

'orld'

Because strings are lists, we can iterate through then using `for` or `while` loops. If we wanted to check if a particular character were found in a word, we coud use a `for` loop like so:

In [13]:
def letter_check(word, letter):
  result = False
  for i in range(len(word)):
    if letter == word[i]:
      result = True
      break
  return result

In [14]:
letter_check('christmas', 'i')

True

We can do this type of check more efficiently using `in`. `in` checks if one string is part of another string and returns `True` if there is a match.

In [17]:
result  = 'i' in 'christmas' # boolean expression
result

True

In [18]:
'x' in 'christmas'

False

This works with entire strings

In [20]:
'duck' in 'There will be duck for xmas'

False

The search is case sensitive

In [21]:
'duck' in 'There will be Duck for xmas'

False

If we want to find what characters two strings have in common, returning a unique list pf those characters:

In [22]:
def common_letters(string_one, string_two):
  result = []
  for i in range(len(string_one)):
    if string_one[i] in string_two:
      if result.count(string_one[i]) == 0:
        result.append(string_one[i])
  return result

In [23]:
common_letters('christmas', 'xmas')

['s', 'm', 'a']

## String Methods

Python provides a number of string methods which you can use to sanitize and standardize input.

### Upper

`.upper()` - returns a **new** string in all uppercase

In [24]:
string = 'hello world'
string.upper()

'HELLO WORLD'

In [25]:
string

'hello world'

### Lower

`.lower()`- returns a **new** string in all lowercase.

In [28]:
string = 'HELLO WORLD'
string.lower()

'hello world'

In [29]:
string

'HELLO WORLD'

### Title

`.title()` - returns a **new** string with the first character of each word capitalized.

In [30]:
string = 'hello world'
string.title()

'Hello World'

In [31]:
string

'hello world'

### Split

`.split()` - splits a string into a list of substrings based on the `delimiter` - a `string` argument passed to the method. If no argument is used, `.split()` defaults to spliting at `spaces`. The original string is **unchanged**. You can use any string as an argument to `.split()`

In [33]:
string = "I'm having turkey for xmas, with stuffing, gravy and mash!"
substrings = string.split(', ')
substrings

["I'm having turkey for xmas", 'with stuffing', 'gravy and mash!']

In [34]:
string.split()

["I'm",
 'having',
 'turkey',
 'for',
 'xmas,',
 'with',
 'stuffing,',
 'gravy',
 'and',
 'mash!']

In [35]:
string

"I'm having turkey for xmas, with stuffing, gravy and mash!"

In [36]:
# return an array of the last names
authors = "Audre Lorde, William Carlos Williams, Gabriela Mistral, Jean Toomer, An Qi, Walt Whitman, Shel Silverstein, Carmen Boullosa, Kamala Suraiyya, Langston Hughes, Adrienne Rich, Nikki Giovanni"

full_names = authors.split(', ')
author_last_names = []
for name in full_names:
  temp = name.split()
  author_last_names.append(temp[-1])
print(author_last_names)

['Lorde', 'Williams', 'Mistral', 'Toomer', 'Qi', 'Whitman', 'Silverstein', 'Boullosa', 'Suraiyya', 'Hughes', 'Rich', 'Giovanni']


We can also split strings using **escape sequences**, such as `\t`, horizonal tab, and `\n`, new line.

`\n` will allow us to split a multi-line string by line breaks and `\t` will allow us to split a string by tabs. `\t` is particularly useful, it is not uncommon for data points to be separated by tabs.

In [38]:
smooth_chorus = \
"""And if you said, "This life ain't good enough."
I would give my world to lift you up
I could change my life to better suit your mood
Because you're so smooth"""
smooth_chorus.split('\n')

['And if you said, "This life ain\'t good enough."',
 'I would give my world to lift you up',
 'I could change my life to better suit your mood',
 "Because you're so smooth"]

Notice that the interpreter automatically escaped the `'` character when it created the new list.

### Join

`.join()` - takes a list of strings, and joins them with a given `delimeter`, returning a string. Syntax:

```py
'delimeter'.join(lst)
```
Without the delimeter the individual strings will be concatenated together.

In [39]:
words = ['one', 'two', 'threee', 'four', 'five']
joined = ' '.join(words)
joined

'one two threee four five'

 You can use any string as a delimiter, including `,` (common with csv files), and escape sequences, e.g. `\n`.

In [40]:
santana_songs = ['Oye Como Va', 'Smooth', 'Black Magic Woman', 'Samba Pa Ti', 'Maria Maria']
santana_songs_csv = ','.join(santana_songs)
santana_songs_csv

'Oye Como Va,Smooth,Black Magic Woman,Samba Pa Ti,Maria Maria'

In [42]:
santana_songs_newline = '\n'.join(santana_songs)
print(santana_songs_newline)

Oye Como Va
Smooth
Black Magic Woman
Samba Pa Ti
Maria Maria


### Strip

`.strip()` - returns a **new** string with the whitespce from either end **removed**. You can also provide an optional argument to strip that character from either end of a string.

In [43]:
string = '!!!!   Rob Lowe    !!!!!'
new_string = string.strip('!') # only strips the argument, NO whitespace
new_string

'   Rob Lowe    '

In [44]:
newer_string = new_string.strip()
newer_string

'Rob Lowe'

In [45]:
string

'!!!!   Rob Lowe    !!!!!'

In [57]:
# rebuild into multiline text
love_maybe_lines = ['Always    ', '     in the middle of our bloodiest battles  ', 'you lay down your arms', '           like flowering mines    ','\n' ,'   to conquer me home.    ']

temp_one = [line.strip() for line in love_maybe_lines]
temp_two = [line for line in temp_one if len(line) > 0]
temp_two

['Always',
 'in the middle of our bloodiest battles',
 'you lay down your arms',
 'like flowering mines',
 'to conquer me home.']

In [58]:
result = '\n'.join(temp_two)
print(result)

Always
in the middle of our bloodiest battles
you lay down your arms
like flowering mines
to conquer me home.


### Replace

`.replace()` - takes two arguments and replaces all instances of the first argument in a string with the second argument. Returns a **string**.

In [59]:
string = 'hello world'
new_string = string.replace('world', 'everyone')
new_string

'hello everyone'

In [60]:
string

'hello world'

### Find

`.find()` - takes a string argument, returning the index of the 1st instance of that string, or `-1` if no match is found.

In [62]:
new_string.find('eve')

6

In [64]:
new_string.find('xmas')

-1

### Format

`.format()` - it takes variables as an argument and includes them in the string that it is run on. You include `{}` as placeholders for where those variables will be imported, in order. 

`.format()` can take as many arguments are there are `{}` in the string it is run on. Extra arguments are ignored. extra `{}` cause the interpreter to raise an `IndexError` exception.

Is the equivalent of `string interpolation` in other languages, and is an alternative to string concatenation.

In [67]:
string = 'My favourite song is {}, by {}'.format('Alphabet Street', 'Prince')
string

'My favourite song is Alphabet Street, by Prince'

`.format` can accept `keywords` in the string and in the arguments, allowing you to pass arguments to `.format()` in any order.

In [68]:
'My favourite song is {title}, by {artist}'.format(artist='Prince', title='Alphabet Street')

'My favourite song is Alphabet Street, by Prince'