# Strings and Lists

## Skills

1. Understand datatypes and basic python operations.
2. Use basic functions.
3. Store data in variables.
4. **Manipulate strings using string methods.**
6. **Use lists to store multiple pieces of data.**

## Vocabulary List

**list.** A Python data type which can contain multiple values, not just one.

**method.** A special type of function which belongs to a string, list, or other kind of object. Instead of calling just the function, you write the function after the object it belongs to with a `.`. For example, if a variable *name* is a string, then `name.upper()` is calling *name*'s `.upper()` method.

**slicing.** A way of selecting only some entries in a string, list, or other variable containing multiple values. When slicing, put the start and ending+1 position inside square brackets `[]`.

**tokenization.** The process of splitting text into its constituent words or other pieces, depending on language and context.

**set.** A Python data type similar to a list, but it can only contain one of each item. This will be very useful when finding unique words.

## Slicing Strings

**Slicing** is a way to select only part of a string. To slice a string, put the starting and ending postions that you want inside of square brackets `[]`, separated by a colon `:`.

In [2]:
fruit = "apple"

fruit[0:3]

'app'

There are a few things to note here:

* The positions in a string start from 0. So the 0th position in `fruit` is the letter "a", the 1st is "p", the 2nd is another "p", &c.

* The starting index is *inclusive*, and the ending is *exclusive*. That means that your slice will contain the letter at the frst index, but will not contain the letter at the ending index.

* There is a special index, blank, which means "the last position in the string + 1" when it is the end position.

* If you omit the `:` and final position altogether, you get one single letter at the start position.

Let's try a few more examples. See if you can predict what the result will be.

In [3]:
fruit[1:4]

'ppl'

In [4]:
fruit[0]

'a'

In [5]:
author = "octavia butler"

In [6]:
author[0:5]

'octav'

In [7]:
author[7]

' '

In [8]:
author[8:]

'butler'

In [9]:
start = 0
end = 3
author[start:end]

'oct'

## String Methods

There are [many useful string methods](https://www.w3schools.com/python/python_ref_string.asp) that will let us start manipulating text. Here are a few that will come in handy:

* `.lower()` converts a string entirely to lowercase letters. You can probably guess what `.upper()` does, as well.
* `.replace()` replaces one substring with another in the larger string.
* `.find()` allows you to find where a letter, word, or phrase appears within a larger string.
* `.count()` tells you how many times some substring appears in the string.
* And of course, we've seen the `len()` function already, which isn't a method.

Let's see them in usase:

In [10]:
movie = "The Fast And The Furious"

In [11]:
movie.upper()

'THE FAST AND THE FURIOUS'

In [12]:
movie.replace("Fast", "Quick")

'The Quick And The Furious'

In [13]:
movie.find("Fast")

4

In [14]:
movie.count("The")

2

In [15]:
len(movie)

24

We can combine these to do more complex tasks, like removing the first word from a string:

In [16]:
firstspace = movie.find(" ")
movie[firstspace+1:]

'Fast And The Furious'

The return value from many of these functions is also a string, so they can be chained together:

In [17]:
movie[4:8].upper().replace("A","IR")

'FIRST'

## Lists

A **list** contains multiple values. When defining one, each entry in the list is separated by commas and the whole list is surrounded by square brackets:

In [18]:
groceries = ["apples", "curry", "yogurt", "bread", "durrian"]

groceries

['apples', 'curry', 'yogurt', 'bread', 'durrian']

Lists are similar to strings in that they have a length and can be sliced. Note that the length is the *number of items* it contains. Similarly, when slicing, you are indicating *which* item you want, not parts of each individual item.

In [19]:
len(groceries)

5

In [20]:
groceries[3]

'bread'

### List Methods and Properties

Just like strings, there are many [list methods](https://www.w3schools.com/python/python_ref_list.asp), some of which work quite similarly. Some useful ones:

* `.count()` works just like the string method, indicating how many items match the requested value.
* `.sort()` sorts either alphabetically or from smallest to highest. This function is done in-place; it doesn't return a new list, but actually sorts the original variable the list is stored in.
* `.reverse()` reverses the list's order. Like `.sort()`, it does the reversing in-place.
* `in` is not a method. It tells you whether a list contains a particular value.
* `set()` converts a list into a **set**, which means each of its entries must be unique. This will be useful later, when determining the size of a text's vocabulary.

In [21]:
letters = ["a", "a", "e", "a", "u"]

letters.count("a")

3

In [30]:
list(set(letters))

['e', 'u', 'a']

In [22]:
groceries.sort()

groceries

['apples', 'bread', 'curry', 'durrian', 'yogurt']

In [23]:
numbers = [1, 5, 17, 22, 9, 4, 6]

numbers.sort()
numbers.reverse()

numbers

[22, 17, 9, 6, 5, 4, 1]

In [24]:
"apples" in groceries

True

In [25]:
"pizza" in groceries

False

## Converting between strings and lists

Strings have a `.split()` method which converts them to lists after splitting by a specific character if passed an argument, or any whitespace (spaces, tabs, and newlines) if no argument is passed. This is a very simple way to do **string tokenizing**, splitting a text into words.

In [86]:
movie

'The Fast And The Furious'

In [90]:
movie.split()

['The', 'Fast', 'And', 'The', 'Furious']

In [91]:
len( movie.split() )

5

There is also a string method called `.join()` which will convert a list back into a string. Whichever string is used will be put between the individual words when the list is joined back together:

In [100]:
names = ["rose", "dorothy", "sophia", "blanche"]

" ".join(names)

'rose dorothy sophia blanche'