# Thurs, 20 Sept. 2018

Continuing text analysis intro. Today's lab is Lipograms.

Some of the examples reuse `getJunkFood()` and `students` from Tuesday's lab:
```python
def getJunkFood(word, symbol='🌮'):
    '''turn word into food emojis'''
    junkfood = symbol * len(word)

    return junkfood

students = [
    'Austin', 
    'Makoto', 
    'Brydie', 
    'Sam', 
    'Anna', 
    'Connor', 
    'Miranda', 
    'Karston', 
    'Alice', 
    'Daniel', 
    'Maya',
]
```

## Working with strings

This first part of the course is largely devoted to textual analysis. We've already seen that Python uses the `str` type to handle text, and we've also used the `len()` function to check the length of a string.

In fact, Python has a whole toolbox of useful tricks that it knows how to do on strings. Just as with lists, many of these actions are **methods** that strings just "know how to do." Keep an eye on the syntax for each.

<div class="alert alert-info" style="margin:1em 2em;">
<strong>Strings and lists are similar in many ways</strong>
<p>
    Like a list, a string is a **sequence**—in this case a sequence of characters. That means that many of the tasks we do with lists have analogues here: for example, checking length, checking membership, and iterating.
</p>
<p>
    One important way in which strings and lists are different is that strings are **immutable**. You can't alter a string once you've created it. However, you can always replace it with an altered copy, which usually amounts to the same thing for us.</p>
</div>

### Slicing

You can slice strings much like lists. (⚠️ But you can't *assign to* a string slice, because strings are immutable.)

**Examples:**

```python
text = 'dr. strangelove or: how i learned to stop worrying and love the bomb'

# the first four characters
print(text[:4])

# the last 13
print(text[-13:])

# every third character, starting with the second
print(text[1::3])
```

### Change case

Use the methods `.upper()`, `.lower()`, or `.title()` to return a copy of the string with different case.

```python
print(' original: ' + text)
print()
print('uppercase: ' + text.upper())
print('lowercase: ' + text.lower())
print('titlecase: ' + text.title())
```

### Remove padding

The method `.strip()` returns a copy of the original string with leading and trailing spaces removed. By default, it strips all whitespace, including <kbd>tab</kbd>, <kbd>return</kbd>, etc.

For example, imagine we've downloaded a textfile from Project Gutenberg and it uses spaces to centre text:

```python
chap = '                 CHAPTER ONE                       '
print('original:', len(chap))
print('stripped:', len(chap.strip()))
```

### Search for substrings

There are a couple of ways to ask Python whether one string (a character, a word, or longer...) can be found inside a longer string.

For the following, assume `text` contains the Dr. Strangelove text from above.

**Is the substring in there, or not: `in`**

The operator `in` will return `True` or `False`, depending on whether the second string is a substring of the first:

```python
# prints True
print('love' in text)

# prints False
print('tacos' in text)
```

**Where is it: `.index()`**

Once you know a substring is in there, use `.index()` to return the position of its first character within the longer string.

For example, the first occurrence of 'love' in `text` begins with the 12th character. So because Python counts from zero, `text.index('love')` will return `11`.

```python
# the first occurrence of 'love' begins with the 12th character.
print(text.index('love'))
```

What's the use of that? Well, one thing you can do is slice the string using that number, in order to, say, extract just the text before or after it. We might do more with this later.

```python
# get the position
i = text.find('love')

# print everything before i
print(text[:i])

# print everything from i on
print(text[i:])
```

**How many times: use `.count()`**

To count how many times a certain string occurs inside another, use `.count()`:

```python
paris = 'paris in the the the the the the the the spring'
print(paris.count('the'))
```

### Replace substrings

You can use the `.replace()` method to return **a copy of the string** with custom edits. `.replace()` takes two arguments, the substring to remove, and the substitution to insert in its place. By default, it will replace every occurrence.

⚠️ This **does not** alter the original string.

**Examples**

```python
# replace characters
print(text.replace('o', 'x'))

# remove chars by replacing with ''
print(text.replace(' ', ''))

# replace words (or longer)
print(text.replace('love', '----'))
```

### Check whether the string matches simple patterns

Use `.startswith()` and `.endswith()` to check whether a string matches some simple pattern. These methods return `True` or `False`, so they're useful in loops and `if` statements.

<div class="alert alert-info" style="margin: 1em 2em;">
One practical use of these methods is checking a list of filenames to see which ones have a particular file extension (e.g. `.txt`, `.docx`) or prefix (e.g. `2018-09-` or `vergil.aeneid_`).
</div>

**Check endings**

```python
# filter out students whose names don't end in 'a'
for student in students:
    if student.endswith('a'):
        print(student)
    else:
        print(getJunkFood(student, '🥦'))
```

**Check beginnings**

```python
# filter out students whose names don't start with 'M'
for student in students:
    if student.startswith('M'):
        print(student)
    else:
        print(getJunkFood(student, '🍕'))
```

**Check middles**

If you're looking for something in the middle, use `in`:

```python
# filter out students whose names don't contain an 'o'
for student in students:
    if 'o' in student:
        print(student)
    else:
        print(getJunkFood(student, '🍦'))
```