# Ranges, Loops, and String Manipulation
In today's lecture, we will discuss the usefulness of `for` loops (aka "iteration") in Python, and practice using them within ranges. We will also dedicate some time to learn the basics of string manipulation.

In [None]:
# run this cell; don't worry about what it does yet.
from datascience import *
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('darkgrid')
%matplotlib inline
import string

## Ranges
 We know how to create arrays using `make_array()`.

In [None]:
make_array(1, 2, 3, 4)

The function `np.arange()` returns an evenly sequenced array, which is useful for making an array counting from a particular number. If you give it one numerical argument `n`, it will create an array that starts at 0 and ends *before* `n`.

In [None]:
np.arange(5) # note that it starts at zero!

If you provide `np.arange()` with two arguments, `m` and `n`, it will create an array that begins at `m` and ends *before* `n`.

In [None]:
np.arange(1, 5)

In [None]:
np.arange(10, 50)

In [None]:
np.arange(10, 5)

## `for` Loops and Iteration
The control statement `for` is used to "loop" over an array of values. (We will learn about other control statements later on in the course.) It will run the lines within the loop as many times as there are values in the array. (To "iterate" means to repeat; the lines within the loop are repeated again and again.)

In [None]:
np.arange(5)

In `for` loop syntax, it is important to remember the colon `:` and the tab indentation of each line that is inside the loop.

In [None]:
for i in np.arange(5):
    print("This is loop number",i)

You can do anything to the item `i` that is called in the loop.

In [None]:
for i in np.arange(5):
    print(i * 3)

Your variable can also be anything, but it should be sensible.

In [None]:
for eachnumber in np.arange(5):
    print(eachnumber * 3)

You can also *nest* loops within each other.

In [None]:
for i in np.arange(4):
    for j in np.arange(2):
        print(i, j)

`for` loops can also run through arrays that aren't numbers.

In [None]:
names = make_array("Aang","Katara","Sokka","Toph")
for n in names:
    print(n,"is a hero.")
print("Zuko is a little snot.")

## Using iteration to populate an array
You can use `for` loops to add items to an array (or a list). The first step is usually to create an empty array.

In [None]:
empty = make_array()
empty

The function `np.append()` will add an item to the array. The first argument it takes is the name of the array; the second argument is the item to be added.

In [None]:
empty = np.append(empty, "No longer empty")
empty

Now, we simply put the `np.append()` line inside of the loop.

In [None]:
for name in names:
    empty = np.append(empty, name)
empty

## Working with Strings
Linguistic data doesn't always deal with numbers and measurements. Today, we are going to look at the analysis of words and writing. In order to do so, we need to learn how to work with the built-in functions for `string` objects in Python.

### Indexing strings
Strings are arrays in Python, which means they can be indexed and sliced the same way that `numpy` arrays can be:

In [None]:
a = make_array('dog', 'cat', 'fish', 'bird')
print(a[0])

In [None]:
print(a[:1]) # same as print(a[0:1])
print(a[0:1])
print(a[2:])

In [None]:
s = 'This is a string'

In [None]:
print(s[3])

In [None]:
print(s[:9])

In [None]:
print(s[1:])

In [None]:
print(s[1::2])

### Removing preceding and trailing whitespace
*Whitespace* characters (spaces, tabs, new lines etc.) at the front and back of a string can be removed with `s.strip()`:

In [None]:
s = "  Who still uses two spaces after sentence-end punctuation?  "
s

In [None]:
s.strip()

`s.lstrip()` and `s.rstrip()` also exist for stripping only the left and right hand side:

In [None]:
s.lstrip()

In [None]:
s.rstrip()

**Note: all string methods return *new* string objects. (String objects are *immutable*, i.e. can't be changed)**

In [None]:
a[1] = 'hat'
a

In [None]:
s

In [None]:
s[5] = 'g' # this throws an error

In [None]:
q = 'Lots of trailing whitespace.     '
q

In [None]:
q.rstrip()

In [None]:
q

### Case operations

In [None]:
s1 = 'dog'
s2 = 'CAT'

`s.lower()` returns a new string which is all lowercase, and `s.upper()` returns a new string which is all uppercase:

In [None]:
print(s2.lower())

In [None]:
print(s1.upper())

`s.capitalize()` returns a new string which contains a capitalized version of `s`:

In [None]:
print(s1.capitalize())
print(s2.capitalize())

### Replacing a single substring
`str.replace(old, new)` is a method that returns a new string with substring `old` replaced by `new`.

In [None]:
text1 = 'I drink soup'
text1_new = text1.replace('drink', 'eat')
text1_new

By default it replaces all of the occurences of `old` with `new`. The optional argument `count` can be used to replace only the first `count` instances of `old`.

In [None]:
text2 = 'I drink corn chowder. I drink miso soup. Is cereal soup?'
text2.replace('drink', 'eat')

In [None]:
text2.replace('drink', 'eat', 1)

### Replacing multiple substrings
There are various ways to replace multiple substrings. The simplest is to iterate and use `s.replace`:

In [None]:
text3 = 'I ship Berkeley and Stanford.'
old_words = make_array('Berkeley', 'Stanford')
for word in old_words:
    text3 = text3.replace(word, 'you')
text3

### Finding the index of a value in a string
There are two ways to find substrings in Python: `s.index(value)` and `s.find(value)`:

In [None]:
text4 = "I'm looking for the Avatar."

In [None]:
text4.index('Avatar')

In [None]:
print(text4[20:])

In [None]:
text4.find('Avatar')

The only difference is that in the event that the value does not exist in the string, `s.find()` returns `-1`, while `s.index()` throws an error.

In [None]:
text4.find('Aang')

In [None]:
text4.index('Aang')

This is a little different from the method `.count()`, which counts the instances of a substring within a string.

In [None]:
text4.count('Avatar')

In [None]:
text4.count('a')

### String constants
The `string` module, which you loaded in the very first cell of this notebook, contains various useful pre-defined constants. (A module is like a package such as `numpy`, but simpler.) For example, `string.ascii_letters` contains all of the [ASCII letters](https://en.wikipedia.org/wiki/ASCII):

In [None]:
string.ascii_letters

`string.punctuation` contains the ASCII punctuation marks:

In [None]:
string.punctuation

`string.whitespace` contains all the different types of spaces (including tab indentation `\t`, new line `\n`, and more). String codes with the `\` symbol are called "escape sequences".

In [None]:
string.whitespace

In [None]:
for space in string.whitespace:
    print('a'+space+'b')
    print('***')

These last two should be self-explanatory.

In [None]:
print(string.ascii_lowercase)
print(string.ascii_uppercase)

### Split and join
Two invaluable string methods are `.split(sep)` and `.join(iterable)`.

In [None]:
poem = 'This is a haiku.\nThis creates a second line.\n\tIndent the third line.\n\nBashō'
print(poem)

Use the method `.split()` to return a `list` of strings in the original object. By default, the separation is done at any whitespace characters (mostly spaces `' '`, tabs `'\t'`, and new lines `'\n'`).

In [None]:
poem.split()

Include the argument `sep` to change where the method separates the object. In the code below, the object is only split into two strings, before and after `\t`.

In [None]:
poem.split('\t')

If you want to be weird about it, go ahead:

In [None]:
poem.split('i')

`.join()` does sort of the opposite of `.split()`. It takes something iterable, such as an array of words, and joins them together by the object of the method. For example, `a.join(b)` will take the words inside `b` (an array) and join them together into one string, separated by the value of `a`. Usually, the value of `a` is a whitestring character, although it can be any string:

In [None]:
b = make_array('F', 'R', 'I', 'E', 'N', 'D', 'S')
j = '-'.join(b)
print(j)

### String formatting
String formatting with the `.format()` method is useful when you are trying to combine non-string Python types with a string. The following cell will throw an error:

In [None]:
age = 31
intro = "My name is Stevonnie, and I am " + age + "years old."

This is because you cannot "add" a string to an integer. You already know that the `print()` function will convert everything into a string for you, but we may sometimes want to format the strings without printing them.

In your string, use the placeholder string `'{}'` to represent where you want to put your non-string variable. Then, use the method `.format()`. The placeholder `"{}"` is replaced with the value of the non-string variable.

In [None]:
intro = "My name is Stevonnie, and I am {} years old."
intro.format(age)

You can use as many placeholders as you like, as long as you specify the correct (same) number of values in `.format()`:

In [None]:
name = 'Stevonnie'
age = 31
pronouns = 'they/them'

intro = "My name is {}, I am {} years old, and my pronouns are {}."
intro.format(name, age, pronouns)

In [None]:
intro.format(name, age)

And also be careful about the ordering of your values:

In [None]:
intro.format(age, pronouns, name)

### String operations and tables
Lastly, string operations can be used within a `Table` object, which will come in handy for our linguistic analysis. Using only `Table` operations, if we wanted to run a string operation on every row in a table, we would have to extract the column as an array and use a `for` loop to iterate through every item in the array. For example:

In [None]:
t = Table().with_columns('mons', make_array('Pikachu,Electric', 'Shroomish,Grass', 'Pangoro,Fighting/Dark', 'Magikarp,Water'))
t

In [None]:
t.column('mons')

In [None]:
mons_split = make_array()
for item in t.column('mons'):
    m = item.split(',')
    mons_split = np.append(mons_split,m)
print(mons_split)

nom = mons_split[0::2]
typ = mons_split[1::2]

print(nom)
print(typ) 

In [None]:
t = t.with_columns('nom', nom, 'typ', typ)
t

This method works well given a clean `Table` with no missing or differently-formatted data. You might notice some ways in which the code will "break" if, for example, a particular value in the column `mons` does not have a comma. There is an easier method, however, using the `.apply(fn, column_or_columns)` method. This applies the function `fn` to each cell of the column(s) specified by the variable `column_or_columns`. 

Actually, this method is both simpler and more complex. It's more complex because you will have to create a custom function. It's simpler because after you create your custom function, you won't have to create separate arrays and append them back to your table. For the time being, I'll create the custom function for you; in Module 5, we will learn more about creating our own custom functions.

In [None]:
def comma_split(s):
    '''This custom function will split a string by the value ",".'''
    return s.split(',')

In [None]:
x = "It's time to eat, children!"
print(comma_split(x)[0])
print(comma_split(x)[1])

*Note the trailing whitespace in front of the string "children".*

Now, we will use the method `.apply()` to apply this function to every row of a `Table`. The first argument is our custom function `comma_split()` (but without the parentheses). The second argument is the column we want to apply it to, `mons`.

In [None]:
u = t.apply(comma_split,'mons')
print(u)
print(type(u))

In [None]:
type(t.column('mons'))

As you can see, `.apply()` has returned the data in `mons` as an array, in particular an array of arrays.