# Sequences

<style>
section.present > section.present { 
    max-height: 90%; 
    overflow-y: scroll;
}
</style>

<small><a href="https://colab.research.google.com/github/brandeis-jdelfino/cosi-10a/blob/main/lectures/notebooks/7_sequences.ipynb">Link to interactive slides on Google Colab</a></small>

# Sequence types

We've looked at a few **sequence types** already - can you guess which types are sequence types?

* **Sequence types** are data types that represent sequences of things. 
  * `str` - sequence of characters
  * `list` - sequence of values
  * `range` - (as in: `for i in range(5)`!) generated sequence of integers 
  * `tuple` - coming soon, similar to `list`
  
Today, we'll learn about tuples, some common operations that work on some/all sequence types, and more about strings.

# Tuples

Tuples are a sequence type. They are almost the same as lists, with a few small differences:
* Tuples are created and represented with parentheses `(` `)` instead of brackets `[` `]`
* Tuples are **immutable**. This means they can't change after creation.

In [None]:
i_am_a_tuple = (1,2,3)
print(i_am_a_tuple)

In [None]:
im_also_a_tuple = (7,)
print(im_also_a_tuple)

In [None]:
# You don't strictly need the parentheses, although it's often more clear to include them
hey_wait = 7,2
print(hey_wait)

Tuples are **immutable**. This means they can't change after creation.

You can't add, remove, or change the items a tuple contains.

In [None]:
flavors = ("chocolate", "vanilla", "strawberry")
flavors[1] = "coffee"

You can convert back and forth between lists and tuples, similar to converting between `int`, `float`, and `str`:

In [None]:
flavors = ("chocolate", "vanilla", "strawberry")
print(flavors)


In [None]:
new_flavors = list(flavors)
new_flavors[1] = "coffee"
print(new_flavors)

In [None]:
flavors = tuple(new_flavors)
print(flavors)

## Returning multiple values with tuples

A common use for tuples is to easily return multiple values from a function.

Here's a function that takes in a list of strings, and returns a 2-tuple containing the longest name, and the length of the longest name:

In [None]:
def longest_name(names):
    max_name = ""
    for name in names:
        if len(name) > len(max_name):
            max_name = name
    return max_name, len(max_name)

In [None]:
longest_name(["Bill Murray", "Spongebob", "Batman"])

## Another example of multiple return values

From last lecture:

In [None]:
def find_average(nums):
    sum = 0
    for num in nums:
        sum = sum + num
    avg = sum / len(nums)
    return avg

In [None]:
def below_average(nums):
    avg = find_average(nums)
    
    print("The average is: " + str(avg) + ". Here are all the below average numbers: ")
    for num in nums:
        if num < avg:
            print(num, end=' ')

Let's modify `below_average` to return 2 things: the average, and a list of things below the average.

In [None]:
def below_average(nums):
    avg = find_average(nums)
    
    belows = []
    for num in nums:
        if num < avg:
            belows.append(num)
    return (avg, belows)

In [None]:
answers = below_average([1, 10000, 2, 7, 11000])
print("The average is: " + str(answers[0]) + " and the below average numbers are: " + str(answers[1])) 

## Side note: Value unpacking

That last example was kind of awkward - we assigned to `answers`, then had to access each part of it by an index. We can use something called "value unpacking" to improve this. 

**Value unpacking** allows you to assign multiple values to multiple variables at once. 

In [None]:
# without value unpacking
answers = below_average([1, 10000, 2, 7, 11000])
print("The average is: " + str(answers[0]) + " and the below average numbers are: " + str(answers[1]))

In [None]:
# with value unpacking
avg, below_avg_nums = below_average([1, 10000, 2, 7, 11000])
print("The average is: " + str(avg) + " and the below average numbers are: " + str(below_avg_nums))

Value unpacking works with any sequence type

In [None]:
names = ["Larry", "Moe", "Curly"]
stooge1, stooge2, stooge3 = names
print("Stooge 1: " + stooge1 + ", stooge 2: " + stooge2 + ", stooge 3: " + stooge3)

In [None]:
a, b, c = "xyz"
print("a: " + a + "; b: " + b + "; c: " + c)

The assignment will raise an error if the number of variables on the left doesn't match the length of the sequence being unpacked.

In [None]:
a, b, c = (1, 2)

In [None]:
a, b = (1, 2, 3)

# `range` revisited

`range` is actually a sequence type that generates integers:

In [None]:
range(5)

In [None]:
list(range(5))

Note the difference! `range` is **not** a `list`! It is a sequence though.

In [None]:
for i in range(5):
    print(i)

behaves the same as:

In [None]:
for i in [0,1,2,3,4]:
    print(i)

# `for` revisited

It turns out that `for i in range(...):` is actually just a special case of the general `for` loop form:

```
for <var> in <sequence>:
   statement(s)
```

A `for` loop can iterate over any **sequence**. It executes the code block once for each item in the sequence. 

This means you can use `for` loops on strings, lists, tuples, and ranges.


In [None]:
for c in "Hello, world!":
    print(c)

# Iterables

An **iterable** object is an object capable of returning its members one at a time. 

All sequence types are **iterables**. Other types can be iterable too. 

An **iterator** is an object you use to iterate over an **iterable**.

Some examples of non-sequence iterables:
* Dictionaries (a data type we'll see in a few lectures)
* Reading a file line-by-line
* A "generator" - an iterator that generates a sequence on-demand, as each item is requested.

The details of iterables and iterators are beyond the scope of this class. The important things to know for now are:
* **Iterables** are objects that can return their members one at a time
* All iterables can be iterated over in a `for` loop.

If you want to learn about this topic, this is a decent introductory resource: [Towards Data Science](https://towardsdatascience.com/python-basics-iteration-and-looping-6ca63b30835c). 

# `for` revisited, again

Last time, I promise. 

```
for <var> in <iterable>:
   statement(s)
```

`for` loops can loop over anything that is an **iterable**. This capture more than just **sequences**.

# Common sequence operations

These work on all sequence types! We've seen some of these before on strings or lists. They also work on tuples (and even ranges).

| Operation | Description | Example | Result |
| --- | :--- | --- | :--- |
| `x in s` | True if an item of s is equal to x | `"hi" in ["hello", "hi", "yo"]` | True |
| `x not in s` | Opposite of `in` | `"hi" not in ["hello", "hi", "yo"]` | False |
| `s + t` | Concatenate (combine end-to-end) | `[1,2,3] + [4,5]` | `[1,2,3,4,5]` |
| `s * n` | Replicate | `"abc" * 3` | `"abcabcabc"` |
| `s[i]` | *i*th item of `s` | `"hello"[1]` | `"e"` |
| `s[i:j]` | slice of `s` from `i` to `j` | `"hello"[1:3]` | `"ell"` |
| `len(s)` | length of `s` | `len("hello")` | `5` |
| `min(s)` | The smallest item of `s` | `min([4, 8, 7, 3])` | `3` |
| `max(s)` | The largest item of `s` | `max([4, 8, 7, 3])` | `8` |
| `s.index(x)` | index of the first occurrence of `x` in `s` | `[4, 8, 7, 3].index(7)` | `2` |
| `s.count(x)` | total number of occurrences of `x` in `s` | `"hello".count("l")` | `2` |

# Exercise

Write a program that prompts the user for 5 integers, then prints out:
1. The largest integer they entered
2. The smallest integer they entered
3. Whether `42` is in the list of integers they entered
4. How many times they repeated the first integer they entered
  * e.g. if they entered: 4, 3, 4, 4, 1, then the answer would be 3, because they entered the number 4 three times.
  
Hint: Use a list to store the integers, then use the sequence operations for each task.
  
## [Repl.it: Sequences Playground](https://replit.com/team/cosi-10a-fall23/Sequence-playground)

# More on Strings

In addition to the common sequence operations, strings provide many other helpful methods.

The full list can be found in the [official documentation on strings](https://docs.python.org/3/library/stdtypes.html#string-methods). 

We'll look through a few notable ones now.

## split

`s.split(<delim>)` splits a string into a list. By default it splits at any whitespace, or you can tell it which character(s) to split at:

In [None]:
sentence = "This is a sentence with lots of words"
sentence.split()

In [None]:
sentence = "This,is a sentence,with,lots of words"
sentence.split(",")

## join

`s.join(<sequence of strings>)` joins a sequence of strings together into a single string, with `s` between each string.

In [None]:
' '.join(["some", "words", "to", "stitch", "together"])

In [None]:
'%T%'.join(["some", "words", "to", "stitch", "together"])

## isdigit

`s.isdigit()` returns `True` if all characters in the string are digits and there is at least one character, `False` otherwise.

In [None]:
'12345678'.isdigit()

In [None]:
'123abc'.isdigit()

## Changing case

`s.upper()` and `s.lower()` convert strings to upper and lower case

In [None]:
'SpongeBob SquarePants'.upper()

In [None]:
'SpongeBob SquarePants'.lower()

## startswith / endswith

`s.startswith(<search string>)` and `s.endswith(<search string>)`: return True if a string starts/ends with a substring

In [None]:
'SpongeBob SquarePants'.startswith("Sponge")

In [None]:
# case matters!
'SpongeBob SquarePants'.endswith("pants")

# String formatting

Let's look at [**f-strings**](https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals), a more flexible, usable way to format strings for output.

Adding an `f` before a string lets you you include the value of Python expressions inside a string. Here's an example:

In [None]:
name = "Spongebob"
age = 72
foods = ["Cake", "Pie", "Peanut Butter"]

In [None]:
# Old way:
print("Hi, I am " + name + ", I'm " + str(age) + " years old, and I like " + str(len(foods)) + " foods: " + ', '.join(foods) + ".")

In [None]:
# Using f-strings
print(f"Hi, I am {name}, I'm {age} years old, and I like {len(foods)} foods: {', '.join(foods)}.")

In [None]:
# Using f-strings
print(f"Hi, I am {name}, I'm {age} years old, and I like {len(foods)} foods: {', '.join(foods)}.")

Note that we were able to include an integer (`age`) without an explicit type conversion. 

This makes printing **much** easier, no more awful lines like this: `print("Some text: " + str(some_number) + ".")`

One more formatting trick: to control the number of decimal places, add `:.<num decimals>f` after an expression in an f-string.

For example:

In [None]:
fraction = 1/3
print(fraction)
print(f"{fraction:.2f}")

The [official documentation on string formatting](https://docs.python.org/3/reference/lexical_analysis.html#f-strings) is extremely obtuse. Try [fstring.help](https://fstring.help/) for a deeper tutorial, and [fstring.help/cheat](https://fstring.help/cheat/) instead for a quick cheat sheet.

# Exercise

Write a word guessing game where the user guesses one letter at a time to try to guess a secret word. Show the user's progress between each guess.

## Functional decomposition

This time, let's write the code from the outside in:

In [None]:
def play_game():
    word = "secret"
    progress = ["_", "_", "_", "_", "_", "_"]
    while True:
        display_progress(progress)
        guess = get_guess()
        process_guess(word, progress)

We've written the outer code, and now we have an idea of which helper functions will be most useful.

In [None]:
def display_progress(progress):
    for p in progress:
        print(p, end=" ")
       

In [None]:
display_progress(["_", "a", "b", "_"])

Another way, using the string method `join` that we just saw:

In [None]:
def display_progress(progress):
    print(" ".join(progress))

In [None]:
display_progress(["_", "a", "b", "_"])

In [None]:
def play_game():
    word = "secret"
    progress = ["_", "_", "_", "_", "_", "_"]
    while True:
        display_progress(progress)
        guess = get_guess()
        process_guess(word, progress)

Next up, `get_guess()`...

In [None]:
def get_guess():
    return input("Guess a letter! ")

In [None]:
get_guess()

In [None]:
def get_guess():
    while True:
        guess = input("Guess a letter! ")
        if len(guess) == 1 and guess.isalpha():
            return guess
        else:
            print("Invalid guess, please guess a single letter")
            

In [None]:
get_guess()

In [None]:
def play_game():
    word = "secret"
    progress = ["_", "_", "_", "_", "_", "_"]
    while True:
        display_progress(progress)
        guess = get_guess()
        process_guess(guess, word, progress)

Next up, `process_guess()`, the trickiest part.

Walk through each letter of the secret word. If our guess matches that letter, then put that letter into `progress` at the correct index.

In [None]:
def process_guess(guess, word, progress):
    for index in range(len(word)):
        if guess == word[index]:
            progress[index] = guess
            

In [None]:
progress = ["s", "_", "_", "r", "_", "t"]
process_guess('e', 'secret', progress)
print(progress)

Each piece seems to work on its own, lets test it all together

In [None]:
def play_game():
    word = "secret"
    progress = ["_", "_", "_", "_", "_", "_"]
    while True:
        display_progress(progress)
        guess = get_guess()
        process_guess(guess, word, progress)

In [None]:
play_game()

Oops, we forgot to end the game!

In [None]:
def play_game():
    word = "secret"
    progress = ["_", "_", "_", "_", "_", "_"]
    while True:
        display_progress(progress)
        guess = get_guess()
        process_guess(guess, word, progress)
        if game_over(progress):
            print("You win!")
            break

In [None]:
def game_over(progress):
    for letter in progress:
        if letter == "_":
            return False
    return True

In [None]:
game_over(["a", "b", "c"])

In [None]:
game_over(["a", "_", "c"])

Or, again, a shorter way using one of the sequence operations: `in`

In [None]:
def game_over(progress):
    return "_" not in progress

In [None]:
game_over(["a", "b", "c"])

In [None]:
game_over(["a", "_", "c"])

In [None]:
def play_game():
    word = "secret"
    progress = ["_", "_", "_", "_", "_", "_"]
    while True:
        display_progress(progress)
        guess = get_guess()
        process_guess(guess, word, progress)
        if game_over(progress):
            print("You win!")
            break

In [None]:
play_game()

What if we want to use a different word? We can parameterize `play_game`:

In [None]:
def play_game(word):
    progress = ["_"] * len(word)
    while True:
        display_progress(progress)
        guess = get_guess()
        process_guess(guess, word, progress)
        if game_over(progress):
            print("You win!")
            break

In [None]:
play_game("abba")

More things to think about:
* How would you keep track of how many incorrect guesses a user had made, and stop after 5 incorrect?
* How would you keep track of which letters the user had already guessed?
* How could you handle the user entering an upper case letter?