<a href="https://colab.research.google.com/github/GamerNerd-i/CMSI-1010_Recitation-Examples/blob/main/Week%204/strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Strings
By now, you've already used strings quite a lot, but for completeness, we'll redefine them here.

> Strings are a data type that represent text.

That's pretty much it, but that's not all you should know about them. 

## String Operations
### Concatenation
Even if you haven't been calling it concatenation, you've already used this string technique.

> **Concatenation** means putting multiple strings together into longer strings.

> Strings are concatenated by using the `+` operator.

Simple as that.

In [None]:
name = "Jonathan"
greeting = "Hello " + name + "!"

print(greeting)

Okay, that's not ALL you need to know. The crucial thing about concatenation is that it lets us display information of other data types within more informative strings without losing the previous information.

For example, when you print something like a number without concatenation, you'd just get a number. This isn't a problem with one number, but you're unlikely to have one number. With concatenation, it's easy to label your numbers so you know exactly what they are.

In [None]:
def get_stats(numbers):
    # Don't worry about some of the stuff used in this function for now.
    # Just know that it gives back the mean, median, maximum, and minimum values of a list of numbers.
    
    list_size = len(numbers)
    
    mean = sum(numbers) / list_size
    median = sorted(numbers)[list_size // 2]
    max = max(numbers)
    min = min(numbers)
    
    # You should, however, remember that functions can have multiple returns.
    return mean, median, max, min

random_numbers = [6, 20, 10, 4, 9]

# With this, we just see the numbers printed.
# That's okay if we know exactly the order in which the function gives us the values,
## but we won't always, or we may forget while looking at the output
print(get_stats(random_numbers))
print()

# Let's format these numbers into something nice and readable.
random_mean,random_median, random_max, random_min = get_stats(random_numbers)

print("Mean of " + str(random_numbers) + ": " + str(random_mean))
print("Median of " + str(random_numbers) + ": " + str(random_median))
print("Maximum of " + str(random_numbers) + ": " + str(random_max))
print("Minimum of " + str(random_numbers) + ": " + str(random_min))

One thing to keep in mind with concatenation, which you have also probably discovered, but bears repeating.

> Only strings can be concatenated to other strings.

Information of other data types, such as `random_numbers` (list) and `random_mean`, `random_median`, `random_max`, and `random_min` (numbers), must first be casted to a string by wrapping them in `str()`.

### Multiplication
You're less likely to have used multiplication, but it's also pretty self-explanatory.

> String multiplication duplicates the string a specified number of times and concatenates them together.

> Strings are multiplied with the `*` operator.

In [None]:
greeting = "Hello! "

print(greeting * 5)
# It also works the other way, just like numerical multiplication.
print(5 * greeting)

Note that, of course, you can only a multiply a string and an integer. That means no decimals. Negatives are *technically* okay, but they give you an empty string output so it's not really helpful.

In [None]:
farewell = "I'm gone! "
print(farewell * -2)

## Substrings, Indexing, and Slicing
### Substrings
> A **substring** is a specific segment of a larger string.

For example, if we have a string `"I like trains"`, each of the words `"I"`, `"like"`, and `"trains"` are valid substrings. `"I like"` and `"like trains"` are also valid substrings. `"ke tr"`, though nonsensical would also be a valid substring. `"I trains"` is *not* a valid substring because, even though all the characters exist in the original string, they never appear in the string *in that order*.

To quickly test whether one string is a substring of another, you can use the `in` keyword. The entire expression will become a boolean.

In [None]:
word = "macuahuitl" # If you're curious, this is a Mesoamerican greatsword.

print("mac" in word)
print("hui" in word)
print("i" in word)
print("" in word) # This is true! The empty string exists in every string.
print("not in the word" in word)

In informal language, if we're extracting a substring we might say that we are *substringing* it from the original string. For example, I might say that I can "substring `mac` from `macuahuitl`". **Keep in mind that this is informal language and may not be understood by everyone.**

### Indexing and Slicing
Indexing and slicing are both techniques for getting a substring from a larger string.

> **Indexing** obtains a single character from a string, based on its position.

> **Slicing** obtains a substring of any length from a string, including empty or single-character strings.

Why would we want to do this? Maybe you're extracting someone's first and last names from their full name. Or maybe you have a secret code where you have to piece together certain words from a nonsensical sentence.

For a real-world example, your DNA is encoded by 4 bases represented by A, C, G, and T, so DNA sequences are represented by long strings of just these 4 letters! With indexing and slicing, you can identify which combination occurs at different positions.

We're getting ahead of ourselves, though.

> Both indexing and slicing are accomplished by using square brackets on a string: `str_name[index]`.
> * **Indexing** only requires a single `index`.
> * **Slicing** requires two indices, separated by `:`, to mark the beginning and end of the substring: `str_name[begin:end]`.

Oh, and one more thing about ALL Python indices:

> Python uses **zero-indexing**, meaning that indices **start at 0**. The first item in a sequence is at index 0, the second is at index 1, etc. In general, *the `n`th item in a sequence can be found at index `n-1`*.

Let's see it in action.

In [None]:
amazing = "supercalifragilsiticespialidocious"
# Wow, writing that out IS something quite atrocious...

print("What's the letter at index 20 of " + amazing + "?")
print(amazing[20] + "\n")

print("What is the slice at indices 13-20?")
print(amazing[13:20])

Notice that the character `amazing[20]` is excluded from the substring `amazing[13:20]`. The indices `begin:end` for slicing follow the same rules as `range()`. That is, the substring will *include `begin`* and *exclude `end`*. 

> If you need a refresher, the `range()` function was covered in [this notebook about loops](https://github.com/GamerNerd-i/CMSI-1010_Recitation-Examples/blob/main/Week%203/loops.ipynb).

#### Negative Indices
> Negative numbers are valid indices for slicing and indexing, and will count from the end of the string.

The index -1 represents the last letter, -2 is the second-to-last, and so on.

In [None]:
print("What is the 5th-to-last letter?")
print(amazing[-5] + "\n")

print("What do you get if you chop off the first and last 4 letters?")
print(amazing[4:-4])

Note that, because of zero-indexing, *the `n`th-to-last item is actually at index `-n`*. Don't get your positive and negative indexing mixed up!

#### Slicing Defaults
Slicing has an additional shortcut to be aware of.

> When slicing a string, leave either side of `:` empty will extend the slice to the start or end of the string.

Examples follow. Again, note that these still follow the end-exclusion rule.

In [None]:
print("What are the last 5 letters?")
print(amazing[-5:] + "\n")

print("What are the first 8 letters?")
print(amazing[:8] + "\n")

# Don't forget that indices are just ints, so we can use math expressions to dynamically fill them if necessary.
print("What 4 letters are in the middle?")
print(amazing[(len(amazing) // 2) - 3 : (len(amazing) // 2) + 3])

**Bonus Question!** What does `amazing[:]` give us?

<details>
    <summary>Answer</summary>

The substring extends to both the beginning and end of the target string. In other words, we just get the whole string!

</details>