<a href="https://colab.research.google.com/github/GamerNerd-i/CMSI-1010_Recitation-Examples/blob/main/Week%204/strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Strings
By now, you've already used strings quite a lot, but for completeness, we'll redefine them here.

> Strings are a data type that represent text.

That's pretty much it, but that's not all you should know about them. 

## String Operations
### Concatenation
Even if you haven't been calling it concatenation, you've already used this string technique.

> **Concatenation** means putting multiple strings together into longer strings.

> Strings are concatenated by using the `+` operator.

Simple as that.

In [None]:
name = "Jonathan"
greeting = "Hello " + name + "!"

print(greeting)

Okay, that's not ALL you need to know. The crucial thing about concatenation is that it lets us display information of other data types within more informative strings without losing the previous information.

For example, when you print something like a number without concatenation, you'd just get a number. This isn't a problem with one number, but you're unlikely to have one number. With concatenation, it's easy to label your numbers so you know exactly what they are.

In [None]:
def get_stats(numbers):
    # Don't worry about some of the stuff used in this function for now.
    # Just know that it gives back the mean, median, maximum, and minimum values of a list of numbers.
    
    list_size = len(numbers)
    
    mean = sum(numbers) / list_size
    median = sorted(numbers)[list_size // 2]
    max = max(numbers)
    min = min(numbers)
    
    # You should, however, remember that functions can have multiple returns.
    return mean, median, max, min

random_numbers = [6, 20, 10, 4, 9]

# With this, we just see the numbers printed.
# That's okay if we know exactly the order in which the function gives us the values,
## but we won't always, or we may forget while looking at the output
print(get_stats(random_numbers))
print()

# Let's format these numbers into something nice and readable.
random_mean,random_median, random_max, random_min = get_stats(random_numbers)

print("Mean of " + str(random_numbers) + ": " + str(random_mean))
print("Median of " + str(random_numbers) + ": " + str(random_median))
print("Maximum of " + str(random_numbers) + ": " + str(random_max))
print("Minimum of " + str(random_numbers) + ": " + str(random_min))

One thing to keep in mind with concatenation, which you have also probably discovered, but bears repeating.

> Only strings can be concatenated to other strings.

Information of other data types, such as `random_numbers` (list) and `random_mean`, `random_median`, `random_max`, and `random_min` (numbers), must first be casted to a string by wrapping them in `str()`.

### Multiplication
You're less likely to have used multiplication, but it's also pretty self-explanatory.

> String multiplication duplicates the string a specified number of times and concatenates them together.

> Strings are multiplied with the `*` operator.

In [None]:
greeting = "Hello! "

print(greeting * 5)
# It also works the other way, just like numerical multiplication.
print(5 * greeting)

Note that, of course, you can only a multiply a string and an integer. That means no decimals. Negatives are *technically* okay, but they give you an empty string output so it's not really helpful.

In [None]:
farewell = "I'm gone! "
print(farewell * -2)

## Substrings, Indexing, and Slicing
### Substrings
> A **substring** is a specific segment of a larger string.

For example, if we have a string `"I like trains"`, each of the words `"I"`, `"like"`, and `"trains"` are valid substrings. `"I like"` and `"like trains"` are also valid substrings. `"ke tr"`, though nonsensical would also be a valid substring. `"I trains"` is *not* a valid substring because, even though all the characters exist in the original string, they never appear in the string *in that order*.

To quickly test whether one string is a substring of another, you can use the `in` keyword. The entire expression will become a boolean.

In [None]:
word = "macuahuitl" # If you're curious, this is a Mesoamerican greatsword.

print("mac" in word)
print("hui" in word)
print("i" in word)
print("" in word) # This is true! The empty string exists in every string.
print("not in the word" in word)

In informal language, if we're extracting a substring we might say that we are *substringing* it from the original string. For example, I might say that I can "substring `mac` from `macuahuitl`". **Keep in mind that this is informal language and may not be understood by everyone.**

### Indexing and Slicing
Indexing and slicing are both techniques for getting a substring from a larger string.

> **Indexing** obtains a single character from a string, based on its position.

> **Slicing** obtains a substring of any length from a string, including empty or single-character strings.

Why would we want to do this? Maybe you're extracting someone's first and last names from their full name. Or maybe you have a secret code where you have to piece together certain words from a nonsensical sentence.

For a real-world example, your DNA is encoded by 4 bases represented by A, C, G, and T, so DNA sequences are represented by long strings of just these 4 letters! With indexing and slicing, you can identify which combination occurs at different positions.

We're getting ahead of ourselves, though.

> Both indexing and slicing are accomplished by using square brackets on a string: `str_name[index]`.
> * **Indexing** only requires a single `index`.
> * **Slicing** requires two indices, separated by `:`, to mark the beginning and end of the substring: `str_name[begin:end]`.

Oh, and one more thing about ALL Python indices:

> Python uses **zero-indexing**, meaning that indices **start at 0**. The first item in a sequence is at index 0, the second is at index 1, etc. In general, *the `n`th item in a sequence can be found at index `n-1`*.

Let's see it in action.

In [None]:
amazing = "supercalifragilisticexpialidocious"
# Wow, writing that out IS something quite atrocious...

print("What's the letter at index 20 of " + amazing + "?")
print(amazing[20] + "\n")

print("What is the slice at indices 13-20?")
print(amazing[13:20])

Notice that the character `amazing[20]` is excluded from the substring `amazing[13:20]`. The indices `begin:end` for slicing follow the same rules as `range()`. That is, the substring will *include `begin`* and *exclude `end`*. 

> If you need a refresher, the `range()` function was covered in [this notebook about loops](https://github.com/GamerNerd-i/CMSI-1010_Recitation-Examples/blob/main/Week%203/loops.ipynb).

#### Negative Indices
> Negative numbers are valid indices for slicing and indexing, and will count from the end of the string.

The index -1 represents the last letter, -2 is the second-to-last, and so on.

In [None]:
print("What is the 5th-to-last letter?")
print(amazing[-5] + "\n")

print("What do you get if you chop off the first and last 4 letters?")
print(amazing[4:-4])

Note that, because of zero-indexing, *the `n`th-to-last item is actually at index `-n`*. Don't get your positive and negative indexing mixed up!

#### Slicing Defaults
Slicing has an additional shortcut to be aware of.

> When slicing a string, leave either side of `:` empty will extend the slice to the start or end of the string.

Examples follow. Again, note that these still follow the end-exclusion rule.

In [None]:
print("What are the last 5 letters?")
print(amazing[-5:] + "\n")

print("What are the first 8 letters?")
print(amazing[:8] + "\n")

# Don't forget that indices are just ints, so we can use math expressions to dynamically fill them if necessary.
print("What 6 letters are in the middle?")
print(amazing[(len(amazing) // 2) - 3 : (len(amazing) // 2) + 3])

**Bonus Question!** What does `amazing[:]` give us?

<details>
    <summary>Answer</summary>

The substring extends to both the beginning and end of the target string. In other words, we just get the whole string!

</details>

## String Functions
Many functions operate on string types. They're helpful for a variety of situations, so it's important to know what's in your toolbox, even if you don't use all of them often.

### String Length
> The `len()` function returns the length of a string as an integer.

The length of a string is how many characters it contains. For example, a string full of spaces, tabs (`\t`), or newline characters (`\n`) might not seem to have any text, but it still has a length; you just can't see the characters.

`len()` is arguably second-most used string function in all of Python (second only to `print()`). There are a lot of cases when we're concerned with the length of string. For example, you may have noticed that I used it in the example above for slicing.

In [None]:
print("What 6 letters are in the middle?")
print(amazing[(len(amazing) // 2) - 3 : (len(amazing) // 2) + 3])

Given the message and what you now know about `len()`, can you explain *why* this expression gives us the 6 middle letters?

Remember that `//` means integer division: after dividing, it drops any decimal that exists.

<details>
    <summary>Answer</summary>

Halving the length of the string (`len(amazing) // 2`) gives us the index of the character in the very middle. From there, we subtract 3 from the index to include the 3 previous letters, and add 3 to include the 3 following letters, for a total of 6 letters.

</details>

### Case-Changing
> The `string.upper()` and `string.lower()` methods let you change the case of letters in a string.

Pretty self-explanatory, except for one thing: these are *methods*, not functions. We'll go over what exactly this entails later, but for now, it means that you need to call it *on a string* instead of passing the string as a parameter.

In [None]:
word = "WaCKy cApS"

print(word.upper())
print(word.lower())
print()

# A variable is also not necessary to use these, but it looks a bit strange.
print("lowercase".upper())
print("UPPERCASE".lower())

You would mainly use this for matching text. Running input text through `string.upper()` or `string.lower()` makes sure that the matching isn't thrown off by cases, for example if someone mistypes their name.

In [None]:
miscapitalized = "JOhn APpleseed"
real_input = "John Appleseed"

print(miscapitalized == real_input)
print(miscapitalized.lower() == real_input.lower())

It's important to note that these methods return *brand new* strings with the given changes. The original strings are untouched.

### Split and Join
Many programs require strings and lists to change between types. In the most common case, a string containing a sentence will be changed to a list of words. Or we might build a list of words and then change the whole list into one string for output.

Either way, `string.split()` and `string.join()` enable these (mostly) seamless transformations.

#### Split
> The `string.split(sep)` method divides a string into a list of substrings, splitting each substring around `sep`.

`sep` (short for "separator") becomes the marker for when one substring ends and another begins. For example, it defaults to a space, so that we can split sentences into individual words.

In [None]:
# This is an epic-sounding alternative to "The quick brown fox jumps over the lazy dog."
# It's even shorter by 6 letters!
invocation = "Sphinx of black quartz, judge my vow."

# As stated, splitting on nothing makes Python assume you want to split on spaces.
word_list = invocation.split()
print(word_list)

Any string is able to be a "splitter."

In [None]:
amazing = "supercalifragilisticexpialidocious"

print(amazing.split("a"))
print(amazing.split("li"))

Notice that the splitter is entirely removed from the output. Also notice that, like `upper()` and `lower()`, this is a method and not a function, so it needs to be called *on a string*.

#### Join
> The `string.join(list)` method unifies a list of substrings into a single string, connecting them with `string`.

`string` here is the equivalent of `sep` above: instead of removing `sep` and creating substrings, we add `string` between each item in `list` and concatenate them to a single string.

Like before, any string can be a "joiner."

In [None]:
motivation = ["Just", "Do", "It"]

print("! ".join(motivation))
print("".join(motivation))

Even though lists can technically contain multiple data types, `join` fails if it encounters anything that's not a string.

In [None]:
fail = ["what", "is", 1, "+", 1]
print(" ".join(fail))

Once again, notice that this is a method that needs to be called on a string, but receives a list as input.

### Find and Replace
No text editor would be complete without find and replace functionality.

#### Find
> The `string.find(sub)` method returns the index of the first instance of `sub` in `string`. If `sub` doesn't exist in `string`, it returns -1.

Earlier, we used `in` to check if a substring exists in a string at all. `string.find()` points us to where exactly in the string the substring can be found. This could be useful if we want to find where a certain section of text begins or ends, so that we can properly extract the remaining part that we want.

In [None]:
invocation = "Sphinx of black quartz, judge my vow."

command_index = invocation.find("judge")
print(invocation[command_index:])

print(invocation.find("dummy"))

#### Replace
> The `string.replace(old, new)` method returns a new string that replaces every instance of `old` with an instance of `new` in `string`.

Note that this replaces **all** occurrences of `old`. If you want to only replace one instance, it'll take some extra work on your end. Otherwise, this is good for altering many parts of a long string at the same time.

In [None]:
amazing = "supercalifragilisticexpialidocious"

print(amazing.replace("i", "oooooooo"))

Like `upper()` and `lower()`, the new string is a copy of the original. The original string is unchanged.