# Introduction to Strings
---

This notebook covers the topic of strings and their importance in the world of programming. You will learn various methods that will help you manipulate these strings and make useful inferences with them. This notebook assumes that you have already completed the "Introduction to Data Science" notebook.

*Estimated Time: 30 minutes*

---

**Topics Covered:**
- Objects
- String Concatenation
- Loops
- String Methods

**Dependencies:**

In [None]:
import numpy as np
from datascience import *

## What Are Objects?

Objects are used everywhere when you're coding - even when you dont know it. But what really is an object? 

By definition, an object is an **instance** of a **class**. They're an **abstraction**, so they can be used to manipulate data. That sounds complicated, doesn't it? Well, to simplify, think of this: a class is a huge general category of something which holds particular attributes (variables) and actions (functions). Let's assume that Mars has aliens called Xelhas and one of them visits Earth. The Xelha species would be a class, and the alien itself would be an *instance* of that class (or an object). By observing its behavior and mannerisms, we would be able to see how the rest of its species goes about doing things.

Strings are objects too, of the *String* class, which has pre-defined methods that we use. But you don't need to worry about that yet. All you should know is that strings are **not** "primitive" data types, such as integers or booleans. That being said, let's delve right in.

Try running the code cell below:


In [None]:
5 + "5"

Why did that happen?

This can be classified as a *type* error. As mentioned before, Strings are not primitive data types, like integers and booleans, so when you try to **add** a string to an integer, Python gets confused and throws an error. The important thing to note here is that a String is a String: no matter what its contents may be. If it's between two quotes, it has to be a String. 

But what if we followed the "same type" rule and tried to add two Strings? Let's try it.


In [None]:
"5" + "5"

What?! How does 5 + 5 equal 55?

This is known as concatenation.

## Concatenation

"Concatenating" two items means literally combining or joining them. 

When you put the + operator with two or more Strings, Python will take all of the content inside quotes and club it all together to make one String. This process is called **concatenation**.

The following examples illustrate how String concatenation works:

In [None]:
"Berk" + "eley"

In [None]:
"B" + "e" + "r" + "k" + "e" + "l" + "e" + "y"

Here's a small exercise for you, with a lot of variables. Try making the output "today is a lovely day".

_Hint: Remember to add double quotes with spaces " " because Python literally clubs all text together._


In [None]:
a = "oda"
b = "is"
c = "a"
d = "l"
e = "t"
f = "y"
g = "lo"
h = "d"
i = "ve"

# your expression here

## String methods

The String class is great for regular use because it comes equipped with a lot of built-in functions with useful properties. These functions, or **methods** can fundamentally transform Strings. Here are some common String methods that may prove to be helpful.

### Replace
For example, the *replace* method replaces all instances of some part of a string with some replacement. A method is invoked on a string by placing a . after the string value, then the name of the method, and finally parentheses containing the arguments.

    <string>.<method name>(<argument>, <argument>, ...)

Try to predict the output of these examples, then execute them.

In [None]:
# Replace one letter
'Hello'.replace('e', 'i')

In [None]:
# Replace a sequence of letters, which appears twice
'hitchhiker'.replace('hi', 'ma')

Once a name is bound to a string value, methods can be invoked on that name as well. The name doesn't change in this case, so a new name is needed to capture the result.

Remember, a string method will replace **every** instance of where the replacement text is found.

In [None]:
sharp = 'edged'
hot = sharp.replace('ed', 'ma')
print('sharp =', sharp)
print('hot =', hot)

Another very useful method is the **`split`** method. It takes in a "separator string" and splits up the original string into an array, with each element of the array being a separated portion of the string.

Here are some examples:

In [None]:
"Another very useful method is the split method".split(" ")

In [None]:
string_of_numbers = "1, 2, 3, 4, 5, 6, 7"
arr_of_strings = string_of_numbers.split(", ")
print(arr_of_strings) # Remember, these elements are still strings!
arr_of_numbers = []
for s in arr_of_strings: # Loop through the array, converting each string to an int
    arr_of_numbers.append(int(s))
print(arr_of_numbers)

As you can see, the `split` function can be very handy when cleaning up and organizing data (a process known as _parsing_).

## Loops

What do you do when you have to do the same task repetitively? Let's say you have to say Hi to someone five times. Would that require 5 lines of "print('hi')"? No! This is why coding is beautiful. It allows for automation and takes care of all the hard work. 

Loops, in the literal meaning of the term, can be used to repeat tasks over and over, until you get your desired output. They are also called "iterators", and they are defined using a variable which changes (either increases or decreases) with each loop, to keep a track of the number of times you're looping.

The most useful loop to know for the scope of this course is the **for** loop.

A for statement begins with the word *for*, followed by a name we want to give each item in the sequence, followed by the word *in*, and ending with an expression that evaluates to a sequence. The indented body of the for statement is executed once for each item in that sequence.

    for *variable* in *np.arange(0,5)*:
    
Don't worry about the np.arange() part yet. Just remember that this expression produces a sequence, and Strings are sequences too! So let's try our loop with Strings!
    
    for each_character in "John Doenero":
        *do something*
        
Interesting! Let's put our code to test.


In [None]:
for each_character in "John Doenero":
    print(each_character)

Cool, right? Now let's do something more useful. 

Write a for loop that iterates through the sentence "Hi, I am a quick brown fox and I jump over the lazy dog" and checks if each letter is an *a*. Print out the number of a's in the sentence.

_Hint: try combining what you've learnt from conditions and use a counter._

In [None]:
# your code here
for ...
    

## Conclusion
---
Congratulations! You have learned the basics of String manipulation in Python.

## Bibliography
---
Some examples adapted from the UC Berkeley Data 8 textbook, <a href="https://www.inferentialthinking.com">*Inferential Thinking*</a>.

Authors:
- Shriya Vohra
- Scott Lee
- Pancham Yadav