# Introduction to Strings
---

This notebook covers the topic of Strings and their importance in the world of programming. You will learn various methods that will help you manipulate these Strings and make useful inferences with them. This notebook assumes that you have already completed the "Introduction to Data Science" notebook.

*Estimated Time: 30 minutes*

---

**Topics Covered:**
- Objects
- String Concatenation
- Loops
- String Methods

**Dependencies:**

In [None]:
import numpy as np
from datascience import *

## What Are Objects?

Objects are used very frequently when you're coding - even when you dont know it. But what really is an object? 

By definition, an object is an **instance** of a **class**. They're an **abstraction**, so they can be used to manipulate data. That sounds complicated, doesn't it? Well, to simplify, think of this: a class is a huge general category of something which holds particular attributes (variables) and actions (functions). Let's assume that Mars has aliens called Xelhas and one of them visits Earth. The Xelha species would be a class, and the alien itself would be an *instance* of that class (or an object). By observing its behavior and mannerisms, we would be able to see how the rest of its species goes about doing things.

Strings are objects too, of the *String* class, which has pre-defined methods that we use. But you don't need to worry about that yet. All you should know is that Strings are **not** "primitive" data types, such as integers or booleans. That being said, let's delve right in.

Try running the code cell below:


In [2]:
5 + "5" 

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Why did that happen?

This can be classified as a *type* error. As mentioned before, Strings are not primitive data types, like integers and booleans, so when you try to **add** a String to an integer, Python gets confused and throws an error. The important thing to note here is that a String is a String: no matter what its contents may be. If it's between two quotes, it has to be a String. 

But what if we followed the "same type" rule and tried to add two Strings? Let's try it.


In [None]:
"5" + "5"

What?! How does 5 + 5 equal 55?

This is known as concatenation.

## Concatenation

"Concatenating" two items means literally combining or joining them. 

When you put the + operator between two Strings, Python will take the contents from both Strings and mash them together to make one String. This process is called **concatenation**.

The following examples illustrate how String concatenation works:

In [4]:
"Berk" + "eley"

'Berkeley'

Concatenation happens left to right, like how would you expect. In the example below, "B" and "e" are first combined to get "Be", which is then combined with "r" to get "Ber", and so on.

In [5]:
"B" + "e" + "r" + "k" + "e" + "l" + "e" + "y"

'Berkeley'

**Exericse 1**

Here's a small exercise for you. In the variable _expression_, create the String "today is a lovely day" using only the variables provided for letters.

_Hint: Remember to add spaces between words (double quotes with a space in between, like " ") because Python literally clubs all text together._


In [8]:
a = "oda"
b = "is"
c = "a"
d = "l"
e = "t"
f = "y"
g = "lo"
h = "d"
i = "ve"

In [9]:
expression = # Your Code Here
print(expression)

SyntaxError: invalid syntax (<ipython-input-9-c2e46a62476f>, line 1)

## String methods

The String class is great for regular use because it comes equipped with a lot of built-in functions with useful properties. These functions, or **methods**, can fundamentally transform Strings. Here are some common String methods that may prove to be helpful.

### Replace
For example, the *replace* method replaces all instances of some part of a String with some replacement. A method is invoked on a String by placing a dot after the string value, then the name of the method, and finally parentheses containing the arguments.

    <String>.<method name>(<argument>, <argument>, ...)

Try to predict the output of these examples, then execute them.

In [11]:
# Replace one letter
'Hello'.replace('e', 'i')

'Hillo'

In [12]:
# Replace a sequence of letters, which appears twice
'hitchhiker'.replace('hi', 'ma')

'matchmaker'

When you call a method on a String stored in a variable, it does not fundamentally change the String stored in the variable, so you have to save the result of calling a method into a new variable to see the change. 

Remember, a String method will replace **every** instance of where the replacement text is found.

In [13]:
sharp = 'edged'
hot = sharp.replace('ed', 'ma')
print('sharp =', sharp)
print('hot =', hot)

sharp = edged
hot = magma


Another very useful method is the **`split`** method. It takes in a **separator string** and based on the separator, splits up the original string into an array, with each element of the array being a separated portion of the string.

This is how we create a split call, where String indicates something of the String class, whether it be a variable name that contains a String, or just something in quotes:

`[String].split([String])`


Here, we split this sentence based on spaces, so every individual word is an element in our final array.

In [15]:
"Another very useful method is the split method".split(" ")

['Another', 'very', 'useful', 'method', 'is', 'the', 'split', 'method']

In the below example, we have a String of numbers and split based on the comma. We thus get an array of numbers, but which are in quotes, so they are still Strings. We can loop through every element in our array to convert all of the Strings of numbers into ints.

In [17]:
string_of_numbers = "1, 2, 3, 4, 5, 6, 7"
arr_of_strings = string_of_numbers.split(", ")
print(arr_of_strings) # Remember, these elements are still Strings!

arr_of_numbers = []
for s in arr_of_strings: # Loop through the array, converting each String to an int
    arr_of_numbers.append(int(s))
print(arr_of_numbers)

['1', '2', '3', '4', '5', '6', '7']
[1, 2, 3, 4, 5, 6, 7]


As you can see, the `split` function can be very handy when cleaning up and organizing data (a process known as _parsing_).

## Loops

What do you do when you have to do the same task repetitively? Let's say you have to say Hi to someone five times. Would that require 5 lines of "print('hi')"? No! This is why coding is beautiful. It allows for automation and takes care of all the hard work. 

Loops, in the literal meaning of the term, can be used to repeat tasks over and over, until you get your desired output. 

The most useful loop to know for the scope of this course is the **for** loop.

A for statement begins with the word *for*, followed by a name we want to give each item in the sequence, followed by the word *in*, and ending with an expression that evaluates to a sequence. The indented body of the for statement is executed once for each item in that sequence.

    for *variable* in *sequence*:
        *body of function*
        
Let's see what happens with this for loop.

In [19]:
for each_character in "John DeNero":
    print(each_character)

J
o
h
n
 
D
e
N
e
r
o


In this case, we are looping through the characters inside of the String "John DeNero". At the first iteration of our for loop, each_character is at the character "J", so it prints the "J". At the next iteration, we reach the character "o", so that is printed, and so on.

**Exercise 2**

Write a for loop that iterates through the sentence "Hi, I am a quick brown fox and I jump over the lazy dog" and checks if each letter is an *a*. Print out the number of a's in the sentence.

_Hint: try combining what you've learnt from conditions and use a variable as a counter._

In [None]:
# Your Code Here
for ...
    

## Conclusion
---
Congratulations! You have learned the basics of String manipulation in Python.

## Bibliography
---
Some examples adapted from the UC Berkeley Data 8 textbook, <a href="https://www.inferentialthinking.com">*Inferential Thinking*</a>.

Authors:
- Shriya Vohra
- Scott Lee
- Pancham Yadav