# Introduction to strings

Lets move on! now we are going to learn about a couple of data structures that will continue to build up your power in Python.

## Objectives:

At the end of this notebooks, you will be able to:

- do operations using strings
- address individual characters in strings
- describe different types of string formatting

Here we are going to learn about another common data type, **strings**. From a high-level perspective, a string is just a bit of text. This <br> could be text that you have read in from a file, html that you have pulled from the internet, or any other text. From Python's perspective,<br> a string (type ```str```) is simply a collection of encoded characters. Wait, what's an encoding...?

An encoding is just a fancy way of us saying that the characters in our string follow a certain format, or structure. The reason this <br> matters to us in terms of our Python programs is that, Python expects our strings to be in one of a couple of different encodings<br> (either ```ASCII```, ```utf-8```, or ```unicode```). This isn't something you will run into often, and especially not when defining your own strings<br> (it's probably most prevalent when pulling text from the internet). However, it's worth noting because there is a good chance<br> that sometime in your Python career, you will end up with Python telling you it doesn't recognize a certain character in one of <br> your strings, and an unexpected encoding will most likely be at the heart of that error.

In Python, strings are recognized as a collection of characters surrounded by a set of either single quotation marks (```'...'```) or <br> double quotation marks (```"..."```). So as long as you open and close your string with a **matching** set of single or double quotation marks, <br> you are free to use either. The single caveat to that is that if you are writing an expression with a single quotation mark in it (such as <br> "Don't do that"), you will **have to** use a matching set of **double** quotation marks. Let's experiment with some strings...

In [1]:
'This is a string.'

'This is a string.'

In [1]:
"This is another string, with double quotation marks."

'This is another string, with double quotation marks.'

In [2]:
'They had told me this already, but I didn't listen.' 

SyntaxError: invalid syntax (523989065.py, line 1)

Just like we expected, we can use both single and double quotation marks. What happened in the 3rd case there? Well, we opened the <br> string with a single quotation mark, and Python started looking for the next single quotation mark to close the string. When it found <br> that quotation mark in the word ```didn't```, it assumed the string was closed after ```didn```. As a result, this left ```t listen.'``` just <br> hanging out, and Python didn't know how to interpret that, resulting in our error. The solution to this, as mentioned above, is to use <br> double quotation marks in any case where your text will have single quotation marks in it. For example...

In [4]:
"Now that I've got double quotes, I can use all the contractions!"

"Now that I've got double quotes, I can use all the contractions!"

In [3]:
"Can't, won't, didn't, don't... all of them!"

"Can't, won't, didn't, don't... all of them!"

As a final note before we dive into string operations, we can store strings in variables in the exact same way that we can store an ```int```, <br> ```float```, or ```complex```.

In [4]:
my_str_variable = 'This is a string variable.' 

In [5]:
my_str_variable # my_str_variable holds the string that we put in it in the above cell. 

'This is a string variable.'

### String Operations

Surprisingly, a couple of our standard mathematical operations will work on strings, namely ```+``` and ```*```. We can use the ```+``` operator to <br> add two strings together (this is known as string concatenation), and we can use the ```*``` operator to repeat a string a given number of <br> times. Let's take a look...

In [8]:
'My first string' + 'My second string'

'My first stringMy second string'

In [9]:
'Repeating string' * 3

'Repeating stringRepeating stringRepeating string'

Note that Python didn't put spaces between the strings with either the ```+``` operator or the ```*``` operator. Why not? Because it wasn't <br> told to! In this case, and in programming in general, we have to be extremely explicit about what we want the computer to do. To fix <br> this, we can add a space in the middle of the first case, and then add a space to the end of our string in the second case.

In [6]:
'My first string' + ' ' + 'My second string'

'My first string My second string'

In [7]:
'Repeating string ' * 3

'Repeating string Repeating string Repeating string '

That looks much better! But, what about that pesky little space at the end of our second string: ```'Repeating string Repeating ``` <br> ```string Repeating string '```. Is there a way to remove this? It turns out there is! One of the methods (a name for a function that is <br> attached to a particular object) that we can call on strings is the ```strip()``` method. Methods are something that we will cover in much <br> more depth later, but for now just note that we call them on our objects through **dot notation**. We simply place a ```.``` at the end of our <br> object (```str```, ```int```, ```float```, ```any variable```, etc.), and then call the method by name. Here's how the use of this **dot notation** looks in <br> practice.

In [8]:
'Repeating string Repeating string Repeating string '.strip()

'Repeating string Repeating string Repeating string'

In [9]:
' Repeating string Repeating string Repeating string '.strip()

'Repeating string Repeating string Repeating string'

So, what did the ```strip()``` method do? In the first example, it removed the trailing space from the string. In the second example, it <br> removed both the leading and trailing spaces. This is exactly what the ```strip()``` method does - by default (without any arguments), it <br> removes leading and trailing whitespace (_note, the method can actually remove any leading or trailing characters if you pass them to <br>```strip()```, but whitespace is the default character that it removes_).

Are there other things that we can do with strings? There are tons! Let's store our string in a variable below, so we can get some <br> exposure working with strings in variables.

In [10]:
my_str_variable = 'below IS my STRING to PLAY around WITH.'

In [11]:
my_str_variable.capitalize()

'Below is my string to play around with.'

In [12]:
my_str_variable.upper()

'BELOW IS MY STRING TO PLAY AROUND WITH.'

In [13]:
my_str_variable.lower()

'below is my string to play around with.'

In [14]:
my_str_variable.replace('STR', 'fl')

'below IS my flING to PLAY around WITH.'

In [15]:
my_str_variable.split()

['below', 'IS', 'my', 'STRING', 'to', 'PLAY', 'around', 'WITH.']

These are some of the most commonly used string methods. You can see above what they do by default: ```capitalize()``` capitalizes <br> the first letter of the string and lowercases the rest; ```upper()``` converts all the letters in the string to uppercase, and ```lower()``` to <br> lowercase; ```replace()``` replaces all instances of a given substring in your string with another given substring; finally, ```split()``` splits <br> the string by an inputted string (whitespace by default, just as with ```strip()```). There are many more string methods available, and <br> you can check them out in the [docs](https://docs.python.org/2/library/stdtypes.html#string-methods).

### Working with individual charactors in strings

We know how to work with an entire string via some of the methods that we've discussed, but what if we wanted to work with the <br> individual characters? There are a couple of ways to do this, but the first we'll focus on is through indexing. We know that to Python, a <br> string is just a collection of characters. It turns out that we can access the individual characters simply by asking Python for a given <br> numbered element in our collection (i.e. the string). We do this by placing the element number that we want in square brackets ```[]```, <br> right after our string (or variable, if it's stored in one). This element number is referred to as the **index** of the character (or element, if <br> it's not a string - more on this soon).

In [16]:
my_str_variable = 'This is my Test String'
my_str_variable[1]

'h'

In [17]:
my_str_variable[5]

'i'

In [18]:
my_str_variable[-1]

'g'

In [19]:
my_str_variable[-3]

'i'

Using indices like this, we can access any element of a string. But why is the element at index 1 ```e```, and not ```T```? After all, ```T``` is the first <br> element in the string. Also, what are those negative numbers doing? In the case of the former, it turns out that Python (and many <br> programming languages) starts indexing at 0, which means that the first element in our string (and any collection that supports <br> indexing) is accessed via indexing at 0. We refer to languages that work this way as **zero indexed**. As for the negative numbers, this is <br> a way to access elements starting from the end of the string, rather than the beginning. Indexing from the end starts from -1 <br> and continues downwards from there. So, we would use ```-2``` to access the ```n``` in the string.

Note that we can also access any given number of the characters (any **substring**) by combining multiple index numbers separated by <br> a colon ```:```. For example:

In [20]:
my_str_variable[1:3]

'hi'

In [21]:
my_str_variable[5:9]

'is m'

In [22]:
my_str_variable[-6:-1]

'Strin'

In [23]:
my_str_variable[1:]

'his is my Test String'

In [24]:
my_str_variable[:-1]

'This is my Test Strin'

This indexing turns out to be pretty useful. You might notice, though, that when indexing from ```[1:3]```, only the letters at index 1 and 2 <br> are returned; when indexing from ```[5:9]```, we get the letters at indices 5, 6, 7, and 8. This is because the indices that you pass in are <br> inclusive on the left side, and exclusive on the right side. This means that when you index, you will grab letters from the starting index <br> that you give up to but not including letters at the ending index that you give.

What about those last two examples, where there isn't an ending index or a starting one? If you don't give an ending index, then <br> Python assumes that your ending index is the last index in the string. Similarly, if you don't give a starting index, Python assumes that <br> your starting index is the first index in the string. Remember, this is the zeroth index in Python (don't worry if this feels confusing, you'll <br> get used to it quickly).

Is there a way to grab elements at regular intervals in a string? For example, what if we wanted to grab every second letter? Python <br> allows us to do this by passing in an optional third number while indexing. This optional third number, also separated by a colon (```:```), <br> tells Python the step size by which to move through the string when indexing. So, if we wanted to grab every second letter from the <br> beginning to end, we could index with ```[::2]```. If we wanted to grab every 3rd letter from the letter at index 2 to the letter at index 10, <br> we could use the indexing ```[2:10:3]```.

In [25]:
my_str_variable[::2]

'Ti sm etSrn'

In [26]:
my_str_variable[2:10:3]

'iim'

Got it?? enough indexing already! Is there a way to cycle (or step through) each one of the letters one by one, and do something with the <br> conditional logic we learned, rather than just grabbing a certain letter or group of letters? Of course! (Why would I ask a question for <br> which the answer was no? That would be lame.)

### Iteration and Strings

We can cycle through all of the letters in our string (a process called **iteration**) in one of a couple of different ways. Let's first look at <br> cycling through with a ```while``` loop.

In [29]:
my_str, idx = 'hello', 0
while idx < 5:
    print(my_str[idx])
    idx += 1

h
e
l
l
o


This while loop will **iterate** over the letters of our string ```hello```, printing each one until ```idx``` reaches the value 6. Since we knew the <br> length of our string (i.e it's 5 letters long), we knew that we could use the condition ```while idx < 5:``` for our loop checking, and <br> ensure that all the letters would be printed. What if we didn't know the length ahead of time, though? There is actually a function that <br> we can use to figure this out (we'll talk much more about functions and how they work later). It's ```len()```, and we simply call ```len()``` <br> with our string passed as an argument, and it returns the length of our string.

In [30]:
my_str = 'hello'
len(my_str)

5

Now, we can write our while loop to be a little bit more general:

In [31]:
my_str, idx = 'hello', 0
while idx < len(my_str):
    print(my_str[idx])
    idx += 1

h
e
l
l
o


As this is a Codealong, you will learn more by typing the code yourself. The next line is there for you to exactly do that and see the <br> result.

Great! But we did mention that there are other ways to iterate over the letters in our string.

The other way that we can iterate over the letters in our string is to use a ```for``` loop. for loops are built off of the same idea of <br>```while``` loops (doing something over and over again), but instead of continuing until some condition is no longer met, ```for``` loops <br> operate directly on iterables. This leaves the concern about when to stop for Python to figure out. With a ```for``` loop, we don't have to <br> care how many iterations/cycles the loop will go through. Let's look at the syntax of a ```for``` loop.

In [32]:
my_str = 'hello'
for idx in range(len(my_str)):
    print(my_str[idx])

h
e
l
l
o


**Note:** the ```range()``` function (which we will cover in more depth when we get to functions) as used above simply gives us a list of <br> numbers from 0 up to but not including the inputted number. In the case above, since ```len(my_str)``` is 5, ```range(len(my_str))``` <br> returns a list of integers from 0 to 4.

This ```for``` loop does the exact same thing as the ```while``` loop we wrote above, but with slightly different syntax. How does it work? At <br> each iteration of the loop, ```idx``` is assigned one of the values in ```range(len(my_str))```, and then the code within the indented block <br> is run with that value of ```idx```. How does Python know what the values of ```idx``` will be? Python simply goes through the values of <br> whatever is after the ```in``` statement **in order**, and assigns those values to ```idx```, one at a time through each iteration of the loop. Since <br> ```range(len(my_str))``` returns to us a list of integers from 0 to 4, those values get assigned to ```idx``` as we run through the ```for``` <br> loop. 

Note that with our ```for``` loop, the ```idx``` variable is automatically changed, rather than us having to manually update it (like we did in  <br> the ```while``` loop). This is one of the incredibly nice aspects of ```for``` loops! But wait, it gets even better!

It turns out that the above implementation of our ```for``` loop is actually considered to be non-Pythonic. This is because the way that <br> ```for``` loops are constructed allows us to achieve the same output as above by writing the following:

In [33]:
my_str = 'hello'
for char in my_str:
    print(char)

h
e
l
l
o


What's going on here!? Well, instead of iterating over all of the integers in a ```range(len(my_str))``` call like we did in our first ```for``` <br> loop, we've gotten Python to simply iterate over all of the individual characters in our string, ```my_str```. In each iteration of this ```for``` <br> loop, char stores a different letter of ```my_str```, and then the call ```print char``` prints that character. In the end, we get the same <br> result as either of our ```while``` loops above, and the less Pythonic ```for``` loop that we wrote above. This way is considered to be the <br> Pythonic way to iterate over a string, and so it's an important concept to grasp

Why is it more Pythonic? That's a good question. When we say that something is more "Pythonic", this means that we are using the <br> language in such a way that makes your code both more readable and simultaneously uses Python's power to make your solutions <br> more optimal. Let's look at how this applies to the final implementation of our ```for``` loop.

We can see that it is more readable since we don't have to index into our string anymore. This means that there is less to follow along <br> with and keep track of; rather than keeping track of both the current index we are on and what letter that index corresponds to in our <br> string, all we have to keep track of is the current letter we're on. We can also note that our code just looks cleaner and more simple, <br> too. In terms of making our code more optimal, since we no longer have to index into the string to grab characters, we have fewer <br> steps in each iteration of the loop. This means less work for Python to do.

### A quick aside on String formatting

There's one more thing that we should talk about before moving on from our discussion of strings - string formatting. String formatting <br> is going to allow us to format strings in certain ways. Probably most usefully, it's going to allow us to insert variable contents into <br> strings dynamically. We'll get an idea of how and when this is most useful as we work through this course. For now, let's just look at <br> the syntax of it all.

In [34]:
my_name = 'Harry' 

In [35]:
print('Hello %s' % my_name)

Hello Harry


In [36]:
print('Hello {}'.format(my_name))

Hello Harry


In [37]:
print(f'Hello {my_name}')

Hello Harry


How is this working? Well, in each case, it's filling in a given part of our string with the value of our variable. In the first case, we use a <br> ```%``` sign to denote where the replacement should happen, followed by a letter to denote what type of variable will be passed in there (```s``` is<br> used for string, ```d``` is for a decimal, etc.). You can find what each letter denotes [here](https://docs.python.org/2/library/stdtypes.html#string-formatting). In the second case, we use brackets ```{}``` to denote<br> where the replacement should take place. We can also place numbers, or even variable names themselves inside these brackets and<br> reference them in the ```format()``` method or the f-string (```f"{variable}"```).

In [38]:
print('Hello {0}'.format(my_name))

Hello Harry


In [40]:
print('Hello {name}'.format(name=my_name))

Hello Harry


There are many more things you can do with it - you can read about them [here](https://docs.python.org/2/library/string.html#format-specification-mini-language).<br> In general, string formatting is much more readable and dynamic as compared to a bunch of concatenation.

## Check your understanding

**String Operation Questions:**

1. When does the distinction between using single and double quotes to build a string matter?
2. Fix the following string to be considered valid and not throw an error when run.
    - ```'You already told me not to do this, but I don't want to listen.'```
3. Create a variable that holds a string of your name.
4. Create another variable that holds a string of your best friend's name.
5. Now, use string concatenation (e.g. addition) to add 'Hello, ' before your name.
6. Given the string 'Hello, Sean', replace each of the letter 'e''s with a 't'.
7. Use the ```.split()``` method on the string 'Hello, Sean' to split it by the comma (```','```).
    - What happens if you split by a comma and a space (```', '```)?
8.  Write a Python program to get a single string from two given strings, separated by a space and swap the first two characters of each<br> string.
    - Sample String : 'abc', 'xyz'
    - Expected Result : 'xyc abz'

In [2]:
#1 The distinction between using single (') and double (") quotes matters when your string contains any other quotes.
#2 The problem is that the string uses single quotes to start and end, but it also contains a single quote  inside: don't. This causes a syntax error.
#3and4 
name = "sruthi"
best_frd = "Noel"
print(name)
print("my best friend is",best_frd)

#5
name = "sruthi"
greeting = "Hello " + name
print(greeting)

#6
txt = 'Hello, Sean'
new_text = txt.replace('e', 't')
print(new_text)

#7
text = 'Hello, Sean'
result = text.split(', ')
print(result)

#8

# Given strings
str1 = 'abc'
str2 = 'xyz'

# Swap the first two characters
new_str1 = str2[:2] + str1[2:]
new_str2 = str1[:2] + str2[2:]

# Combine with a space
result = new_str1 + ' ' + new_str2

print(result)

sruthi
my best friend is Noel
Hello sruthi
Htllo, Stan
['Hello', 'Sean']
xyc abz
