## Handy stuff: Strings

Python strings are just pieces of text.


In [1]:
name = 'Now let\'s explore novels' 

In [2]:
name

"Now let's explore novels"

So far we know how to add them together.

In [4]:
"Harry Potter" + " And the Chamber of Secrets"

'Harry Potter And the Chamber of Secrets'

We also know how to repeat them multiple times.

In [5]:
name*3

"Now let's explore novelsNow let's explore novelsNow let's explore novels"

Python strings are immutable. That's just a fancy way to say that they cannot be changed in-place, and we need to create a new string to change them. Even some_string += another_string creates a new string. Python will treat that as some_string = some_string + another_string, so it creates a new string but it puts it back to the same variable. + and * are nice, but what else can we do with strings?

In [6]:
our_string = "Hello World!"

### Slicing

Slicing is really simple. It just means getting a part of the string. For example, to get all characters between the second place between the characters and the fifth place between the characters, we can do this:

In [8]:
our_string[2:5]

'llo'

So the syntax is like some_string[start:end].

![image.png](attachment:image.png)

But what happens if we slice with negative values?

In [9]:
our_string[-5:-2]

'orl'

It turns out that slicing with negative values simply starts counting from the end of the string.

![image.png](attachment:image.png)

If we don't specify the beginning it defaults to 0, and if we don't specify the end it defaults to the length of the string. For example, we can get everything except the first or last character like this:

In [10]:
our_string[1:]

'ello World!'

In [11]:
 our_string[:-1]

'Hello World'

Remember that strings can't be changed in-place.

In [12]:
our_string[:5] = 'Howdy'

TypeError: 'str' object does not support item assignment

### Indexing

In [13]:
our_string[1]

'e'

### String Methods

Python's strings have many useful methods. The official documentation covers them all, but I'm going to just show some of the most commonly used ones briefly. Python also comes with built-in documentation about the string methods and we can run help(str) to read it. We can also get help about one string method at a time, like help(str.upper).

Again, nothing can modify strings in-place. Most string methods return a new string, but things like our_string = our_string.upper() still work because the new string is assigned to the old variable.

Also note that all of these methods are used like our_string.stuff(), not like stuff(our_string). The idea with that is that our string knows how to do all these things, like our_string.stuff(), we don't need a separate function that does these things like stuff(our_string). We'll learn more about methods later.

In [14]:
our_string.upper()

'HELLO WORLD!'

In [15]:
 our_string.lower()

'hello world!'

In [16]:
our_string.startswith('Hello')

True

In [17]:
our_string.endswith('World!')

True

In [18]:
our_string.endswith('world!')  # Python is case-sensitive

False

In [19]:
our_string.replace('World', 'there')

'Hello there!'

In [20]:
our_string.replace('o', '@', 1)   # only replace one 

'Hell@ World!'

In [21]:
'  hello 123  '.lstrip()    # left strip

'hello 123  '

In [22]:
'  hello 123  '.rstrip()    # right strip

'  hello 123'

In [23]:
'  hello 123  '.strip()     # strip from both sides

'hello 123'

In [24]:
'  hello abc'.rstrip('cb')  # strip c's and b's from right

'  hello a'

In [25]:
our_string.ljust(30, '-')

'Hello World!------------------'

In [26]:
our_string.rjust(30, '-')

'------------------Hello World!'

In [27]:
our_string.center(30, '-')

'---------Hello World!---------'

In [28]:
our_string.count('o')   # it contains two o's

2

In [29]:
our_string.index('o')   # the first o is our_string[4]

4

In [30]:
our_string.rindex('o')  # the last o is our_string[7]

7

In [31]:
'-'.join(['hello', 'world', 'test'])

'hello-world-test'

In [32]:
'hello-world-test'.split('-')

['hello', 'world', 'test']

In [33]:
our_string.upper()[3:].startswith('LO WOR')  # combining multiple things

True

The things in square brackets that the split method gave us and we gave to the join method were lists. We'll talk more about them later.

### String formatting

To add a string in the middle of another string, we can do something like this:

In [39]:
name = 'Dewang'
'My name is ' + name + '.'

'My name is Dewang.'

But that gets complicated if we have many things to add.

In [40]:
channel = '##learnpython'
network = 'slack'
"My name is " + name + " and I'm on the " + channel + " channel on " + network + "."

"My name is Dewang and I'm on the ##learnpython channel on slack."

Instead it's recommended to use string formatting. It means putting other things in the middle of a string.

Python has multiple ways to format strings. One is not necessarily better than others, they are just different. Here's a few ways to solve our problem:

+ .format()-formatting, also known as new-style formatting. This formatting style has a lot of features, but it's a little bit more typing than %s-formatting.

+ %s-formatting, also known as old-style formatting. This has less features than .format()-formatting, but 'Hello %s.' % name is shorter and faster to type than 'Hello {}.'.format(name). I like to use %s formatting for simple things and .format when I need more powerful features.

In [41]:
"Hello {}.".format(name)

'Hello Dewang.'

In [42]:
"My name is {} and I'm on the {} channel on {}.".format(name, channel, network)

"My name is Dewang and I'm on the ##learnpython channel on slack."

In [43]:
 "Hello %s." % name

'Hello Dewang.'

In [44]:
"My name is %s and I'm on the %s channel on %s." % (name, channel, network)

"My name is Dewang and I'm on the ##learnpython channel on slack."

In the second example we had (name, channel, network) on the right side of the % sign. It was a tuple, and we'll talk more about them later.

If we have a variable that may be a tuple we need to wrap it in another tuple when formatting:



In [46]:
thestuff = (1, 2, 3)
"we have %s" % thestuff

TypeError: not all arguments converted during string formatting

In [47]:
"we have %s and %s" % ("hello", thestuff)

'we have hello and (1, 2, 3)'

f-strings are even less typing, but new in Python 3.6.

In [48]:
f"My name is {name} and I'm on the {channel} channel on {network}."

"My name is Dewang and I'm on the ##learnpython channel on slack."

In [49]:
'Three zeros and number one: {:04d}'.format(1)

'Three zeros and number one: 0001'

In [50]:
'Three zeros and number one: %04d' % 1

'Three zeros and number one: 0001'

We can use in and not in to check if a string contains another string.

In [51]:
our_string = "Hello World!"

In [52]:
"Hello" in our_string

True

In [53]:
"Python" in our_string

False

In [54]:
"Python" not in our_string

True

We can get the length of a string with the len function.

In [55]:
 len(our_string)

12

In [56]:
len('')     # no characters

0

In [57]:
len('\n')    # python thinks of \n as one character

1

We can convert strings, integers and floats with each other with str, int and float. They aren't actually functions, but they behave a lot like functions. We'll learn more about what they really are later.

In [58]:
str(3.14)

'3.14'

In [59]:
float('3.14')

3.14

In [60]:
str(123)

'123'

In [61]:
int('123')

123

Giving an invalid string to int or float produces an error message.



In [62]:
int('lol')

ValueError: invalid literal for int() with base 10: 'lol'

In [63]:
float('hello')

ValueError: could not convert string to float: 'hello'

## Lists and tuples

### Why should we use lists?

Sometimes we may end up doing something like this.

In [66]:
avenger1 = 'IronMan'
avenger2 = 'Hulk'
avenger3 = 'Thor'
avenger4 = 'Captain America'
avenger5 = 'Hawkeye'

name = input("Enter your name: ")
if name == name1 or name == name2 or name == name3 or name == name4 or name == name5:
    print("Hey! An Avenger")
else:
    print("Nopes, Not an avenger")

Enter your name: Spiderman
Nopes, Not an avenger


This code works just fine, but there's a problem. The name check is repetitive, and adding a new name requires adding even more repetitive, boring checks.

Instead of adding a new variable for each name it might be better to store all names in one variable. This means that our one variable needs to point to multiple values. An easy way to do this is using a list:

In [67]:
avengers = ['IronMan', 'Hulk', 'Thor', 'Captain America', 'Hawkeye']

In [68]:
avengers

['IronMan', 'Hulk', 'Thor', 'Captain America', 'Hawkeye']

In [69]:
len(avengers)

5

In [70]:
avengers + ['Black Widow']   # create a new list with Black Widow in it

['IronMan', 'Hulk', 'Thor', 'Captain America', 'Hawkeye', 'Black Widow']

In [71]:
['SpiderMan', 'Not an Avenger'] * 2    # repeating

['SpiderMan', 'Not an Avenger', 'SpiderMan', 'Not an Avenger']

With strings indexing and slicing both returned a string, but with lists we get a new list when we're slicing and an element from the list if we're indexing.

In [72]:
avengers[:2]

['IronMan', 'Hulk']

In [73]:
avengers[0]

'IronMan'

If we want to check if the program knows a name all we need to do is to use the in keyword.

In [74]:
'Spiderman' in avengers

False

In [75]:
'Hawkeye' in avengers

True

We can't use this for checking if a list of names is a part of our name list.

In [76]:
['IronMan','Thor'] in avengers

False

In [77]:
['IronMan'] in avengers

False



Lists have a few useful methods. Some of the most commonly used ones are append, extend and remove. append adds an item to the end of a list, extend adds multiple items from another list and remove removes an item.

In [78]:
avengers

['IronMan', 'Hulk', 'Thor', 'Captain America', 'Hawkeye']

In [79]:
avengers.remove("Hawkeye")

In [80]:
avengers.remove("Thor")

In [81]:
avengers

['IronMan', 'Hulk', 'Captain America']

In [82]:
avengers.append("Black Widow")

In [83]:
avengers

['IronMan', 'Hulk', 'Captain America', 'Black Widow']

In [84]:
avengers.extend(["Hawkeye","Thor"])

In [85]:
avengers

['IronMan', 'Hulk', 'Captain America', 'Black Widow', 'Hawkeye', 'Thor']

Note that remove removes only the first match it finds.

In [86]:
avengers = ['IronMan','IronMan','IronMan']
avengers.remove('IronMan')
avengers

['IronMan', 'IronMan']

We can also use slicing and indexing to change the content:

In [87]:
avengers[1]='Captain America'


In [88]:
avengers

['IronMan', 'Captain America']

As you can see, list can be changed in-place. In other words, they are mutable. Integers, floats, strings and many other built-in types can't, so they are immutable.

With strings we did something to them and then set the result back to the same variable, like message = message.strip(). This just doesn't work right with most mutable things because they're designed to be changed in-place.

In [89]:
avengers = avengers.remove('Captain America')

In [90]:
print(avengers)

None


This is the same thing that happened way back when we assigned print's return value to a variable.

### What is What?

After working with lists a while you'll find out that they behave like this:

In [92]:
a = [1,2,3]
b = a
b.append(4)
a

[1, 2, 3, 4]

This can be confusing at first, but it's actually easy to explain. The problem with this code example is the b = a line. If we draw a picture of the variables it looks like this:

![image.png](attachment:image.png)

This is when the is keyword comes in. It can be used to check if two variables point to the same thing.

In [93]:
a is b

True

Typing [] creates a new list every time.

In [94]:
[] is []

False

In [95]:
[1,2,3] is [1,2,3]

False

If we need a new list with similar content we can use the copy method.

In [96]:
a = [1,2,3]
b = a.copy()
b is a

False

In [97]:
b.append(4)

In [98]:
b

[1, 2, 3, 4]

In [99]:
a

[1, 2, 3]

If we draw a picture of our variables in this example it looks like this:

![image.png](attachment:image.png)

### Tuples

Tuples are a lot like lists, but they're immutable so they can't be changed in-place. We create them like lists, but with () instead of [].

In [100]:
thing = (1,2,3)
thing

(1, 2, 3)

In [101]:
thing = ()
thing

()

If we need to create a tuple that contains only one item we need to use (item,) instead of (item) because (item) is used in places like (1 + 2) * 3.

In [102]:
(3)

3

In [103]:
(3,)

(3,)

In [104]:
(1 + 2)*3

9

In [105]:
(1+2,)*3

(3, 3, 3)



It's also possible to create tuples by just separating things with commas and adding no parentheses. Personally I don't like this feature, but some people like to do it this way.

In [106]:
1,2,3

(1, 2, 3)

In [107]:
'hello',

('hello',)

Tuples don't have methods like append, extend and remove because they can't change themselves in-place.

In [108]:
stuff = (1,2,3)
stuff.append(4)

AttributeError: 'tuple' object has no attribute 'append'

The Cultural Difference is about how lists and tuples are actually used: lists are used where you have a homogenous sequence of unknown length; tuples are used where you know the number of elements in advance because the position of the element is semantically significant.

For example, suppose you have a function that looks in a directory for files ending with *.py. It should return a list, because you don't know how many you will find, and all of them are the same semantically: just another file that you found.

<code>

find_files("*.py")
["control.py", "config.py", "cmdline.py", "backward.py"]

</code>

On the other hand, let's say you need to store five values to represent the location of weather observation stations: id, city, state, latitude, and longitude. A tuple is right for this, rather than a list:

In [1]:
denver = (44, "Denver", "CO", 40, 105)
denver[1]

'Denver'