# Manipulating Strings

Strings are what allows you to store text. You can use them in so many ways and Python gives you a lot of tools for playing with them and perform all kind of manipulations.

After going through this notebook, you will be able to use all those very useful features:

1. Concatenating strings
1. Lengths and counts
1. Finding substrings
1. Formating strings
1. Joining a list into a string
1. Splitting a string into a list
1. Replacing parts of a string
1. Slicing strings
1. More functions on strings

## 1. Concatenating strings

Let's say you want to concatenate multiple string together, meaning that you take two or more strings and glue them together to form one new string.

In Python, you can simply use the **plus sign +** for concatenating strings.

In [22]:
cats = 'Cats'
dogs = 'Dogs'

print(cats)
print(dogs)

Cats
Dogs


In [114]:
cats_and_dogs = cats + ' and ' + dogs

print(cats_and_dogs)

Cats and Dogs


In [115]:
cats_and_dogs += ' are not always friends.'

print(cats_and_dogs)

Cats and Dogs are not always friends.


## 2. Lengths and counts

### Length

Python has a built-in function for counting the length of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set). ([Doc](https://docs.python.org/3/library/functions.html#len))

In [121]:
number_of_characters = len(cats_and_dogs)

print('Length:', number_of_characters)

Length: 37


In [129]:
print(len('Python 3 is great!'))

18


In [131]:
print(len(''))

0


### Count

To count the occurances of a character or word / string, just use the function available on strings.

In [150]:
game_rule = 'Scissors cuts paper, paper covers rock, rock crushes lizard, lizard poisons Spock, Spock smashes scissors, scissors decapitates lizard, lizard eats paper, paper disproves Spock, Spock vaporizes rock, and as it always has, rock crushes scissors.'
print(game_rule)

Scissors cuts paper, paper covers rock, rock crushes lizard, lizard poisons Spock, Spock smashes scissors, scissors decapitates lizard, lizard eats paper, paper disproves Spock, Spock vaporizes rock, and as it always has, rock crushes scissors.


In [154]:
print('Word "rock" appears %d time(s).' % game_rule.count('rock'))
print('Word "paper" appears %d time(s).' % game_rule.count('paper'))
print('Word "scissors" appears %d time(s).' % game_rule.count('scissors'))

Word "rock" appears 4 time(s).
Word "paper" appears 4 time(s).
Word "scissors" appears 3 time(s).


In [156]:
# Be careful, with lowercase and uppercase!
print('Word "Rock" appears %d time(s).' % game_rule.count('Rock'))
print('Word "Paper" appears %d time(s).' % game_rule.count('Paper'))
print('Word "Scissors" appears %d time(s).' % game_rule.count('Scissors'))

Word "Rock" appears 0 time(s).
Word "Paper" appears 0 time(s).
Word "Scissors" appears 1 time(s).


In [157]:
print('There are %d spaces.' % game_rule.count(' '))

There are 34 spaces.


In [159]:
print("There are %d A's." % game_rule.count('A'))
print("There are %d a's." % game_rule.count('a'))

There are 0 A's.
There are 18 a's.


## 3. Finding substrings

You will see that a string kind of behave like a list. Essentially, it is a sequence of characters. 

That being said, you can perform trivial operations like getting the ith character or getting the position of a string.

In [168]:
phrase = 'Winter is comming!'

Use the find() or index() function to find a string in a string.

In [172]:
print('Find "Winter" using find() : ', phrase.find('Winter'))
print('Find "Winter" using index(): ', phrase.index('Winter'))

Find "Winter" using find() :  0
Find "Winter" using index():  0


In [163]:
phrase.find('is')

7

In [165]:
phrase.find('comming')

10

In [167]:
phrase.find('m') # Note that the first occurance willbe given back

12

The main difference bewteen find() and index() is that find() return -1 if no occurances are found and index() raises a ValueError.

In [173]:
phrase.find('Summer')

-1

In [174]:
phrase.index('Summer')

ValueError: substring not found

## 4. Formating strings

Concatenation is useful for small things, but it can quickly become ugly and unreadable.

Formatting string gives you another way to glue string together, with many more possibilites. 

### The OLD way of formatting strings

In [35]:
# Let's reuse our cats and dogs variables defined above.
print(cats)
print(dogs)

Cats
Dogs


In [36]:
formatted_cats_and_dogs = '%s and %s' % (cats, dogs) # %s for string

print(formatted_cats_and_dogs)

Cats and Dogs


In [72]:
pi = 3.14159265359

print('%d' % pi) # %d for integer
print('%f' % pi) # %f for float

3
3.141593


In [44]:
# You can set the number of decimals after the comma

print('%.2f' % pi)
print('%.4f' % pi)
print('%.8f' % pi)

3.14
3.1416
3.14159265


### The NEW way of formatting strings

In [40]:
formatted_cats_and_dogs = '{} and {}'.format(cats, dogs) # {} for string/int/float/whatever

print(formatted_cats_and_dogs)

Cats and Dogs


In [65]:
print('{}'.format(pi))
print('{:f}'.format(pi)) # Explicitly set to float

3.14159265359
3.141593


In [66]:
print('{:d}'.format(pi)) # In this case this is not allowed, because pi is a float

ValueError: Unknown format code 'd' for object of type 'float'

In [69]:
print('{}'.format(int(pi)))
print('{:d}'.format(int(pi)))

3
3


In [63]:
# At least 2 decimals 
print('{:.2f}'.format(pi))

3.14


In [82]:
# You can also say where goes what, unlike the old method where the order is important.
print( '{} and {}'.format(cats, dogs) )
print( '{0} and {1}'.format(cats, dogs) )
print( '{1} and {0}'.format(cats, dogs) )

Cats and Dogs
Cats and Dogs
Dogs and Cats


There exists many more ways to format strings. We will cover this in more details in an upcomming course.

Also, feel free to refer to the official documentation: https://docs.python.org/3.1/library/string.html

## 5. Joining a list into a string

In [14]:
tv_series = [
    'Doctor Who',
    'The Walking Dead',
    'Breaking Bad',
    'Knight Rider',
    'The Simpsons',
    'The X-Files',
    'Friends',
    'The Big Bang Theory',
    'Silicon Valley'
]

In [83]:
tv_series_joined = ', '.join(tv_series)

print(tv_series_joined)

Doctor Who, The Walking Dead, Breaking Bad, Knight Rider, The Simpsons, The X-Files, Friends, The Big Bang Theory, Silicon Valley


In [87]:
print('The list of TV series is: {}!'.format(tv_series_joined))

The list of TV series is: Doctor Who, The Walking Dead, Breaking Bad, Knight Rider, The Simpsons, The X-Files, Friends, The Big Bang Theory, Silicon Valley!


## 6. Splitting a string into a list

In python, the data type string has already a built-in function for splitting a string based on another string.

In [107]:
# Let's try to split our previous joined list of TV series
print(tv_series_joined)

Doctor Who, The Walking Dead, Breaking Bad, Knight Rider, The Simpsons, The X-Files, Friends, The Big Bang Theory, Silicon Valley


In [108]:
split_result = tv_series_joined.split(',')

split_result

['Doctor Who',
 ' The Walking Dead',
 ' Breaking Bad',
 ' Knight Rider',
 ' The Simpsons',
 ' The X-Files',
 ' Friends',
 ' The Big Bang Theory',
 ' Silicon Valley']

In [1]:
print(split_result[0])
print(split_result[1]) # Note the space in front. We need to get rit of it.

NameError: name 'split_result' is not defined

In [110]:
# It is easy this time, we can just split the same way that we joined.

split_result = tv_series_joined.split(', ') # Command followed by a space

split_result

['Doctor Who',
 'The Walking Dead',
 'Breaking Bad',
 'Knight Rider',
 'The Simpsons',
 'The X-Files',
 'Friends',
 'The Big Bang Theory',
 'Silicon Valley']

Splitting a string always depends on the result you are looking for. Maybe you would like to get all words.

In [112]:
split_result = tv_series_joined.split(' ')

split_result

['Doctor',
 'Who,',
 'The',
 'Walking',
 'Dead,',
 'Breaking',
 'Bad,',
 'Knight',
 'Rider,',
 'The',
 'Simpsons,',
 'The',
 'X-Files,',
 'Friends,',
 'The',
 'Big',
 'Bang',
 'Theory,',
 'Silicon',
 'Valley']

## 7. Replace parts of a string

You have seen many ways to build a string. Sometimes, you don't want or need to build a string from scratch, but can change parts of it by replacing all substrings with a new string. In a similar way, deleting parts of a string is just the same as replacing those parts with an empty string.

In [190]:
# Let's take a fun example.
hodor = 'Hodor? Hodor! Hodor, Hodor. Hodor!'

In [191]:
hold_the_door = hodor.replace('Hodor', 'Hold the door')

print(hold_the_door)

Hold the door? Hold the door! Hold the door, Hold the door. Hold the door!


In [192]:
hold_door = hold_the_door.replace('the', '')

print(hold_door)

Hold  door? Hold  door! Hold  door, Hold  door. Hold  door!


Now we have a string that is not very beautiful, because of the double spaces everywhere. Let's take care of that!

In [193]:
# Replace 2 spaces with 1 space! 
hold_door.replace('  ',' ')

# Note that replace() does not change the variable, but returns the result instead!
# Most often you will just reassign the result to the same variable: s = s.replace(a,b)

'Hold door? Hold door! Hold door, Hold door. Hold door!'

## 8. Slicing strings

Slicing a string just means that you cut a piece out. You will find useful what you learned in the course about lists.

The general syntax is as follows:

- data[i] : The character as **index i**
- data[:i] : The first **ith** characters
- data[i:] : All characters except the first **i** ones.
- data[begin:end] : All characters between **begin** and **end**
- data[begin:end:step] : Every **step** character between **begin** and **end**

In [195]:
quote = 'Not all those who wander are lost.'

In [216]:
# Get the first character. Index starts at 0.
quote[0]

'N'

In [220]:
quote[-1] # Get the last character

'.'

*Note:* In many programming languages, a character is a special data type that stores only one character ('a', 'b', '1', ...). In Python, there is no character data type. All texts are strings, regardless of the length.

In [213]:
# Get the first 7 characters (from index 0 to 6)
print(quote[:7])
print(quote[0:7])

Not all
Not all


In [214]:
# Starting from index 8, get the rest.
quote[8:]

'those who wander are lost.'

In [205]:
# Get the last 5 characters.
quote[-5:]

'lost.'

In [243]:
# Get a substring in the middle. From character index 8 to 24.
quote[8:24]

'those who wander'

In [248]:
# More complicated, you can define a step, which means you can skip every x character.
print(quote[8:24:1])
print(quote[8:24:2])
print(quote[8:24:3])

those who wander
toewowne
tsw nr


## 9. More functions on strings

There exist many more functions and operations that you can do with strings. These are just the basics and will allow to understand how a string works and will get you ready to learn the rest once you need it.

Before we end this notebook, let's just explore a little bit more of what is possible in Python with not much effort. 

### Uppercase and lowercase

Use the built-in string function to quickly change a string to upper- or lowercase.

In [249]:
sparta = 'This is sparta!'.upper()

sparta

'THIS IS SPARTA!'

In [250]:
sparta.lower()

'this is sparta!'

### Reverse

If you want to reverse a string, it is a bit of a unelegant solution.

In [251]:
# reversed() returns a reverse iterator. Not very nice.
''.join(reversed('Reversed'))

'desreveR'

Here we use an extended slice syntax.
The syntax in brackets is [begin:end:step]. 
The key is to not define begin and end and have a step of -1. 
Meaning that it will go over the entire string backwards.

In [252]:
'Reversed'[::-1]

'desreveR'

### Tests

While manipulating strings you might encounter many errors, especially when you mix strings with integers, floats, booleans or other object types. This is why you have at your disposal a range of method to perform simple checks in order to decide best how to handle a string.

In [258]:
# Check if a string is a number
'This is not a number.'.isdigit()

False

In [260]:
'10'.isdigit()

True

In [264]:
'000125'.isdigit()

True

In [267]:
'-0'.isdigit() # Does not understand negative sign

False

In [269]:
'THIS IS SPARTA'.isupper()

True

In [271]:
'THIS IS SPARTA'.islower()

False

In [273]:
'data_results_256.csv'.endswith('.csv')

True

In [275]:
'https://www.python.org'.startswith('https://')

True

### Stripping strings

Or, removing characters on the left and right edges using strip(), lstrip() and rstrip(). If no parameter is given, removes white-spaces.

In [288]:
game_over = '   **GAME * OVER**   '

In [289]:
game_over.strip()

'**GAME * OVER**'

In [290]:
game_over.rstrip()

'   **GAME * OVER**'

In [291]:
game_over.lstrip()

'**GAME * OVER**   '

In [296]:
# First strip white-spaces, then strip "*" from the left and right sides only!
game_over.strip().strip('*')

'GAME * OVER'

### Repeating strings

In [305]:
'Loading' + '.' * 25

'Loading.........................'

In [306]:
'Done! ' * 5

'Done! Done! Done! Done! Done! '