# Strings

Strings are a vital component of Python and are often used to record and communicate qualitative data in the form of text. In this notebook, we'll look at how to create strings, how to format them, and some of the most common string methods.

## Creating Strings

To create a string in Python, we enclose a sequence of characters in single or double quotes. For example:

In [1]:
a = "String contents"

We can then use the string for operations, or printing its contents. For example:

In [2]:
print(a)

String contents


## Key Features
Strings are a type of data which contain a sequence of characters (including letters, numbers and punctuation). Similar to items in a list, these characters are indexed by integer values such that the first character has an index of 0. For instance:

In [1]:
my_string = "Hello world"
print(my_string[0])
print(my_string[4])

H
o


We can find the length of a string using the ```len``` function:

In [2]:
print(len(my_string))

11


We can also concatenate two strings to create a new string using the ```+``` operator:

In [3]:
my_second_string = my_string + "!"
print(my_second_string)

Hello world!


Strings are a very powerful and versatile data type. They contain methods will allow us to interrogate and manipulate them in a variety of ways. If you find yourself wanting to perform an operation with a string, it's normally worth checking to see if the string object has a built-in method to help you. We'll look at some of the most common methods below.

## Interrogating Strings

Sometimes it can be useful to find out information about the characters of a string. Python contains a number of useful tools for finding out commonly required information about a string.

### Count

The ```count``` method of the string class counts the number of times one string appears in another. For example:

In [3]:
a = "Strings are great!"
print(a.count("e"))
print(a.count("ea"))
print(a.count("s")) # Note matches are case-sensitive
print(a.count("!")) # Can count  numerals or punctuation characters

2
1
1
1


### Finding Strings in Strings

The ```find``` method attempts to find the first instance of one string within another and return the index of the first character of the found string. If the string that is being searched for is not found, ```-1``` will be returned. For instance:

In [4]:
a = "Rabbits, rabbits, rabbits"
print(a.find("rabbit")) # Note that the search is case-sensitive and only the index of the first occurrence is returned
print(a.find("lapin")) # If the substring is not found, -1 is returned

9
-1


### Testing String Content

There are several useful string methods which can examine the contents of the string and tell us if tey fulfil various criteria. These can sometimes be useful when processing or checking strings.

#### Isalnum

The string method ```isalnum``` checks if all charcaters in a string are letters (upper or lower case) or numbers (as opposed to punctuations, spaces, etc):

In [None]:
print("City17".isalnum())
print("Big Bang".isalnum()) # Return False because of the space
print("Pop!".isalnum())# Returns False because of the exclamation mark

True
False
False


#### Isalpha

The string method ```isalpha``` checks if all characters in a string are letters (upper or lower case):

In [None]:
print("Hello".isalpha())
print("d20".isalpha()) #Returns False because of the numbers
print("Hello world".isalpha()) # Returns False because of the space

True
False
False


#### Isnumeric

On a simple level, the string method ```isnumeric``` chekcs if all characters in a string are numerals (0-9):

In [None]:
print("01234".isnumeric())
print("-2".isnumeric()) # Returns false due to the hyphen
print("1.22".isnumeric()) # Returns false due to the ful stop
print("1E3".isnumeric()) # Returns false due to the E

True
False
False
False


#### Istitle

The ```istitle``` string method checks if the string is written in title case. The string is considered in words (sections of th string separated by space characters). For result returned to be ```True```, all the following must be true:
* No words begin with a lowercase character (a-z)
* At least one word begins with an uppercase character (A-Z)
* No characters, except those at the start of words may be uppercase.

If any of these conditions are not met, ```False``` will be returned instead.

In [None]:
print("Brave New World".istitle()) # All words start with a capital so return True
print("Misery".istitle()) # All words start with a capital so return True
print("20,000 Leagues Under The Sea".istitle()) # No words start with a lowercase character and at least one word starting with an uppercase character so return True
print("To Kill a Mockingbird".istitle()) # Lower case "a" at the start of a word causes False to be returned
print("Party In The USA".istitle()) # Uppercase characters not at the start of a word cause False to be returned
print("1984".istitle()) # No words starting with uppercase
print("".istitle()) # No words starting with uppercase

True
True
True
False
False
False
False


#### Islower

The ```islower``` string method checks if all characters are lowercase, ignoring numerical characters and punctuation. At least one lowercase character must be present for True to be returned.

In [None]:
print("abcd".islower())
print("abcd123".islower()) # Numbers are ignored
print("hello there!".islower()) # Spaces and punctuation are ignored
print("Hello world".islower()) # One or more uppercase letters causes False to be returned
print("1234".islower()) # False will be returned if no lowercase characters are present

True
True
True
False
False


#### Isupper

The ```isupper``` string method checks if all characters are uppercase, ignoring numerical characters and punctuation. At least one uppercase character must be present for True to be returned.

In [None]:
print("YELLING".isupper())
print("C17".isupper()) # Numbers are ignored
print("#LOUD NOISES#".isupper()) # Spaces and punctuation are ignored
print("CaCO3".isupper()) # One or more lowercase characters causes False to be returned
print("1234".isupper()) # False will be returned if no uppercase characters are present

True
True
True
False
False


#### Startswith

The ```startswith``` string emthod checks if a string starts with another string provided as an argument:

In [None]:
print("Python is great!".startswith("Python"))
print("Running code is fun".startswith("Run")) # Doesn't have to be a full word
print("+442075895111".startswith("+44")) # Works with numbers and punctuation too
print("Werewolf".startswith("were")) # Case-sensitive
print("Python".startswith("on")) # Doesn't matter if the specified strings occurs later on

True
True
True
False
False


#### Endswith

The ```endswith``` string method checks if a string ends with anther string specified as an argument:

In [None]:
print("Programming is fun".endswith("fun"))
print("Uh-oh".endswith("Uh"))

True
False


### Exercise: Interrogating Strings

Examine the following code. Write down what you think the output will be, then run the code to check your answer.

In [11]:
print("100.0".isnumeric())
print("Funny fish".find("f"))
print("A New Hope".startswith("A "))
print("Return of the Jedi".istitle())
print(("Cookbook"[0:6].count("o")))

False
6
True
False
3


## Creating New Strings From Old

There are several string methods which return one or more new strings from an initial string. These are commonly used when reformatting a string or extracting information from a string.

### Split

The ```split``` string method splits a string into several strings which are returned in a list. The locations of the splits are determined by a separator supplied as an argument. It's common to use this method to split a string up into different words by specifying a space as a separator.

In [None]:
print("Split up sentences".split(" "))
print("Stop. Look. Listen. Live.".split(".")) # If the separator occurs at the end of the string, the last value returned in the list will be an empty string
print("ZZZZZZZZ".split("Z")) # Separators are not themselves returned
print("Letters".split("")) # Cannot use an empty separator

['Split', 'up', 'sentences']
['Stop', ' Look', ' Listen', ' Live', '']
['', '', '', '', '', '', '', '', '']


ValueError: ignored

You may optionally provide an extra integer argument to ```split```. This is the maximum number of splits which will be performed:

In [None]:
a = "banana banana banana"
print(a.split(" ", 1))
print(a.split("a", 3))

['banana', 'banana banana']
['b', 'n', 'n', ' banana banana']


### Join

The ```join``` string method joins together all the strings in an iterable (such as a list, or tuple), adding the original string that was used to call ```join``` between each one. The final result is returned as a string.

In [None]:
print(" ".join(["Hello", "world"])) # Can join a list of strings
print("-".join(("555","1234","0000"))) # Can join tuples. String can contain numbers
print(".".join("ICL")) # When joining strings, the separator will be added between each character
print("".join(["un", "re", "turn", "able"])) # The separator can be blank

Hello world
555-1234-0000
.--. -.-- - .... --- -.
I.C.L
unreturnable


### Repeating Strings

We can repeat a string to create a new string using the ```*``` operator and an ```int```:

In [10]:
print("ho"*3) # Creates a new string with the original string repeated 3 times
print("he"*-1) # Using a value of less than 1 results in an empty string
print("ha"*2.0) # Using a non-int leads to TypeError

hohoho



TypeError: can't multiply sequence by non-int of type 'float'

### Replace

The ```replace``` string method creates a new string, with all (by default) instances of a specified phrase replaced with another phrase. The first argument is the phrase to be replaced, the second is the phrase to replace it with. Optionally, you may give a third argument which specifies have many instances of the phrase to replace.

In [None]:
print("C++ is better than Python. R is better than Python".replace("better", "worse")) 
print("trolololo".replace("o", "a", 2)) # Replace on the first 2 "o"s

C++ is worse than Python. R is worse than Python
tralalolo


### Cases
The case of alphabetical characters in a string may be modified using the following methods:



* ```upper```: converts all alphabetical characters to uppercase
* ```lower```: converts all alphabetical characters to lower case
* ```title```: converts alphabetical characters to title case
* ```swapcase```: swaps the case of all alphabetical characters

In each case a new string is returned and the original string is unchanged.



In [None]:
a = "2 be or not 2 be, that is The Question"
print(a.upper())
print(a.lower())
print(a.title())
print(a.swapcase())

2 BE OR NOT 2 BE, THAT IS THE QUESION
2 be or not 2 be, that is the quesion
2 Be Or Not 2 Be, That Is The Quesion
2 BE OR NOT 2 BE, THAT IS tHE qUESION


### Stripping Whitespace

Sometimes strings can have unwanted whitespace (space characters) at the start or end of the string. A family of commands can remove this whitespace.

* ```strip```: Removes whitespace at start and end of string
* ```lstrip```: Removes whitespace from the start of the string
* ```rstrip```: Removes whitespace from the end of the string

In each case a new string is returned with the relevant whitespace removed and the original string is left unchanged.

In [None]:
a = "    hi    "
# The "|" characters in the following examples have been added to show the left and right extent of the stripped string
print("|"+a.strip()+"|")
print("|"+a.lstrip()+"|")
print("|"+a.rstrip()+"|")

|hi|
|hi    |
|    hi|


## Combining Functions

The functions in this notebook are each useful in their own right, but they become even more useful when combined together. This can be over the course of several statements, or in a single long statement. Combining several operations into a single expression produces more compact and slightly faster code but may be less readable. However, as it's rare to chain more than 2 or 3 functions, it's normally fine to combine operations in this way.

In [12]:
a = ["  ", "HeLlO", "ThErE", "  "]
# A long-winded way to do it
b = " ".join(a)
c = b.lower()
print(c.strip())
# A more compact way to do it
print(" ".join(a).lower().strip()) # The join operation is executed first. This creates a string, which lower operates on to produce another string, which strip operates on

hello there
hello there


## f-Strings

Sometimes it can be useful to create long strings using data from several different variables or expressions. One way to do this is to convert variables to strings and concatenate them, like this:

In [14]:
data = [1,2,3]
mean = sum(data) / len(data)

summary_string = "The data " + str(data) + " has " + str(len(data)) + " entries, and a mean of " + str(mean) + "."
print(summary_string)

The data [1, 2, 3] has 3 entries, and a mean of 2.0.


This works, but we can make this somewhat easier and more compact by using "f-Strings":

In [15]:
summary_string = f"The data {data} has {len(data)} entries, and a mean of {mean}."
print(summary_string)

The data [1, 2, 3] has 3 entries, and a mean of 2.0.


The character ```f``` which precedes the string causes Python to look for sets of curly brackets within the string. The expressions in these curly brackets are evaluated, converted into strings and the results are inserted into the string. This has a few advantages over the previous method:

* The code is shorter: we don't need to keep opening and closing strings, use the concatenate operator or explicitly call the ```str``` function.
* It's easier to read: It's much easier to see the structure of the string and how the data slots into it.
* It's faster: f-Strings are slightly faster to execute than concatenation.

### Formatting Numerical Arguments With the Format Statement

We can also specify the how we want each of the values in the curly braces by following them with a colon, then a format specifier. For example:

In [17]:
data = [10000, 7000, 64998]
mean = sum(data) / len(data)

print(f"A float in scientific format {12345.67:e}")
print(f"A float rounded to 3 decimal places {3.14159:.3f}")
print(f"An integer with commas separating thousands {10000^3:,}") # Note we're evaluating an expression in the curly braces
print(f"A float converted to a percentage to 2 decimal places {0.879873:.2%}")
print(f"An integer converted to binary {10:b}")
print(f"An right-aligned integer taking up 10 characters {60:>10}") # Can be useful for aligning values with printing/writing to file several values

A float in scientific format 1.234567e+04
A float rounded to 3 decimal places 3.142
An integer with commas separating thousands 10,003
A float converted to a percentage to 2 decimal places 87.99%
An integer converted to binary 1010
An right-aligned integer taking up 10 characters         60


These formatting specifiers allow for convenient conversion of numbers into a number of common formats. A full list of formats can be found in the [Python documentation](https://docs.python.org/3/library/string.html#format-specification-mini-language) for strings. If you're trying to convert a value to a specific format, there's a very good chance there's a simple, compact and convenient way to do it using f-strings.

## Exercise: Reading Chemical Formulae

Your task is to create a piece of code that will count the number of atoms of a specified element in a chemical compound. This formula of the compound will be provided as a string, such as ```CaCO3```. The section "Compound Notation Refresher" at the bottom of this notebook gives a summary of this notation if you need it

We’ll make a series of simplifications for this exercise:
-	You may assume that the chemical formula and elemental symbol you receive are both strings and are both a valid chemical formula and elemental symbol respectively.
-	You may assume no compounds will contain parentheses, such as ```UO2(NO3)2```
-	You may assume each element will appear at most once in the chemical formula
-	There will be no more than 9 atoms of any single element in any compound
-	The fact that numbers are normally in subscript will be ignored. So the string describing H<sub>2</sub>O will be provided as ```H2O```.

In the code cell below, complete the function ```count_atoms``` which takes two arguments. The first argument is a string such as ```H2O``` which describes a compound. The second will be a string such as ```H``` which describes an element. The function should return the number of atoms of the specified element in the specified compound. So if the function is called with ```count_atoms("H2O", "H")```, the value 2 should be returned as an integer. If there are no atoms of the specified element present, the value 0 should be returned.

There are several calls to the function to check your function. A sample solution may be found in ```Sample Solutions/Sample Solutions 2 - Strings.ipynb```.

In [21]:
def count_atoms(formula, element):
    # Complete this function




    
    
print(count_atoms("H2O", "H")) # Should return 2
print(count_atoms("H2O", "Si")) # Should return 0
print(count_atoms("NaCl", "Cl")) # Should return 1
print(count_atoms("Al2O3", "Al")) # Should return 2
print(count_atoms("NaBrO3", "Br")) # Should return 1

IndentationError: expected an indented block after function definition on line 1 (1164018930.py, line 4)

### Compound Notation Refresher

Read this section if you need a reminder of how chemical compounds are described.

Compounds are groupings of two or more atoms. Each atom is an example of a particular element. Each element has a symbol used to represent it (such as “H” or “He”). This symbol is comprised of one or two letters and will always begin with a capital letter. If a second letter is present, it will always be lower case. Compounds can be described with a chemical formula, which contains one or more of these element symbols, followed by subscript numbers which describe how many atoms of that element are in the compound (so "H<sub>2</sub>O<sub>2</sub>" tells us there are two atoms of hydrogen and two atoms of oxygen in the compound). If the symbol of an element is not followed by a number, it means there is only one atom of that element. Here are some examples:

| Compound | Number of Atoms present |
| --- | --- |
| H<sub>2</sub> | 2 Hydrogen (H) |
| H<sub>2</sub>O | 2 Hydrogen (H), 1 Oxygen (O) |
| HNO<sub>3</sub> | 1 Hydrogen (H), 1 Nitrogen (N), 3 Oxygen (O) |

There are more complexities to this notation system, but this is all you need to know for this exercise.