# String Formatting and Manipulations

Strings are a vital component of Python and are often used to record and communicate qualitative data in research codes. Thus, being able to efficiently create and manipulate strings in Python is a useful skill for a researcher working in Python. Fortuantely, Python contains functionality designed to cover many common tasks, making using strings much easier.

## Recap
In this section, we'll briefly recap some of the key features of strings we'll be using in thid module. Strings are a type of data which contain a sequence of characters (including letters, numbers and punctuation). These characters are indexed by integer values such that the first character has an index of 0. For instance:

In [1]:
my_string = "Hello world"
print(my_string[0])
print(my_string[4])

H
o


We can find the length of a string using the ```len``` function:

In [2]:
print(len(my_string))

11


We can also concatenate two strings to create a new string using the ```+``` operator:

In [3]:
my_second_string = my_string + "!"
print(my_second_string)

Hello world!


## Interrogating Strings

Sometimes it can be useful to find out information about the characters of a string. Python contains a number of useful tools for finding out commonly required information about a string.

### Count

The ```count``` method of the string class counts the number of times one string appears in another. For example:

In [4]:
a = "String are great!"
print(a.count("e"))
print(a.count("ea"))
print(a.count("s")) # Note matches are case-sensitive
print(a.count("!")) # Can count  numerals or punctuation characters

2
1
0
1


We may also provide the index of the first character of the section of the string to be searched. If we do this, we may also specify the index of the first character not to be counted.

In [5]:
a = "Starting and Stopping"
print(a.count("S", 1)) # Counting occurrences of "S" in all characters except the first
print(a.count("ing", 3, 7)) # Counting occurrences of "ing" in all characters with indices greater than or equal to 3 and less than 7. Note this excludes the first "ing" because the "g" has an index of 7

1
0


### Find and Index

The ```find``` and ```index``` methods both attempt to find the first instance of one string within another and return the index of the first character of the found string. For instance:

In [6]:
a = "Rabbits, rabbits, rabbits"
print(a.find("rabbit")) # Note that the search is case-sensitive and only the index of the first occurrence is returned
print(a.index("rabbit"))

9
9


We may also search witihin a specified range, just like ```count```:

In [7]:
a = "Hide and seek"
print(a.find("e", 5)) # Only search the section of the string with an index of 5 or greater
print(a.index("e", 5, 11)) # Also exclude sections of the string with indices greater or equal to 11

10
10


The difference between the two methods is that ```find``` will return ```-1``` if the string is not found, while ```index``` will raise a ValueError. Choose which to use dependent on the behaviour you want if the value isn't found.

In [8]:
a = "Sneaky"
print(a.find("Snakey"))
print(a.index("y", 1, 5)) # Note the argument 5 excludes the "y" at the end of "Sneaky"

-1


ValueError: substring not found

### Testing String Content

There are several useful string methods which can examine the contents of the string and tell us if tey fulfil various criteria. These can sometimes be useful when processing or checking strings.

#### Isalnum

The string method ```isalnum``` checks if all charcaters in a string are letters (upper or lower case) or numbers (as opposed to punctuations, spaces, etc):

In [None]:
print("City17".isalnum())
print("Big Bang".isalnum()) # Return False because of the space
print("Pop!".isalnum())# Returns False because of the exclamation mark

True
False
False


#### Isalpha

The string method ```isalpha``` checks if all characters in a string are letters (upper or lower case):

In [None]:
print("Hello".isalpha())
print("d20".isalpha()) #Returns False because of the numbers
print("Hello world".isalpha()) # Returns False because of the space

True
False
False


#### Isnumeric

On a simple level, the string method ```isnumeric``` chekcs if all characters in a string are numerals (0-9):

In [None]:
print("01234".isnumeric())
print("-2".isnumeric()) # Returns false due to the hyphen
print("1.22".isnumeric()) # Returns false due to the ful stop
print("1E3".isnumeric()) # Returns false due to the E

True
False
False
False


This means ```isnumeric``` doesn't test if a string can be converted to a number. If you want to do that, consider writing a function somehting like the following:

In [None]:
def is_number(s):
  try:
    float(s) # Attempts to make a float from s. Will raise a ValueError if it can't
    return (True) # Will be returned if s can be converted to a float
  except ValueError:
    return(False) # Will be returned if s cannot be converted to a float

print(is_number("01234"))
print(is_number("-2"))
print(is_number("1.22"))
print(is_number("1E3"))

True
True
True
True


A slight complication of ```isnumeric``` is that it also recoognises some [Unicode](https://en.wikipedia.org/wiki/Unicode) character codes as numeric, such as those that represents fraction. This is a more uncommon situation, but worth having an awareness of.

In [None]:
half_string = "\u00BD" # This is the string coding for the Unicode character representing a half
print(half_string) # String displays as a fraction 1/2
print(half_string.isnumeric()) # Is recognised as numeric

Â½
True


#### Isdigit

The ```isdigit``` string method checks if all characters are digits. As such, it's very similar to ```isnumeric``` except it returns false for Unicode numbers:

In [None]:
print("01234".isdigit())
print("-2".isdigit()) # Returns false due to the hyphen
print("1.22".isdigit()) # Returns false due to the ful stop
print("1E3".isdigit()) # Returns false due to the E
print("\u00BD".isdigit()) # Unicode character for 1/2 is not composed of digits

True
False
False
False
False


#### Istitle

The ```istitle``` string method checks if the string is written in title case. The string is considered in words (sections of th string separated by space characters). For result returned to be ```True```, all the following must be true:
* No words begin with a lowercase character (a-z)
* At least one word begins with an uppercase character (A-Z)
* No characters, except those at the start of words may be uppercase.

If any of these conditions are not met, ```False``` will be returned instead.

In [None]:
print("Brave New World".istitle()) # All words start with a capital so return True
print("Misery".istitle()) # All words start with a capital so return True
print("20,000 Leagues Under The Sea".istitle()) # No words start with a lowercase character and at least one word starting with an uppercase character so return True
print("To Kill a Mockingbird".istitle()) # Lower case "a" at the start of a word causes False to be returned
print("Party In The USA".istitle()) # Uppercase characters not at the start of a word cause False to be returned
print("1984".istitle()) # No words starting with uppercase
print("".istitle()) # No words starting with uppercase

True
True
True
False
False
False
False


#### Islower

The ```islower``` string method checks if all characters are lowercase, ignoring numerical characters and punctuation. At least one lowercase character must be present for True to be returned.

In [None]:
print("abcd".islower())
print("abcd123".islower()) # Numbers are ignored
print("hello there!".islower()) # Spaces and punctuation are ignored
print("Hello world".islower()) # One or more uppercase letters causes False to be returned
print("1234".islower()) # False will be returned if no lowercase characters are present

True
True
True
False
False


#### Isupper

The ```isupper``` string method checks if all characters are uppercase, ignoring numerical characters and punctuation. At least one uppercase character must be present for True to be returned.

In [None]:
print("YELLING".isupper())
print("C17".isupper()) # Numbers are ignored
print("#LOUD NOISES#".isupper()) # Spaces and punctuation are ignored
print("CaCO3".isupper()) # One or more lowercase characters causes False to be returned
print("1234".isupper()) # False will be returned if no uppercase characters are present

True
True
True
False
False


#### Startswith

The ```startswith``` string emthod checks if a string starts with another string provided as an argument:

In [None]:
print("Python is great!".startswith("Python"))
print("Running code is fun".startswith("Run")) # Doesn't have to be a full word
print("+442075895111".startswith("+44")) # Works with numbers and punctuation too
print("Werewolf".startswith("were")) # Case-sensitive
print("Python".startswith("on")) # Doesn't matter if the specified strings occurs later on

True
True
True
False
False


#### Endswith

The ```endswith``` string method checks if a string ends with anther string specified as an argument:

In [None]:
print("Programming is fun".endswith("fun"))
print("Uh-oh".endswith("Uh"))

True
False


## Creating New Strings From Old

There are several string methods which return one or more new strings from an initial string. These are commonly used when reformatting a string or extracting information from a string.

### Split

The ```split``` string method splits a string into several strings which are returned in a list. The locations of the split are determined by a separator supplied as an argument. It's common to use this method to split a string up into different words by specifying a space as a separator.

In [None]:
print("Split up sentences".split(" "))
print("Stop. Look. Listen. Live.".split(".")) # If the separator occurs at the end of the string, the last value returned in the list will be an empty string
print("ZZZZZZZZ".split("Z")) # Separators are not themselves returned
print("Letters".split("")) # Cannot use an empty separator

['Split', 'up', 'sentences']
['Stop', ' Look', ' Listen', ' Live', '']
['', '', '', '', '', '', '', '', '']


ValueError: ignored

You may optionally provide an extra integer argument to ```split```. This is the maximum number of splits which will be performed:

In [None]:
a = "banana banana banana"
print(a.split(" ", 1))
print(a.split("a", 3))

['banana', 'banana banana']
['b', 'n', 'n', ' banana banana']


### Join

The ```join``` string method joins together all the strings in an iterable (such as a list, or tuple), adding the original string that was used to call ```join``` between each one. The first result is returned as a string.

In [None]:
print(" ".join(["Hello", "world"])) # Can join a list of strings
print("-".join(("555","1234","0000"))) # Can join tuples. String can contain numbers
print(" ".join({".--.":"p", "-.--":"y", "-":"t", "....":"h", "---":"o", "-.":"n"})) # Can join the keys of a dictionary (we'll cover dictionaries in more detail later). Can join punctuation.
print(".".join("ICL")) # When joining strings, the separator will be added between each character
print("".join(["un", "re", "turn", "able"])) # The separator can be blank

Hello world
555-1234-0000
.--. -.-- - .... --- -.
I.C.L
unreturnable


### Repeating Strings

We can repeat a string to create a new string using the ```*``` operator and an ```int```:

In [None]:
print("ho"*3)
print("he"*-1) # Using a value of less than 1 results in an empty string
print("ha"*2.0) # Using a non-int leads to TypeError

hohoho



TypeError: ignored

### Replace

The ```replace``` string method creates a new string, with all (by default) instances of a specified phrase replaced with another phrase. The first argument is the phrase to be replaced, the second is the phrase to replace it with. Optionally, you may give a third argument which specifies have many instances of the phrase to replace.

In [None]:
print("C++ is better than Python. R is better than Python".replace("better", "worse")) 
print("trolololo".replace("o", "a", 2)) # Replace on the first 2 "o"s

C++ is worse than Python. R is worse than Python
tralalolo


### Cases
The case of alphabetical characters in a string may be modified using the following methods:



* ```upper```: converts all alphabetical characters to uppercase
* ```lower```: converts all alphabetical characters to lower case
* ```title```: converts alphabetical characters to title case
* ```swapcase```: swaps the case of all alphabetical characters

In each case a new string is returned and the original string is unchanged.



In [None]:
a = "2 be or not 2 be, that is The Question"
print(a.upper())
print(a.lower())
print(a.title())
print(a.swapcase())

2 BE OR NOT 2 BE, THAT IS THE QUESION
2 be or not 2 be, that is the quesion
2 Be Or Not 2 Be, That Is The Quesion
2 BE OR NOT 2 BE, THAT IS tHE qUESION


### Stripping Whitespace

Sometimes strings can have unwanted whitespace (space characters) at the start or end of the string. A family of commands can remove this whitespace.

* ```strip```: Removes whitespace at start and end of string
* ```lstrip```: Removes whitespace from the start of the string
* ```rstrip``: Removes whitespace from the end of the string

In each case a new string is returned with the relevant whitespace removed and the original string is left unchanged.

In [None]:
a = "    hi    "
# The "|" characters in the following examples have been added to show the left and right extent of the stripped string
print("|"+a.strip()+"|")
print("|"+a.lstrip()+"|")
print("|"+a.rstrip()+"|")

|hi|
|hi    |
|    hi|


## Combining Functions

The functions in this notebook are each useful in their own right, but they become even more useful when combined together. This can be over the course of several statements, or in a single long statement. Combining several operations into a single expression produces more compact and slightly faster code but ,ay be less readable. However, as it's rare to chain more than 2 or 3 functions, it's normally fine to combin operations in this way.

In [None]:
a = ["  ", "HeLlO", "ThErE", "  "]
# A long-winded way to do it
b = " ".join(a)
c = b.lower()
print(c.strip())
# A moe compact way to do it
print(" ".join(a).lower().strip()) # The join operation is executed first. This creates a string, which lower operates on to produce another string which strip operates on

hello there
hello there


## The Format Method

Sometimes it can be useful to create long strings using data from several different variables or expressions. One way to do this is to convert variables to strings and concatenate them, like this:


In [None]:
data = [1,2,3]
mean = sum(data) / len(data)

summary_string = "The data " + str(data) + " has " + str(len(data)) + " entries, and a mean of " + str(mean) + "."
print(summary_string)

The data [1, 2, 3] has 3 entries, and a mean of 2.0.


This works, but the format method of the string class methods this somewhat easier and more compact:

In [None]:
summary_string = "The data {} has {} entries, and a mean of {}.".format(data, len(data), mean)
print(summary_string)

The data [1, 2, 3] has 3 entries, and a mean of 2.0.


The format method goes through the string and replaces each instance of ```{}``` with the arguments passed to the method, after the ```str``` method has been applied to them. The first argument gets placed in the first ```{}```, the second in the second and so on. This method of creating a string has a few advantages:

* The code is shorter: we don't need to keep opening and closing strings, use the concatenate operator or explictly call the ```str``` function
* It's easier to read: It's much easier to see the structure of the string and how the data slots into it
* We can reuse the original string with new bits of data:

In [None]:
formatting_string = "The data {} has {} entries, and a mean of {}."

data1 = [10, 4, 100, 52]
mean1 = sum(data1) / len(data1)
data2 = [1e-3, 0.1, 0.25]
mean2 = sum(data2) / len(data2)

print(formatting_string.format(data1, len(data1), mean1))
print(formatting_string.format(data2, len(data2), mean2))

The data [10, 4, 100, 52] has 4 entries, and a mean of 41.5.
The data [0.001, 0.1, 0.25] has 3 entries, and a mean of 0.11699999999999999.


### Formatting Numerical Arguments With the Format Statement

We can also specify the how we want each of the arguments passed to the ```format``` method to be formatted by specifying a format within the corresponding ```{}```. This will typically be a colon followed by one or more characters. For example:

In [None]:
data = [10000, 7000, 64998]
mean = sum(data) / len(data)

print("A float in scientific format {:e}".format(12345.67))
print("A float rounded to 3 decimal places {:.3f}".format(3.14159))
print("An integer with commas separating thousands {:,}".format(10000000000000))
print("A float converted to a percentage to 2 decimal places {:.2%}".format(0.879873))
print("An integer converted to binary {:b}".format(10))
print("An right-aligned integer taking up 10 characters {:>10}".format(60)) # Can be useful for aligning values with printing/writing to file several values

A float in scientific format 1.234567e+04
A float rounded to 3 decimal places 3.142
An integer with commas separating thousands 10,000,000,000,000
A float coverted to a percentage to 2 decimal places 87.99%
An integer converted to binary 1010
An right-aligned integer taking up 10 characters         60


These formatting strings allow for convenient conversion of numbers into a number of common formats. A full list of formats can be found in the [Python documentation](https://docs.python.org/3/library/string.html) for strings. If you're trying to convert a value to a specific format, there's a very good chance there's a simple, cmpact and convenient way to do it using the ```format``` method.