<center> 
# R406: Using Python for data analysis and modelling

<br> <br> 

## Lecture 5: Strings and regular expressions

<br>

<center> **Andrey Vassilev**

<br> 

<center> **2016/2017**
 

# Outline

1. Strings and string manipulation
2. Regular expressions

# A one-minute review of strings

Strings in Python can be defined using single or double quotes:

In [None]:
s1 = 'String One'
s2 = "string two"

Multiline strings are defined as follows:

In [None]:
s3 = '''Multi
line
string'''
s4 = """Multi
line
string"""
s3 == s4 # The quotes chosen don't matter

In [None]:
print(s3) # Note the way this string prints

# A one-minute review of strings

Strings can be indexed and sliced:

In [None]:
print(s1[2])
s1[0:6]

We can iterate over a string:

In [None]:
for l in s1:
    print(l, end = "  ")

Strings are immutable: `s2[0]=3` will raise an error.

# Case manipulation

The following are self-explanatory. Note that they return copies of the original string (unsurprisingly for an immutable object). Try them out!

In [None]:
s1.upper() # Convert to uppercase

In [None]:
s1.lower() # Convert to lowercase

In [None]:
print(s2)
print(s2.title()) # Convert to titlecase

In [None]:
s2.capitalize() # Capitalize (make first letter capital)

In [None]:
print(s1)
print(s1.swapcase()) # Change the case

# Adding and removing spaces

&nbsp;

## Removing whitespace

Leading and trailing whitespace can be removed by using the `strip()` method.

In [None]:
s = "    some text            "
s.strip()

Removing only leading or only trailing whitespace can be accomplished with `lstrip()` and `rstrip()`.

In [None]:
s.lstrip()

In [None]:
s.rstrip()

The `strip()` method and its variations also accept as argument a specific character to be stripped.

In [None]:
s = "++++++pure text+++++"
s.strip("+")

In [None]:
s.rstrip("+")

## Adding space

These operations serve primarily a formatting purpose. For example, the `center()` function inserts additional whitespace to center a string, producing an appropriately padded (longer) string.

In [None]:
s = "small piece of text"
s.center(30) # The argument is the overall length 
             # of the resulting string

We can also specify the padding character:

In [None]:
s.center(30,"^")

The `ljust()` and `rjust()` methods perform the respective justifications by means of appropriate one-sided padding. They work similarly to `center()`.

In [None]:
s.ljust(40)

In [None]:
# You can also specify the string directly
# instead of defining a variable
"Python is sooo cool".rjust(25,"☯")

There is also a special zero-fill method `zfill()` which pads with zeros from the left (i.e. it right-justifies text):

In [None]:
x = 777
str(x).zfill(10)

# Finding and replacing text

&nbsp;

## Finding text

We can find the first occurrence of a substring by using the methods `find()` and `index()`. The search is performed from the left. Both methods return the position where the substring is found.

In [None]:
s = "A yellow python is prettier than a black python."
s.find("python")

In [None]:
s.index("python")

The difference between the two methods is how they handle the case when the substring is not found: `find()` returns -1, while `index()` throws ~~a tantrum~~ an error.

In [None]:
s.find("anaconda")

In [None]:
s.index("anaconda")

Finding a string by performing the search **from the end** can be done with the methods `rfind()` and `rindex()`.

In [None]:
s.rfind("python")

The `find()` and `index()` methods can optionally take a second and third argument specifying a starting position and an end position in the string which will confine the search to the respective range.

In [None]:
s.find("python", 15)

In [None]:
s.find("python", 15, 30)

There are also methods that check whether a string begins or ends with a specific substring. These are called `startswith()` and `endswith()`, respectively. They return Boolean values.

In [None]:
s.startswith("A blue")

In [None]:
s.endswith("python.")

## Replacing text

**Once again, these operations do not modify the original string but return a new string containing the changes!**

We can replace a substring with another one using the `replace()` method. The syntax is ```s.replace("old", "new")
```

In [None]:
s.replace("python", "anaconda")

# Splitting strings

One can break a string into substrings using different methods.

The `partition()` method finds the first occurrence of a target substring and returns a tuple in the following manner:

In [62]:
s.partition("python")

('A yellow ', 'python', ' is prettier than a black python.')