<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

Created by [Nathan Kelber](http://nkelber.com) for [JSTOR Labs](https://labs.jstor.org/) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)<br />
For questions/comments/improvements, email nathan.kelber@ithaka.org.<br />
____

# Python Basics 5

**Description:**
This notebook focuses on strings, preparing learners to use:
* Escape characters
* String methods

The lesson ends with the construction of a basic tokenizer.

**Use Case:** For Learners (Detailed explanation, not ideal for researchers)

**Difficulty:** Beginner

**Completion time:** 75-90 minutes

**Knowledge Required:** 
* [Getting Started with Jupyter Notebooks](./getting-started-with-jupyter.ipynb)
* [Python Basics 1](./python-basics-1.ipynb)
* [Python Basics 2](./python-basics-2.ipynb)
* [Python Basics 3](./python-basics-3.ipynb)
* [Python Basics 4](./python-basics-4.ipynb)

**Knowledge Recommended:** None

**Data Format:** None

**Libraries Used:**
* Counter()

**Research Pipeline:** None
___

## String basics

Python strings can use single or double quotes. If the string contains a single quote character, it may be beneficial to use double quotes. Try printing out the string in the next code cell:

In [None]:
# Print out a string using single or double quotes
string = 'Hello World: Here's a string.'
print(string)

An easy solution would be to use double quotes, such as:
> string = "Hello World: Here's a string."

The use of double quotes keeps Python from ending the string prematurely. But what if your string contains both single and double quotes?

### Escape characters

In order to insert certain characters into a Python string, we need to use an escape character. An escape character begins with a `\`. For example, we could insert a single quote into a string surrounded by single quotes by using an escape character.

In [None]:
# Print out a single quote in a Python string
string = 'There\'s an escape character in this string.'
print(string)

The backslash character `\` in front of the single quote tells Python not to end the string prematurely. Of course, this opens a new question: How do we create a string with a backslash? The answer is another escape character using two backslashes.

In [None]:
# Print a backslash using an escape character
string = 'Adding a backslash \\ requires an escape character.'
print(string)

Another option is to use a raw string, which ignores any escape characters. A raw string simply starts with an `r` similar to an `f` string.

In [None]:
string = r'No escape characters \ here'
print(string)

Escape characters also do more than just allow us to add quotes and backslashes. They are also responsible for string formatting for aspects such as tabs and new lines.

|Code|Result|
|---|---|
|`\'`| ' |
|`\\`| \ |
|`\t`| tab |
|`\n`| new line|

In [None]:
# Print out a string with two lines
string = 'The first line of a string\nThe second line of a string'
print(string)

The newline escape character `\n` can affect readability for many lines. A more readable option is to start a string with a triple quote to start and end the string. This method can also automatically interpret tabs.

In [None]:
# Print out Shakespeare's Sonnet 18
string = """Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer’s lease hath all too short a date;
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm'd;
And every fair from fair sometime declines,
By chance or nature’s changing course untrimm'd;
But thy eternal summer shall not fade,
Nor lose possession of that fair thou ow’st;
Nor shall death brag thou wander’st in his shade,
When in eternal lines to time thou grow’st:
    So long as men can breathe or eyes can see,
    So long lives this, and this gives life to thee."""

print(string)


### String slices
The characters of a string can also be indexed and sliced like the items of a list. 

In [None]:
# Using a string index
string = 'Python Basics'
string[0]

In [None]:
# Slicing a string
string = 'Python Basics'
string[0:6]

## String methods

There are a variety of methods for manipulating strings. 


|Method | Purpose | Form |
|---|---|---|
|.lower()| change the string to lowercase | string.lower()|
|.upper()| change the string to uppercase | string.upper()|
|.join()| joins together a list of strings | ' '.join(string_list)|
|.split()| splits strings apart | string.split()|
|.replace()| replaces characters in a string | string.replace(oldvalue, newvalue)|
|.rjust(), .ljust(), .center()| pad out a string | string.rjust(5)|
|.rstrip(), .lstrip(), .strip()| strip out whitespace | string.rstrip()|


All of the characters in a string can be lowercased with `.lower()` or uppercased with `.upper()`.

In [None]:
# Lowercase a string
string = 'Hello World'
string.lower()

These methods do not change the original string, but they return a string that can be saved to a new variable.

In [None]:
# The original string is unchanged
print(string)

# The returned string can be assigned to a new variable
new_string = string.upper()
print(new_string)

A string can be split on any character, or set of characters, passed into `.split()`. By default, strings are split on any whitespace including spaces, new lines, and tabs.

In [None]:
# Splitting a string on white space
string = 'This string will be split on whitespace.'
string.split()

In [None]:
# Splitting a phone string
phone_string = '313-555-3434'
phone_string.split('-')

Similarly, lists of strings can be joined together by passing them into `.join()`. A joining string must be specified before the `.join()`, even if it is the empty string `''`.

In [None]:
# List of strings joined together
name_list = ['Sam', 'Delilah', 'Jordan']
', '.join(name_list)

The `.strip()` method will strip leading and trailing whitespace (including spaces, tabs, and new lines) from a string. Remember, these changes will not affect the original string, but they can be assigned to a new variable.

In [None]:
# Stripping leading and trailing whitespaces from a string
string = '    Python Basics '
string.strip()

It is also possible to only strip whitespace from the right or left of a string.

In [None]:
# Stripping leading whitespace from the leftside of a string
string = '    Python Basics '
string.lstrip()

Characters in a string can be replaced with other characters using the `.replace()` method.

In [None]:
# Replacing characters in a string with .replace()
string = 'Hello world'
string.replace('l', 'x')

In [None]:
# Removing characters from a string
# using .replace with an empty string
string = 'Hello! World!'
string.replace('!', '')

Finally, strings can be justified (or padded out) with characters leading, trailing, or both. By default, strings are justified with spaces but other characters can be specified by passing a second argument.

In [None]:
# Printing a dictionary of contacts in neat columns
contacts ={
 'Amanda Bennett': 'Engineer, electrical',
 'Bryan Miller': 'Radiation protection practitioner',
 'Christopher Garrison': 'Planning and development surveyor',
 'Debra Allen': 'Intelligence analyst'}

print('Name'.ljust(22), 'Occupation')
print('|'.center(44, '-'))
for name, occupation in contacts.items():
    print(name.ljust(20), '|', occupation)

### Checking string contents

There are a variety of ways to to verify the contents of a string. These return a Boolean `True` or `False` and are useful for flow control. For example, we can check if a particular set of characters is inside of a string with the `in` and `not in` operators. The result is a Boolean True or False.

In [None]:
# Check whether a set of characters can be found in a string
string = 'Python Basics'
'Basics' in string

The following string methods also return Boolean `True` or `False` values.

|Method | Purpose | Form |
|---|---|---|
|.startswith(), .endswith()| returns `True` if the string starts/ends with another string | string.startswith('abc')|
|.isupper(), .islower()| returns `True` if all characters are upper or lowercase| string.isupper()|
|.isalpha()| returns `True` if string is only letters and not blank | string.isalpha()|
|.isalnum()| returns `True` if string only letters or numbers but not blank | string.alnum()|
|.isdecimal()| returns`True` if string is only numbers and not blank | string.isdecimal()|

In [None]:
# Checking if a string starts 
# with a particular set of characters

string = 'Python Basics'
string.startswith('Python')

In [None]:
# Checking if a string is lowercased
string = 'python basics'
string.islower()

In [None]:
# Checking if a string is alphabet characters
string = 'PythonBasics'
string.isalpha()

In [None]:
# Checking if a string only
# alphabetic characters and numbers
string = 'PythonBasics5'
string.isalnum()

In [None]:
# Checking if a string is only numbers
string = '50'
string.isdecimal()

The `.isdecimal()` method does not check to see if a string is a float. It only checks if the string is made of numbers.

In [None]:
# Checking if a string is only numbers
string = '5.0'
string.isdecimal()