# Welcome to the Dark Art of Coding:
## Introduction to Python
Strings

<img src='../universal_images/dark_art_logo.600px.png' width='300' style="float:right">

# Objectives
---

In this session, students should expect to:

* Learn to create strings
* Learn to perform basic manipulations on strings...
* Explore using string methods


<h1>What is a string?</h1>
<img src='02_images/pearl_necklace.jpg'>

Photo by <a href="https://www.flickr.com/photos/tiararama/3530304167">tiarama</a><br>
Attribution http://creativecommons.org/licenses/by-nd/4.0/


# Strings
---

In [None]:
# Using single quotes

phrase = 'python rocks'
print(phrase)           # print() displays the value on the "screen" and
                        #     and will not display the quotes
                        #
phrase                  # in Jupyter/IPython, leaving off the print() function
                        #     simply evaluates the variable and displays it as
                        #     a string, with quotes 
                        #
                        # NOTE: this evaluation processes all the lines in the cell,
                        #     but only shows the evaluation results for the last line.

In [None]:
# Using double quotes is also fine (but physically harder to type)

phrase = "python rocks"
           
phrase                  

In [None]:
# A pair of quotes with nothing between them is still a string.
#     It has all the qualities of any other string.
#     It is generally referred to as an EMPTY string.

empty = ''
empty

In [None]:
# Watch out if mixing apostrophes and single quotes

apostrophe = 'You've written code'

# NOTE: Please don't ignore error messages.
# Reading them is a skill like any other.
# Practice reading them, even if they feel like ελληνικά (Greek)

In [None]:
# Using double quotes to encapsulate the single quote
#     solves this problem

apostrophe = "You've written code"
apostrophe

In [None]:
# Using a backwards slash to "escape" the apostrophe also works

escape = 'You\'ve written code'
escape

In [None]:
# Numbers encapsulated by quotes are still strings:

num = '42'
num

In [None]:
# To convert a numeric string to an integer
integer_form = int('42')
integer_form

In [None]:
# Or convert a numeric string to a float

float_form = float('42')
float_form

# Escape characters
---

Character  | Displays
----------|----------
\'        | Single quote
\"        | Double quote
\t        | Tab 
\n        | Newline (line break, return character)
\\\\        | Backslash

In [None]:
# Printing using a newline escape character
#     enables you to put content on separate lines

print("Python\nPython 3\nPython 3.6")

# NOTE: there are better solutions that we will explore soon.

In [None]:
# Raw strings let you preserve a string "as is"
#    escape slashes and all

print(r'You\'re gonna be a great programmer!')

# Experience Points!
---

In **Jupyter** do each of the following:

Task | Sample Object(s)
:---|:---
Create a string with your first name, a tab, your last name and a newline in it; `print()` the string | `\t`, `\n`
Create a string with a word, three tabs, and a word in it; `print()` the string | `\t\t\t`
Create a string with a single quote in it; `print()` the string | `'`
Recreate the same string using an alternate method|

When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

# Multi-line strings
---

In [None]:
# Multi-line strings: using triple quotes (either ' OR ")
#     preserve natural newlines and leading spaces, etc.

print("""Multi-line Strings!

multi-line strings will preserve
    the nuances
        including the newlines and leading spaces and 

yes, you're still gonna be a great programmer!

""")

In [None]:
# One great place to use multi-line quotes is as the 
#     first string in a function. This string is
#     automatically used by Python as the documentation
#     or Docstring for your function.

def getRandomNumber():
    '''This function returns a random number
    
    chosen by fair dice roll.
    guaranteed to be random.
    
    hat tip to http://xkcd.com/221/'''
    return 4

In [None]:
# Shift-Tab allows you to access help documentation for objects, functions,
#     and for modules, etc.
# Place your cursor in the name of the object and press Shift+Tab:

getRandomNumber

In [None]:
help(getRandomNumber)

## xkcd
<img src='02_images/random_number.png'>

Attribution http://xkcd.com/221/

# Indexing and slicing
---

## Indexing

Each character in a string has an index:
Indexes start at zero (0)

```Python
language = 'P  Y  T  H  O  N'
            0  1  2  3  4  5
```

Indexes also exist counting backwards from the end:

NOTE: reverse indexing starts counting at -1.


```Python
language = 'P  Y  T  H  O  N'
            0  1  2  3  4  5
           -6 -5 -4 -3 -2 -1 
```

In [None]:
# To reference a specific character in a string, you
#     simply use bracket notation and the index
#     of the character you are interested in.

phrase = 'Pyladies!'
phrase[0]

In [None]:
phrase[8]

In [None]:
# To capture a COPY of that character for use, you can assign a label 

letter_e = phrase[-3]
letter_e

## Slicing

In [None]:
# To reference more than one character, you 
#     use the same bracketed notation and a 
#     * starting index
#     * ending index
#     separated by a colon (:)
# WARNING: Python, slices up to, but NOT including the
#          last index.

print(phrase[2:8])

## The backstory to indexing and slicing

Why **zero indexing**?

Why the principle of **'up to but not including'**?

[See the rationale by Djikstra](https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF)   

History buff?

[Learn more about one of the greats in Computer History](https://en.wikipedia.org/wiki/Edsger_W._Dijkstra)

In [None]:
# You can used negative numbers as the start OR end index

phrase[-7:-1]

You can mix and match negative and positive indexes

```Python
  phrase = 'P  y  l  a  d  i  e  s  !'
            0  1  2  3  4  5  6  7  8
           -9 -8 -7 -6 -5 -4 -3 -2 -1
                  ^                 ^
```

In [None]:
phrase[2:-1]

In [None]:
# Leaving the index on either side of the colon blank
#     defaults to 
#     * 0 (zero) before the colon
#     * the end of the string after the colon
#     NOTE: this potentially 'unexpected behavior' for the ending index

print(phrase[:])


In [None]:
# This shortcut is a common way used to create copies...

new_phrase = phrase[:]
new_phrase


# Exceeding bounds
---

In [None]:
# If you attempt to exceed the index bounds when accessing a 
#     specific character, you will get an error condition:
#     'index out of range'

print(phrase[9000])

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_strings_01.py```

Execute your script in **Jupyter** using the command:

```bash
run my_strings_01.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally.

1. Assign the label `myname` to a string with your first and last names
1. Assign the label `shortname` to a **slice** of `myname` that has all letters **except** the first and last letters
1. `print()` `shortname` to the screen
1. Assign the label `multi` to a string that is multi-line. Include in the multi-line string the following items (one per line): your name, your favorite food, your favorite song'

When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

# Finding the length of a string
---

In [None]:
# to find the length of a string, Python has a builtin function used
# to determine the length of objects
# NOTE: for strings, it simply returns the number of characters

len(phrase)

# Parsing each character, one at a time
---

A common operation in computing is to access the individual characters of a string. Python simplifies this process dramatically via the use of the `for element in object:` statement.

`object` is generally any Python object that contains a sequence of things.

`element` is a target variable and can be called any name that makes sense to you in the context.

Here, we want to parse letters in a phrase so we call the target variable `letter`. 

We will go into depth on for loops in a future lesson.

In [None]:
for letter in phrase:
    print(letter) 

# `in` as a logical operator
---

Python includes the keyword `in` as a logical operator that allows you to test a string to see if a substring is present anywhere in the string.


In [None]:
# For example:

'Py' in phrase

In [None]:
# The `in` test is case-sensitive

'py' in phrase

In [None]:
result = 'ladies' in phrase
result

### DANGER WILL ROBINSON: beware of empty strings

WARNING: the empty string `''` is ALWAYS considered to be present in a string

In [None]:
'' in phrase

Python also includes a `not` keyword. The `not` keyword is used a logical operator to negate a `True` or `False` value (we will talk more about `True` and `False` later).

It is also used to negate the `in` logical operator.

Here we test whether OR not the string `Fortran` is contained within the `str` `phrase`

In [None]:
'Fortran' not in phrase

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_strings_02.py```

Execute your script in **Jupyter** using the command:

```bash
run my_strings_02.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally.

1. Assign the label `myhobby` to a string with the name of your favorite hobby
1. Assign the label `length` to the result of using the `len()` function to calculate the length of the string referenced by `myhobby`
1. Parse the string referenced by `myhobby` using a `for loop` and print each character to the screen, one at a time.
1. Assign a label `result` to a **conditional test** to see if the string `python` is present in `myhobby`
1. `print()` the `result` to the screen 

When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

# Comparisons
---

In [None]:
term1 = 'Python'
term2 = 'python'

In [None]:
# Python interprets string equality based on 
#     case sensitivity:
#     Thus 'Python' and 'python' are different


term1 == term2

In [None]:
# Python accounts for lexigraphical order (loosely alphabetical order)
#     when determining whether a string comes before or
#     after another string


term3 = 'abcde'
term4 = 'abcdz'

In [None]:
term3 < term4

In [None]:
# lexigraphical order puts capital letters first

word1 = 'pyladies'
word2 = 'Pyladies'

word1 < word2

Lexigraphical order, what?
---

In mathematics and computer science, lexigraphical order is an assigned order over a sequence of items (ie. letters, digits, symbols or commands).

Collections of characters have been created to be used by computers. The two most frequently used are American Standard Code for Information Interchange (ASCII) OR Unicode (in fact, Unicode includes all the ASCII characters as the first set of Unicode characters.)

This is a summary of the ASCII characters.
When this collection was created, each character was given a numerical order... 

NOTICE: **numbers** (characters 48 to 57) come before **uppercase** letters (characters 65 to 90) which come before **lowercase** letters (characters 97 to 122).

<img src='02_images/ascii-0-127.gif' width='600'>

Attribution: [ascii chart: http://www.jimprice.com/ascii-0-127.gif](http://www.jimprice.com/ascii-0-127.gif)

# Object Methods
## >>> particularly string methods
---

Objects in Python are associated with a programming paradigm called "Object Oriented Programming".

Each object has attributes and behaviors associated with it, even if it is not inherently obvious at first. We will discuss this many times in the future. 

* **Attributes**: Learning what attributes are available will help you discover details about the objects you create
* **Behaviors**: Understanding what behaviors your objects can exhibit will enable you to write efficient and effective code. Behaviors are often called methods OR functions

Strings are no exception.

Attributes and behaviours are accessed via dot.notation:

```python
phrase.some_attribute
phrase.some_behaviour()```

Attributes are accessed using the `name_of_the_object.name_of_the_attribute`

Behaviours are accessed similarly, but because they are methods/functions, they need to be **called** by using the **parenthesis**.

**All** functions/methods can be referenced without the parenthesis, but to actually make them execute, you must call them using the parenthesis.


In [None]:
phrase = 'Pyladies!'

In [None]:
# Every string in Python is also an object

phrase

In [None]:
# Strings in Python come with behaviors such as the 
#     .upper() method which makes a copy of the string
#     and converts it to uppercase.

phrase.upper()

In [None]:
# Strings in Python come with behaviors such as the 
#     .lower() method which makes a copy of the string
#     and converts it to lowercase.

phrase.lower()

Generically referencing strings:

The generic version of a string is called `str` so in help documentation or tutorials, you may see string methods or attributes prefixed by str. Similarly, a single `S` may be used as an examplar:

```python
str.upper()
str.lower()
S.upper()
S.lower()
```

## Real-life use case

One very common place where `str.upper()` or `str.lower()` is used is in normalizing inputs for easy comparison.

In the following example, you don't know if your user will input with all **CAPS**, all **lowercase**, or **MiXeD CaSe**... so comparing their input to your test string becomes very difficult. A common technique is to normalize both the input value and the test value to either uppercase OR lowercase.

Here, we chose lowercase.

In [None]:
choice = input('Who is your favorite superhero? ')

# by normalizing the input value to lowercase, we can easily compare it to our lowercase test.

if choice.lower() == 'selina kyle':
    print('I like Catwoman, too')
else:
    print("Catwoman is better!")

# Finding an index
---

Sometimes we want to know: 

* whether a character is present in a string
* where that character may be found

We will look at `phrase.find()` and leave it as an exercise for the student to research `phrase.index()`

In [None]:
sentence = 'Python Rox'
sentence

If we want to find where the `t` string is located, we can use the `.find()` function.

In [None]:
sentence.find('t')

# P y t h o n   R o x
# 0 1 2 3 4 5 6 7 8 9

If we want to find where the `o` string is located, we can use the `.find()` function.

NOTE: this only shows us the location of the FIRST 'o'

In [None]:
sentence.find('o')

# P y t h o n   R o x
# 0 1 2 3 4 5 6 7 8 9

As you begin to use methods and functions, I highly recommend that you read the help documentation. Let's do that with the `.find()` method:

In [None]:
sentence.find?

In [None]:
# To continue processing the string to find other examples
#     of the character 'o', we can start the search
#     ONE character up, from where the last character was found.

sentence.find('o', 5)

# P y t h o n   R o x
# 0 1 2 3 4 5 6 7 8 9

In [None]:
# To automate such processing users often use a placeholder 
#     variable to help identify where to start the next search

start = sentence.find('o')
print(start)
subsequent = sentence.find('o', start + 1)
print(subsequent)

# P y t h o n   R o x
# 0 1 2 3 4 5 6 7 8 9

# Testing string characteristics
---

There are methods to test for characteristics:

These type of methods often start with:

`.is*` i.e. 

* `.isupper()`
* `.islower()`

In [None]:
heroine = 'CATWOMAN'
heroine.isupper()

In [None]:
heroine_mixed = 'CaTwOmAn'

heroine_mixed.islower()

In [None]:
# How do you find out which methods exist for strings.
# Use tab-completion

sentence.

Method | Purpose
-------|--------
.isalpha()     | Verifies whether ALL the characters are alphabetic
.isalnum()     | Verifies whether ALL the characters are alphabetic or numeric
.isdecimal()     | Verifies whether ALL the characters are numerical
.isspace()     | Verifies whether ALL the characters are whitespace (\t, \n, ' ', etc)
.istitle()     | Verifies whether the string is in 'Title Case'

In [None]:
# NOTE: even string literals that have not been assigned a label
#       have the same methods and attributes associated with
#       them.

'bullwhip'.islower()

In [None]:
# NOTE: numerical characters in a string DO NOT have a sense
#       of upper or lowercase.

'42'.isupper()

In [None]:
# As noted above, str is the base class (i.e. blueprint) for 
#     all string objects, so we can use str.<methodname> to search for help on a particular method.

help(str.isspace)

In [None]:
# any str object will have exactly the same methods

help(heroine.isspace)

## .join(), .split(), .startswith() and .endswith()

Method | Purpose
------------------|--------
.join()     | Combine all the elements of a sequence (like a `list`) using the string
.split()     | Separate the string into substrings by splitting on a given string
.startswith()     | Verifies whether the string STARTS with a given substring
.endswith()     | Verifies whether the string ENDS with a given substring

### .split()

In [None]:
address = 'bishop street|honolulu|hawaii'

# To split this into short strings, we give the .split() method a substring to split on...

results = address.split('|')
print(results)

# NOTE: 
#     * the substring is removed completely
#     * the output is always a list
#     * visually, you can tell you are looking at a list by the square brackets at each end: [ ]

Remember:

It never hurts to learn more about new functions you learn!

And it never hurts to test the things that you learn.

In [None]:
str.split?

### .join()

If you have a sequence of strings, such as the results list we just created, you can join them together using the .join() method.

In [None]:
new_string = ','.join(results)
print(new_string)

### .endswith(), .startswith()

Python also provides for the ability to test whether a string starts with or ends with a particular character OR series of characters.

In [None]:
vehicle = 'batmobile'

In [None]:
vehicle.startswith('bat')

In [None]:
vehicle.endswith('MOBILE')

### .center(), .rjust(), .ljust()

These methods create new string based on the original string that contains a given number of characters. Depending on the method called, the new string may contain the original string on the left edge, the center, OR the right edge.

Method | Purpose
-------|--------
.center()     | Center the string in a field of given width
.rjust()     | Right justify the string in a field of given width
.ljust()     | Left justify the string in a field of given width


In [None]:
# In this case, I include the asterixes simply to bookend the field so you can
#     see that it really made a field of the desired width

print('*', 'aloha'.center(40), '*')

## Real-life use case

In [None]:
# Presume you want to create a header for a set of data composed of 
#     three columns: Date, Name, and Address
# If you know the width of your columns, you can put the column name and the 
#     data into fields of the correct width and display it in an organized fashion.


print('Date'.ljust(13), 'Name'.center(20), 'Address'.rjust(20))
print('20190223'.ljust(13), 'Dark Lord of Python'.center(20), 'Oahu, Hawaii'.rjust(20))

### .strip(), .rstrip(), .lstrip()

Method | Purpose
-------|--------
.strip()     | Remove all the given characters from the string (on both ends)
.rstrip()     | Remove all the given characters from the right end of the string 
.lstrip()     | Remove all the given characters from the left end of the string

In [None]:
str.strip?


In [None]:
# The following string has many newlines at each end, which need to be removed OR stripped.

newline_str = '\n\n\n\n\n\n*this string\nhas\nnewlines and stars*\n\n\n\n\n\n'
print(newline_str)

# NOTICE: the swaths of whitespace before and after.

In [None]:
# The .strip() function automagically removes all whitespace characters

clean_version = newline_str.strip()
print(clean_version)

## What is whitespace in Python?

|Name|Representation|
|----|----|
|Tab|`\t`|
|Newline|`\n`|
|Space|` `|
|Carriage Return|`\r`|


# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_strings_03.py```

Execute your script in **Jupyter** using the command:

```bash
run my_strings_03.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally.

1. Assign the label `myname` to a string with your first and last names separated by a comma
1. Assign a label `names` to the result of splitting (using `.split()`) the variable `myname` based on a comma
1. Print `names`
1. Assign a label `pipe` to the result of joining (using `.join()` and the `\` character) the terms in `names` into a single string.
1. Print `pipe`
1. Test whether `myname` is ALL uppercase: IF you don't remember which method to use, research string functions.
1. Create a field 60 characters wide and right justify your full name in the field: IF you don't remember which method to use, research string functions.


When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

# String Formatting

Method | Purpose
-------|--------
.format()     | return a formatted version of a string

In [None]:
help(str.format)

In [None]:
boilerplate = '''Catwoman is a complex character with many plot lines. Several women have
used the name Catwoman, including: {}, {}, and {}.
'''.format('Selina Kyle', 'Holly Robinson', 'Eiko Hasigawa')

print(boilerplate)

In [None]:
positionals = '{1} {0}'.format('last', 'first')

print(positionals)

In [None]:
headline = '''In a battle of the superheroes between {0} and {1},
in this round, {0} clearly came out on top, getting away with 
the jewels. {0} snuck away in the dark of night.'''.format('Catwoman', 'Batman')

print(headline)

In [None]:
# Alignment

left_align = '{:20}'.format('Selina')
print(left_align)

left_align = '{:<20}'.format('Selina')
print(left_align)

right_align = '{:>20}'.format('Holly')
print(right_align)

center_align = '{:^20}'.format('Eiko')
print(center_align)

In [None]:
# Padding

left_pad = '{:*<20}'.format('Selina')
print(left_pad)

right_pad = '{:_>20}'.format('Kyle')
print(right_pad)

In [None]:
# Truncation

trunc = '{:.6}'.format('Selina Kyle')
print(trunc)

# Combining them all together

trunc_pad = '{:->20.6}'.format('Selina Kyle')
print(trunc_pad)

In [None]:
# Numbers

print('{:d}'.format(42))

print('{:f}'.format(2.71))

print('{:20d}'.format(42))

print('{:6.2f}'.format(2.718281828459045))

print('{:+d}'.format(42))

print('{:d}'.format(-42))

In [None]:
# Named Placeholders

name = '{first} {last}'.format(first='Selina', last='Kyle')

print(name)

# Want to learn more?

In years past, this website has been a go-to for me...

https://pyformat.info

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_strings_04.py```

Execute your script in **Jupyter** using the command:

```bash
run my_strings_04.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally.

1. Assign a label `boilerplate` to a boilerplate string that provides placeholders to put your favorite food and song: `"My favorite food is ___ and my favorite song is ___"`
1. Use the `.format()` method of this string to provide the name of:
    1. Your favorite food
    1. Your favorite song

When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

# The new "hotness": f-strings
---

`f-strings` are rapidly becoming my new favorite way to do strings. 

NOTE:  This only works on Python 3.6+

In [None]:
title = 'dark lord'
language = 'python'

f'chalmer is the {title} of {language}'

f-strings not only allow the easy incorporation of variables into boilerplate, just as the `.format()` method does, they also allow for Python expressions and Python function calls to be incorporated into the final string output "on-the-fly".

In [None]:
name = 'Chalmer'
number = 21

# NOTE:
#     * we include the name.upper() function call
#     * we include the math expression number * 2


f'''He said his name is {name.upper()} and his favorite number is {number * 2}.'''

For more information, see this page in the Python documentation:

[formatted string literals](https://docs.python.org/3/reference/lexical_analysis.html#f-strings)  
    