<font color='blue'> First of all, please “Copy to Drive” to get your own copy for editing. </font>

<font color='red'> Run all the cells. For places with "Complete the codes below", please replace the "XXX" placeholder with your own codes.</font>

# How to work with Strings


* Text is represented in programs by the *string* data type. **String** is a sequence of **characters**, often treated as a data structure.
 - String literal - "some characters" or 'some characters' (no difference, use a matching set)
* Both lists and strings are a type of collection known as a *sequence*

      Note: Lists are mutable
      Note: Strings are not mutable, strings cannot be changed "in place"

## Unicode, indexes, slicing, duplicating, and multiline strings

**How does a computer represent strings?**

With version 3.0 and later, Python uses *Unicode* to store the characters in strings.

**Unicode maps each character to an integer(ordinal) value** and provides for most characters in most of the world's languages.

* **Encoding** process
   * A character >> a number >> a sequence of numbers (binary) in computer memory
   * ASCII (American Standard Code for Information Interchange) is a subset of **Unicode**
   * UTF-8: variable-length encoding scheme
   * Python strings support the **Unicode** Standard

In [None]:
ord("a")

In [None]:
chr(97)

**Two built-in functions**:

![String](images/string_function1.png)

In [None]:
# The ordinal value of a Unicode character
print("5 =", ord("5"))     # 5 = 53

In [None]:
print("A =", ord("A"))     # A = 65

In [None]:
print("a =", ord("a"))     # a = 97

**Summary Table**: Python Sequence (String ans List) Operations
* Concatenation, repetition, indexing, slicking, len() apply to both list and string

![Python sequence operations](images/sequence_operations.png)

In [None]:
mylist = ["a", "b", "c","d"]

In [None]:
len(mylist)

In [None]:
mystring ="abcd"

In [None]:
len(mystring)

How to access a character in a string
* index

In [None]:
mylist[0]

In [None]:
mystring[0]

In [None]:
mystring[-1]

How to slice a string
* string[start:end:step]

In [None]:
mylist[0:3]

In [None]:
mystring[0:3]

In [None]:
mylist[1] = "e"   # list is mutable
mylist

In [None]:
mystring[1] = "e"  # string is immutable
mystring

How to use the repetition operator (*)

In [None]:
print("=" * 20)             # ====================
print("A horse! " * 2)      # "A horse! A horse!"

How to use triple quotes to create a multiline string

In [None]:
query = '''SELECT categoryID, name AS categoryName
           FROM Category WHERE categoryID = ?'''
query

In [None]:
query2 = """SELECT categoryID, name AS categoryName
           FROM Category WHERE categoryID = ?"""
query2

* Python uses *Unicode* to store the characters in strings. Unicode maps each character to an integer(*ordinal*) value and provides for most characters in most of the world's languages.
* A string is *immutable*, which means that you can't change its characters. If you attempt to do that, Python raises a TypeError.
* If you use and index that doesn't exist in the string, Python raises an IndexError that indicates that the index is out of range.
* Slicing and duplicating a string works the same as slicing and duplicating a list.
* Triple quotes are used for strings that are used within programs, not for display.

## How to search a string

How to use the **in** keyword to search a string
* *term* in *string*

In [None]:
# Examples
spam = "Congratulations. You've won a million dollars."

"million" in spam

In [None]:
myemail = "ali.gator@ufl.edu"

"@" in myemail

In [None]:
# Code that uses an if statement to check a search

search_term = input("Enter search term: ")
if search_term in spam:
    print("Term found!")
else:
    print("Not found!")

## How to loop through the characters in a string

How to loop through the characters in a string:

In [None]:
# What happens? You are updating the value for a variable --char-- each iteration
## 1st iteration: char = message[0] = "H"
## 2nd iteration: char = message[1] = "i"
## ......
## last iteration: char = message[2] = "!"

# Code that prints each character in a string
message = "Hi!"
for char in message:
    print(char)

In [None]:
# check again, can you understand it?
print(char)

In [None]:
# Code that prints the ordinal value for each character
message = "0123 ABCD abcd"
for cha in message:
    print(ord(cha), end=" ")

# You are updating the value for a variable --cha-- each iteration

In [None]:
# check again, can you understand it?
print(cha)

looping through the characters in a string is similar to looping through the items in a list:

In [None]:
mylist = ["a", "b", "c","d"]

for item in mylist:
  print(item)

In [None]:
for i in range(len(mylist)):
  print(mylist[i])

In [None]:
mystring ="abcd"

for ch in mystring:
  print(ch)

<font color='red'>Complete the codes in the cell below.

In [None]:
# use another for loop with range() function
# looping characters in mystring (to get the same output)
xxx
xxx

## How to use basic string methods

![String](images/basic_string_methods.png)

In [None]:
s = "hello, I came here for an argument"
s.title() # copy of s with title case

In [None]:
s.lower()  # copy of s in all lowercase characters

In [None]:
s.upper()  # copy of s with all characters converted to uppercase

In [None]:
# How to check if a string contains all digits
entry = "12345"
entry.isdigit()

In [None]:
# How to check if a string starts with a substring
title = "The Meaning of Life"
title.startswith("The")

In [None]:
myemail = "ali.gator@ufl.edu"
myemail.endswith(".com")

In [None]:
# How to strip whitespace from a string
ssn = "   392 55 7722  "
ssn.strip()

In [None]:
# How to align strings by using justification
print("Hammer".ljust(14), "$9.99".rjust(10))
print("Nails".ljust(14), "$14.50".rjust(10))

## How to find, remove, and replace parts of a string

**Four more methods of a string:**

![String](images/four_string_methods.png)

In [None]:
s = "hello, I came here for an argument"
s.find(',') # find the first position where sub occurs in s

In [None]:
s.replace("I", "you")  # replace all occurrences of oldsub in s with newsub

In [None]:
email0 = "joel@murach.com"
# How to remove a substring from the start of a string
email0.removeprefix("joel")

In [None]:
def get_full_name():
    while True:
        name = input("Enter full name:       ").strip()
        if " " in name:
            return name
        else:
            print("You must enter your full name.")

In [None]:
# call get_full_name() function:
get_full_name()

<font color='red'>Complete the codes in the cell below.

In [None]:
## define get_email_address() function
## to be valid, the address must contain an @ sign and end with ".com"
XXX



In [None]:
# call get_email_address() function


Get a valid phone number:

In [None]:
phone = "(555) 333-333"
phone = phone.replace(" ", "")
phone

In [None]:
phone = phone.replace("-", "")
phone

In [None]:
phone_number = "(555) 333-333"
for char in " -().":
  phone_number = phone_number.replace(char, "")

print(phone_number)

In [None]:
phone_number.isdigit()

In [None]:
len(phone_number)

<font color='red'>Complete the codes in the cell below.

In [None]:
## define get_phone_number() function
# to be valid: remove all spaces, dashes, parentheses, and periods from the number
# then check to make sure the number consists of 10 characters that are digits
# return a valid phone number (e.g., 5556667777)
def get_phone_number():
  XXX


  # check to make sure the number consists of 10 characters that are digits

  if XXX:


## How to split a string into a list of strings

How to split a string into a list of strings:

The **split()** method of a string

![String](images/split_method.png)

* **split** method: this method splits a string into a list of substrings. By default, it will **split the string wherever a space occurs**.
* Split can be used to split a string at places other than spaces **by supplying the character to split** on as a **delimiter**. For example, if we have a string of numbers separated by commas, we could split on the commas:

In [None]:
# How to split a date on a delimiter
date = "11/9/1972"
date = date.split("/")
month = int(date[0])     # 11

In [None]:
ystring = "You only live once"
len(ystring)

In [None]:
ystring.split()

In [None]:
ylist = ystring.split()
ylist[0][0]

In [None]:
len(ylist)

In [None]:
len(ylist[0])

In [None]:
# ylist = "Fear of missing out".split()
avg_len = (len(ylist[0]) + len(ylist[1]) + len(ylist[2]) + len(ylist[3]))/len(ylist)
avg_len

In [None]:
acronym = ylist[0][0] +ylist[1][0] + ylist[2][0] +ylist[3][0]
acronym.upper()

<font color='red'>Complete the codes in the cell below.

In [None]:
# use a for loop to get an acronym for the user-enterred phrase
# example: you only live once--YOLO
# example: Fear of missing out --FOMO
phrase = input("Enter a phrase:")
XXX

for XXX in XXX:
  XXX


<font color='red'>Complete the codes in the cell below.

In [None]:
# use a for loop to get the average word length of the user-enterred phrase
# example: you only live once--3.75
# example: Fear of missing out --4.0
phrase = input("Enter a phrase:")
XXX

for XXX in XXX:
  XXX



## How to join strings

How to join strings
* with the + and += operators
* with an f-string
* with join() method

The **join()** method of a string:

![String](images/join_method.png)

In [None]:
number = "5558886666"

# With the + operator
new_number = number[0:3] + "." + number[3:6] + "." + number[6:]
print(new_number)

In [None]:
# With the += operator
number2 = number[0:3]
number2 += ". "
number2 += number[3:]

print(number2)

In [None]:
# With an f-string
number3 = f"{number[0:3]}, {number[3:]}"
number3

In [None]:
# the join() method of a string
" ".join(["Number", "one,", "the", "Larch"])  #concatenate list into a string

In [None]:
"|".join(["Number", "one,", "the", "Larch"]) # #concatenate list into a string, using SS as separator

<font color='red'>Complete the codes in the cell below.

**Exercise: Word Counter program**

In [None]:
lyrics = '''I'm in the room, it's a typical Tuesday night.
I'm listening to the kind of music she doesn't like.
And she'll never know your story like I do.
But she wears short skirts.
I wear T-shirts.'''

In [None]:
def get_words(text):
    text = text.replace("\n", "")
    text = text.replace(",", "")
    text = text.replace(".", " ")
    text = text.lower()
    text = text.strip()

    words = text.split(" ")   # convert str to list
    words.sort()
    return words

In [None]:
len(get_words(lyrics))

In [None]:
def get_unique_words(words):
    unique_words = []
    unique_words.append(words[0])

    for i in range(1, len(words)):
        if words[i] == words[i - 1]:
            continue
        else:
            unique_words.append(words[i])
    return unique_words

In [None]:
words = get_words(lyrics)
unique_words = get_unique_words(words)

# display number of words and unique words
print(f"Number of words = {len(words)}")
print(f"Number of unique words = {len(unique_words)}")

In [None]:
# display unique words and their word counts
print("Unique word occurrences:")
for word in unique_words:
  print(f"    {word} = {words.count(word)}")

<font color='red'>Complete the codes in the cell below.

In [None]:
# Please modify to show only first 5 unique word occurrences
print("First 5 Unique word occurrences:")
for i in range(5):
  XXX

First 5 Unique word occurrences:
    a = 1
    and = 1
    but = 1
    do = 1
    doesn't = 1


In [None]:
# Please modify to show only first 5 unique word occurrences
print("First 5 Unique word occurrences:")
sub_words = unique_words[0:5]   # using slicing
for word in sub_words:
    print (word)

In [None]:
## define get_sentences (text) function to get the # of sentences
## You can assume all the sentences end with a period (.)
def get_sentences (text):
    sentence = text.split(".")

    return sentence


In [None]:
sentences = get_sentences(lyrics)

for sen in sentences:
  print(sen)
  print("========")

In [None]:
print(f"Number of sentences = {len(sentences)-1} ")

In [None]:
len(get_sentences(lyrics))

**Another Example (Lab)**:

In [None]:
def to_pig_latin(word):
  ch = word[0]
  if (ch == 'a' or
    ch == 'e' or
    ch == 'i' or
    ch == 'o' or
    ch == 'u'):
    word += "way"
  else:
    if ch == 'y':
      word = word[1:]
      word += ch
      ch = word[0]
    while (ch != 'a' and
          ch != 'e' and
          ch != 'i' and
          ch != 'o' and
          ch != 'u' and
          ch != 'y'):
       word = word[1:]
       word += ch
       ch = word[0]
    word += "ay"
  return word


In [None]:
latin1 = to_pig_latin('Break')
print(latin1)

## How to use format specifications with f-strings

The syntax of an f-string with a format specification

```
"f{value:format_specification}"
```
The syntax for the format specification:
```
[field_width][comma][.decimal_places][type_code]
```

**Common type codes**

![](http://drive.google.com/uc?export=view&id=1E-4z0F5O3vNM3XpjoAhBxOJGfT5o9vGw)

F-strings with format specifications:

In [None]:
fp_number = 12345.6789
print(f"{fp_number:.2f}")        # 12345.68
print(f"{fp_number:,.2f}")       # 12,345.68
print(f"{fp_number:15,.2f}")     #       12,345.68
print(f"{fp_number:.2e}")        # 1.23e+04

In [None]:
fp_number = .12345
print(f"{fp_number:.0%}")        # 12%
print(f"{fp_number:.1%}")        # 12.3%

In [None]:
int_number = 12345
print(f"{int_number:d}")         # 12345
print(f"{int_number:,d}")        # 12,345

How to format a string literal:

In [None]:
# enclose the literal in single quotes
print(f"{'Description':15}{'Price'}")

In [None]:
descrip = 'Description'
print(f"{descrip}{'Price':30}")

How to use a variable in a format specification:

In [None]:
spec = 15
# enclose the variable in brackets
print(f"{'Description':{spec}}{'Price'}")

How to use field widths to align results

In [None]:
print(f"{'Description':15} {'Price':>10} {'Qty':>5}")
print(f"{'Hammer':15} {9.99:10.2f} {3:5d}")            # numbers are right-aligned by default
print(f"{'Hammer':15} {9.99:>10.2f} {3:>5d}")
print()
print(f"{'Description':15} {'Price':10} {'Qty':5}")    # strings are left-aligned by default
print(f"{'Description':15} {'Price':<10} {'Qty':<5}")
print(f"{'Hammer':15} {9.99:<10.2f} {3:<5d}")

* You can use format specifications to format the values in the braces of an f-string.
* When you use a field-width specification, numbers are right aligned and strings are left aligned by default. However, you can use the <end> symbols to override that.