# Strings

Concepts: Python provides several ways to access the individual characters in a
string. Strings also have methods that allow you to perform operations
on them.

In [10]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Outline:

- A string is a sequence 
- Strings are Immutable
- Indexing Strings
- Traversing Strings
- Slicing Strings
- String Methods
- Parsing Strings

## Accessing individual characters in a String

There are two ways of doing that: 

- using the `for` loop
    ```
    for variable in string:
        statement
        statement
        etc.
    ```
    1. `variable` is the name of a variable 
    2. `string` is either a string literal or a variable that references a string
    3. Each time the loop iterates, `variable` will reference a copy of a character in `string`, beginning with the first character
    4. See example 11.1 for the demonstration
    <br>
    <br>
- indexing
    1. Each character in a string has an index that specifies its position in the string. 
    2. Indexing starts at 0; the index of the last character in a string is 1 less than the number of characters.
    3. See example 11.2 for the demonstration

<img src="https://developers.google.com/edu/python/images/hello.png">
<font size=-2>
<p  style="text-align:right">&copy; Google</p>
</font>

In [60]:
# Example 11.1
# This program counts the number of times 
# the letter T (uppercase or lowercase)
# appears in a string

def main():
    # create a variable to use to hold the count
    # the variable must start with 0
    count = 0
    
    # get a string from the user 
    my_string = input('Enter a sentence: ')
    
    # count the Ts
    for ch in my_string:
        if ch == 'T' or ch == 't':
            count += 1
            
    # print the result 
    print(f'The letter T appears {count} times.')
    
# call the main function
if __name__ == '__main__':
    main()

Enter a sentence: Today is Tuesday.
The letter T appears 2 times.


In [65]:
# example 11.2
s = 'he8lo'

# assign second character to variable si
si=s[1]
print(si)

# the last character in string
print(s[-1])

# the second last character
print(s[-2])

# the 3rd character in the string
print(s[2])
print(s[-3])

e
o
l
8
8


### `IndexError` exceptions
An IndexError exception will occur if you try to use an index that is out of range for particular string. For example, the string 'Boston' has 6 characters and the valid indexes are 0 through 5. 

In [69]:
string = 'hello'
string[5] # index error

IndexError: string index out of range

In [70]:
# length of the string
len(string) 

5

In [8]:
city = 'Boston'
index = 0
while index < 7:
    print(city[index])
    index += 1

B
o
s
t
o
n


IndexError: string index out of range

In [9]:
len(city)

6

## String Concatenation

A common operation that performed on strings is concatenation, or appending one string to the end of another string.

- The + operator produces a string that is the combination of the two strings used as its operands.
- You can also use += operator to perform concatenation
    - Keep in mind that the operand on the left side of the += operator must be an existing variable.
If you specify a nonexistent variable, an exception is raised.

In [73]:
message = 'Hello ' + 'World'
print(message)

Hello World


In [72]:
message = 'Hello' + ' ' + 'World'
print(message)

Hello World


In [78]:
print("Hello", "World")

Hello World


## Strings are immutable 

In Python, strings are immutable, which means once they are created, they cannot be
changed. Some operations, such as concatenation, give the impression that they modify
strings, but in reality they do not.

In [79]:
# this program concatenates strings
def main():
    name = 'Carmen'
    print(f'The name is: {name}')

    name = name + ' Brown'
    # Carmen is not modified, 
    # Instead, a new string containing 'Carmen Brown' is created
    # and assigned to the name variable
            
    print(f'Now the name is: {name}')
    
# call the main function
if __name__ == '__main__':
    main()

The name is: Carmen
Now the name is: Carmen Brown


In [85]:
### name = 'Tim'
print(name)
name[0] = 'J' # => wouldnt work cause strings are immutable

Jimothy


TypeError: 'str' object does not support item assignment

In [89]:
# correct way of updating the string 
name = 'Tim'
print(name)

# this is not ok: name[0] = 'J'

name = 'Jim'
print(name)

Tim


TypeError: 'str' object does not support item assignment

In [94]:
print('Welcome', name, '!')

Welcome Tim !


In [91]:
print('Welcome ' +  name + '!')

Welcome Tim!


## Traversing Strings
Traversing simply involves examining the string to process one character at a time. Typically, we start from the beginning, select each character, process it, and then repeat this process until the desired end point. <br>

In [98]:
# assign the string to variable coffee
coffee = input('Enter a word in your mind: ')

# type() tell you the data type of the variable
print(f'The data type of the variable is {type(coffee)}.')

num = 0 # starting index number

# use a while loop to access individual character in the string
while num < len(coffee): # to stop when we have traversed the entire string
    letter = coffee[num] 
    print(num, letter)
    num += 1

Enter a word in your mind: computer
The data type of the variable is <class 'str'>
0 c
1 o
2 m
3 p
4 u
5 t
6 e
7 r


**Exercise:** <br>
1. Traverse the String and print the individual characters in a string.<br>
2. Ask the user to enter a number and then print the total of the individual digits.  If the user enters 1234, the program would print 10 (i.e. the sum of 1+2+3+4)<br>

In [100]:
coffee = input('Enter a string: ')
for letter in coffee:
    print(letter)

Enter a string: starbucks
s
t
a
r
b
u
c
k
s


In [101]:
# user input is stored as string by default
number = input('Enter a number: ')

# variable for calculating the sum
total = 0

for i in number:
    # examine the for loop running
    print(int(i))
    # sum of the digits
    total += int(i)

print(f'The sum of each digit in {number} is {total}.')

Enter a number: 20220412
2
0
2
2
0
4
1
2
The sum of each digit in 20220412 is 13.


In [None]:
# Write a program 

## String Slicing

Concept: You can use slicing expressions to select a range of characters from a string.

We've learned that slice is a span of items that are taken from a sequence. When you take a slice from a string, you get a span of characters from within the string. String slices are also called substrings.

- To get a slice of a string: `string[start : end : step_value]`.
- `end` will not include the character itself.
- If you leave out `start` index in a slicing expression, Python uses 0 as the starting index.
- You can also use negative numbers as indexes in slicing expressions to reference positions relative to the end of the string

<img src="http://www.nltk.org/images/string-slicing.png" width="500" height="400">
<font size=-2>
<p  style="text-align:right">&copy; nltk</p>
</font>

In [102]:
full_name = 'Patty Lynn Smith'
middle_name = full_name[6:10]
middle_name

'Lynn'

In [105]:
first_name = full_name[0:5]
first_name

first_name = full_name[:5]
first_name

'Patty'

'Patty'

In [106]:
# what do you think the code will genereate 
my_string = full_name[:]
my_string

'Patty Lynn Smith'

In [107]:
my_string = full_name[0 : len(full_name)]
my_string

'Patty Lynn Smith'

In [109]:
# String Slicing - sequence of characters taken from the string
# Format string[start:end:step]; please note end is upto but not including...
# workaround for invalid indexes; see book chaper for more details

# s[0:5]    s[0:5:2]  s[0:]  s[:5]  s[-3:-1]
# s[:2]   +  s[2:] = ?

mp='monty python'
sliced=mp[-12:-7]
print (sliced)

monty


In [110]:
phrase = 'hello, welcome'
phrase[:5] # same as phrase[0:5]

'hello'

In [112]:
phrase[7:14] # to get 'welcome'
phrase[7:]

'welcome'

'welcome'

In [113]:
name = "programming"
print(name[4:])
print(name[:4])

ramming
prog


In [115]:
s = 'hello'
for n in range(5): # 0,1,2,3,4
    sliced=s[:n]+s[n:]
    print (sliced)

hello
hello
hello
hello
hello


Slicing expressions can also have step value, which can cause characters to be skipped in the string. 

In [56]:
numbers = '123456789'

# print the odd digits
print(numbers[0:9:2])
# print the even digits
print(numbers[1:9:2])

13579
2468


Here we use negative numbers as indexes in slicing expressions to reference positions relative to the end of the string.

In [59]:
full_name = 'Patty Lynn Smith'
last_name = full_name[-5:]
last_name

'Smith'

#### Note
Invalid indexes do not cause slicing expressions to raise an exception.

- If the `end` index specifies a position beyond the end of the string, Python will use the
length of the string instead.

- If the `start` index specifies a position before the beginning of the string, Python will
use 0 instead.

- If the `start` index is greater than the end index, the slicing expression will return
an empty string.

## Testing, Searching, and Manipulating Strings

Concept: Python provides operators and methods for testing strings, searching the
contents of strings, and getting modified copies of strings.

### Testing strings with `in` and `not in`

- In Python, you can use the in operator to determine whether one string is contained in another
string. Ex., `string1 in string2`

- `string1` and `string2` can be either string literals or variables referencing strings. The expression
returns true if string1 is found in string2.

In [6]:
# testing string with 'in' or 'not in' operators

text = 'hello, are you there?'
sliced = 'lo,  '

if (sliced in text):
    print('sub-string found in text')
else:
    print('sub-string not found in text')

sub-string found in text


In [7]:
text = 'Yuxiao teaching CIS2300 in the Spring'
if 'Spring' in text:
    print('The string "Spring" was found.')
else:
    print('The string "Spring" was not found')

The string "Spring" was found.


In [8]:
if 'Yuxiao' not in text:
    print('The string "Yuxiao" was not found.')
else:
    print('The string "Yuxiao" was found')

The string "Yuxiao" was found


### String methods
Method is a function that belongs to an object and performs some operations on that object.  String objects in python have numerous methods to perform specific operations on the string. The general format is
    <font color = 'red'>stringvar.method(argument) </font><br>
- Testing the values of strings
- Performing various modifications 
- Searching for substrings and replacing sequences of characters

See below for a some useful examples.<br>

### Testing Methods

`isalnum()`, `isalpha()`, `isdigit()`,`islower()`, `isupper()`, `isspace()` returns true if condition is satisfied and is at least one character in length.


In [12]:
# isdigit method returns true if the stirng contains only numeric digits
ts = "1234"
ts.isdigit()

ts = "123,"
ts.isdigit()

True

False

In [13]:
# write a function to test if the string contains only digits
def DigiTest(stringvar):
    if ts.isdigit():
        print(f'{ts} contains only digits.')
    else:
        print(f'{ts} contains characters other than digits.')
        
DigiTest(ts)

123, contains characters other than digits.


In [14]:
# isalpha returns true if the string contains only alphabetic letters 
ts = 'YuxiaoLuo'
ts.isalpha()

# isalnum returns true if the string contains only alphabetic letters
ts = 'Yuxiao2022'
ts.isalpha()
ts.isalnum()

True

False

True

In [20]:
# islower returns true if all the alphabetic letters in the string 
# are lowercase, and the string contains at least one alphabetic letter
mp = 'monty python'
test1 = mp.islower()
print(test1, '\n')

# isupper returns true if all the alphabetic letters in the string
# are uppercase, and the string contains at least one alphabetic letter
test2 = mp.isupper()
print(test2, '\n')

mp = 'MONTY PYTHON'
print(mp.isupper(), '\n')

# all letters should satisfy
mp = 'MONTY python'
print(mp.islower())
print(mp.isupper())

True 

False 

True 

False
False


In [24]:
# isspace() returns true if the string contains only whitespace characters 
# and is at least one character in length

# whitespace characters are spaces, newlines (\n)
# and tabs (\t)

tv =' '
print(tv.isspace())

tv = '\t\n'
print(tv.isspace())

True
True


In [27]:
# using string methods to perform an operation on a string
# Summing the digits of number

# input value from user
num = input('Enter a number: ')

# initiate total var to calculate the sum
total = 0 

if num.isdigit():
    for digit in num:
        print(digit, type(digit))
        total += int(digit)
    print ('Total is:', total)
else:
    print('Please enter a valid number')

Enter a number: 21392183921839213
2 <class 'str'>
1 <class 'str'>
3 <class 'str'>
9 <class 'str'>
2 <class 'str'>
1 <class 'str'>
8 <class 'str'>
3 <class 'str'>
9 <class 'str'>
2 <class 'str'>
1 <class 'str'>
8 <class 'str'>
3 <class 'str'>
9 <class 'str'>
2 <class 'str'>
1 <class 'str'>
3 <class 'str'>
Total is: 67


### Modification methods

`lower()` `lstrip()` `rstrip()` `strip()` `upper()`

In [28]:
# lower() returns a copy of the string with all 
# alphabetic letters lowercase

mp = 'monty PythON'
print(mp.lower())

# upper() returns a copy the string with all 
# alphabetic letters uppercases
mp = 'monty PYTHON'
print(mp.upper())

monty python
MONTY PYTHON


In [29]:
# lstrip() returns a copy of the string with all leading whitespace 
# characters removed 
mp = ' monty python'
print(mp)
print(mp.lstrip())

mp = '\tmonty python'
print(mp)
print(mp.lstrip())

 monty python
monty python
	monty python
monty python


In [31]:
# lstrip(char) The char argument is a string containing a character
# returns a copy of the string with all instances of char that appear 
# at the beginning of the string removed 
bp = ' beautifulsoup'
print(bp.lstrip('beau'))
print(bp.lstrip(' beau'))

 beautifulsoup
tifulsoup


In [32]:
txt = ",,,,,ssaaww.....banana"
txt.lstrip(",.asw")

'banana'

In [34]:
# rstrip() returns a copy of the string with all trailing 
# whitespace characters removed 
bp = ' beautiful soup '
bp.rstrip()

bp = ' beautiful soup \t\n'
bp.rstrip()

' beautiful soup'

' beautiful soup'

In [39]:
# rstrip(char) returns a copy of the string with all instances 
# of char that appear at the end of the string removed
bp = 'beautiful soup'
bp.rstrip('sup')

'beautiful so'

In [42]:
txt = "banana,,,,,ssqqqww....."

x = txt.rstrip(",.sqw")

print(x)

banana


In [44]:
# strip() returns a copy of the string with all 
# leading and trailing whitespaces removed 
bp = '  beautiful,,,...soup., '
bp.strip()

# strip(char)
bp = '!!beautiful soup.,'
bp.strip('.,!')

'beautiful,,,...soup.,'

'beautiful soup'

### Searching and replacing
In Python, we can search for substrings, or strings that appear within other
strings.

`replace(old,new)` `endswith(substring)` `find(substring)` `startswith()`

In [47]:
# replace(old, new) returns a copy 
# of the string with all instances of old 
# replaced by new

mp = 'monty python'
newStr = mp.replace('py', 'cy')
print(mp)
print(newStr) 

monty python
monty cython


In [52]:
bp = 'beautiful soup' 
mp = 'monty python'

# endswith(substring) returns true if string ends 
# with substring
print(bp)
print(mp)
bp.endswith('up')  # returns boolean value
mp.endswith('up')

beautiful soup
monty python


True

False

In [53]:
# startswith(substring) returns true if 
# string starts with substring
print(bp)
print(mp)
bp.startswith('beau')  # returns boolean value
mp.startswith('beau')

beautiful soup
monty python


True

False

In [56]:
# find(substring) returns lowest index in the string 
# if substring is found
print(mp)
mp.find('y')
                  
# method returns -1 if substring is not found
mp.find('soup')

monty python


4

-1

In [57]:
# find() method returns -1 if substring is not found
word = 'apple' # 1, 2
word.find('s')

-1

### Splitting a string
Strings in Python have a method named split that returns a list containing the words in
the string.

- By default, the split method uses spaces as separators (that is, it returns a list of the words in
the string that are separated by spaces). 
- You can specify a different separator by passing it as
an argument to the split method.

In [58]:
# split() returns a list containing the words 
# in the string
mp = 'monty python'
mp2 = mp.split()
print(mp2)    # returns a list... more on lists in the next module

['monty', 'python']


In [288]:
mp = 'monty python tv show'
mp2 = mp.split
print(mp2)
type(mp2)

<built-in method split of str object at 0x00000269C56EEEE0>


builtin_function_or_method

In [285]:
# specify a different seperator
mp = 'monty python'
print(mp.split('p'))

['monty ', 'ython']


In [59]:
# example of applying string methods

email = 'Yuxiao.Luo@outlook.com'
# find the position of @
ipos = email.find('@')
print(ipos)

# extract the user name using the identifier @
userid = email[:ipos]
print('User ID:', userid)

ipos2 = email.find('.')
print(ipos2)

# extract the domain name using the identifier @
domain = email[(ipos+1):]
print('Domain:', domain)

# potential errors and correction (userid-dot, if userid-dot rewrite, missing-@)?

10
User ID: Yuxiao.Luo
6
Domain: outlook.com


Exercise: 

Please write a program to ask the user to input an email (for example: nk999@domainname.com) and then separate the userid from the email and print it.  In the example above, the print output would be nk999.

In [61]:
# ask user input for the email address
email = input("Please enter an email address: ")
# specify an identifier
ipos = email.find('@')
# slice the string using the identifier
userid = email[:ipos]
# print out the user id
print(f'The user id is: {userid}')

Please enter an email address: nk999@outlook.com
The user id is: nk999


In [49]:
# Program to print domain name from email. In the above example (nk999@domainname.com), 
#the output should be domainname 

email = 'YuxiaoLuo@outlook.com'
ipos = email.find('@')
domain = email[(ipos+1):]
print(domain)
ipos2 = domain.find('.')
print(ipos2)
domain2 = domain[:ipos2]
print(domain2)

outlook.com
7
outlook


In [93]:
# Input Validation

email = 'nk1000@outlook.com'
ipos = email.find('@')
print(ipos); 
if ipos == -1:
    print('invalid email')
else:
    print('valid')

6
valid


In [290]:
first_name = input('first name: ')
middle_name = input('middle name: ')
last_name = input('last name: ')
print(first_name[0], middle_name[0], last_name[0])

first name: Yuxiao
middle name: None
last name: Luo
Y N L


In [63]:
# quote from IBM Chairman, Thomas Watson, in 1943
quote = "I think there is a world market for maybe five computers."

print("Original quote:")
print(quote)

print("\nIn uppercase:")
print(quote.upper()) # ALL CAPS

print("\nIn lowercase:")
print(quote.lower()) # all lowercases

print("\nAs a title:")
print(quote.title()) # title case

print("\nWith a minor replacement:")
print(quote.replace("five", "millions of")) 

print("\nOriginal quote is still:")
print(quote)


Original quote:
I think there is a world market for maybe five computers.

In uppercase:
I THINK THERE IS A WORLD MARKET FOR MAYBE FIVE COMPUTERS.

In lowercase:
i think there is a world market for maybe five computers.

As a title:
I Think There Is A World Market For Maybe Five Computers.

With a minor replacement:
I think there is a world market for maybe millions of computers.

Original quote is still:
I think there is a world market for maybe five computers.


### Excercises - Strings
1. Initials

Write a program that gets a string containing a person’s first, middle, and last names, and then display their first, middle, and last initials. For example, if the user enters John William Smith the program should display J. W. S.

2. Date printer

Write a program that reads a string from the user containing a date in the form mm/dd/ yyyy. It should print the date in the form March 12, 2014.

3. Sentence Capitalizer

Write a program with a function that accepts a string as an argument and returns a copy of the string with the first character of each sentence capitalized. For instance, if the argument is “hello. my name is Joe. what is your name?” the function should return the string “Hello. My name is Joe. What is your name?” The program should let the user enter a string and then pass it to the function. The modified string should be displayed.

4. Word Separator

Write a program that accepts as input a sentence in which all of the words are run together but the first character of each word is uppercase. Convert the sentence to a string in which the words are separated by spaces and only the first word starts with an uppercase letter. For example the string “StopAndSmellTheRoses.” would be converted to “Stop and smell the roses.”

5. Pig Latin

Write a program that accepts a sentence as input and converts each word to “Pig Latin.” In one version, to convert a word to Pig Latin you remove the first letter and place that letter at the end of the word. Then you append the string “ay” to the word. Here is an example:

English: I SLEPT MOST OF THE NIGHT

Pig Latin: IAY LEPTSAY OSTMAY FOAY HETAY IGHTNAY

6. Caesar Cipher

A "Caesar Cipher is a simple way of encrypting a message by replacing each letter with a letter a certain number of spaces up the alphabet. For example, if shifting the message by 13 an A would become an N, while an S would wrap around to the start of the alphabet to become an F

Write a program that asks the user for a message (a string) and a shift amount (an integer). The values should be passed to a function that accepts the string and the integer as arguments, and returns a string that represents the original message but encrypted, by shifting the letters by the integer.

For example, a string of "BEWARE THE IDES OF MARCH" and an integer of 13 should result in a string of "ORJNER GUR VQRF BS ZNEPU".

## Weekly Assignments & Quizzes

### Weekly Assignment

- Assignment 11: Writing program with strings (Due date: Apr 29th)

### Codelab Quizz

- Strings(1)
    - Length
    
- Strings(2)
    - Indexing
    - Slicing
    - String methods
    - Finding
    - Conversions
    