# Chp-4: Strings
- Learning Objectives
    - ..
    - ..

## Create Strings
Strings are ordered sequences of characters.
- There are several ways to create strings:
    - Single quotes:  `'text'`
    - Double quotes:  `"text"`
    - Triple single quotes: `'''text'''`
    - Triple double quotes: `"""text"""`
- A space is also a character that can be in a string.
- Triple single and double quotes are used to create strings spanning multiple lines.

- If a single quote is a character in the string, using single quotes to create the string is an error. This is because the character single quote will end the creation of the string. It behaves like the second single quote, instead of being a character in the string.
    - You can use double quotes to create the string to overcome the confusion.
    - Alternatively, you can use escaping characters.

In [17]:
# Single quotes 
name = 'Mary'

``` python
# ERROR: Double single quotes cannot be used to create strings.
name = ''Mary''
print(name)
```

In [20]:
# Double quotes 
name = "Mary"

In [21]:
# Triple single quotes
name = '''Mary'''

In [22]:
# Triple double quotes
name = """Mary"""

``` python
# ERROR
'Mary's book.'
```

- In the code above, the second single quote ends the creation of the string. 
- The `s book.'` part causes the error because it has only one single quote.
-  If you use double quotes, then there will not be any problem.

In [12]:
# no ERROR
sentence = "Mary's book."

``` python
# ERROR
"Mary said "I am here"."
```

- In the code above, the second double quote ends the creation of the string. 
- The `I am here` part causes the error because it is not enclosed by single or double quotes.
- If you use single quotes, then there will not be any problem.

In [13]:
# no ERROR
sentence = 'Mary said "I am here".'

``` python
# ERROR: Simple quotes cannot be used with strings spanning more than one line.
name = 'John
     Steinbeck'
print(x)
```

``` python
# ERROR: Double quotes cannot be used with strings spanning more than one line.
name = "John
     Steinbeck"
```

In [19]:
# no ERROR
name = '''John
     Steinbeck'''

In [14]:
# no ERROR
name = """John
     Steinbeck"""

## Escaping characters

An escape character is a backslash `\` followed by a character. It is used to write special characters in a string.

  - `\'` : Single quote (apostrophy)
  - `\"` : Double quote
  - `\n` : New line
  - `\t` : Tabulation
  - `\b` : Backspace
  - `\\` : Backslash
  - `\r` : Carriage return

In [16]:
# \' is the character '
print('Mary\'s book.')

Mary's book.


In [15]:
# \" is the character "
print("Mary said \"I am here\".")

Mary said "I am here".


In [23]:
# \n is a new line
print('John\nSteinbeck')

John
Steinbeck


In [25]:
# \t is a tab 
# Inserts spaces up to the next tab stop, which occurs every 8th character.
print('John\tSteinbeck')
print('John'+' '*4+'Steinbeck')

John	Steinbeck
John    Steinbeck


In [27]:
print('Mark\tTwain')
print('Mark'+' '*4+'Twain')

Mark	Twain
Mark    Twain


- \b (backspace) moves one character back,  deleting the preceding character.


In [41]:
print('John\b Steinbeck')

Joh Steinbeck


- \b\b (two backspaces) moves two characters back,  deleting the preceding two characters.


In [29]:
print('John\b\b Steinbeck')

Jo Steinbeck


In [30]:
# \\ is the \ (backslash) character
print('John\\ Steinbeck')

John\ Steinbeck


## Raw Strings
- Each character in a raw string has no special meaning; they are just characters.
- It is created using `r` in front of the string: `r'text'`
- In the example below, `\t,\n,\',\"` have no special meanings; they are just characters `\, t, n,',"`

In [226]:
# \t means two characters, \ and t. 
# \t does not mean a tab in a raw string

text = r'Good \t bye.'
print(text)

Good \t bye.


In [37]:
text = r'Hello.\t my name\n is Tom. I am\' from\" England.'
print(text)

Hello.\t my name\n is Tom. I am\' from\" England.


## f-strings
- It is a great way to combine constants and variable values.
- It is in the form of: `f'text {variable} text'`
- Variables are enclosed in curly brackets `{variable}` called placeholders.
- An f-string generates a new string.
- You can also perform rounding or algebraic operations within curly brackets.
- It is much easier to use f-strings than comma-separated values in a print() function.

In [45]:
name = 'Tom'
country = 'Spain'
age = 25
weight = 173.6294

- You can compare the next two cells below to see the advantage of using f-strings.

In [39]:
# using an f-string
print(f'My name is {name}.')

My name is Tom.


In [40]:
# longer way
print('My name is ', name, '.', sep='')

My name is Tom.


In [43]:
# using an f-string

text = f'My name is {name}.'   # text is a string
print(text)

My name is Tom.


In [44]:
# Multiple placholders
print(f'My name is {name} and I am from {country}. I am {age} years old.')

My name is Tom and I am from Spain. I am 25 years old.


In [42]:
# algebraic operation inside curly brackets
print(f'I will be {age+1} years old next year.')

I will be 26 years old next year.


In [54]:
# rounding by using the round() function

print(f'My weight is {weight}.')
print(f'My rounded weight is {round(weight,2)}.')

My weight is 173.6294.
My rounded weight is 173.63.


In [53]:
# rounding by a different way

print(f'My weight is {weight}.')
print(f'My rounded weight is {weight:.2f}.') # 2 means second decimal place (hundredth), f means float

My weight is 173.6294.
My rounded weight is 173.63.


## Unicode Characters
- These are symbols, accented letters, non-Latin characters, and emojis—kind of different characters.
- You can find the list of Unicode characters on the official website of [Unicodes](https://home.unicode.org/).
- Each Unicode character has a code that is unique to it.
    - If the code has four characters, use
        - `\uXXXX` where XXXX is the code.
    - If a code has four characters or more, pad it with 0 from the left to make the length of the code eight and use:
        - `\Uxxxxxxxx` where xxxxxxxx is the 0-padded code.

In [58]:
# unicode code is 1F639, you need to add 3 zeros to the left
print('\U0001F639')

😹


In [59]:
# unicode code is 1F602, you need to add 3 zeros to the left
print('\U0001F602')

😂


In [63]:
# unicode code is 2764
print('\u2764')

❤


In [65]:
# unicode code is 2764, you need to add 4 zeros to the left
print('\U00002764')

❤


##  Operations on strings
###  Concatenation
- The `+` operator is used to concatenate two strings.
- String + String: combines two strings.

In [33]:
x = 'John'
y = 'Steinbeck'

In [34]:
# concatenation of x and y

name = x+y
print(name)

JohnSteinbeck


In [35]:
# add a space between x and y
# concatenation of x, space, and y

name = x+' '+y
print(name)

John Steinbeck


### Repetition
- The `*` operator is used to repeat a string a certain number of times.
- `String * Integer` or `Integer * String` makes copies of the string Integer many times.
- Floats cannot be used for repetitions.

In [36]:
# four copies of x

fourtoms = x*4
print(fourtoms)

JohnJohnJohnJohn


``` python
# ERROR: float * str
# floats can not be sed for repetition
4.3*'Hi'
```

In [82]:
# Triangle with the '$' character using repetitions.
print('$')
print('$'*2)
print('$'*3)
print('$'*4)
print('$'*5)
print('$'*6)
print('$'*7)

$
$$
$$$
$$$$
$$$$$
$$$$$$
$$$$$$$


In [227]:
# Triangle with the '$' and ' ' (space) characters using repetitions."
print(' '*6+'$')
print(' '*5+'$'*2)
print(' '*4+'$'*3)
print(' '*3+'$'*4)
print(' '*2+'$'*5)
print(' '*1+'$'*6)
print(' '*0+'$'*7)    # you do not have to include the space part because it adds no space.

      $
     $$
    $$$
   $$$$
  $$$$$
 $$$$$$
$$$$$$$


## Length Function
- The built-in `len()` function returns the number of characters in a string.

In [72]:
# there are five characters in hello
print(len('hello'))

5


In [73]:
# there are six characters in 'hel lo'
# space is a character
print(len('hel lo'))

6


In [74]:
# there are six characters in 'hel\nlo
# \n is a new line character (single character)
print(len('hel\nlo'))

6


## String Indexing

- Indexing is used to access individual characters or sets of characters.
- Indexing starts with zero. 
- The index of the first character is 0.
- The index of the second character is 1, and so on.
- The index is written in square brackets: string[index].
- Negative numbers can also be used for indexing.
- The index of the last character is -1.
- The index of the second character from the end is -2, and so on.

![](pict/cal_index.png)
- index of `C` is 0 or -10
- index of `first A` is 1 or -9
- index of `second A` is 9 or -1
- **Warning:** There is a character with index -10 (the second 'A'), but there is no character with index positive 10 because indexing starts with 0, and it does not reach 10, which is the length of the string. 
    - For any string, there is no character with an index equal to the length of the string.

In [91]:
state = 'CALIFORNIA'

In [101]:
# access the character at index 0 by using square brackets."
print(state[0])

C


In [100]:
# Access the character at index 6 by using square brackets."
print(state[6])

R


In [99]:
# Access the character at index -1 by using square brackets."
print(state[-1])

A


In [228]:
# Access the character at index -3 by using square brackets."
print(state[-3])

N


- There is an error in the following code because there is no character with index 10.

``` python
# ERROR: out of range
state[10]
```
- The length of state is 10, and there is an error in the following code. This applies to all strings

``` python
# ERROR: out of range
state[len(state)]
```

In [229]:
# There is no error in the following code because len(state) - 1 = 9 is the index of the last character
print(state[len(state)-1])

A


## String Slices

- You can access more than one character of a string by using index numbers.
- It is in the form of `string[start: end]` with inclusive start and exclusive end.
- Use `:` (colon) inside square brackets between the start and end indexes.
- It consists of characters starting with index start up to the character with index end-1.
- The character with index end is not included.
- For example, `string[2:5]` returns characters with indexes 2, 3, 4 (5 is not included).
- For example, string[-4:-1] returns characters with indexes -4, -3, -2 (-1 is not included).
- It returns a substring.


![](pict/cal_index2.png)

In [102]:
state = 'CALIFORNIA'

In [104]:
print(state[2:5])  # index=2,3,4

LIF


In [105]:
print(state[-4:-1])  # index=-4,-3,-2

RNI


- `string[:end]`: the default value of start is 0, which means it starts from the very beginning.
    - For example, `string[:5]` returns characters with indexes 0, 1, 2, 3, 4 (5 is not included).
- `string[start:]`: the default value of end is the length, which means it goes all the way to the end.
    - For example, `string[2:]` returns characters with indexes 2, 3, 4, 5, 6, 7, 8, 9 (all characters starting from index 2).
- `string[:]`:  starting from the very beginning and going all the way to the end, representing the whole string.

In [110]:
print(state[2:])  # index = 2,3,4,5,6,7,8,9

LIFORNIA


In [111]:
print(state[:5])  # index = 0,1,2,3,4

CALIF


In [112]:
print(state[:])  # index = all of them = 0,1,2,...,9

CALIFORNIA


- Slicing can also be done by taking steps in the form of: `string[start: end: step]`.
- `string[start: end: step]` means starting with the character at index = start up to the character at index = length - 1, as before, but not necessarily including all characters between them.
- The first index is start, the second index is start + step, and the third index is start + 2 * step.
- It continues in this way, but the largest index can be at most length - 1.
- `step` can also be considered as an increment, but it can also be a negative number.
- The default value of step is 1.

In [113]:
print(state)

CALIFORNIA


In [116]:
print(state[2:7:2])  # index = 2,2+2=4, 4+2=6

LFR


In [117]:
print(state[1:8:3])  # index = 1,1+3=4, 4+3=7

AFN


In [118]:
print(state[7:2:-2])  # index = 7,7+(-2)=5,5+(-2)=3 

NOI


In [120]:
print(state[-8:-2:3])  # index = -8, -8+3=-5

LO


In [125]:
# for negative step default value of start is 9 (-1)
# for negative step default value of end   is 0 (-10)
print(state[::-1])  # index = 9,8,...,0  

AINROFILAC


In [124]:
print(state[-3::-1])  # index = -3,-4,...,-10

NROFILAC


In [127]:
print(state[:-4:-1])  # index = 9,8,7 or -1,-2,-3

AIN


## String module
- It contains constants and functions to process strings, as well as some constants.
- Use `help(string)` for more explanations.

In [37]:
# constants and functions

import string
print(dir(string))

['Formatter', 'Template', '_ChainMap', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_re', '_sentinel_dict', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']


In [133]:
# lowercase letters
print(string.ascii_lowercase)

abcdefghijklmnopqrstuvwxyz


In [134]:
# lowercase and uppercase letters
print(string.ascii_letters)

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ


In [135]:
# digits
print(string.digits)

0123456789


In [139]:
# punctuations
print(string.punctuation)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


## Immutable
- Strings are immutable, which means they cannot be modified.
- For example, if you try to change the first character of 'CALIFORNIA', you will get an error message.

``` python
# ERROR:  try to change the first character, which has an index of 0.
state = 'CALIFORNIA'
state[0] = 'R'  
```
- You can use the state variable to produce a new string without changing the original one.
- In the following code:
    - The value of the variable *new_state* is the concatenation of the string 'R' and a slice of state starting from the character at index 1 and continuing to the end.

In [38]:
state = 'CALIFORNIA'
new_state = 'R' + state[1:]   

print(new_state)

RALIFORNIA


- In the following code, a new value is assigned to the variable *state*.

In [230]:
# 'CALIFORNIA' is not modified.

state = 'CALIFORNIA'
state = 'R' + state[1:]   

print(state)

RALIFORNIA


## in  and not in 
- These operators are used to check if a character or slice is present in a string.
- They return a boolean value: True or False.
- Python is case-sensitive.

In [145]:
print( 'a' in 'FLORIDA' )  # 'a' is not in FLORIDA

False


In [146]:
print( 'a' not in 'FLORIDA' )  # 'a' is not in FLORIDA

True


In [145]:
print( 'A' in 'FLORIDA' )  # 'A' is in FLORIDA

False


In [147]:
print( 'A' not in 'FLORIDA' )  # 'a' is not in FLORIDA

False


## String Methods
- String methods do not modify the original string because strings are immutable.
- String methods return a new value.
- If you run `dir(str)`, you will see that there are many methods because there can be so many things that can be done with strings.
- We will cover some of them here, but you can check `help(str)` for more details.

### capitalize()
- Produces a duplicate of the string where only the initial character is in uppercase, while all other characters are converted to lowercase.

In [153]:
text = 'tOm aNd jerRy.'
print(text.capitalize())   # 't' is capitalized, a new string is produced
print(text)                # no change on text (immutable)

Tom and jerry.
tOm aNd jerRy.


### upper()
- Produces a duplicate of the string where all characters are converted to uppercase.


In [154]:
text = 'tOm aNd jerRy.'
print(text.upper())        # all characters are in uppercase
print(text)                # no change on text (immutable)

TOM AND JERRY.
tOm aNd jerRy.


### lower()
- Produces a duplicate of the string where all characters are converted to lowercase.


In [155]:
text = 'tOm aNd jerRy.'
print(text.lower())        # all characters are in lowercase
print(text)                # no change on text (immutable)

tom and jerry.
tOm aNd jerRy.


### title()
- Produces a duplicate of the string where all words are capitalized.

In [181]:
text = 'tOm aNd jerRy.'
print(text.title())        # all words are capitalized
print(text)                # no change on text (immutable)

Tom And Jerry.
tOm aNd jerRy.


### find()
- It provides the earliest occurrence of a given substring within a string.
- It returns the lowest index.
- If the substring is not present, it returns -1.
- Additionally, you have the option to begin the search from a specific character to find the index of the given substring.
    - `find('a', N)`: find index of first 'a'  starting from index=N
    - default value of N is 0

In [234]:
state = 'CALIFORNIA'
print(state.find('L')) # index of 'L' is 2

2


In [160]:
# -1 means 'W' does not exist, -1 does not represent an index
print(state.find('W')) 

-1


In [162]:
print(state.find('A')) # index of first 'A'

1


In [163]:
# The index of the first occurrence of 'A' starting from the character at index 3.
print(state.find('A', 3)) 

9


In [174]:
# The string 'FOR' begins from the character at index 4.
print(state.find('FOR')) 

4


In [176]:
print(state.find('WE')) # 'WE' does not exist in CALIFORNIA

-1


### rfind()
- It returns the maximum index in a string where the substring is located.

In [165]:
print(state.find('A'))   # index of first 'A'
print(state.rfind('A'))  # index of last 'A'

1
9


### strip(), rstrip(), lstrip()
- `strip()`: Removes white spaces from the beginning and end of a string.
- `rstrip()`: Removes white spaces from the end of a string.
- `lstrip()`: Removes white spaces from the beginning of a string.

In [173]:
country = '  FLORIDA   '
print(country)          
print('---'+country.strip()+'---')      # white spaces on the left and right are removed
print('---'+country.rstrip()+'---')     # white spaces on the right are removed
print('---'+country.lstrip()+'---')     # white spaces on the left  are removed

  FLORIDA   
---FLORIDA---
---  FLORIDA---
---FLORIDA   ---


### startswith()
- It returns True if the string starts with the specified prefix; otherwise, it returns False.

In [179]:
print(state.startswith('H')) # 'CALIFORNIA' does not startswith 'H'

False


In [233]:
print(state.startswith('C')) # 'CALIFORNIA' does startswith 'C'

True


### count()
- Returns the number of non-overlapping occurrences of a substring within a string

In [182]:
print(state.count('A'))   # number of 'A's in 'CALIFORNIA'

2


In [183]:
print(state.count('I'))   # number of 'I's in CALIFORNIA'

2


In [184]:
print(state.count('C'))   # number of 'C's in 'CALIFORNIA'

1


In [185]:
print(state.count('W'))   # number of 'C's in 'CALIFORNIA'

0


### isdigit()
- Returns True if the string consists of digits, False otherwise.

In [198]:
print('hello'.isdigit())         # not all characters are digits

False


In [199]:
print('123456'.isdigit())        # all characters are digits

True


In [200]:
print('h1234'.isdigit())         # not all characters are digits

False


### isalpha()
- Returns True if the string consists of alphabetic characters, False otherwise.

In [201]:
print('hello'.isalpha())         # all characters are alphabetic

True


In [202]:
print('123456'.isalpha())        # not all characters are alphabetic

False


In [203]:
print('h1234'.isalpha())         # not all characters are alphabetic

False


### replace()
- Returns a duplicate with all occurrences of the old substring replaced by the new one.
- It is in the form of `replace(old, new)`


In [205]:
print(state)
print(state.replace('A', 'W')) # replace 'A' by 'W'

CALIFORNIA
CWLIFORNIW


In [206]:
print(state)
print(state.replace('T', 'W')) # no 'T' to replace by 'W'

CALIFORNIA
CALIFORNIA


In [236]:
print(state)
print(state.replace('LI', '***')) # replace 'LI' by '***'

CALIFORNIA
CA***FORNIA


### swapcase()
- Transform uppercase characters to lowercase and lowercase characters to uppercase.

In [237]:
name = 'aRThUr'
print(name)
print(name.swapcase()) # 'a' becomes 'A', 'R' becomes 'r', and so on

aRThUr
ArtHuR


### join()
- Concatenate a list of strings.
- Insert the string, whose method is called, between each given string.
- Return the result as a new string.
- Example: `'--'.join(['ab', 'pq', 'rs'])` returns `'ab--pq--rs'`
  

In [211]:
print('--'.join(['ab', 'pq', 'rs']))

ab--pq--rs


## Parsing Strings

- By using string methods, you can analyze a string and extract meaningful information about the string.
- You can also perform specific operations based on the structure and content of the string.
- Example:
    - From the given message below, extract the company name and capitalize it.
    - The company name is between the characters `@` and `.`
    - e can use the `find()` method to find the indexes of these two characters.
    - There are multiple `.` characters, so we need to find the first one that comes after `@`."


In [215]:
message = 'Hello. My name is Tom. I live in California. My email address is tom@tesla.com. I will be in NY next week.'

In [214]:
index_at = message.find('@')                    # index of @
index_period = message.find('.', index_at)      # index of first . after @ 

Tesla


- To grab the company name, we need to use slicing.
- Slicing must start from `index_at + 1` because if you start from index_at, the slice will include @.
- Slicing must end at `index_period` because the end index is not included.

In [216]:
print(message[index_at+1:index_period].capitalize())

Tesla
