# Text Sequence Type — str

- **(Main Source)** Python Docs: [Text Sequence Type — str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)
    - [String methods](https://docs.python.org/3/library/stdtypes.html#string-methods)
    - [Text Processing Services](https://docs.python.org/3/library/text.html#textservices)
    - [Common string operations](https://docs.python.org/3/library/string.html)

## Character Encoding

**Character encoding** is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers.

- **ASCII** codes represent text in digital devices
- ASCII has just `128` code points, of which only `95` are printable characters (English-only)
- The set of available punctuation had significant impact on the syntax of computer languages and text markup
- **ANSII** contain further characters from `128` to `255`, which differ based on language
- **Unicode** has over a **million code points**, but the first `128` of these are the same as ASCII
    - Language like Arabic are included in the Unicode code points

Figure below: showing that if we try to save Arabic text (Unicode) using ASCII (a subset of it), we get warning, and we lose our work.

<img src="../assets/save_ascii_arabic.png" title="Save Popup in Notepad" width="400" />

Error reads: "This file contains characters in Unicode format which will be lost if you save this
file as an ANSI encoded text file. To keep the Unicode information, click Cancel below and then select one of the Unicode options from the Encoding drop down list. Continue?"

In [172]:
# ASCII (English-only) characters are represented by numbers between 0 and 127
print(ord("A"), ord("Z"))
print(ord("a"), ord("z"))

65 90
97 122


In [3]:
# Arabic Unicode points are between 1536 and 1791
print(ord("أ"), ord("ب"), ord("ي"), ord('َ'), ord('ُ'))

1571 1576 1610 1614 1615


See [Wikipedia: Arabic script in Unicode](https://en.wikipedia.org/wiki/Arabic_script_in_Unicode) for details.

In [164]:
import string

string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [None]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

### `string.printable`

String of ASCII characters which are considered printable. This is a combination of [`digits`](https://docs.python.org/3/library/string.html#string.digits), [`ascii_letters`](https://docs.python.org/3/library/string.html#string.ascii_letters), [`punctuation`](https://docs.python.org/3/library/string.html#string.punctuation), and [`whitespace`](https://docs.python.org/3/library/string.html#string.whitespace).

In [None]:
string.printable

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

In [11]:
# Tab character: "\t"
print('A\tB')

A	B


In [15]:
# Newline character: '\n'
print('A\nB')

A
B


### Carriage Return `'\r'` Character

The "carriage return" character has that name because it refers to a command to return the (print) carriage to the beginning of the line.

<img src="../assets/type-writer.png" width="300">

> You'd start a new paragraph by feeding in the paper and then - with your left hand - shoving the carriage (the part on top that has the paper) all the way to the right so the keys will be hitting the spot on the far left first. Then as you typed, the carriage would advance one space at a time. When it got all the way to the right (usually it went "ding!"), you'd have to push that carriage back again, and if you didn't also hit the line-feed lever, you'd start typing over the same line. So the line-feed lever is right there, mounted in the same spot you'd use to push the carriage back anyway, and you could combine both motions. -- [Answer by **nico** on english.stackexchange.com](https://english.stackexchange.com/a/36082)

In [227]:
print('A', end='\r')
print('B')

B


In [226]:
import time
# progress bar using the Carriage Return "\r" character
for x in range(10 + 1):
    time.sleep(0.20)
    print(f'[{x}/10] ' + '===' * x + '>', end='\r')



**Common use cases for strings:**

- Strings can be used to represent text, such as:
    - names
    - addresses
    - messages

- Textual data in Python is handled with [`str`](https://docs.python.org/3/library/stdtypes.html#str) objects, or *strings*.
- Strings are immutable [sequences](https://docs.python.org/3/library/stdtypes.html#typesseq) of Unicode code points.

Python has great support for strings:

In [237]:
name = 'Adam' # String literals can use single quotes
address = "Riyadh, Saudi Arabia" # or double quotes; it does not matter

In [238]:
# Triple quoted strings may span multiple lines.
# All associated whitespace will be included in the string literal.
message = """Hello everyone,
I hope you are enjoying the course,

Thank you.
"""

Note: there is no separate “character” type

In [239]:
type('a')

str

Length of a string:

In [242]:
phone = "00966555555555"
print(len(phone))
print(len([15, 20, 10]))

14
3


**Note**: that the length of a string is the number of characters in the string, including spaces and punctuation.

In [39]:
len(address + '\n\t ') == len(address) + 3

True

Repeating strings

In [4]:
s = "Salam " * 3
print(s)

Salam Salam Salam 


In [41]:
zeros = "0" * 8
x = "1" + zeros
type(x), x

(str, '100000000')

In [43]:
x + x

'100000000100000000'

#### Exercise

- find the length of the variable `phone`
- find the length of the variable `message`

### membership operator: `in`

The `in` operator is used to check if a value is present in a sequence (`str`, `list`, `range`, etc.).

In [6]:
vowels = "aeiou"
print("a" in vowels)

True


In [59]:
# same as above
# since both are sequences in Python
vowels = ["a", "e", "i", "o", "u"]
print("a" in vowels)

True


In Python, Strings are **objects**.

- Objects have **attributes** that can be **accessed** with the `.` operator.
- Objects have **methods** that can be **called** using the `.` operator and the `()` parenthesis:
- ... more on objects later.

In [244]:
print("hello".upper())
print("HeLLO".lower())

HELLO
hello


In [245]:
name = "john doe"

print(name.capitalize())
print(name.title())

John doe
John Doe


In [246]:
help(str.title)
# str.title? # in Jupyter

Help on method_descriptor:

title(self, /)
    Return a version of the string where each word is titlecased.
    
    More specifically, words start with uppercased characters and all remaining
    cased characters have lower case.



In [247]:
# Check case
print(name.islower())
print(name.isupper())

True
False


In [252]:
# Count occurrences
name = "john doe"

print(name.count('o'))
print(name.find('o')) # fist occurrence
print(name.replace('o', 'w'))

2
1
jwhn dwe


In [98]:
# 3rd argument is `count`
# Maximum number of occurrences to replace.
# -1 (the default value) means replace all occurrences
print(name.replace('o', 'w', 1))

jwhn doe


### Whitespace Characters

In [None]:
string.whitespace

' \t\n\r\x0b\x0c'

In [235]:
# note that this will remove leading and trailing whitespace,
# but not whitespace in the middle of the string
text = '\t hello    world \n\n\n'
print(text)

	 hello    world 





In [233]:
# Strip whitespaces
print(text.strip())

hello   world


In [114]:
# Split
print("Hello, world".split()) # default is space

['Hello,', 'world']


In [51]:
print("Hello, world".split("l"))

['He', '', 'o, wor', 'd']


In [118]:
# `maxsplit` argument: Maximum number of splits to do.
# -1 (the default value) means no limit.
print("Hello, world".split("l", 1)) # 1 means split only once

['He', 'lo, world']


See: [Splitlines](https://docs.python.org/3/library/stdtypes.html#str.splitlines)

In [52]:
# multi-line string
text = '''
Hello
World

How are you?
'''

In [53]:
# displays the string as is (showing whitespace characters)
text

'\nHello\nWorld\n\nHow are you?\n'

In [54]:
# follows control characters and prints visible characters
print(text)


Hello
World

How are you?



In [132]:
text.splitlines()

['', 'Hello', 'World', '', 'How are you?']

In [57]:
# Join
names = ["Adam", "Belal", "Camal"]
seperator = ','
print(seperator.join(names)) 

Adam,Belal,Camal


In [67]:
''.join(names)

'AdamBelalCamal'

In [66]:
' + '.join(names)

'Adam + Belal + Camal'

In [74]:
# String formatting

name = "John"

print("Name: {name}")
print(f"Name: {name}")

Name: {name}
Name: John


In [20]:
# Alignment
print(name.ljust(15)) 
print(name.center(15))

john doe       
    john doe   


In [112]:
# Padding
print(f'{100:10}')
print(f'{1000:10}')
print(f'{10000:10}')

num = 100
print(f'{num:5}')

       100
      1000
     10000
  100


In [114]:
# Check start/end
name = 'mr. john.jpg'
print(name.startswith('mr. '))
print(name.endswith('jpg'))

True
True


A fun way to decorate a string using `center` method:

In [118]:
name = 'John'
width = 20
decorator = '*'

print(decorator * width)
print(name.center(width, decorator))
print(decorator * width)

********************
********John********
********************


#### Exercise

- Change the above code to print your `name`, in all uppercase
- change the `width`
- Change the `decorator` to some other character like `#`

## Indexing and Slicing

- A string is a sequence of characters
- Sequences can be indxed using `[]`
    - 1st element is at index `0`
    - 2nd element is at index `1`
    - last element is at index `-1`

In [120]:
title = "Pythonista"

<img src="../assets/pythonista.png">

In [121]:
print(title[2:]) # thonista

thonista


In [124]:
print(title[0] == 'P')
print(title[-1] == title[9] == 'a')
# title[10] # IndexError: string index out of range

True
True


In [161]:
# TODO: weired case that needs to be understood
# print(title[None:-4:-1])

### Slicing

Sequences can also be sliced using `[start:end]`

<img src="../assets/pythonista.png">

In [125]:
title[0] == title[0:1] == 'P'

True

<img src="../assets/pythonista.png">

In [None]:
print(title[2:5]) #
print(title[:5])
print(title[-4:])
print(title[-4:None])

tho
Pytho
ista
ista


#### Exercise

Ex: Given that `name = "Johnson"` What is the value of `name[0]`? `name[1]`? `name[-1]`? `name[-2]`?

In [None]:
# try it

#### Exercise

Ex: try `name[1:3]` and `name[3:5]`

In [31]:
# try it

### We can also add a step to slicing `[start:end:step]`

In [127]:
s = "ABCDEF"

In [134]:
s[0:len(s):1] == s[0:None:1] == s[::] # default values

True

In [139]:
s[::-1]

'FEDCBA'

We can also omit the `start` or `end` of the slice, which would implicitly mean the beginning or end of the string:

In [34]:
s[0::2]

'ACE'

In [145]:
# s = "ABCDEF"
# s[None:0:-1] # TODO: understand this weird case

'FEDCB'

In [36]:
s[-1:0:-2]

'FDB'


Ex: run and try to understand the following code

```python
name = "Johnson"
print(name[::2])
print(name[::-1])
print(name[1:5:2])
```

In [37]:
# try it


Ex: For each of the following, specify the `start`, `end`, and `step`:

- `name[::2]`
- `name[::-1]`
- `name[1:5:2]`

Answer: ...

#### Exercise


Ex: Write a program that takes a string and prints the string in reverse.


In [38]:
# try it


Ex: Write a program that takes a string and prints every other character in the string. Example: `abcdef` -> `bdf`


In [39]:
# try it

Ex: Write a program that takes a string and prints the string in reverse order, but only every other character. It also must capitalize it. Example: `abcdef` -> `ECA`

In [40]:
# try it

Ex: Count the number of `o` in the string `hello world`. Hint: use the `.count()` method.

In [41]:
# try it

## String formatting

**(Main)** docs reference: [`printf`-style String Formatting](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)

There are 3 different ways to concatenate strings in Python:

1. Joining individual strings with + operator
2. `format` string method
3. f-strings

In [148]:
# Using the + operator
name = "John"
age = 30

x1 = "My name is " + name + " and my age is " + str(age)
x2 = "My name is {} and my age is {}".format(name, age)
x3 = f"My name is {name} and my age is {age}"
print(x1 == x2 == x3)
print(x1)

True
My name is John and my age is 30


#### Exercise

- concatenate the strings `first_name` and `last_name` using the + operator
- concatenate the strings `first_name` and `last_name` using the `format` method
- concatenate the strings `first_name` and `last_name` using f-strings

In [43]:
# try it

Ex: Use f-strings to print `Hello, my name is John Doe. and I am 30 years old`. Using the variables `first_name`, `last_name`, and `age`.

In [44]:
# try it

Checkout: [PyFormat](https://pyformat.info/) for more.

## Numbers formatting

In [219]:
# Thousands separator to integers
assert 1_000_000 == 1000000

big_num = 1_000_000 # syntactic sugar
print(f'{big_num}')
print(f'{big_num:,}')

1000000
1,000,000


In [221]:
# Scientific Notation
assert 1e-4 == 0.0001

small_num = 1e-4
print(f"{small_num:.2e}")

1.00e-04


In [159]:
# Number of decimal places
num = 10.5689
print(f'{num}')
print(f'{num:.4f}')
print(f'{num:.2f}')
print(f'{num:.0f}')

10.5689
10.5689
10.57
11


In [182]:
# Format currency (USD)
price = 2978.95
print(f"${price:,.2f}")

$2,978.95


Checkout: [PyFormat](https://pyformat.info/) for more.