# Recap Text/Strings in Python 

##  String Methods


In [1]:
'hello'.upper()

'HELLO'

In [4]:
'HELLo'.isupper()

False

In [5]:
'HELLo'.lower()

'hello'

In [8]:
'hello'.endswith('lo')

True

In [9]:
'hello world'.capitalize()

'Hello world'

In [10]:
'  BLA   '.strip()

'BLA'

In [13]:
'hello world'.count('l')

3

In [14]:
'hello world'.count('l', 0, 3)

1

In [16]:
my_splits = 'hello world'.split()
type(my_splits)

list

In [17]:
'hello world'.split('l')

['he', '', 'o wor', 'd']

In [19]:
'hello4'.isalpha()

False

In [20]:
'hello4'.isalnum()

True

In [24]:
'Hello World'.replace('l', 'ß')


'Heßßo Worßd'

In [25]:
'Hello World'.replace('l', 'ß', 1)


'Heßlo World'

In [27]:
'hello'.find('x')

-1

The find() method should be used only if you need to know the **position** of sub. To check if sub is a substring or not, use the in operator: 
 ```
 'Py' in 'Python'

In [29]:
'l' in 'Hello'

True

In [28]:
'hello'.index('x')

ValueError: substring not found

- joining strings of a list:

In [33]:
';'.join(['Hello', 'world', 'again'])

'Hello;world;again'

Bob&;Doe&;walks

In [37]:
';&'.join(['Bob', 'Doe', 'walks'])

'Bob;&Doe;&walks'

In [38]:
'Hello worldß'.casefold()

'hello worldss'

### Concatinating

- +

In [41]:
'Hello' + ' ' + 'World'


'Hello World'

- format

In [40]:
"The sum of 1 + 2 is {0}".format(1+2)

'The sum of 1 + 2 is 3'

- f-strings

In [46]:
word1 = 'hello'
word2 = 'world'
f"{word1.capitalize()} {word2.capitalize()}" #formatted string literal
f"{word1} {word2}"
word1 + ' ' + word2

'Hello World'

In [47]:
print(f"{word1.capitalize()} {word2.capitalize()}")

Hello World


- % operator can be used to insert values into a string in a formatted way.
- left operand is a string containing one or more placeholders e.g( %s, %d, %f)
- %s for strings, %d for integers, %f for floats




In [50]:
name = 'John'
age = 30
print("My name is %s and %d years old." % (name, age))
'%s' % 'BLA'

My name is John and 30 years old.


'BLA'

### Index and Sliceing

In [51]:
my_string = 'Hello, World'

In [53]:
my_string[1]

'e'

In [54]:
my_string[5]

','

In [55]:
my_string[6]

' '

**Slice():** 
- String slice syntax allows us to extract a portion of a string by specifying a range of indices.

> string[start:end:step]

In [57]:
my_string

'Hello, World'

In [56]:
my_string[0:5]

'Hello'

In [59]:
my_string[1:6:1]

'ello,'

In [63]:
my_string[::3]

'Hl r'

In [64]:
my_string[7:]

'World'

In [66]:
my_string[7:12]

'World'

In [70]:
my_string[7::2]

'Wrd'

In [73]:
my_string[6:0:-1]

' ,olle'

In [75]:
my_string[-7::-1]

',olleH'

In [76]:
my_string[6::-1]

' ,olleH'

In [77]:
my_string[-12]

'H'

In [78]:
my_string[::-1]

'dlroW ,olleH'

![image.png](attachment:image.png)

### String Immutability

- Strings are immutable objects, which means that the contents of the string cannot be modified in-place.


In [79]:
my_string[0] = 'B'

TypeError: 'str' object does not support item assignment

- Instead, any modification to a string will create a new string in memory. 
- Because of this immutability property, two strings with the same value can be saved in the same memory location.

In [110]:
str1 = 'Hello'
id(str1)

140178140067120

In [109]:
str2 = 'Hello'
id(str2)

140178140067120

In [111]:
str3 = 'Hello2'
id(str3)

140178140144752

In [112]:
str4 = 'Hello2' 
id(str4)

140178140144752

In [97]:
str3 = str1 + '0'
id(str3)
#str3

140177775874224

- When these two strings are created, Python checks if there is already a string with the same value in memory
- If there is, Python will not allocate a new block of memory for the second string, but instead, will assign the memory address of the first string to the second string.
- In other words, both str1 and str2 will point to the same memory location in memory
- This behavior is possible because strings are immutable objects in Python.
- In summary, because strings are immutable objects in Python, two strings with the same value can be saved in the same memory location.


### Encoding

> String encoding refers to the process of converting a sequence of Unicode characters into a series of bytes that can be stored or transmitted over a network.

- This is necessary because computers and other electronic devices can only understand binary data (i.e., 0s and 1s)
- There are different encoding schemes available
- each maps Unicode characters to a unique series of bytes.
- Some popular encoding schemes include ASCII, UTF-8, and UTF-16.

ASCII (American Standard Code for Information Interchange)

- It represents each character using a single byte (8 bits), which can represent up to 256 different characters.

In [113]:
2 ** 8

256

- ASCII does not support characters other than English.

 UTF-8 (Unicode Transformation Format 8-bit) is a variable-length encoding scheme that can represent any Unicode character using one to four bytes.
 - It is the most widely used encoding scheme on the web and supports characters from
all major languages, including Chinese, Arabic, and Hebrew.

UTF-16 (Unicode Transformation Format 16-bit) is a fixed-length encoding scheme that represents each character using between 1 and 4 bytes.


In [114]:
string = "Hello, World!"

encode_string = string.encode('utf-8')
encode_string

b'Hello, World!'

In [121]:
my_bytes = 'cafè'.encode('utf-8')
len(my_bytes)
my_string = my_bytes.decode('utf-8')
len(my_string)

4