![image.png](attachment:image.png)

# Sequence Types. Strings

## 0. Defining Strings

Firstly we need to know that strings are inmutable sequences (we cannot modify a defined strings). The easiest way to define them is with either single, double or triple quotation marks.

In [16]:
# String with single quotes
first = 'a simple string using single quotes'

# String with double quotes
second = "a simple string using double quotes"

# Triple quotation admits multiline strings
third = '''This is a multiline string
that spans several lines
of text.'''

## 1. Sequence operations
As inmutable strigns we can use some common operations. The common sequence operations are sorted in ascending priority in the following table: 

![sequence_types.png](attachment:sequence_types.png)

In [17]:
# Check if "h" is in "hello"
print("h" in "hello")  # True

# Check if "h" is NOT in "hello"
print("h" not in "hello")  # False

# Concatenate strings
print("Whats" + "App")  # "WhatsApp"

# Repeat a string 4 times
print("Ciao! " * 4)  # "Ciao! Ciao! Ciao! Ciao! "

# Access the first character of first
print(f"First character of first str: {first[0]}")  # "a"

# Get the first 7 characters of 'third'
print(f"First 7 characters of third str: {third[0:7]}")  # "This is"

# Get the characters with a stride of 2
print(f"Characters with a stride of 2 in third str: {third[0:7:2]}")  # "Ti s"

# Get the length of first
print(f"Length of first str: {len(first)}")  # 35

# Get the character with the smallest Unicode value in first
print(f"Smallest Unicode character in first str: {min(first)}")  # " "

# Get the character with the largest Unicode value in first
print(f"Largest Unicode character in first str: {max(first)}")  # "u"

# Find the index of the first occurrence of "s" in first
print(f"Index of first occurrence of 's' in 'first': {first.index('s')}")  # 2

# Count how many times "s" appears in second
print(f"Count of 's' in second var: {second.count('s')}")  # 4

True
False
WhatsApp
Ciao! Ciao! Ciao! Ciao! 
First character of first str: a
First 7 characters of third str: This is
Characters with a stride of 2 in third str: Ti s
Length of first str: 35
Smallest Unicode character in first str:  
Largest Unicode character in first str: u
Index of first occurrence of 's' in 'first': 2
Count of 's' in second var: 4


### 1.1 Indexing

String indexing allows you to access individual characters in a string. Python uses zero-based indexing, which means the first character is at position 0.

- Positive indices start from the beginning (0, 1, 2, ...)
- Negative indices start from the end (-1, -2, -3, ...)

In [18]:
# Let's create a sample string
sample = "Python"

# Accessing characters by index
print(f"First character: {sample[0]}")    # P
print(f"Second character: {sample[1]}")   # y
print(f"Third character: {sample[2]}")    # t

# Negative indices count from the end of the string
print(f"Last character: {sample[-1]}")    # n
print(f"Second to last: {sample[-2]}")    # o

First character: P
Second character: y
Third character: t
Last character: n
Second to last: o


### 1.2 Slicing

In Python, the slicing syntax is `string[start:stop:step]`. You can use:
- `start`: the index where the slice starts.
- `stop`: the index where the slice ends (not inclusive).
- `step`: the interval between characters. A negative step reverses the order.

Below, you'll see examples using all three arguments.

In [19]:
# Using the sample string defined earlier
sample = "Python"

# Reverse the entire string using a negative step
reversed_str = sample[::-1]
print("Reversed string:", reversed_str)

# Get every second character from the original string
every_second = sample[::2]
print("Every second character:", every_second)

# Reverse the string and take every second character
reversed_every_second = sample[::-2]
print("Reversed every second character:", reversed_every_second)

# Advanced slicing: slice from index 4 to index 1, stepping backwards.
# Note that the stop index is not included, so the character at index 1 won't be printed.
custom_slice = sample[4:0:-1]
print("Custom slice (indices 4 to 1, reversed):", custom_slice)

Reversed string: nohtyP
Every second character: Pto
Reversed every second character: nhy
Custom slice (indices 4 to 1, reversed): ohty


## 2. String Methods

This section covers various string methods in Python:

- **2.1 Case Manipulation:** Methods like `upper()`, `lower()`, `title()`, etc.
- **2.2 Finding and Replacing:** Using methods like `find()`, `replace()`, etc.
- **2.3 Splitting and Joining:** Using `split()` and `join()` to convert between strings and lists.
- **2.4 Stripping Whitespace:** Methods like `strip()`, `lstrip()`, and `rstrip()` to remove extra spaces.
- **2.5 Testing String Properties:** Methods like `isalpha()`, `isdigit()`, etc., for checking the content type of strings.

### 2.1 Case Manipulation Methods

Below are some methods for manipulating the case of strings:

- **capitalize()**: Capitalizes the first letter and makes all other characters lowercase.
- **casefold()**: Converts the string to lowercase in a more aggressive way for case-insensitive comparisons.
- **lower()**: Converts all characters in the string to lowercase.
- **swapcase()**: Swaps uppercase letters to lowercase and vice versa.
- **title()**: Converts the string to title case, where each word starts with an uppercase letter followed by lowercase letters.
- **upper()**: Converts all characters in the string to uppercase.

In [20]:
# Sample text for demonstration
sample_text = "heLLo WOrld"

print("Original:", sample_text)
print("capitalize():", sample_text.capitalize())   # HeLlo world -> "Hello world"
print("casefold():", sample_text.casefold())         # hello world (more aggressive than lower())
print("lower():", sample_text.lower())               # hello world
print("swapcase():", sample_text.swapcase())         # HEllO woRLd
print("title():", sample_text.title())               # Hello World
print("upper():", sample_text.upper())               # HELLO WORLD

Original: heLLo WOrld
capitalize(): Hello world
casefold(): hello world
lower(): hello world
swapcase(): HEllO woRLD
title(): Hello World
upper(): HELLO WORLD


### 2.2 Finding and Replacing

In this section, we will explore methods to search for substrings and replace parts of strings:

- **find()**: Searches for a substring from the beginning and returns the lowest index or -1 if not found.
- **index()**: Like find() but raises an error if the substring is not found.
- **rfind()**: Searches for a substring from the end and returns the highest index or -1 if not found.
- **rindex()**: Like rfind() but raises an error if the substring is not found.
- **replace()**: Replaces occurrences of a substring with another substring.

In [21]:
# Sample sentence for demonstration
sentence = "Python is great, and Python is fun!"

# Using find() to locate the first occurrence of "Python"
first_index_find = sentence.find("Python")
print("Using find(), first occurrence of 'Python':", first_index_find)

# Using index() to locate the first occurrence of "Python"
first_index_index = sentence.index("Python")
print("Using index(), first occurrence of 'Python':", first_index_index)

# Using rfind() to locate the last occurrence of "Python"
last_index_rfind = sentence.rfind("Python")
print("Using rfind(), last occurrence of 'Python':", last_index_rfind)

# Using rindex() to locate the last occurrence of "Python"
last_index_rindex = sentence.rindex("Python")
print("Using rindex(), last occurrence of 'Python':", last_index_rindex)

# Demonstrate what happens when a substring is not found:
# Using find() returns -1 if substring not found
not_found_find = sentence.find("Java")
print("Using find() for 'Java' (not found):", not_found_find)

# Using index() for a non-existent substring will raise an error.
try:
    sentence.index("Java")
except ValueError as e:
    print("Using index() for 'Java' raised an error:", e)

# Using replace() to replace "Python" with "JavaScript"
replaced_sentence = sentence.replace("Python", "JavaScript")
print("After replacement:", replaced_sentence)

# Demonstrate replacing only the first occurrence with a count parameter
replaced_once = sentence.replace("Python", "JavaScript", 1)
print("After replacing first occurrence:", replaced_once)

Using find(), first occurrence of 'Python': 0
Using index(), first occurrence of 'Python': 0
Using rfind(), last occurrence of 'Python': 21
Using rindex(), last occurrence of 'Python': 21
Using find() for 'Java' (not found): -1
Using index() for 'Java' raised an error: substring not found
After replacement: JavaScript is great, and JavaScript is fun!
After replacing first occurrence: JavaScript is great, and Python is fun!


### 2.3 Splitting and Joining

In this section, we will explore various methods for splitting strings into parts and joining them back together:

- **split()**: Splits a string into a list using a specified separator (by default, whitespace).
- **rsplit()**: Splits a string from the right-hand side.
- **splitlines()**: Splits a string at line breaks and returns a list of lines.
- **partition()**: Splits the string at the first occurrence of the separator and returns a tuple (before, separator, after).
- **rpartition()**: Splits the string at the last occurrence of the separator.
- **join()**: Joins the elements of an iterable into a single string using a specified separator.

In [22]:
# Example string for splitting using split() and rsplit()
text = "one,two,three,four"
print("Using split() with comma separator:", text.split(','))  # Splits at every comma

# Using rsplit() with a maxsplit parameter to limit splits from the right
print("Using rsplit() with comma separator and maxsplit=2:", text.rsplit(',', 2))

# Example for splitlines(): a multi-line string
multiline_text = "Line 1\nLine 2\nLine 3"
print("Using splitlines():", multiline_text.splitlines())

# Using partition(): splits at the first occurrence of 'is'
sentence = "Python is great, and Python is fun!"
print("Using partition() on 'is':", sentence.partition("is"))
# The output is a tuple: (part before, separator, part after)

# Using rpartition(): splits at the last occurrence of 'is'
print("Using rpartition() on 'is':", sentence.rpartition("is"))

# Using join(): join a list of strings into one string with a specified separator
fruits = ['apple', 'banana', 'cherry']
print("Using join() with ', ' as separator:", ", ".join(fruits))

Using split() with comma separator: ['one', 'two', 'three', 'four']
Using rsplit() with comma separator and maxsplit=2: ['one,two', 'three', 'four']
Using splitlines(): ['Line 1', 'Line 2', 'Line 3']
Using partition() on 'is': ('Python ', 'is', ' great, and Python is fun!')
Using rpartition() on 'is': ('Python is great, and Python ', 'is', ' fun!')
Using join() with ', ' as separator: apple, banana, cherry


### 2.4 Stripping Whitespace

This section demonstrates methods to remove whitespace from strings:

- **strip()**: Removes whitespace from both ends of the string.
- **lstrip()**: Removes whitespace from the beginning (left) of the string.
- **rstrip()**: Removes whitespace from the end (right) of the string.

In [23]:
# Sample string with extra whitespace
messy_text = "   Hello, World!   "

# Remove whitespace from both ends
stripped_text = messy_text.strip()
print("strip():", repr(stripped_text))

# Remove whitespace only from the left side
lstripped_text = messy_text.lstrip()
print("lstrip():", repr(lstripped_text))

# Remove whitespace only from the right side
rstripped_text = messy_text.rstrip()
print("rstrip():", repr(rstripped_text))

strip(): 'Hello, World!'
lstrip(): 'Hello, World!   '
rstrip(): '   Hello, World!'


### 2.5 Testing String Properties

This section demonstrates various methods to check certain properties of strings:

- **isalnum()**: Returns True if all characters in the string are alphanumeric.
- **isalpha()**: Returns True if all characters in the string are alphabetic.
- **isascii()**: Returns True if all characters in the string are ASCII.
- **isdecimal()**: Returns True if all characters in the string are decimal characters.
- **isdigit()**: Returns True if all characters in the string are digits.
- **isidentifier()**: Returns True if the string is a valid Python identifier.
- **islower()**: Returns True if all cased characters in the string are lowercase.
- **isnumeric()**: Returns True if all characters in the string are numeric.
- **isprintable()**: Returns True if all characters in the string are printable.
- **isspace()**: Returns True if the string contains only whitespace.
- **istitle()**: Returns True if the string is in title case.
- **isupper()**: Returns True if all cased characters in the string are uppercase.

In [24]:
# isalnum(): True if all characters are alphanumeric (letters & numbers), False otherwise.
print("'Hello123'.isalnum():", "Hello123".isalnum())
print("'Hello 123'.isalnum():", "Hello 123".isalnum())  # Contains space

# isalpha(): True if all characters are alphabetic.
print("'Hello'.isalpha():", "Hello".isalpha())
print("'Hello123'.isalpha():", "Hello123".isalpha())

# isascii(): True if all characters are ASCII.
print("'Hello'.isascii():", "Hello".isascii())
print("'Héllo'.isascii():", "Héllo".isascii())  # 'é' is not ASCII

# isdecimal(): True if all characters are decimals.
print("'12345'.isdecimal():", "12345".isdecimal())
print("'123.45'.isdecimal():", "123.45".isdecimal())  # Decimal point is not a decimal

# isdigit(): True if all characters are digits.
print("'12345'.isdigit():", "12345".isdigit())
print("'²'.isdigit():", "²".isdigit())  # Superscript 2

# isidentifier(): True if the string is a valid Python identifier.
print("'variable_name'.isidentifier():", "variable_name".isidentifier())
print("'123variable'.isidentifier():", "123variable".isidentifier())

# islower(): True if all cased characters are lowercase.
print("'hello'.islower():", "hello".islower())
print("'Hello'.islower():", "Hello".islower())

# isnumeric(): True if all characters are numeric.
print("'12345'.isnumeric():", "12345".isnumeric())
print("'123.45'.isnumeric():", "123.45".isnumeric())

# isprintable(): True if all characters are printable.
print("'Hello, World!'.isprintable():", "Hello, World!".isprintable())
print("'Hello\nWorld'.isprintable():", "Hello\nWorld".isprintable())

# isspace(): True if the string contains only whitespace.
print("'    '.isspace():", "    ".isspace())
print("' a '.isspace():", " a ".isspace())

# istitle(): True if the string is title cased.
print("'Hello World'.istitle():", "Hello World".istitle())
print("'hello World'.istitle():", "hello World".istitle())

# isupper(): True if all cased characters are uppercase.
print("'HELLO'.isupper():", "HELLO".isupper())
print("'Hello'.isupper():", "Hello".isupper())

'Hello123'.isalnum(): True
'Hello 123'.isalnum(): False
'Hello'.isalpha(): True
'Hello123'.isalpha(): False
'Hello'.isascii(): True
'Héllo'.isascii(): False
'12345'.isdecimal(): True
'123.45'.isdecimal(): False
'12345'.isdigit(): True
'²'.isdigit(): True
'variable_name'.isidentifier(): True
'123variable'.isidentifier(): False
'hello'.islower(): True
'Hello'.islower(): False
'12345'.isnumeric(): True
'123.45'.isnumeric(): False
'Hello, World!'.isprintable(): True
'Hello
World'.isprintable(): False
'    '.isspace(): True
' a '.isspace(): False
'Hello World'.istitle(): True
'hello World'.istitle(): False
'HELLO'.isupper(): True
'Hello'.isupper(): False


**String constants**. What Python specifically considers as alphanumeric or ASCII inside this methods? Well, there are some constants in the **string.py** file

In [25]:
import string

# String constants
print("ASCII letters:", string.ascii_letters)  # 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
print("ASCII lowercase:", string.ascii_lowercase)  # 'abcdefghijklmnopqrstuvwxyz'
print("ASCII uppercase:", string.ascii_uppercase)  # 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
print("Digits:", string.digits)  # '0123456789'
print("Hexadecimal digits:", string.hexdigits)  # '0123456789abcdefABCDEF'
print("Octal digits:", string.octdigits)  # '01234567'
print("Punctuation:", string.punctuation)  # '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
print("Printable characters:", string.printable)  # Digits, letters, punctuation, and whitespace
print("Whitespace characters:", string.whitespace)  # ' \t\n\r\x0b\x0c'

ASCII letters: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
ASCII lowercase: abcdefghijklmnopqrstuvwxyz
ASCII uppercase: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Digits: 0123456789
Hexadecimal digits: 0123456789abcdefABCDEF
Octal digits: 01234567
Punctuation: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Printable characters: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 	

Whitespace characters:  	



## 3. String Formatting

There are several ways to format strings in Python:

- **f-strings:** Embeds expressions directly within string literals.
- **format() method:** Uses curly braces as placeholders to be replaced by arguments.
- **%-formatting:** Uses `%` as a formatting operator (older style).
- **Template strings:** Uses the `string.Template` class for more customizable substitutions.

In [26]:
import string

# Sample data for formatting
name = "Alice"
age = 30
score = 95.5

# f-strings: Embedding expressions directly
formatted_f = f"{name} is {age} years old and scored {score:.1f} points."
print("f-string:", formatted_f)

# format() method: Using curly braces as placeholders
formatted_format = "{} is {} years old and scored {:.1f} points.".format(name, age, score)
print("format() method:", formatted_format)

# %-formatting: Older style formatting with %
formatted_percent = "%s is %d years old and scored %.1f points." % (name, age, score)
print("%-formatting:", formatted_percent)

# Template strings: Using string.Template for customizable substitutions
template = string.Template("$name is $age years old and scored $score points.")
formatted_template = template.substitute(name=name, age=age, score=f"{score:.1f}")
print("Template strings:", formatted_template)

f-string: Alice is 30 years old and scored 95.5 points.
format() method: Alice is 30 years old and scored 95.5 points.
%-formatting: Alice is 30 years old and scored 95.5 points.
Template strings: Alice is 30 years old and scored 95.5 points.


## 4. String Conversion

Let's see how to convert from/to other types. The syntax is very easy: surround a variable with the target type you want to cast it to, `type(variable)`. Invalid conversions can raise `TypeError`.

In [27]:
# Converting a number to a string
number = 42
number_str = str(number)
print("Converting int to str:", number_str, type(number_str))

# Converting a string to a number
numeric_string = "2025"
converted_int = int(numeric_string)
print("Converting str to int:", converted_int, type(converted_int))

converted_float = float(numeric_string)
print("Converting str to float:", converted_float, type(converted_float))

Converting int to str: 42 <class 'str'>
Converting str to int: 2025 <class 'int'>
Converting str to float: 2025.0 <class 'float'>


## 5. String Immutability and Best Practices

In Python, strings are **immutable**, meaning that once created, their contents cannot be changed. When you perform operations that appear to modify a string, a new string is actually created.

**Best Practices:**
- **Use immutable strings wisely.** Avoid needless concatenations in a loop; consider using `list` and `join()` for efficiency.  
- **Use f-strings or `format()` for clarity** in your code when building strings dynamically.  
- **Choose the right data structure.** If frequent modifications are needed, consider byte arrays or other editable data types.

In [28]:
immutable_example = "Hello"
try:
    immutable_example[0] = "J"
except TypeError as e:
    print("Strings are immutable:", e)

# Creating a modified version returns a new string
new_string = "J" + immutable_example[1:]
print("Original:", immutable_example)
print("New:", new_string)

Strings are immutable: 'str' object does not support item assignment
Original: Hello
New: Jello
