# Programming Course in Python


# Python Strings

### Basic Properties

Strings in Python are sequences of characters enclosed within single (' ') or double quotes (" ").

They are a versatile data type used to represent text.

* Immutability
* Length
* Concatenation
* Repetition
* Indexing
* Membership

In [1]:
single_quote_string = 'Hello, World!'
double_quote_string = "Hello, World!"

**Immutability**

Once a string is created, it cannot be modified

It is only possible to create a new string with changes from the first one, if needed

In [None]:
text = "Hello"
new_text = text.replace("H", "F")
print(text)
print(new_text)

Hello
Fello


**Length**

The length of a string can be determined using the 'len()' function

In [None]:
text = 'DSBA'
len(text)

4

**Concatenation**

String can be concatenated (summed) using the '+' operator

In [None]:
first_name = "Alan"
last_name = 'Smith'
first_name + " " + last_name

'Alan Smith'

**Repetition**

Strings can be repeated using the "*" operator

In [None]:
text = "Hello! "
repeated_text = text * 5
repeated_text

'Hello! Hello! Hello! Hello! Hello! '

### Task!

Write down your full name 3 times with 2 methods:
1. Using variables
2. Without variables

In [None]:
# First Method
first_name = #...
last_name = #...
full_name = #...
#...

In [None]:
# Second Method
# ...your code

**Indexing**

Each character in a string can be accessed using its index.

The first index start from 0

In [None]:
text = "Higher School of Economics"
print(text[0], text[11])

H o


Negative indexing starts from the end of the string

In [None]:
print(text[-1], text[-3])

s i


**Membership**

The "in" keyword is used to check if a substring exists within a string

In [None]:
text = "Faculty of Computer Science"
"Computer" in text

True

In [None]:
"computer" in text

False

### Inner Implementation

Strings are immutable sequences of Unicode characters. They are implemented as arrays of bytes representing Unicode code points. This means that each character in a string is stored as a sequence of bytes using a specific encoding (e.g., UTF-8).

Understanding their inner implementation can help in optimizing code and utilizing strings effectively.

* Memory Efficiency
* Unicode Representation

**Memory Efficiency**

Python uses a technique called string interning to optimize memory usage.

Identical string literals share the same memory location.

In [None]:
a = "hello"
b = "hello"
print(a is b)  # Output: True (both refer to the same memory location, demonstrating string interning.)

True


**Unicode Representation**

Strings are stored as sequences of Unicode code points.

Python's default encoding is UTF-8

In [None]:
text = "hello"
print(text.encode('utf-8'))  # Output: b'hello'

b'hello'


### Slices

Slicing is a powerful feature in Python that allows you to access parts of sequences like strings.

Slices can be used to extract subparts of these sequences using a specific syntax.

**Syntax of Slicing**

The slicing syntax in Python is: sequence[ *start* : *stop* : *step* ]

* start: The beginning index of the slice (inclusive).
* stop: The end index of the slice (exclusive).
* step: The interval between indices in the slice. The default value is 1.

**Key Features of Slicing**

* Extracting Substrings
* Default Indices
* Negative Indices
* Step Values

**Practical Examples**

* Reversing a String
* Extracting Even-Indexed Characters
* Extracting Odd-Indexed Characters
* Using Negative Indices
* Advanced Slicing

**Extracting Substrings**

We can extract parts of a string by specifying the start and stop indices.

In [None]:
text = "Python Slicing"
text[0:6]

'Python'

**Default Indices**

If you omit the start index, slicing starts from the beginning of the string.

If you omit the stop index, slicing goes up to the end of the string.

In [None]:
text = "Python Slicing"
print(text[:6])
print(text[7:])

Python
Slicing


**Negative Indices**

Python supports negative indexing, which allows us to slice from the end of the string.

In [None]:
text = "Python Slicing"
print(text[-7:])
print(text[-7:-1])

Slicing
Slicin


**Step Values**

The step parameter allows you to specify the interval between indices.

A negative step value reverses the direction of slicing.

In [None]:
text = "Python Slicing"
print(text[::2])
print(text[::-1])

Pto lcn
gnicilS nohtyP


**Reversing a String**

Using slicing with a step of -1 to reverse a string.

In [None]:
text = "Reverse me"
reversed_text = text[::-1]
print(reversed_text)

em esreveR


**Extracting Even-Indexed Characters**

In [None]:
text = "EvenIndex"
even_index_chars = text[::2]
print(even_index_chars)

EeIdx


**Extracting Odd-Indexed Characters**

In [None]:
text = "OddIndex"
odd_index_chars = text[1::2]
print(odd_index_chars)

dIdx


**Using Negative Indices**

Extracting a substring from the end using negative indices

In [None]:
text = "Negative Slicing"
sub_text = text[-7:]
print(sub_text)

Slicing


**Advanced Slicing**

Combining start, stop, and step for more complex slicing

In [None]:
text = "Advanced Slicing"
complex_slice = text[2:15:3]
print(complex_slice)

vc in


### Tasks!

**1. Extract the first word**

In [None]:
text = "Learn Python"
first_word = #...
print(first_word)

**2. Extract the last word using negative indices**

In [None]:
text = "Learn Python"
last_word = #...
print(last_word)

### In-built Methods

* Changing Case
* Trimming Spaces
* Finding and Replacing
* Splitting and Joining
* Checking String Properties

**Changing Case**

Methods to convert string's case:
* *.upper()* - Converts all characters in the string to uppercase.
* *.lower()* - Converts all characters in the string to lowercase.
* *.capitalize()* - Capitalizes the first character of the string.
* *.title()* - Capitalizes the first character of each word in the string.
* *.swapcase()* - Swaps the case of all characters in the string.

In [None]:
text = 'great Work'
print(text.upper())
print(text.lower())
print(text.capitalize())
print(text.title())
print(text.swapcase())

GREAT WORK
great work
Great work
Great Work
GREAT wORK


**Trimming**

Methods for removing whitespace:
* *.strip()* - removes whitespaces in the beginning and end of the string
* *.lstri()* - removes whitespaces in the beginning of the string
* .*rstrip()* - removes whitespaces in the end of the string

In [None]:
text = "  hello  "
text.strip()

'hello'

In [None]:
text = "  hello"
text.lstrip()

'hello'

In [None]:
text = "hello  "
text.rstrip()

'hello'

In [None]:
text = "  Hello, World!  "
print(text.strip())
print(text.lstrip())
print(text.rstrip())

Hello, World!
Hello, World!  
  Hello, World!


**Finding and Replacing**

*str.find(sub)* : Returns the lowest index where the substring sub is found, or -1 if not found.

In [None]:
text = "hello world"
print(text.find("world"))
print(text.find("Python"))

6
-1


*str.rfind(sub)* : Returns the highest index where the substring sub is found, or -1 if not found.

In [None]:
text = "hello world world"
print(text.rfind("world"))

12


*str.replace(old, new)* : Replaces all occurrences of the substring old with new.

In [None]:
text = "hello world"
print(text.replace("world", "Python"))  # Output: hello Python

hello Python


**Splitting and Joining**

*str.split()* : Splits the string into a list of substrings based on the separator sep.

In [None]:
text = "apple,banana,cherry"
text.split(",")

['apple', 'banana', 'cherry']

*str.rsplit()* : Splits the string into a list of substrings starting from the right.

In [None]:
text = "apple,banana,cherry"
text.rsplit(",", 1)

['apple,banana', 'cherry']

str.join(iterable): Joins elements of an iterable with the string as the separator.

In [None]:
fruits = ["apple", "banana", "cherry"]
", ".join(fruits)

'apple, banana, cherry'

**Checking String Properties**

*str.startswith(prefix)* : Returns True if the string starts with the specified prefix.

In [None]:
text = "hello world"
text.startswith("hello")

True

*str.endswith(suffix)* : Returns True if the string ends with the specified suffix.

In [None]:
text = "hello world"
text.endswith("world")

True

*str.isalpha()* : Returns True if all characters in the string are alphabetic.

In [None]:
text_1 = "hello"
text_2 = '12345'
print(text_1.isalpha())
print(text_2.isalpha())

True
False


*str.isdigit()* : Returns True if all characters in the string are digits.

In [None]:
text_1 = "12345"
text_2 = 'hello'
print(text_1.isdigit())
print(text_2.isdigit())

True
False


*str.isalnum()* : Returns True if all characters in the string are alphanumeric.

In [None]:
text_1 = "hello123"
text_2 = 'hello'
print(text_1.isalnum())
print(text_2.isalnum())

True
True


*str.isspace()* : Returns True if all characters in the string are whitespace.

In [None]:
text_1 = "   "
text_2 = "   7 "
print(text_1.isspace())
print(text_2.isspace())

True
False


### Regular Expressions

Regular expressions (regex) are a powerful tool for matching patterns in text.

Python's re module provides support for regex, allowing us to search, match, and manipulate strings efficiently.

**1. Basic Syntax**

*Literal Characters* : Matches the exact characters you specify.

In [None]:
import re
text = "hello world"
match = re.search(r"world", text)
if match:
    print("Found:", match.group())

Found: world


*Metacharacters* : Special characters with specific meanings.

* . : Matches any character except a newline.
* ^ : Matches the start of the string.
* $ : Matches the end of the string.
* \* : Matches 0 or more repetitions.
* \+ : Matches 1 or more repetitions.
* ? : Matches 0 or 1 repetition.
* [ ]: Matches any character inside the brackets.
* | : Matches either the pattern before or after the pipe.
* ( ) : Groups patterns.

In [None]:
import re
text = "The rain in Spain"
match = re.search(r"^The.*Spain$", text)
if match:
    print("Match found")

Match found


**2. Common Methods**

*re.search(pattern, string)* : Searches for the first occurrence of the pattern in the string.

In [None]:
import re
text = "hello world"
match = re.search(r"world", text)
if match:
    print("Found:", match.group())

Found: world


*re.match(pattern, string)* : Checks for a match only at the beginning of the string.

In [None]:
import re
text = "hello world"
match = re.match(r"hello", text)
if match:
    print("Match at the beginning:", match.group())

Match at the beginning: hello


*re.findall(pattern, string)* : Returns a list of all non-overlapping matches in the string.

In [None]:
import re
text = "hello world, hello universe"
matches = re.findall(r"hello", text)
print("All matches:", matches)

All matches: ['hello', 'hello']


*re.sub(pattern, repl, string)* : Replaces occurrences of the pattern with repl in the string.

In [None]:
import re
text = "hello world"
result = re.sub(r"world", "universe", text)
print(result)

hello universe


**3. Special Sequences**

* \d : Matches any digit (equivalent to [0-9]).
* \D : Matches any non-digit.
* \w : Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).
* \W : Matches any non-alphanumeric character.
* \s : Matches any whitespace character.
* \S : Matches any non-whitespace character.

In [None]:
import re
text = "hello 123 world"
match = re.findall(r"\d+", text)
print("Digits:", match)

Digits: ['123']


**4. Anchors**

* ^ : Matches the start of the string.
* $ : Matches the end of the string.

In [None]:
import re
text = "hello world"
match = re.search(r"^hello", text)
if match:
    print("Starts with 'hello'")

Starts with 'hello'


**5. Grouping and Capturing**

* ( ) : Groups patterns and captures the matched text.
* \number: Refers to the captured group.

In [None]:
import re
text = "hello world"
match = re.search(r"(hello) (world)", text)
if match:
    print("Group 1:", match.group(1))
    print("Group 2:", match.group(2))

Group 1: hello
Group 2: world


### Encoding and Decoding

Encoding and decoding are fundamental concepts in text processing that allow you to convert strings into bytes and vice versa.

This is essential for handling text data, especially when working with different character encodings in various applications.

**1. Basic Concepts**

*Encoding* : The process of converting a string into a bytes object using a specific character encoding (e.g., UTF-8).

*Decoding* : The process of converting a bytes object back into a string using a specific character encoding.

In [None]:
text = "Hello, World!"
# encoding
encoded_text = text.encode('utf-8')
# decoding
decoded_text = encoded_text.decode('utf-8')

print(encoded_text)
print(decoded_text)

b'Hello, World!'
Hello, World!


**2. Common Encodings**

* UTF-8: A variable-width character encoding capable of encoding all valid Unicode code points.
* ASCII: A character encoding standard for electronic communication, representing text in computers.
* ISO-8859-1: An 8-bit single-byte coded graphic character sets, part of the ISO/IEC 8859 series.

**3. Encoding a String**

*str.encode(encoding)* : Encodes the string using the specified encoding.

In [None]:
text = "Hello, World!"
utf8_encoded = text.encode('utf-8')
ascii_encoded = text.encode('ascii')
print(utf8_encoded)
print(ascii_encoded)

b'Hello, World!'
b'Hello, World!'


**4. Decoding a Byte Object**

*bytes.decode(encoding)* : Decodes the byte object using the specified encoding.

In [None]:
utf8_encoded = b'Hello, World!'
decoded_text = utf8_encoded.decode('utf-8')
print(decoded_text)

Hello, World!


**5. Handling Errors**

When encoding or decoding, we may encounter characters that cannot be handled by the specified encoding.

Python provides several strategies to handle such errors:

* strict: Raises a UnicodeEncodeError or UnicodeDecodeError (default behavior).
* ignore: Ignores the unencodable/untranslatable characters.
* replace: Replaces unencodable/untranslatable characters with a replacement marker (e.g., ?).

In [None]:
text = "Hello, World! 你好，世界！"
try:
    ascii_encoded = text.encode('ascii')
except UnicodeEncodeError as e:
    print("Encoding Error:", e)

ascii_encoded_ignore = text.encode('ascii', errors='ignore')
ascii_encoded_replace = text.encode('ascii', errors='replace')
print(ascii_encoded_ignore)
print(ascii_encoded_replace)

Encoding Error: 'ascii' codec can't encode characters in position 14-19: ordinal not in range(128)
b'Hello, World! '
b'Hello, World! ??????'


## Yandex Contest

Great work!

Here is the link to the tasks in Yandex Contest: https://official.contest.yandex.ru/contest/62953/standings

You can test your knowledge and train your skills!