## Regular Expressions (RE)

- Patterns to match strings.
- Enclosed in r"" notation.
- Use square brackets for character choices, e.g., [Aa].
- | for alternatives, like [Aa]pple|[Bb]anana.
- Legal variable names pattern: r"[A-Za-z_][A-Za-z_0-9]*\Z".
- Use raw strings (r"") for RE patterns to avoid conflicts with special notations.
## Common RE Notations

- . matches any character.
- [...] matches characters in brackets.
- [^...] matches characters not in brackets.
- ^ matches start of the string.
- $ matches end of the string.
- '*' matches zero or more.
- '+' matches one or more.
- {m,n} matches m to n occurrences.
- ? matches zero or one occurrence.
## Grouping and Shorthand

- Grouping with parentheses.
- Backreferences with \number.
- Shorthand notations: \d, \D, \s, \S, \w, \W.
## Matching Functions

- `re.match` for the start of a string.
- `re.search` for matching anywhere.
- `re.findall` finds all occurrences.
- `re.finditer` returns an iterator of match objects.
- `re.sub` replaces pattern matches.
## Pattern Compilation

- Precompile patterns for efficiency with re.compile.
- Flags like `re.IGNORECASE, re.MULTILINE`, and `re.DOTALL` modify matching behavior.
## Match Objects

- Returned by re.match and re.search.
- Access matched substrings and groups with methods.
- Use boolean checks like if mo: or convert with bool(mo).

In [1]:
# Match and search functions
import re
s = "Doing things, going home, staying awake, sleeping later"
re.findall(r'\w+ing\b', s)

['Doing', 'going', 'staying', 'sleeping']

In [2]:
re.findall(r'[+-]?\d+', "23 + -24 = -1")

['23', '-24', '-1']

In [4]:
s = ("if I'm not in a hurry, then I should stay. " + " On the other hand, if I leave, then I can sleep.")
# Greedy matching (.*) tries to match as many characters as possible
re.findall(r'[Ii]f (.*), then', s)

["I'm not in a hurry, then I should stay.  On the other hand, if I leave"]

In [5]:
"""
The repetition specifiers +, *, ?, and {m,n} have corresponding non-greedy versions: +?, *?, ??, and {m,n}?. 
These expressions use as few characters as possible to make the whole pattern match some substring. 
"""
s = ("if I'm not in a hurry, then I should stay. " + " On the other hand, if I leave, then I can sleep.")
# Non - Greedy matching (.*?) tries to match as many characters as possible
re.findall(r'[Ii]f (.*?), then', s)

["I'm not in a hurry", 'I leave']

In [6]:
# Functions in the re module
import re
str = "She goes where she wants to, she's a sheriff."
newstr = re.sub(r'\b[Ss]he\b', 'he', str)
print(newstr)

he goes where he wants to, he's a sheriff.


In [11]:
import re
str = """He is a timelord.
He has a Tardis."""
newstr = re.sub(r'(\b[Hh]e\b)', r'\1 (The Doctor)', str, 1)
print(newstr)

He (The Doctor) is a timelord.
He has a Tardis.


In [13]:
# Match Object

mo = re.search(r'\d+ (\d+) \d+ (\d+)', 'first 123 45 67 890 last')
if mo:
    print(mo)

<re.Match object; span=(6, 19), match='123 45 67 890'>


In [15]:
# ignore cases
# pre compile pattern for faster response 
# (?i) > re.IGNORECASE
# (?m) > re.MULTILINE
# (?s) > re.DOTALL
import re
pattern = r'hello world'
re.compile(pattern, re.MULTILINE | re.DOTALL)

re.compile(r'hello world', re.MULTILINE|re.DOTALL|re.UNICODE)

In [19]:
"""
Write function integers_in_brackets that finds from a given string all integers that are enclosed in brackets.
Example run: 
integers_in_brackets(" afd [asd] [12 ] [a34] [ -43 ]tt [+12]xxx") 
returns [12, -43, 12]. 
So there can be whitespace between the number and the brackets, 
but no other character besides those that make up the integer.

Test your function from the main function.
"""
import re
def integers_in_brackets(s):
    pattern = r'\[\s*?([-+]?\d+)\s*?\]'
    result = re.findall(pattern, s)
    return(result)

def main():
    result = integers_in_brackets(" afd [asd] [12 ] [a34] [ -43 ]tt [+12]xxx")
    print(result)
main()


['12', '-43', '+12']


## Basic File Processing:

- Open a file with open(filename, mode="r").
- Use the file object to read or write.
- Close the file with close() when done.
### File Opening Modes:

- r: Read-only, file must exist.
- w: Write-only, creates or overwrites.
- a: Write-only, appends to the end.
- r+: Read/write, file must exist.
- w+: Read/write, creates or overwrites.
- a+: Read/write, appends to the end.
- t (text mode, default) or b (binary mode).
### Text Mode vs. Binary Mode:

- Text mode converts line endings `\n` to two bytes and convert back to newline when read (e.g., Windows) and 
- One character is encodes characters (e.g., utf-8) to one or two bytes during read or write conversion.
- Binary mode handles bytes directly.
### Common File Object Methods:

- `read(size)`: Read a specific number of characters/bytes.
- `write(string)`: Write a string/bytes to a file.
- `readline()`: Read a line until the next newline character.
- `readlines()`: Return a list of all lines in a file.
- `writelines()`: Write a list of lines to a file.
- `flush()`: Ensure changes are written to disk immediately.
### Context Manager:
Use `with open(...) as f:` to automatically close the file.

### Iterating Through File Lines:
- The file object is iterable; you can use a for loop to iterate through lines.

### Standard File Objects:

- sys.stdin: Standard input.
- sys.stdout: Standard output.
- sys.stderr: Standard error.
### Reading/Writing from Standard File Objects:

- Read from user (keyboard) using sys.stdin.readline().
- Write to user (screen) using sys.stdout.write(line).
- Use sys.stderr for error messages.
### Changing File Object Destinations:
- You can redirect standard file objects to point elsewhere, like log files.

### sys Module:

- sys.path: List of folders to find imported modules.
- sys.argv: Command line parameters.
    - sys.argv[0] is the program name.
    - Additional parameters are in the list.
- sys.exit(): Exit a program with a return value (0 for success, non-zero for errors).

In [20]:
# Basic file processing
# encoding with utf-8
"ä".encode("utf-8")

b'\xc3\xa4'

In [22]:
# hex to decimal between 0- 255
list("ä".encode("utf-8"))

[195, 164]

In [23]:
# Some common file object methods
# unfortunately we don't have basics.ipynb file
f = open("basics.ipynb", 'r') # Let's open this notebook file,
# which is essentially a text file.
# So you can open it in a texteditor as well.
for i in range(5):  # And read the first five lines
    line = f.readline()
    print(f"Line {i}: {line}", end="")
f.close()

FileNotFoundError: [Errno 2] No such file or directory: 'basics.ipynb'

In [None]:
# second example for opening file with better opening file method
# get the max length of the line from text content
max_len = 0
with open("basics.ipynb", "r") as f:    # the file will be automatically closed.
    # when the with block exits
    for i in range(5):
        line = f.readline()
        if len(line) > max_len:
            max_len = len(line)
        print(f"Line {i}: {line}", end="")
print(f"The longest line in this file has length {max_len}")
# out put should something look like this
# The longest line in this file has length 1046

In [24]:
# Standard file objects
# These standard file objects are meant to be a basic input/output mechanism in textual form. 
import sys
import random
# we are getting number between -10 and 10
i= random.randint(-10, 10)
if i >= 0:
    sys.stdout.write("Got a positive integer.\n")
else:
    sys.stderr.write("Got a negative integer.\n")
# Got a negative integer.



Got a negative integer.


In [45]:
"""
Exercise 2.2 (file listing)

The file src/listing.txt contains a list of files with one line per file. 
Each line contains seven fields: access rights, number of references, owner's name, name of owning group, file size, date, filename. 
These fields are separated with one or more spaces. 
Note that there may be spaces also within these seven fields.

Write function file_listing that loads the file src/listing.txt. 
It should return a list of tuples (size, month, day, hour, minute, filename). 
Use regular expressions to do this (either match, search, findall, or finditer method).

An example: for line

-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf
the function should create the tuple (25399, "Nov", 2, 21, 25, "exception_hierarchy.pdf").
"""

import re


def file_listing(filename="src/listing.txt"):
    # function should create the tuple (25399, "Nov", 2, 21, 25, "exception_hierarchy.pdf")
    # re.match is for matching exactly as the input string represent. So it need to consider the rest of none match.
    pattern = r'(\d+)\s+(\w{3})\s+(\d+)\s+(\d{2}):(\d{2})\s+(.+)'
    """
    input_string = "-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf"
    match_group = re.search(pattern, input_string)
    size, month, day, hour, minute, filename = match_group.groups()
    print(size, month, day, hour, minute, filename)
    """
    result = []
    with open(filename,"r") as file:
        for line in file:
           
            # first match all the group
            match_group = re.search(pattern, line)
            if match_group:
                size, month, day, hour, minute, filename = match_group.groups()
                result.append((int(size), month, int(day), int(hour), int(minute), filename))
            
    return result

"""
# Different version
def file_listing(filename="src/listing.txt"):
    with open(filename) as f:
        lines = f.readlines()
    result=[]
    for line in lines:
        pattern = r".{10}\s+\d+\s+.+\s+.+\s+(\d+)\s+(...)\s+(\d+)\s+(\d\d):(\d\d)\s+(.+)"
        if True:      # Two alternative ways of doing the same thing
            m = re.match(pattern, line)
        else:
            compiled_pattern = re.compile(pattern)
            m = compiled_pattern.match(line)
        if m:
            t = m.groups()
            result.append((int(t[0]), t[1], int(t[2]), int(t[3]), int(t[4]), t[5]))
        else:
            print(line)
    return result
 

"""
  
def main():
    result = file_listing()
    print(result)
main()


25399 Nov 2 21 25 exception_hierarchy.pdf
None


In [54]:
"""
Exercise 2.3 (red green blue)
The file src/rgb.txt contains names of colors and their numerical representations in RGB format. 
The RBG format allows a color to be represented as a mixture of red, green, and blue components. 
Each component can have an integer value in the range [0,255]. 
Each line in the file contains four fields: red, green, blue, and colorname. 
Each field is separated by some amount of whitespace (tab or space in this case). 
The text file is formatted to make it print nicely, but that makes it harder to process by a computer. 
Note that some color names can also contain a space character.

Write function red_green_blue that reads the file rgb.txt from the folder src. 
Remove the irrelevant first line of the file. The function should return a list of strings. 
Clean-up the file so that the strings in the returned list have four fields separated by a single tab character (\t). 
Use regular expressions to do this.

The first string in the returned list should be:

'255\t250\t250\tsnow'

str = '''He is a timelord.
He has a Tardis.'''
newstr = re.sub(r'(\b[Hh]e\b)', r'\1 (The Doctor)', str, 1)
"""
import re

def red_green_blue(filename="src/rgb.txt"):
    result = []
    # input string need to be 
    """
    ! $Xorg: rgb.txt,v 1.3 2000/08/17 19:54:00 cpqbld Exp $
    255 250 250		snow
    """
    # get rid off the head
    # replace all the single space with \t
    # it matches all the single space and we replace them with \t character
    # input_string = "255 250 250		snow"

    with open(filename, "r") as file:
        # we skip the first line
        first_line = file.readline()
        result = []
      
        # we iterate through the line
        for line in file:
            #line = "255 250 250		snow white"
          
            newstr = re.search(r'(\d+)\s+(\d+)\s+(\d+)\s+(\w+.*)', line)
           
            # newstr = re.sub(r'(\d+)\s+', r'\1\\t', line)
            # 255\t250\t250\tsnow
            # eliminate new line at the end
            result.append(("\t".join(newstr.groups())).strip())
    return result
"""
# second version

def red_green_blue(filename="src/rgb.txt"):
    with open(filename) as in_file:
        l = re.findall(r"(\d+)\s+(\d+)\s+(\d+)\s+(.*)\n", in_file.read())
        return [
            "{}\t{}\t{}\t{}".format(r, g, b, name)
            for r, g, b, name
            in l
"""

def main():
    red_green_blue()
main()


255\t250\t250\tsnow


In [74]:
"""
Exercise 2.4 (word frequencies)
Create function word_frequencies that gets a filename as a parameter and 
returns a dict with the word frequencies. 
In the dictionary the keys are the words and 
the corresponding values are the number of times that word occurred in the file specified by the function parameter. 
Read all the lines from the file and split the lines into words using the split() method. 
Further, remove punctuation from the ends of words using the strip(''"!"#$%&'()*,-./:;?@[]_''') method call.

Test this function in the main function using the file alice.txt. In the output, there should be a word and 
its count per line separated by a tab:

The     64
Project 83
Gutenberg   26
EBook   3
of      303
"""
def word_frequencies(filename):
    test_string = """
    The Project Gutenberg EBook of Alice in Wonderland, by Lewis Carroll\n

    This eBook is for the use of anyone anywhere at no cost and with\n
    almost no restrictions whatsoever.  You may copy it, give it away or\n
    re-use it under the terms of the Project Gutenberg License included\n
    with this eBook or online at www.gutenberg.org\n
    """
    result = {}
    with open(filename, "r") as file:
        for input_string in file:
            lines = input_string.split()
            for word in lines:
                strip_word = word.strip("""!"#$%&'()*,-./:;?@[]_""")
                if strip_word not in result or not result:
                    result[strip_word] = 0
                
                result[strip_word] += 1
    return(result)
'''
def word_frequencies(filename):
    result = {}
    with open(filename) as in_file:
        for w in in_file.read().split():
            ws = w.strip("""!"#$%&'()*,-./:;?@[]_""")
            if ws not in result:
                result[ws] = 0
            result[ws] += 1
    return result
'''
def main():
    words = word_frequencies("src/alice.txt")
    for word, frequency in words.items():
        print(f"{word}\t{frequency}")
    
main()

{'The': 1, 'Project': 2, 'Gutenberg': 2, 'EBook': 1, 'of': 3, 'Alice': 1, 'in': 1, 'Wonderland,': 1, 'by': 1, 'Lewis': 1, 'Carroll': 1, 'This': 1, 'eBook': 2, 'is': 1, 'for': 1, 'the': 3, 'use': 1, 'anyone': 1, 'anywhere': 1, 'at': 2, 'no': 2, 'cost': 1, 'and': 1, 'with': 2, 'almost': 1, 'restrictions': 1, 'whatsoever.': 1, 'You': 1, 'may': 1, 'copy': 1, 'it,': 1, 'give': 1, 'it': 2, 'away': 1, 'or': 2, 're-use': 1, 'under': 1, 'terms': 1, 'License': 1, 'included': 1, 'this': 1, 'online': 1, 'www.gutenberg.org': 1}


In [89]:
"""
Summary of Exercise 2.5:

Part 1:

Create a function called "summary" that takes a filename as a parameter.
The file should contain floating point numbers, one per line.
The function should read the numbers and return a triple with the sum, average, and standard deviation.
Standard deviation formula: √[Σ(xi - x̄)² / (n-1)]
The main function should call "summary" for each filename in "sys.argv[1:]" (excluding the program name).
Example usage: python3 src/summary.py src/example.txt src/example2.txt
Print results with six decimals precision.
Part 2:

If a line doesn't represent a number, ignore it using a try-except block.
Example:
python
Copy code
try:
    x = float(line)
except ValueError:
    # Handle the exceptional situation
Exceptions will be covered in more detail later in the course.

"""
import sys
import math
def summary(filename):
    # return a triple with the sum, average, and standard deviation
  
    
    converted_numbers = []
    (total_sum,average_number,standard_deviation) = (0,0,0)

    with open(filename, 'r') as file:
        # conversion
        for number in file.read().split():
            try:
                converted_numbers.append(float(number))
            except ValueError:
                continue

        #convert_number = list(map(float, input_numbers.split()))
        total_sum = sum(converted_numbers)
        length = len(converted_numbers)
        mean = average_number = total_sum/length
        sum_of_diff = 0

        
        for number in converted_numbers:
            square_diff = (number - mean)**2
            sum_of_diff += square_diff
        variance = sum_of_diff / (length - 1)
        standard_deviation = math.sqrt(variance)

   
 

    return (total_sum,average_number,standard_deviation)
"""
# model solution
from statistics import stdev
import sys
 
def summary(filename):
    L=[]
    with open(filename) as f:
        for line in f:
            try:
                L.append(float(line))
            except ValueError:
                continue
    s = sum(L)
    a = s/len(L)
    stddev = stdev(L)
    return s, a, stddev
"""
def main():
    for filename in sys.argv[1:]:
        (total_sum,average_number,standard_deviation) = summary(filename)
        print(f"File: {filename} Sum: {total_sum:0.6f} Average: {average_number:0.6f} Stddev: {standard_deviation:0.6f}")
main()

File: src/example.txt Sum: 51.400000 Average: 10.280000 Stddev: 8.904606


In [None]:
"""
Exercise 2.6 (file count)
This exercise can give two points at maximum!

Part 1.

Create a function file_count that gets a filename as parameter and returns a triple of numbers. 
The function should read the file, count the number of lines, words, and characters in the file, and 
return a triple with these count in this order. 
You get division into words by splitting at whitespace. 
You don't have to remove punctuation.

Part 2.

Create a main function that in a loop calls file_count using 
each filename in the list of command line parameters sys.argv[1:] as a parameter, in turn. 
For call python3 src/file_count file1 file2 ... the output should be

?      ?       ?       file1
?      ?       ?       file2
...
The fields are separated by tabs (\t). The fields are in order: linecount, wordcount, charactercount, filename.
"""
import sys

def file_count(filename):
    lines_total, words_total, characters_total = 0, 0, 0
    # count the number of lines, words, and characters in the file
    with open(filename, "r") as file:
        for line in file:
            
            characters_total += len(line)
            words_total += len(line.split())
            lines_total += 1    

    return (lines_total, words_total, characters_total)   

def main():
    for filename in sys.argv[1:]:
        linecount, wordcount, charactercount, filename = file_count(filename)
        print(f"{linecount}\t{wordcount}\t{charactercount}\t{filename}")


In [105]:
"""
Exercise 2.7 (file extensions)
This exercise can give two points at maximum!

Part 1.

Write function file_extensions that gets as a parameter a filename. 
It should read through the lines from this file. 
Each line contains a filename. 
Find the extension for each filename. 
The function should return a pair, where the first element is a list containing all filenames 
with no extension (with the preceding period (.) removed). 
The second element of the pair is a dictionary with extensions as keys and 
corresponding values are lists with filenames having that extension.

Sounds a bit complicated, but hopefully the next example will clarify this. 
If the file contains the following lines

file1.txt
mydocument.pdf
file2.txt
archive.tar.gz
test

then the return value should be the pair: 
(["test"], { "txt" : ["file1.txt", "file2.txt"], "pdf" : ["mydocument.pdf"], "gz" : ["archive.tar.gz"] } )

Part 2.

Write a main method that calls the file_extensions function with "src/filenames.txt" as the argument. 
Then print the results so that for each extension there is a line consisting of the extension and 
the number of files with that extension. 
The first line of the output should give the number of files without extensions.

With the example in part 1, the output should be

1 files with no extension
gz 1
pdf 1
txt 2
Had there been no filenames without extension then the first line would have been 0 files with no extension. 
In the printout list the extensions in alphabetical order.
"""
def file_extensions(filename):
    # traverse each line and split them with . delimeter
    # then assign the dictionary base on file name and file extension
    file_with_extension = {}
    file_no_extension = []
    files = filename
    #with open(filename, "r") as files:

    for file in files:
        file = file.strip()
        try:
            
            file_name, file_extension = file.rsplit('.', 1)
            file_extension = file_extension.strip()
            if file_extension not in file_with_extension:
                file_with_extension[file_extension] = []
            file_with_extension[file_extension].append(file)
        except ValueError:
            file_no_extension.append(file)
    return (file_no_extension, file_with_extension)

input_files = ["file1.txt","mydocument.pdf","file2.txt","archive.tar.gz","test"]
result = file_extensions(input_files)
print(result)
def main():
    file_extensions("filename")

"""
# model solution
def file_extensions(filename):
    no_extension=[]
    d = {}
    with open(filename) as f:
        for line in f:
            line=line.strip()
            v = line.split('.')
            if len(v) == 1:
                no_extension.append(line)
            else:
                extension = v[-1]
                if extension not in d:
                    d[extension] = []
                d[extension].append(line)
    return (no_extension, d)
 
def main():
    no_extension, d = file_extensions("src/filenames.txt")
    print(f"{len(no_extension)} files with no extension")
    for extension, files in sorted(d.items()):
        print(f"{extension} {len(files)}")
"""

(['test'], {'txt': ['file1.txt', 'file2.txt'], 'pdf': ['mydocument.pdf'], 'gz': ['archive.tar.gz']})


## Objects and Classes:

- Python is an object-oriented programming language, similar to Java and C++.
- Unlike Java, Python doesn't enforce the use of classes, inheritance, and methods; it also supports a structural programming paradigm with functions and modules.
### Objects in Python:

- Every value in Python is an object.
- Objects combine data and functions, known as attributes.
- Data items and functions of objects are attributes, with function attributes being called methods.
### First-Class Objects:

- Functions, modules, methods, classes, etc., are all first-class objects.
- They can be stored in a container, passed as parameters to functions, returned by functions, or bound to a variable.
### Attribute Access:

- You can access an attribute of an object using the dot operator, like object.attribute.
- For example, if L is a list, you can refer to the method append with L.append.
### Data Types and Instances:

- Numbers (e.g., 2, 100) are instances of types (e.g., int), and strings (e.g., "hello") are instances of the str type.
- Instances are created using the class constructor, e.g., s = set() creates an instance of type set.
### User-Defined Data Types (Classes):

- Users can define their own data types called classes.
- Classes can be thought of as recipes for creating objects.
- Example class definition is shown.
### Class Definition:

- A class definition starts with the class statement.
- It provides a name for the new type and lists the base classes.
- The class body defines attributes and methods for the class.
- No instances are created at this point.
### Methods and Attributes:

- Classes have methods (e.g., __init__ and f) with a special first parameter, self.
- Class attributes (e.g., a) are shared among all instances.
- Instance attributes (e.g., b) are specific to each instance.
- Methods whose names begin and end with two underscores are special methods (e.g., __init__).
### Instances:

- Instances are created by calling a class like a function.
- Parameters are passed to the __init__ method for initialization.
- You can (re)bind attributes with assignment to create instance-specific attributes.
### Attribute Lookup:

- Attribute lookup involves checking three phases: 
    - instance attributes, 
    - class attributes, and base classes.
- Attribute binding is done with assignment, either to the instance or class.


In [106]:
class MyClass(object):
    """Documentation string of the class"""
    def __init__(self, param1, param2):
        "This initialises an instance of type ClassName"
        self.b = param1 # creates an instance attribute
        c = param2 # creates a local variable of the function
        # statements ...
    def f(self, param1):
        """This is a method of the class"""
    
    a = 1 # this creates a class attribute

In [107]:
"""
Exercise 2.8 (prepend)
Create a class called Prepend. We create an instance of the class by giving a string as a parameter to the initializer. 
The initializer stores the parameter in an instance attribute start. 
The class also has a method write(s) which prints the string s prepended with the start string. 
An example of usage:

p = Prepend("+++ ")
p.write("Hello");
Will print

+++ Hello
Try out using the class from the main function.
"""

class Prepend(object):
    # Add the methods of the class here
    def __init__(self, start):
        self.start = start

    def write(self, input_string):
        print(f"{self.start + input_string}")
def main():
    p = Prepend("+++ ")
    p.write("Hello")
main()

+++ Hello


## Inheritance:

- Inheritance allows code reuse from a base class (B) to create a new class (C).
- Attribute lookup starts with the instance dictionary and continues with class attributes.
- Attributes are searched recursively in base classes.
- Class C can inherit attributes from its base class B.
- If an attribute with the same name exists in both the class and its base class, it's overridden.
- Class C is referred to as a derived class or subclass, and B is the base class or super class.
### Special Methods:

- `__init__`: Initializes instance attributes. It takes self as the first parameter and does not return a value.

- `__hash__`: Returns an integer used for object storage in dictionaries and sets. Requires that `x == y implies x.__hash__() == y.__hash__()`. Instances x and y must be immutable.

- `__call__`: Makes instances of a class callable, allowing them to be used like functions.

- `__del__`: Gets called when an instance is deleted.

- `__new__`: Controls the creation of new instances. Can be used to create classes with only one instance.

- `__str__`: Called when the print statement needs to print the value of an instance. It returns a string and is used for conversion %s.

- `__repr__`: Called when the interactive interpreter prints the value of an evaluated expression. Used for conversion %r. Returns a canonical representation string.

- Comparison Methods: Special methods like `__eq__, __ge__, __gt__, __le__, __lt__, and __ne__ `get called for corresponding operators (x==y, x>=y, etc.).

- Numerical Operations: To support numeric operations like addition, subtraction, multiplication, and division, define special methods like `__add__, __sub__, __mul__, __truediv__, and __floordiv__`.

- Augmented Assignments: Augmented assignment operators like +=, -=, *=, and /= have special methods (iadd, isub, imul, idiv).

- Conversion Functions: Special methods like `__complex__, __float__, and __int__` are used to convert objects to complex numbers, floats, and integers, respectively.


### Container Operations:

- Container operations use special methods.
- Membership test with x in c calls x.__contains__(y).
- Deleting an element with del c[key] calls x.__delitem__.
- Reading an item with c[key] calls c.__getitem__(key).
- Setting an item with c[key]=value calls c.__setitem__(key, value).
- Getting the number of elements with len(c) calls c.__len__.
- The call iter(c) calls c.__iter__.

In [108]:
class B(object):
    def f(self):
        print("Executing B.f")
    
    def g(self):
        print("Executing B.g")
    
class C(B):
    def g(self):
        print("Executing C.g")
x=C()
x.f() # inherited from B
x.g() # inherited from C

Executing B.f
Executing C.g


In [114]:
"""
Exercise 2.9 (rational)
Create a class Rational whose instances are rational numbers. 
A new rational number can be created with the call to the class. 
For example, the call r=Rational(1,4) creates a rational number “one quarter”. 
Make the instances support the following operations: + - * / < > == with their natural behaviour. 
Make the rationals also printable so that from the printout we can clearly see that they are rational numbers.
"""


class Rational(object):
    def __init__(self, numerator, denominator):
        self.numerator = numerator
        self.denominator = denominator
    
    def __str__(self):
        return f"{self.numerator}/{self.denominator}"
    
    def __add__(self, r2):
        numerator = self.numerator*r2.denominator + self.denominator*r2.numerator
        denominator = self.denominator*r2.denominator
        return Rational(numerator, denominator)

    def __sub__(self, r2):
        numerator = self.numerator*r2.denominator - self.denominator*r2.numerator
        denomenator = self.denominator*r2.denominator
      
        return Rational(numerator, denomenator)
    def __mul__(self, r2):
        mul_rational = Rational(self.numerator*r2.numerator, self.denominator*r2.denominator)
        return mul_rational
    def __truediv__(self, r2):
        div_rational = Rational(self.numerator*r2.denominator, self.denominator*r2.numerator)
        return div_rational
    
    def __lt__(self, r2):
        return (self.numerator/self.denominator) < (r2.numerator/r2.denominator)
    
    def __gt__(self, r2):
        return (self.numerator/self.denominator) > (r2.numerator/r2.denominator)
    
    def __eq__(self, r2):
        return (self.numerator/self.denominator) == (r2.numerator/r2.denominator)
    
def main():
    r1=Rational(1,4)
    r2=Rational(2,3)
    print(r1)
    print(r2)
  
    print(r1*r2)

    print(r1/r2)
    print(r1+r2)
    print(r1-r2)
    print(Rational(1,2) == Rational(2,4))
    print(Rational(1,2) > Rational(2,4))
    print(Rational(1,2) < Rational(2,4))

main()

1/4
2/3
2/12
3/8
11/12
-5/12
True
False
False


## Exception Handling:

- When an error occurs, how to respond.
- Options include printing an error message, stopping program execution, indicating errors with special values, or ignoring errors.
### Separation of Error Recognition and Handling:

- Most modern programming languages have an exception handling system.
- Separates the detection of errors from the handling of errors.
- Errors can be signaled by raising exceptions.
### Raising Exceptions in Python:

- Exceptions are raised in Python using the raise statement.
- It can raise specific exception classes or create custom exceptions.
- Exceptions are used to indicate various situations, including non-error conditions like StopIteration.
### Exception Catching:

- Try-except blocks are used for catching and handling exceptions.
- The try block contains code that may raise exceptions.
- The except block handles exceptions.
- The else block executes if no exceptions occurred.
- The finally block is for cleanup code and always executes.
### Specific Exception Handling:

- Handling specific exceptions allows for better error diagnosis.
- Over-general exception specifications, like except Exception:, can hide the true cause of an error.
### Exception Hierarchy:

- Python exceptions are objects instantiated from exception classes.
- Exception classes form hierarchies, with the root being the class Exception.
- New exception classes can be created by inheriting from existing ones.
### Error Handling Policy in Python:

- Python employs dynamic error checking, where operations are attempted, and exceptions are checked.
- Duck typing is used, allowing functions to work for inputs that make sense for the operations within the function.

```
try:
    # here are the statements that can cause exceptions
except (Exceptionname1, Exceptionname2, ....):
    # here we handle the exceptions
finally:
    pass
    # this is always executed, clean-up code
```

In [115]:
L=[1,2,3]
try:
    print(L[3])
except IndexError:
    print("Index does not exist")
    

Index does not exist


In [118]:
def compute_average(L):
    n = len(L)
    s = sum(L)

    return float(s)/n # error is noticed here !!!

mylist = []

while True:
    try:
        x = float(input("Give a number (non-number quits): "))
        mylist.append(x)
    except ValueError:
        break
try:
    average = compute_average(mylist)
    print("Average is", average)
except ZeroDivisionError:
    # and the error is handled here
    if len(mylist) == 0:
        print("Tried to cmpute the average of empty list of numbers")
    else:
        print("Something strange happened")

Average is 1.0


In [129]:
# Too general exception specifications
import sys
s = input("Give a number: ")

s = s[:-1] # strip the \n character from the end
try:
    print(s)
    x= int(s)
    sys.stdout.wr1te(f"You entered {x}\n")
# cause typing error
#except Exception:
except ValueError as e:
    print(f"You didn't enter a number {e}")

werwe
You didn't enter a number invalid literal for int() with base 10: 'werwe'


In [1]:
"""
Exercise 2.10 (extract numbers)
Write a function extract_numbers that gets a string as a parameter. 
It should return a list of numbers that can be both ints and floats. 
Split the string to words at whitespace using the split() method. 
Then iterate through each word, and initially try to convert to an int. 
If unsuccesful, then try to convert to a float. If not a number then skip the word.

Example run: print(extract_numbers("abd 123 1.2 test 13.2 -1")) will return [123, 1.2, 13.2, -1]
"""

def extract_numbers(s):
    result = []
    for number in s.split():
        try:
            # convert to integer 
            int_number = int(number)
            result.append(int_number)
        except ValueError:
            try:
                float_number = float(number)
                result.append(float_number)
            except ValueError:
                pass
    return result

def main():
    print(extract_numbers("abd 123 1.2 test 13.2 -1"))
main()

[123, 1.2, 13.2, -1]


## 1. Sequences:

- Sequences are a type of iterable.
- They are containers that can be traversed using a for loop.
- Typically, sequences allow indexing using brackets [].
- Examples of sequences include strings, lists, and tuples.
### 2. Iterables:

- An iterable is a more general concept than sequences.
- Any container that can be traversed using a for loop is an iterable.
- To create a custom iterable, a class should define a special method `__iter__` that returns an iterator for the container.
- An iterator is an object with a `__next__` method that provides the next element from the container.
- An iterable can be checked using the isinstance(obj, abc.Iterable) function from the collections module.
### 3. Generators:

- Generators provide an easier way to create iterators.
- A generator is a function containing the yield statement.
- Generator functions can yield values one at a time when iterated over.
- Generator expressions, while also producing iterables, are different from generator functions.
### 4. Example: WeekdayIterator:

- Demonstrates a custom iterator for weekdays.
- The class defines `__iter__ and __next__` methods.
- It is an iterable but not a sequence, as it doesn't support indexing.
### 5. Example: mydate Generator:

- Shows an example of a generator function.
- Generates dates starting from a given date.
- Yields date values as they are requested.
- More convenient than writing custom iterators explicitly.

In [2]:
class WeekdayIterator(object):
    """Iterator over the weekdays."""
    def __init__(self):
        self.i= 0 # Start from Monday
        self.weekdays =  ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
    
    def __iter__(self): 
        # If this object were a container, then this method would return the iterator over the
        # elements of the container.
        # Hoever, this object is already an iterator, hence we return self.
        return self
    
    def __next__(self):
        # Returns the next weekday
        if self.i == 7:
            raise StopIteration
            # Signal that all weekdays were already iterated over
        else:
            weekday = self.weekdays[self.i]
            self.i += 1
            return weekday

for w in WeekdayIterator():
    print(w)

Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday


In [3]:
from collections import abc 
# Get the abstract base classes
containers = ["efg", [1,2,3], (4,5), WeekdayIterator()]

for c in containers:
    if isinstance(c, abc.Sequence):
        print(c, "is a sequence")
    else:
        print(c, "is not a sequence")

efg is a sequence
[1, 2, 3] is a sequence
(4, 5) is a sequence
<__main__.WeekdayIterator object at 0x7f8257e814b0> is not a sequence


In [4]:
# Generator example
def mydate(day=1, month=1): 
    # Generates dates starting from the given date
    lengths=(31,28,31,30, 31, 30, 31, 31, 30, 31, 30, 31)
    # How many days in a month

    first_day = day
    for m in range(month, 13):
        for d in range(first_day, lengths[m-1] + 1):
            yield(d, m)

        first_day = 1

# Create the generator by calling the function:
gen = mydate(26, 2) # Start from 26 of February
for i, (day, month) in enumerate(gen):
    if i == 5: 
        break # Print only the first five dates from the generator
    print(f"Index {i}, day {day}, month {month}")

Index 0, day 26, month 2
Index 1, day 27, month 2
Index 2, day 28, month 2
Index 3, day 1, month 3
Index 4, day 2, month 3
