# SLU11 | Part 2 | Exceptions and File Handling
---

### Table of Contents
[1. Exceptions](#1.-Exceptions)\
&emsp;[1.1. Semantic vs. Syntatic](#1.1.-Semantic-vs\.-Syntatic)\
&emsp;&emsp;[1.1.1. Syntax Errors](#1.1.1.-Syntax-Errors)\
&emsp;&emsp;[1.1.2. Semantic Errors](#1.1.2.-Semantic-Errors)\
&emsp;[1.2. Common Exceptions](#1.2.-Common-Exceptions)\
&emsp;&emsp;[1.2.1. SyntaxError](#1.2.1.-SyntaxError)\
&emsp;&emsp;[1.2.2. IndexError / KeyError](#1.2.2.-IndexError-/-KeyError)\
&emsp;&emsp;[1.2.3. ValueError](#1.2.3.-ValueError)\
&emsp;&emsp;[1.2.4. TypeError](#1.2.4.-TypeError)\
&emsp;&emsp;[1.2.5. NameError](#1.2.5.-NameError)\
&emsp;&emsp;[1.2.6. AttributeError](#1.2.6.-AttributeError)\
&emsp;&emsp;[1.2.7. ZeroDivisionError](#1.2.7.-ZeroDivisionError)\
&emsp;[1.3. Handling Exceptions](#1.3.-Handling-Exceptions)\
&emsp;&emsp;[1.3.1. Raising Exceptions](#1.3.1.-Raising-Exceptions)\
&emsp;&emsp;[1.3.2. Try/Except](#1.3.2.-Try/Except)\
&emsp;&emsp;&emsp;[1.3.2.1. Catching Separate Exceptions](#1.3.2.1.-Catching-Separate-Exceptions)\
&emsp;&emsp;&emsp;[1.3.2.2. Catching Known Groups of Exceptions](#1.3.2.2.-Catching-Known-Groups-of-Exceptions)\
&emsp;&emsp;[1.3.3. Beyond Try/Except](#1.3.3.-Beyond-Try/Except)\
&emsp;[1.4. Bonus Section: Custom Exceptions](#1.4.-Bonus-Section:-Custom-Exceptions)

[2. File Handling](#2.-File-Handling)\
&emsp;[2.1. Opening/Closing](#2.1.-Opening/Closing)\
&emsp;&emsp;[2.1.1. The Bad Way](#2.1.1.-The-Bad-Way)\
&emsp;&emsp;[2.1.2. The Good Way](#2.1.2.-The-Good-Way)\
&emsp;[2.2. Reading/Writing Files](#2.2.-Reading/Writing-Files)\
&emsp;&emsp;[2.2.1. Text Files](#2.2.1.-Text-Files)\
&emsp;&emsp;[2.2.2. Binary Files (pickle example)](#2.2.2.-Binary-Files-(pickle-example))\
&emsp;&emsp;[2.2.3. JSON Files](#2.2.3.-JSON-Files)\
&emsp;[2.3. Bonus Section: Working with Pathlib](#2.3.-Bonus-Section:-Working-with-Pathlib)\
&emsp;&emsp;[2.3.1. The Path Structure](#2.3.1.-The-Path-Structure)\
&emsp;&emsp;[2.3.2. Improving Readability](#2.3.2.-Improving-Readability)

# 1. Exceptions

<center><img src="./media/errors.png" width="500"/><center>

## 1.1. Semantic vs. Syntatic

### 1.1.1. Syntax Errors
> A syntax error is an error in the syntax of a sequence of characters that is intended to be written in a particular programming language.

> In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language.

In other words: *your code is wrong.*

In [1]:
# Example of a Syntax Error: Here we use the print statement without the parentheses (which was how you did it in Python v2)

print "hello world"

SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)? (4028581847.py, line 3)

In [2]:
# How it is supposed to write the previous' cell statement

print("hello world")

hello world


### 1.1.2. Semantic Errors
A semantic error is code whose syntax is correct but there is a mistake that prevents its goal to be properly achieved.

These can be particularly annoying, as semantic errors tend to not be intuitive, visible, or might only happen under certain conditions.

They can be even more troublesome because __in some cases these errors will never actually *raise an error*!__

In [3]:
# Example of Semantic Error: Wrong operand used in function. 
# This would never raise an exception/error to call your attention.

def add(x: int, y: int) -> int:
    return x - y  # ATTENTION: although we want to ADD, we used - instead

for i in range(1, 10):
    print(f"{i} + {i} = {i+i} BUT function add({i}, {i}) = {add(i,i)}")

1 + 1 = 2 BUT function add(1, 1) = 0
2 + 2 = 4 BUT function add(2, 2) = 0
3 + 3 = 6 BUT function add(3, 3) = 0
4 + 4 = 8 BUT function add(4, 4) = 0
5 + 5 = 10 BUT function add(5, 5) = 0
6 + 6 = 12 BUT function add(6, 6) = 0
7 + 7 = 14 BUT function add(7, 7) = 0
8 + 8 = 16 BUT function add(8, 8) = 0
9 + 9 = 18 BUT function add(9, 9) = 0


In [4]:
# Example of Semantic Error: Exception lurking in certain conditions.
# This particular error will ONLY happen when dividing by zero.

for i in range(5, -1, -1):
    num = i * 10
    print(f"\nDividing {num} by {i}")
    print(f"Result: {num} / {i} = {num/i}")


Dividing 50 by 5
Result: 50 / 5 = 10.0

Dividing 40 by 4
Result: 40 / 4 = 10.0

Dividing 30 by 3
Result: 30 / 3 = 10.0

Dividing 20 by 2
Result: 20 / 2 = 10.0

Dividing 10 by 1
Result: 10 / 1 = 10.0

Dividing 0 by 0


ZeroDivisionError: division by zero

## 1.2. Common Exceptions

An exception is an error that occurs. Not all exceptions are alike but most are. You can also create and handle exceptions in different ways.

Most exceptions inherit from `Exception`, which stand for errors that happened in the code execution.\
Things like `KeyboardInterrupt` and `SystemExit`, are not an `Exception` although they share some commonalities through `BaseException`.

Below is a list of all potential `BaseException` and `Exception` classes. You do not need to know them all as some are indeed more common than others.

```
BaseException
 ├── BaseExceptionGroup
 ├── GeneratorExit
 ├── KeyboardInterrupt
 ├── SystemExit
 └── Exception
      ├── ArithmeticError
      │    ├── FloatingPointError
      │    ├── OverflowError
      │    └── ZeroDivisionError
      ├── AssertionError
      ├── AttributeError
      ├── BufferError
      ├── EOFError
      ├── ExceptionGroup [BaseExceptionGroup]
      ├── ImportError
      │    └── ModuleNotFoundError
      ├── LookupError
      │    ├── IndexError
      │    └── KeyError
      ├── MemoryError
      ├── NameError
      │    └── UnboundLocalError
      ├── OSError
      │    ├── BlockingIOError
      │    ├── ChildProcessError
      │    ├── ConnectionError
      │    │    ├── BrokenPipeError
      │    │    ├── ConnectionAbortedError
      │    │    ├── ConnectionRefusedError
      │    │    └── ConnectionResetError
      │    ├── FileExistsError
      │    ├── FileNotFoundError
      │    ├── InterruptedError
      │    ├── IsADirectoryError
      │    ├── NotADirectoryError
      │    ├── PermissionError
      │    ├── ProcessLookupError
      │    └── TimeoutError
      ├── ReferenceError
      ├── RuntimeError
      │    ├── NotImplementedError
      │    └── RecursionError
      ├── StopAsyncIteration
      ├── StopIteration
      ├── SyntaxError
      │    └── IndentationError
      │         └── TabError
      ├── SystemError
      ├── TypeError
      ├── ValueError
      │    └── UnicodeError
      │         ├── UnicodeDecodeError
      │         ├── UnicodeEncodeError
      │         └── UnicodeTranslateError
      └── Warning
           ├── BytesWarning
           ├── DeprecationWarning
           ├── EncodingWarning
           ├── FutureWarning
           ├── ImportWarning
           ├── PendingDeprecationWarning
           ├── ResourceWarning
           ├── RuntimeWarning
           ├── SyntaxWarning
           ├── UnicodeWarning
           └── UserWarning
```

### 1.2.1. SyntaxError
We have already seen these in the previous section. They commonly occur when you write code that is not valid in Python.

In [5]:
print "hello world"

SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)? (3923495743.py, line 1)

### 1.2.2. IndexError / KeyError
These are pretty similar in the way that both relate to trying to access things that do not exist in sequences (lists, dictionaries, etc.).

In [6]:
lst = [1, 2, 3] # Python is 0-indexed, meaning they index of any sequence starts at 0 

print(lst[3]) # indexes are [0, 1, 2], therefore 3 falls outside of bounds

IndexError: list index out of range

In [7]:
dkt = {"a": 0, "b": 1}

print(dkt["c"]) # similarly, you cannot fetch items that do not exist

KeyError: 'c'

### 1.2.3. ValueError
These are typically used when functions receive the *correct* type of argument, but its value is inappropriate.

In [8]:
from math import sqrt

print(sqrt(-1))  # math.sqrt does not work for negative numbers

ValueError: math domain error

### 1.2.4. TypeError
Unlike ValueError, these are typically used to say that the wrong type of argument was passed.

In [9]:
print(1 + "2")  # you cannot add an integer (1) with a string ("2")

TypeError: unsupported operand type(s) for +: 'int' and 'str'

### 1.2.5. NameError
These will be raised when you call local or global names that have not been defined.

In [10]:
foo()  # we never defined a function named foo

NameError: name 'foo' is not defined

### 1.2.6. AttributeError
Similar to NameError, these will happen when you request an attribute from a class that does not exist.

In [11]:
class A:
    pass  # just a blank class, with no user-defined attributes

a = A()
print(a.my_attribute)

AttributeError: 'A' object has no attribute 'my_attribute'

### 1.2.7. ZeroDivisionError
This error will occur when you try to divide anything by 0, which does not make much sense mathematically.

In [12]:
print(1 / 0)

ZeroDivisionError: division by zero

## 1.3. Handling Exceptions

### 1.3.1. Raising Exceptions
Raising an exception is done with the `raise` keyword and receives the type of exception. 

In [13]:
raise Exception  # no message attached

Exception: 

In [14]:
raise Exception("Raised an Exception")  # with message attached

Exception: Raised an Exception

In [15]:
# This is similar to the example we saw with math.sqrt() which does not allow negative numbers.
def raise_when_negative(num: int):
    if num < 0:
        raise ValueError(f"The number {num} is negative...")
        
    print(f"The number {num} is positive! And its square root is {sqrt(num)}.")

In [16]:
raise_when_negative(9)

The number 9 is positive! And its square root is 3.0.


In [17]:
raise_when_negative(-9)

ValueError: The number -9 is negative...

---
There is also a special kind of way to *raise* a specific type of error called `AssertionError` by using the `assert` keyword. This is commonly used in unit testing to compare outputs with expected results.\
These in theory are recommended not to be used in production code due to the fact that there are ways to run Python that would skip `assert` statements. This will often depend on the code base and practices adopted in the project (i.e, `assert` being used exclusively for debug purposes).

In [18]:
assert False == True

AssertionError: 

In [19]:
assert 1 == "1", "Integer is not equal to String"

AssertionError: Integer is not equal to String

### 1.3.2. Try/Except
To handle exceptions, we use the `try` and `except` statements. If you have experience with other programming languages, this is the same concept of the popular try/catch statements.

The logic is to *try* to do something, *except* if something happens then we will need to do something else.

__Careful__ with the `except` as it might continue a program unless you re-raise an exception within it. While this can let you keep the program going and *skip* an error, you must think whether that is OK for your use-case!

In [20]:
num = 6
for i in (3, 2, 1, 0, -1, -2, -3):
    try:
        print(f"{num} / {i} = {num/i}")
    except Exception:
        print(f"Cannot divide {num} with {i}")

6 / 3 = 2.0
6 / 2 = 3.0
6 / 1 = 6.0
Cannot divide 6 with 0
6 / -1 = -6.0
6 / -2 = -3.0
6 / -3 = -2.0


While you can leave the `except` bare of any exception type, __it is not recommended!__ Here is an example of why this is such a bad idea:

```python
from time import sleep  # import sleep function from time module 

if __name__ == "__main__":
    while True:  # always running until a certain condition
        try:
            print("Hi")
            sleep(3)  # wait 3 seconds
        except:
            print("Bye")
```

This code __will not exit__ even when you issue a `Ctrl-C` and you will need to kill the terminal window to exit. When you `Ctrl-C` what happens is that the program will receive a `KeyboardInterrupt` signal, which is caught by the `except` block. Once there it will print the word *Bye*, and will immediately go back to its loop condition.

One way to get around the previous example is to set `except Exception:`

Leaving the barren `Exception` in the `except` block is also advised against for two main reasons:
- It is less readable because *any* Exception will trigger it
- Because any exception can trigger it, the error processing will be the same

In [21]:
def divide_by_square_root(num: int):
    try:
        sr = sqrt(num)
        print(f"Num: {num}; SquareRoot: {sr}")
        print(f"Result: {num / sr}")
    except Exception:
        print("An error occurred!")

In [22]:
divide_by_square_root(0)  # Notice how we can still calculate the square root of 0. It only errors out when dividing!

Num: 0; SquareRoot: 0.0
An error occurred!


In [23]:
divide_by_square_root(-10) # Immediate problem when computing square root

An error occurred!


#### 1.3.2.1. Catching Separate Exceptions
We can specify several `except` blocks that deal with different errors. We can leave a barren `Exception` at the end to catch any errors we did not expect (which might warrant extra investigation).

In [24]:
def informative_divide_by_square_root(num: int):
    try:
        sr = sqrt(num)
        print(f"Num: {num}; SquareRoot: {sr}")
        print(f"Result: {num / sr}")
    except ZeroDivisionError:
        print("Tried to divide by zero!")
    except ValueError:
        print("math.sqrt received a negative number")
    except Exception:
        print("An error occurred!")

In [25]:
informative_divide_by_square_root(0)

Num: 0; SquareRoot: 0.0
Tried to divide by zero!


In [26]:
informative_divide_by_square_root(-10)

math.sqrt received a negative number


#### 1.3.2.2. Catching Known Groups of Exceptions
We can also catch known exceptions to deal with in a specific way.

In this example we show:
- How to group two or more exceptions in the same `except` block to have the same handling
- How to get the exception variable with `as <name>`, where `<name>` is the variable name given by you

We are not really going to delve deeper into the exception variable, but rest assured that there is a plethora of stuff regarding how to handle exceptions with it.

In [27]:
def grouped_divide_by_square_root(num: int):
    try:
        sr = sqrt(num)
        print(f"Num: {num}; SquareRoot: {sr}")
        print(f"Result: {num / sr}")
    except (ZeroDivisionError, ValueError):  # both ZeroDivisionError AND ValueError are caught in this except statement
        print(f"This method does not make mathematical sense with num={num}!")
    except Exception as exc: # we can also use "as <name>" to catch the Exception as a variable
        print(f"An error occurred (type={type(exc)}; msg='{exc}')")

In [28]:
grouped_divide_by_square_root(0)

Num: 0; SquareRoot: 0.0
This method does not make mathematical sense with num=0!


In [29]:
grouped_divide_by_square_root(-10)

This method does not make mathematical sense with num=-10!


In [30]:
grouped_divide_by_square_root("30")

An error occurred (type=<class 'TypeError'>; msg='must be real number, not str')


### 1.3.3. Beyond Try/Except
Let's be perfectly clear: 99% of your potential use-cases will warrant using `try/except` blocks and nothing else. The reasoning behind this is that you want to do X, but if X has a problem we want to gracefully exit or deal with whatever happened.\
More often than not, when you have an `Exception` you want to log it somewhere and either move on and/or exit the program.

```python
try:
    do_something()
except Exception as exc:
    logger.info("Something went wrong")
    raise exc
```

We have seen that we can have multiple `except` statements, but now let us look at something that is a little less used, the `else` and `finally` statements within a `try/except` block.

In [31]:
def extended_try_except(num: int):
    try:
        result = num / sqrt(num)
    except ZeroDivisionError:
        print("Could not compute result because we tried to divide by zero.")
    except ValueError:
        print("Could not compute result because the number given is not a valid input for math.sqrt()")
    else:
        print(f"Success! Result is {result}")
    finally:
        print("Exiting function.")

In [32]:
extended_try_except(0)

Could not compute result because we tried to divide by zero.
Exiting function.


In [33]:
extended_try_except(-1)

Could not compute result because the number given is not a valid input for math.sqrt()
Exiting function.


In [34]:
extended_try_except(4)

Success! Result is 2.0
Exiting function.


---
What we did in this example was:
- Tried to divide the given number by its square root
- If the square root was zero, we handled the ZeroDivisionError
- If the given number was not valid for `math.sqrt()`, we handled the ValueError
- If there was __no error__ then we handled the rest of the function using the `else` block
- In any case, we would print a message at the end of the function, regardless of what happened, using the `finally` block

So we introduced two statements:
- `else` (in the try/except context) allows us to specify what to do when we succeed the `try`
- `finally` allows us to always have a final statement that runs at the end of the try/except

We will later see a real example of this when explaining how to properly deal with files using the manual approach.

## 1.4. Bonus Section: Custom Exceptions
Since all exceptions inherit from `Exception`, we can also create our own!

But why would we want to do this? 
- To deal with different application errors in their own way (e.g. you can have multiple ValueError defined and want to deal with them differently)
- To increase readability of your code, i.e. someone that reads it knows *why* something will break
- To get better control of the exception itself

In [35]:
class MyCustomException(Exception):  # all we need is inherit from Exception
    pass

In [36]:
raise MyCustomException("The code is broken!")

MyCustomException: The code is broken!

In [37]:
class MyCustomExceptionWithMessage(Exception):
    def __init__(self):
        super().__init__("The code has broken again!")

In [38]:
raise MyCustomExceptionWithMessage()

MyCustomExceptionWithMessage: The code has broken again!

In [39]:
class MyCustomExceptionWithVariableMessage(Exception):
    def __init__(self, msg_string: str):
        super().__init__(f"The code has broken again because of: {msg_string}")

In [40]:
raise MyCustomExceptionWithVariableMessage("insert better reason")

MyCustomExceptionWithVariableMessage: The code has broken again because of: insert better reason

##### Real Example

In [41]:
# OK let's give you a real example

class InvalidAge(Exception):
    def __init__(self, age: int):
        super().__init__(f"Cannot have negative age! Received: {age}")

def get_birth_year(age: int):
    if age < 0:
        raise InvalidAge(age)

    print(f"Person was probably born in {2024-age}")

In [42]:
get_birth_year(30)

Person was probably born in 1994


In [43]:
get_birth_year(-12)

InvalidAge: Cannot have negative age! Received: -12

# 2. File Handling

When dealing with files and similar use-cases, it is common to have to work around *paths*. You can imagine the path as being the address that your computer or program will use to find something within its file system.

There are two kinds of paths: *absolute* and *relative*.

**Absolute Paths** are paths that stem from the root of your OS system. In Windows they tend to start with the device letter (e.g. `C:`), while Unix-based systems will tend to start with `/`, and go through all the different directories until they reach the specified file or directory.\
Examples: 
- Windows -> `C:\Users\<name>\Documents\Projects\prep-course-2024\Week 06\SLU11 - String & File Handling\media\text.jpeg`
- Unix (WSL) -> `/mnt/c/Users/<name>/Documents/Projects/prep-course-2024/Week 06/SLU11 - String & File Handling/media/text.jpeg`

**Relative Paths** are paths *relative to* another directory. That means that instead of going through all different directories, they will start at a certain point and go from there. For example, `media/text.jpeg` will assume the directory of this notebook and look for a folder named `media` that contain an image called `text.jpeg`.

It is also possible to move backwards using relative paths by using `..`. So in order to visit the last SLU, you would use `cd ../"SLU10 - OOP Inheritance"` in order to go back to `Week 06` folder and into `SLU10 - OOP Inheritance`.

In fact, when we use relative paths from the current directory, we are technically using `.` (e.g. `media/text.jpeg` = `./media/text.jpeg`). For the most part, both Python and the terminal are able to infer the `.` by default, which is why we tend not to use it as often.

**TL;DR:** *absolute* paths contain the entire path from OS root to the specified destination, while *relative* paths contain the path starting from a certain (often, the current) directory. Most problems stemming when working with paths stem from having an incorrect view of the overall file structure.

## 2.1. Opening/Closing

<center><img src="./media/files.png" width="500"/><center>

### 2.1.1. The Bad Way
First, lets define what we mean with the *bad way*.

In many cases you can do something several different ways with similar or the same results. This does not mean that *every* way is *correct* in the sense that there are ways that are *better* than others.

Opening files is one of those things that can be done correctly, or incorrectly, from a *best practice* point-of-view.\
We will briefly go through the incorrect way:

In [44]:
file = open("document.txt")  # naively open file

contents = file.read()  # read contents as a single string

file.close()  # close file manually

print(contents)

maria
ate an apple
pie for breakfast


__Explanation:__ So why was this so bad?

Two main reasons:
- It assumes the default mode of opening a file, which is READ-ONLY
- You have to manually close the file

> But why is manually closing the file so bad?

The problem here is that if there is a problem between opening and closing the file, you might not handle the closure properly. You potentially risk that the file ends up corrupted, particularly when writing/creating a file. 

To properly close the file, you would have to handle potential errors happening between opening and closing:

In [45]:
file = open("document.txt")

try:
    contents = file.read()
except Exception:
    # handle exception
    raise
else:
    print(contents)
finally:
    file.close()

maria
ate an apple
pie for breakfast


### 2.1.2. The Good Way
Writing the previous example for handling errors between opening and closing files is tiresome, to say the least. You essentially have to write an entire block of code to merely read/write the contents of a file.

Python, however, provides a clear and concise way of reducing this by using the `with` statement.

The `with` statement works with pretty much any *context manager*, which `open()` just so happens to be. We do not need to know much about context managers beyond the fact that they have an entry method and an exit method. It will handle the rest in the background.\
Using the `with` statement, we can streamline working with files:

In [46]:
with open("document.txt", "r") as file:
    contents = file.read()

print(contents)

maria
ate an apple
pie for breakfast


## 2.2. Reading/Writing Files

Before we proceed, lets quickly look at the file opening `mode` which is an argument of `open()`.

There are three main *modes*:
- `r` (READ)
- `w` (WRITE)
- `a` (APPEND)

The difference between write (`w`) and append (`a`) is that `w` will __overwrite__ the file contents if they exist!

To grant extra permissions you can add `+`:
- `r+` (READ + UPDATE)
- `w+` (WRITE + READ)
- `a+` (APPEND + WRITE)
---

| Mode | Description                          | File Pointer Position | Creates File if Not Exists | Truncates Existing File |
|------|--------------------------------------|-----------------------|----------------------------|-------------------------|
| r    | Read-only                            | Beginning of the file | No                         | No                      |
| r+   | Read and write (updating)            | Beginning of the file | No                         | No                      |
| w    | Write-only (overwrite or create)     | Beginning of the file | Yes                        | Yes                     |
| w+   | Write and read (overwrite or create) | Beginning of the file | Yes                        | Yes                     |
| a    | Append-only (append or create)       | End of the file       | Yes                        | No                      |
| a+   | Append and read (append or create)   | End of the file       | Yes                        | No                      |

*File Pointer Position: where operation begins*

### 2.2.1. Text Files
The simplest of files, basically read something as a string. Can be used for most files that contain text such as *.txt*, *.py*, etc.

In [47]:
# Reading
with open("document.txt", mode="r") as f:
    print(f.read())

maria
ate an apple
pie for breakfast


In [48]:
# Writing
with open('new_document.txt', mode='w') as f:
    f.write("I just learned how to deal with files in python")

---
__The Cursor:__ When operating with files, we must be aware of the concept of the cursor. It is essentially the position of a file that Python will use when handling files. When you read a file using `read()` for the first time, the cursor will start at the beginning of the file and finish at the end of the file. That means that *if you read() again*, the cursor will already be at the __end__ of the file. 

You can imagine then, that when you are appending something to a file, the cursor is placed at the *end*, from which it will start writing the new content.

To mess around with the cursor, we can make use of the function `seek()`, although it is relatively uncommon.

In [49]:
with open("document.txt", mode="r") as f:
    content = f.read() # start cursor at beginning and place it at the end
    
    new_contents = f.read() # starts cursor at the end -> nothing left to read

print(new_contents) # won't print anything because second read() will have the cursor at the end of the file




In [50]:
with open("document.txt", mode="r") as f:
    content = f.read()
    f.seek(0) # tell the cursor to get to the beginning of the file
    new_contents = f.read()

print(new_contents) # now we will have read the file twice

maria
ate an apple
pie for breakfast


___
__Other Reading Methods__ There are other reading methods that we can use. All of them follow the same rule of starting wherever the *cursor* is at.

- `readlines()` is used to read __all__ lines from a file and placing them in a list
- `readline()` will basically read the document one line at the time, using the cursor as its placemarker

You will notice that some of the lines contain the newline character (`\n`). This is normal and expected. The same happens with `read()` although since that returns a string, the `print()` function will simply output the correct format.

In [51]:
# Readlines will read line by line
with open("document.txt", mode="r") as f:
    content = f.readlines()

print(content)

['maria\n', 'ate an apple\n', 'pie for breakfast']


In [52]:
# Readline will use the cursor to get one line at a time (i.e, each call will read the subsequent line)
with open("document.txt", mode="r") as f:
    first_line = f.readline()
    second_line = f.readline()
    third_line = f.readline()

print(first_line)

print(second_line)

print(third_line)

maria

ate an apple

pie for breakfast


In [53]:
# As you can probably imagine, we can send the cursor back to the beginning to have readline() read the first line again
with open("document.txt", mode="r") as f:
    first_line = f.readline()
    f.seek(0)
    second_line = f.readline()

print(first_line)

print(second_line)

maria

maria



### 2.2.2. Binary Files (pickle example)
Some files are more code than text. Bytes in Python are expressed as hexadecimal strings prefixed by the character "b", and generally are not readable by humans (e.g. `b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04"`).

Similar to the usage of `+` to extend permissions, we can add `b` to indicate to `open()` that the file it is handling should be treated as containing bytes. Examples of bytes being used in Python is under certain messaging protocols (e.g. Kafka), storing old machine learning models (we have since moved into more framework-specific methods), and storing objects.

The example below will show you how to write and read a list of `Student` objects that we define. We will store them using the `pickle` module, which  implements binary protocols for serializing and deserializing a Python object structure.

__warning:__ NEVER use pickle to deserialize data that you do not trust!

In [54]:
class Student:
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age
        
    def __str__(self):
        return f"Student(name={self.name}, age={self.age})"

student_list = [
    Student(name="John", age=20), 
    Student(name="Mary", age=25),
]

In [55]:
import pickle

In [56]:
with open("StudentList.pkl", "wb") as file:
    pickle.dump(student_list, file)

In [57]:
with open("StudentList.pkl", "rb") as file:
    contents = pickle.load(file)

for student in contents:
    print(student)  # we are going to make use of the __str__ method to pretty print the class

Student(name=John, age=20)
Student(name=Mary, age=25)


---
What happens if you try to read a BYTES file as a TEXT file?

In [58]:
# You will see here that just reading the file as if were a text file will not work
with open("StudentList.pkl", "r") as file:
    contents = file.read()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

### 2.2.3. JSON Files
Believe it or not, most valid JSON files can be read straight into Python dictionaries/lists. There can be some weird results with single-quotes being used, but you can essentially read/write JSON files using the `json` module.

In [59]:
test_dict = {
    "year": 2024,
    "students": [
        {"name": "John", "age": 20},
        {"name": "Mary", "age": 25},
    ]
}

In [60]:
import json

In [61]:
with open("students.json", "w") as file:
    json.dump(test_dict, file)

In [62]:
with open("students.json", "r") as file:
    contents = json.load(file)

print(type(contents))

print(contents)

print(contents["students"])

<class 'dict'>
{'year': 2024, 'students': [{'name': 'John', 'age': 20}, {'name': 'Mary', 'age': 25}]}
[{'name': 'John', 'age': 20}, {'name': 'Mary', 'age': 25}]


---
What happens if you try to read a JSON file as a TEXT file?

In [63]:
# While you can read a JSON file like a text file, you will get a STRING back, instead of a DICTIONARY
with open("students.json", "r") as file:
    contents = file.read()

print(type(contents))

print(contents)

<class 'str'>
{"year": 2024, "students": [{"name": "John", "age": 20}, {"name": "Mary", "age": 25}]}


## 2.3. Bonus Section: Working with Pathlib

As you have seen in the past few examples, working with files involves a lot of strings representing the paths to the files. In reality, during your developer journey will have you working with more complex file structures that will require some external help from modules (you will learn about these later on).

Working with paths is a quick way to start having problems. Sometimes its because a function is expecting a path a certain way, others it is because we have provided an incorrect path. 

The `os` module has been the traditional go-to tool for dealing with paths and files. However, it has (as of Python version 3.6) been surpassed by an even better module called `pathlib`, namely, with its `Path` structure.

<center><img src="./media/best_practices.png" width="500"/><center>

In [64]:
# some imports to get our tools ready
import os
import os.path

from pathlib import Path

### 2.3.1. The Path Structure
When working with paths, the `os` module will work exclusively with *strings*. But we are working with paths and files, not strings!\
Sure, we can use strings to represent paths but then we have to pass these strings to other functions that will actually do the work of decoding the strings. 

The `Path` structure encapsulates (i.e, represents) a lot of path-specific functionality, allowing for cleaner, more readable, and maintainable code. One of the __key__ aspects of Python is that the code is supposed to be *as readable as possible*!

In [65]:
Path.cwd()

PosixPath('/mnt/c/Users/Nuno/Documents/Projects/ds-prep-course-instructors-2024/Week 06/SLU11 - String & File Handling')

One of the easiest ways to make a mistake when working with paths is when you try to build them as strings. The naive way of doing things is to simply join things as a string, such as with the `str.join()` method we learned earlier. The other, more traditional way is to use the `os.path` module to have a function that, in theory, is specifically geared to work with paths.

Let's take a look at an example using both of the aforementioned techniques and compare them with `Path`.

__Example Statement:__ 
We want to join three paths, 
- `hello`
- `world/`
- `file.txt`
These paths represent a *parent/child* directories and a filename.\
We want, therefore, to receive the full path `hello/world/file.txt`.

---
__Using str.join()__

We will start by using the `/` string as separator to the `str.join()` function that receives all elements in a list.

In [66]:
path = "/".join(["hello", "world/", "file.txt"])
print(path)

hello/world//file.txt


Oh dear, it seems we have an extra `/` between the first two elements! We might have just introduced a bug in our code by providing the wrong path...\
*Note* that simply removing the `/` from `/world` will not solve all cases because one of the elements might need it!

In [67]:
edge_case_elements = ["src/hello", "world/", "file.txt"]

path = "/".join([element.replace("/", "") for element in edge_case_elements]) # beyond just making code more complex, it is also wrong

print(path) # we will have wrongly removed the / from src/hello

srchello/world/file.txt


---
__Using os.path.join()__

In [68]:
path = os.path.join("hello", "world/", "file.txt")
print(path)

hello/world/file.txt


But now if we wanted to get the path to the parent directory of the file (let's say you received multiple lists of paths and have to automate things), then we would have to play around with string/list manipulation:

In [69]:
split_path = path.rsplit("/", maxsplit=1)[0]
print(split_path)

hello/world


---
__Using Path__

Since `Path` is __specifically__ tailored to paths, it has a whole host of functionality encapsulated in itself.

Here we will even use the `/` operator that allows you to iteratively build paths similar to how to can concatenate lists or strings. The `/` operator has been overloaded in the `Path` structure to be able to join a `Path` (hence `Path("hello")`) to a string, returning another `Path`. This saves a lot of trouble as opposed to working exclusively with strings.

In [70]:
path = Path("hello") / "world/" / "file.txt"
print(path)

hello/world/file.txt


And now let's try the example of getting the parent directories of `file.txt`:

In [71]:
print(path.parent)

hello/world


### 2.3.2. Improving Readability

By using the `Path` structure we might actually improve our readability. Here's why:

- Using the `os` module to work with paths might force you to actually use two modules from it: `os` and `os.path`.
- The `os` module functions work primarily with strings and return strings
  - This means that you must *feed* the output of one function to another
  - Which means that you need to read the code in-to-out, in order to understand what is happening
- Using `os.path` can make your code very *verbose* (i.e, lots of text)

In [72]:
base_dir = os.path.dirname(os.path.dirname(os.path.abspath(".")))

base_dir_contents = os.listdir(base_dir) # notice we use os, not os.path, which might add to the confusion

print(base_dir)

/mnt/c/Users/Nuno/Documents/Projects/ds-prep-course-instructors-2024


To read the `base_dir` code we have to start in the inner-most portion and work our way out:
- `"."` is the relative path to the current directory
- `os.path.abspath(".")` gives us the absolute path to the current directory
- `os.path.dirname(os.path.abspath("."))` gives us the parent directory of the absolute path we got
- `os.path.dirname(os.path.dirname(os.path.abspath(".")))` gives us the parent directory of the previous directory

To add to the complication, let's list the names of the contents in the the final directory. This, however, is achieved with `os.listdir()` which is outside of `os.path`, which just makes things more confusing.

In [73]:
for content in base_dir_contents:
    print(content)

.git
.github
.gitignore
.python-version
assets
LICENSE
README.md
requirements.txt
venv
Week 00
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11


---
Now let us use `Path` to solve the exact same problem.

Unlike `os` and `os.path`, `Path` functions return other `Path` functions. This means we can write our code sequentially, readable from left-to-right.

In [74]:
base_dir = Path().absolute().parent.parent
base_dir_contents = base_dir.iterdir()

print(base_dir)

/mnt/c/Users/Nuno/Documents/Projects/ds-prep-course-instructors-2024


From left to right, we have:

- `Path()` which defaults to `Path(".")`
- `Path().absolute()` which resolves gets the absolute path of the result of `Path()`
- `Path().absolute().parent` gets the parent path of the absolute path
- `Path().absolute().parent.parent` gets the parent of the parent of the absolute path

Instead of calling another module, we will be able to list the contents using functions encapsulated in the `Path` structure (here we use `Path.name` to emulate the response of `os.listdir()`, otherwise we would get the absolute paths of each).

In [75]:
for content in base_dir_contents:
    print(content.name)

.git
.github
.gitignore
.python-version
assets
LICENSE
README.md
requirements.txt
venv
Week 00
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11


## Wrapping Up

Phew! Looks like we are done learning for this SLU.

This notebook focused primarily in teaching you two things:
- What `Exceptions` are, and how to deal with them
- How to read/write files in different formats

As a bonus, we also briefly showed you notions of the `pathlib` module, which opens up a huge amount of tools specific for working with paths.

Now that you have gone through both notebooks, go ahead and jump into the Exercise notebooks where you will put what you learned into practice!