<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

important to ensure code works regardless of OS  
also important if the code is intended to write to a file

# 1 Important concepts

## 1.1 Path

can be thought of as a directory, or a guide to a specific location  
there is an **absolute path** to every file in the computer  
- to find the absolute path for the current file, `!dir` can be used (assuming administrative access)

a more convenient and equally clear way to define a path is using **relative path**, if the two files are located in the same directory
- however, moving them out of the directory requires redefining the relative path, otherwise it will not work
- relative paths are more flexible and can work for different paths on different machines

## 1.2 More about relative paths

`.` refers to this folder, `..` refers to one folder above (up)  
two folders up: `../../filename`  
three folders up: `../../../filename`  
this works only if the file structure is consistent (i.e. it will not work in a different computer with a different file structure, unless the entire folder/directory is shared)  

to go 1 folder up from the current folder: `!cd ..`
- 2 folders `!cd ../..`

### macOS or Linux

`~` returns to the desktop directory (home folder) in all OSes

## 1.3 Path separator

there are different path separators for Windows (\\) and MacOS/Linux (/)  
try to write code in a way that can work in both systems (not hardcoding)

remember for Windows, the backslash needs to be escaped (or the filepath could be mistaken for an escape itself e.g. `\n` or `\t`)

## 1.4 Text files vs.Â Binary files

text file: when opened in almost any software, they are readable (e.g. .txt, .csv)
- note that Jupyter Notebooks are text files, but they are run by a software (Jupyter Notebook) to make it more readable

binary file: below the graphical interface, it is stored as a series of 1s and 0s (binary). when run in the appropriate program that is able to make sense of the format of the binary file, then it will show its intended content
- e.g. can be used as proprietary content
- text files can be clunky/big (each letter is a number), while binary files can be compressed as patterns to store more data in less space. binary files generally parse faster as well

## 1.5 Extensions

file extensions (usually preceded by `.`) tell the OS how to handle the file (e.g. what associated program to open it with)
- this underpins default programs for certain file extensions (e.g. Excel for .xlsx files)
- they help to identify the correct decompression algorithm (reading method) to open the file with

# 2 Opening and closing files

## 2.1 Reading data

```
with open(filename.ext, 'r') as file:  
    print(file.read())
```
opens a file and prints its contents using a file manager
- `'r'` reads the entire file (as text file by default, doesn't really work with binary files unless `'rb'` is used)
    - there are other options, deal with them as you encounter them
- `filename.ext` should be replaced with the actual file and its extension
- by default, this would reference the same directory as the code file

normally, opening a file opens a stream to the file to read/write data  
the stream has to be closed to prevent file corruption using `file.close()`  
however, this is not required when using `with` to open files as it does resource handling for you

## 2.2 Writing data

### Writing to a file in one go

```
with open(filename.txt, 'w') as file:  
    file.write()
```
this creates a new text file `filename.txt` and writes the argument in `.write()` into the file
- the `'w'` represents writing
- if the file doesn't exist, this creates a new file named `filename.txt`
- if the file exists, this **overwrites** the existing content in the file
    - use case: continual updating of a file
    - file writing tends to be the slowest part in a code

on the other hand, `'a'` instead **appends** the argument in `.write()` to the end of the contents in the file

### Writing to a file, line by line

using `.splitlines()` can split a file by a delineator into separate entiries in a list  
this is usually followed by `.writeline()` to write the separate entries into a file

# 3 Some useful packages

importing can be done once per kernel (at the top of the file)
- backend, if Python already has the package it will not import it (code clutter)

In [1]:
import os
import glob
import shutil

# 4 OS safe paths

`os.path.join` writes the file path in a manner that is agnostic to whichever OS the code is run in (Windows/MacOS/Linux etc.)

# 5 Folders

## 5.1 Creating folders

`os.mkdir()` makes a new folder (directory) in the same directory as the code file with the name specified in the argument (as a string)
- the argument specified is case-sensitive
- however, using identical strings with different capitalistion is not possible as the underlying OSes do not accept folders with the same name but different capitalisations
- the intended filepath can also be specified manually, or using `os.path.join` to produce the relative filepath

## 5.2 Checking for existence

good practice to use one of these blocks to check for file creation

### Using try-except

Avoid using `os.` commands unless you are sure the code works correctly  
Specific errors can be assigned to different `except` branches, in this case `FileNameError`

### Using os.path.exists()

an OS command `os.path.exists()` can also be used together with `if-else` to check for pre-existing files in the specified directory

## 5.3 Copying files

```
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person)
    shutil.copy('sp2273_logo.png', path_to_destination)
    print(f'Copied file to {path_to_destination}.')
```

By adding `file_to_copy` as a variable, different files can be assigned to this variable (code can be reused to copy other files)  
replace the actual filename in the above code with the variable

This can accidentally overwrite data if done repeatedly as well, therefore using a **check for existence** is important to prevent this  
Using `print()` is also a good way to check if the code is working as intended before running a directory-changing `os.` command

Python is always looking in a folder (by default, the folder that the code is running in)  
It can be asked to find a certain file/folder by specifying a relative filepath}')

# 6 Listing and looking for files

`glob` is a package optimised for searching for file contents (it is a pattern-matching engine that allows the use of regular expressions)  
`'*'` in Python is used to represent 'anything' (a wildcard)  
- if the argument is `'*.txt`, it will show all .txt files
- if the argument is '`peo*'`, it will look for files starting with peo
- if the argument is `'*e*'`, it will look for files containing e in its name
- if the argument is `'peo*\*`, it looks for all files within folders starting with peo
    - additional directories can be further specified with additional `\*`
- these can be combined
 
adding the second argument `recursive=True` will show all folders and their subfolders/contents as separate list entries
- helps to search the entire file structure

# 7 Extracting file info

`os.path.sep` helps make the code OS-agnostic by using OS-specific filepath separators depending on which OS the code is run on  
- using `.split()` with the above, this splits the file into separate list entries, one for each directory level
    - the path separator regardless of OS acts as the delineator for the `.split()` function
- indexing can be performed on this list to look for the specific file/sub-directory intended (e.g. `[-1]` looks for the deepest file
- filenames should have extensions to tell the OS which program to run it with

fun facts about data formats:
- JPEG files use an algorithm to print the image (lossy formats)
- BMP/TIF files carry images by pixels (more pixels = larger files) -> higher resolution
- PDFs are vector formats -- images are stored as mathematical formats (look the same even with zoom/other viewing)

`os.path.split()` breaks the specific file name from the rest of the filepath in the argument  
`os.path.splitext()` breaks the extension from the filename specified in the filepath in the argument  
`os.path.dirname()` returns only the directory leading to the file specified in the filepath in the argument

# 8 Deleting stuff

`os.remove()` is the default delete function that deletes the file specified in its argument  
`os.rmdir()` can be used to delete entire folders specified in the argument, but it has a failsafe error that will be thrown if the folder isn't empty
`shutil.rmtree()` instantly deletes the folder even if it has contents

deletion moves the affected files/folders into the Recycle Bin