<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

# 1 Important concepts

## 1.1 Path

A **path** is a way to specify a location on a computer. It shows an address which directs you to a particular file or folder; it allows you to know how to find that particular file or folder in your computer.

A path can be defined absolutely or relatively. When defined absolutely, the path is the full *address* of where a file or folder is, but when defined relatively, the path is the relative position of the file or folder with respect to another file or folder.

## 1.2 More about relative paths

With relative paths, we have to know about the `.` and `..` (dot) notation.

The dot notations are to set a **reference point** for which you are going to use it for relative paths for other folders or files.

|**Notation**|**Meaning**|**Example**|
|:--:|:--:|:--:|
|`.`|'this folder'|`.\data-files\data-01.txt` means `data-01.txt` in the `data-files` in the current folder|
|`..`|'the folder above'|`..\data-files\data-01.txt` means `data-01.txt` in the `data-files` located in the folder above.|

### macOS or Linux

macOS and Linux allows the use of the `~` notation to access our home directory (i.e. the main folder associated with a particular user in a computer).

For example, `~/Downloads/file.txt` will access `file.txt` in the `Downloads` folder in the home directory.

## 1.3 Path separator

A **path separator** is a character used to separate components in a path. The path separator for different OS is different. Note that the structure of the path is also different for different OS.

|**OS**|**Path separator**|**Path to desktop**|
|:--:|:--:|:--:|
|Windows|`\`|`C:\\Users\Warren\Desktop`|
|macOS or Linux|`/`|`/Users/Warren/Desktop`|

For a code involving paths, if we want it to work on all OS, we must not **hardcode** the path separators.

## 1.4 Text files vs. Binary files

All files in a computer can be categorised into two big categories: `text files` or `binary files`.

`Text files` are files that can be opened and the contents of the file can be easily read by almost all softwares (e.g. Jupyter, Notepad, etc.). Examples of `text file` formats are `.txt`, `.md` and `.csv`.

`Binary files` are files that require processing ebfore they can be read properly. For instance for a `.png` file, opening it in its raw form will give a string of binary numbers that do not make sense. Also, some `binary files` only run in specific OS.

## 1.5 Extensions

Files are usually named in the form of `name.extension`. The **extension** `.extension` helps the computer decide which software is to be used to read the file. For example `.xlsx` will prompt Microsoft Excel to read the file.

Note that you should be careful when changing the extensions, because it may raise an error when you try to read the contents of the file.

# 2 Opening and closing files

## 2.1 Reading data

We can read data in a file using `with` statements combined with the `open` function.

The syntax for `open` is `open('file name', 'mode')`. The `mode` can be `r` (read only), `w` (write), `a` (append), etc, which decides what we can do to the opened file.

For example, we open a file called `spectrum-01.txt`.

In [1]:
with open('spectrum-01.txt', 'r') as file:
    file_content = file.read()

print(file_content)

Light Intensity, Ch A vs Actual Angular Position, Run #4
Actual Angular Position (  )	Light Intensity, Ch A ( % max )
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.3
0.000	-0.2
0.000	-0.2
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.004	-0.1
0.010	-0.2
0.018	-0.2
0.024	-0.3
0.029	-0.3
0.033	-0.3
0.036	-0.2
0.039	-0.1
0.043	-0.1
0.047	-0.1
0.053	-0.1
0.060	-0.1
0.066	-0.1
0.069	-0.1
0.073	-0.1
0.076	-0.1
0.079	-0.1
0.081	-0.1
0.082	-0.1
0.083	-0.2
0.083	-0.2
0.086	-0.2
0.090	-0.2
0.095	-0.2
0.100	-0.3
0.103	-0.3
0.104	-0.2
0.105	-0.3
0.107	-0.2
0.110	-0.2
0.115	-0.1
0.122	-0.2
0.128	-0.1
0.134	-0.2
0.139	-0.1
0.144	-0.2
0.150	-0.2
0.157	-0.2
0.164	-0.2
0.170	-0.3
0.175	-0.3
0.180	-0.2
0.185	-0.2
0.191	-0.1
0.195	-0.1
0.198	-0.2
0.201	-0.1
0.204	-0.2
0.206	-0.2
0.208	-0.3
0.210	-0.3
0.213	-0.1
0.217	0.3
0.222	0.6
0.226	0.2
0.230	0.0
0.233	-0.1
0.235	-0.1
0.237	

Note that if we try to open a file that does not exist in the computer, then said file will be created.

## 2.2 Writing data

We can write something into a file with Python. Let's try writing the following text.

In [None]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

There are two methods to do so.

### Writing to a file in one go

In [3]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

with open('my-text-once.txt', 'w') as file:
    file.write(text)

With this method, the file `my-text-once.txt` is created in an open folder with `text` in it. Note that the `mode` for the `open` function here is `w`, for writing.

Note the method associated with `file`, which is `.write`.

### Writing to a file, line by line

In [4]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():                #.splitlines() is a method to split texts into a list of lines according to line endings (e.g. \n, etc.)
        file.writelines(text)                     #.writelines() is the same as .write() but writes line by line from a list of lines 

Writing in this way takes a lot of time.

# 3 Some useful packages

Below shows some useful Python packages.

|**Package**|**Used for**|
|:--:|:--:|
|`os`|create, modify, delete folders and write OS-agnostic code|
|`glob`|to search for files|
|`shutil`|to copy files|

*OS-agnostic code is a code that is designed to run on different operating systems without requiring modification.

# 4 OS safe paths

Let's say we want to open a file `im-a-file.txt` that is inside a folder `im-a-subfolder`, which is part of a folder `im-a-folder`. We can use the `os.path.join` function from the `os` package to create the path name and use the `with` and `open` method to access the contents of the file.

In [5]:
import os

path = os.path.join('.', 'im-a-folder', 'im-a-subfolder', 'im-a-file.txt')
print(path)

.\im-a-folder\im-a-subfolder\im-a-file.txt


In the above example, the output is `.\im-a-folder\im-a-subfolder\im-a-file.txt` because it is operating on a Windows computer. However, if it is operating in a macOS or Linux computer, the output will automatically adjust to the path style for macOS and Linux, which is `./im-a-folder/im-a-subfolder/im-a-text.txt`. This is why the code then will work for all OS.

# 5 Folders

## 5.1 Creating folders

We can make folders using the `os.mkdir` function from the `os` package. This is useful because we can type very little lines of code to organise our data.

Let's say we need to store data about the SPS courses SP1111, SP2222 and SP3333. We can make folders for each and compile data into each respective folder.

In [7]:
os.mkdir('SPS_courses')                               #creates the folder in which the respective SPS course folders will be created

sp_code = ["SP1111", "SP2222", "SP3333"]
for course in sp_code:
    path = os.path.join('SPS_courses', course)        #creates path names for all the folder for the SPS courses
    os.mkdir(path)                                    #creates the folder for the SPS courses in the locationa according to the path names

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'SPS_courses'

## 5.2 Checking for existence

Python will raise an error if we try to create a folder that already exists. We can check for the existence of folders using different methods shown below.

### Using try-except

In [9]:
sp_code = ["SP1111", "SP2222", "SP3333"]
for course in sp_code:
    path = os.path.join('SPS_courses', course)
    try:
        os.mkdir(path)
    except:                                    #catch the raised error with except
        print(f"{path} already exists.")

SPS_courses\SP1111 already exists.
SPS_courses\SP2222 already exists.
SPS_courses\SP3333 already exists.


### Using os.path.exists()

The function `os.path.exists` returns a Boolean value that indicates whether or not the folder we are trying to make exists.

In [10]:
sp_code = ["SP1111", "SP2222", "SP3333"]
for course in sp_code:
    path = os.path.join('SPS_courses', course)
    if os.path.exists(path):
        print(f"{path} already exists")
    else:
        os.mkdir(path)

SPS_courses\SP1111 already exists
SPS_courses\SP2222 already exists
SPS_courses\SP3333 already exists


## 5.3 Copying files

We can use the `shutil.copy` function from the `shutil` package. The syntax is `shutil.copy('file', path name of folder in which the file will be copied to)`.

Let's say we have an image `download.png` that we want to copy to each course folders.

In [12]:
import shutil

sp_code = ["SP1111", "SP2222", "SP3333"]
for course in sp_code:
    path = os.path.join("SPS_courses", course)
    shutil.copy('download.png', path)
    print(f"Succesfully copied to {path}.")

Succesfully copied to SPS_courses\SP1111.
Succesfully copied to SPS_courses\SP2222.
Succesfully copied to SPS_courses\SP3333.


For this to work, we need to ensure that the file `download.png` is present in the current directory that is open in the `Anaconda prompt`.

<p></p>

We can also move files with the `shutil.move` function. The syntax is `shutil.move(current path of file, new path of file)`. Let's say we want to move all the `download.png` to a folder called `images` in each course folders. So first, we need to make that `image` folders.

In [13]:
for course in sp_code:
    path_images = os.path.join("SPS_courses", course, "images")
    if not os.path.exists(path_images):
        os.mkdir(path_images)
    current_image_loc = os.path.join("SPS_courses", course, "download.png")
    new_image_loc = os.path.join("SPS_courses", course, "images", "download.png")
    shutil.move(current_image_loc, new_image_loc)

# 6 Listing and looking for files

If we want to know what files are in a folder, we can use the `glob.glob` function from the `glob` package.

For `glob.glob`, asterisks `*` are important to represent 'anything'. So, if for example it is `glob.glob("SP*")` then we are trying to list out names of folders or files whose name start with 'SPS' followed by anything, e.g. SP1111, SP2222, etc.

In [16]:
import glob

glob.glob("sp*")

['spectrum-01.txt', 'SPS_courses']

*Note that the argument for `glob.glob` is not case-sensitive, as we notice above.

<p></p>

We can also use the path separators like `\` to list out the names of folders or files inside a folder.

In [14]:
import glob

glob.glob("SPS_courses/SP*")

['SPS_courses\\SP1111', 'SPS_courses\\SP2222', 'SPS_courses\\SP3333']

<p></p>

We can also add the `recursive = True` argument (for `glob.glob` to access ALL directories) inside `glob.glob` to list out all the contents of a folder (including files, subfolders and files inside those subfolders) and by using `**`, which means all subdirectories.

In [22]:
import glob

glob.glob("SPS*/**", recursive = True)

['SPS_courses\\',
 'SPS_courses\\SP1111',
 'SPS_courses\\SP1111\\images',
 'SPS_courses\\SP1111\\images\\download.png',
 'SPS_courses\\SP2222',
 'SPS_courses\\SP2222\\images',
 'SPS_courses\\SP2222\\images\\download.png',
 'SPS_courses\\SP3333',
 'SPS_courses\\SP3333\\images',
 'SPS_courses\\SP3333\\images\\download.png']

<p></p>

We can add extensions for files if we want to look out for specific file types (e.g. `.png`). Note that for this to work, it needs the `recursive = True` argument.

In [25]:
import glob

glob.glob("SPS*/**/*.png", recursive = True)

['SPS_courses\\SP1111\\images\\download.png',
 'SPS_courses\\SP2222\\images\\download.png',
 'SPS_courses\\SP3333\\images\\download.png']

# 7 Extracting file info

We extract file and folder names and extensions from a path by using simple string methods (i.e. `.split`).

In [28]:
path = os.path.join('im-a-folder', 'im-a-subfolder', 'im-a-file.txt')
filename = path.split(os.path.sep)[-1]                                   #splits path using the / or \ as the separator (this creates a list)
extensionname = filename.split(".")[-1]
print(filename)
print(extensionname)

im-a-file.txt
txt


<p></p>

The `os` package also has specialised functions to achieve the same. See below.

In [31]:
path = os.path.join('im-a-folder', 'im-a-subfolder', 'im-a-file.txt')

print(os.path.split(path))        #splits file name from path
print(os.path.splitext(path))     #splits extension from path
print(os.path.dirname(path))      #gives the directory name of path

('im-a-folder\\im-a-subfolder', 'im-a-file.txt')
('im-a-folder\\im-a-subfolder\\im-a-file', '.txt')
im-a-folder\im-a-subfolder


# 8 Deleting stuff

**Removing a file**

We use the `os.remove` function from the `os` package.

In [33]:
os.remove("SPS_courses/SP1111/images/download.png")

<p></p>

**Removing an empty folder**

We use the `os.rmdir` function from the `os` package.

In [34]:
os.rmdir("SPS_courses/SP1111/images")

<p></p>

**Removing a filled folder**

We use the `shutil.rmtree` function from the `shutil` package.

In [35]:
shutil.rmtree("SPS_courses/SP3333")