<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

![](https://imgs.xkcd.com/comics/operating_systems.png)

From [xkcd](https://xkcd.com/)

# What to expect in this chapter

There is no choice; you must always interact with your **operating system (OS)** to get anything done. This particularly applies to programming, as you must communicate with the OS to *create, modify, move, copy, and delete files and directories (folders)*.

This section is devoted to getting you up-to-speed with some Python modules (e.g., `os`, `glob`, `shutil`) that will allow you to execute these necessary actions.

This section will also show you how to write code that will seamlessly run on **both *macOS* and *Windows***.

# 1 Important concepts

**Note:**

- We will touch on some concepts that we need to navigate the OS efficiently
- We will use the terms **folder** and **directory** interchangeably; *but* they **refer to the same thing**.

## 1.1 Path

When dealing with computers, you will often encounter the term **‘path’**. The path is simply a way to *specify a location on your computer*. It is **like an address**, and if you follow the path, **it will take you to your file or folder**.

Like specifying location, you can specify your pat**h absolute**ly o**r relative**ly. So, for examplewe I can specify tha***t SPS is located on level 3 of block S***16. However, iwe area*m alrea*dy on Level 5 of S16we I can say***, go two floors do***wn

. Th*e form*er is a**n absolute pa**th, and th*e latt*er i**s relati**veWe I have always found it easier to use relative paths, especially iwe I later want to movourmy folders about.

**Remember**:

Remember that the path tells us how to find a file or folder and that you can specify it **absolutely** *or* **relatively**.

For example, here is an absolute path to a file on the `Desktop` on a Windows machine.

`C:\\Users\Chammika\Desktop\data-01.txt`

## 1.2 More about relative paths

When dealing with relative paths, you will find it helpful to know `.` and `..` notation.

| **Notation** | **Meaning**        |
|--------------|--------------------|
| `.`          | 'this folder'      |
| `..`         | 'one folder above' |

So,
- `
.\data-files\data-01.t`xt means the fil`e data-01.t`xt in the folde`r data-fil`es in th**e curre**nt folder- `.
..\data-files\data-01.`txt means the fi`le data-01.`txt in the fold`er data-fi`les located in the fold**er ab**o

**Remember**:

Remember `.` means **current folder**, and `..` means **one folder up**.ve.

### macOS or Linux

macOS and Linux allow you to use `~` to refer to your home directory. So, for example, you can access the `Desktop` in these systems ‘relatively’ with `~/Desktop`. So, somebody can look for a file in his/her Desktop using:

`~\Desktop\data-01.txt`

## 1.3 Path separator

Today’s major OSs (Windows, macOS, Linux) offer similar graphical environments. However, one of the **most striking differences** between Windows and macOS (or Linux) is the *path separator*.

Windows uses `\` as the path separator while macOS (or Linux) uses `/`. So, the absolute path to a file on the Desktop on each of these systems will look like this:

| **Windows**          | **`C:\\Users\chammika\Desktop\data-01.txt`** |
|----------------------|----------------------------------------------|
| **macOS (or Linux)** | **`/Users/chammika/Desktop/data-01.txt`**    | |

If you want to share your code and want it to work on both systems, you **must not hardcode** either path separator. Later, we will show how to use the Python `os` package to fix this problem.

## 1.4 Text files vs. Binary files

You can think of all files on your computer as being either ***text files*** or ***binary files***. Text files are **simple and can be opened**, and their contents examined by almost any **software (e.g., Notepad, TextEdit, Jupiter,…)**. Examples of text file formats are `.txt`, `.md` or `.csv`.

Binary files, in contrast**, require some processi**ng to make sense of what they contain. For example, if you look at the raw data in `a .p`ng file, you will se**e gibberi**sh. In addition, some binary files will only run on specific OSs. For example, th`e Excel.a`pp on a Mac will not run on Windows, nor will th`e Excel.e`xe file run on macOS (or Linux). Some reasons for havin***g binary fil***es ar*e speed and si*ze***; text fil***es, thoug*h simple, can get bul*ky.

## 1.5 Extensions

Files are usually named to end with an *extension* separated from the name by a `.` like `name.extension`. This `extension` lets the OS know what software or app to use to extract the details in a file. For example, a `.xlsx` means use Excel or `.pptx` means use PowerPoint. Be careful about changing the extension of a file, as it will make your OS cough and throw a fit. If you don’t believe, try changing a `.xlsx` to `.txt` and double-click.

# 2 Opening and closing files

Now, let’s look at how we can open a file for *reading* and *writing*. We will show a slightly advanced but better way of doing this by using the `with` statement (called a **context manager**). First, please download the file [spectrum-01.txt](https://sps.nus.edu.sg/sp2273/docs/python_basics/06_this-n-that/spectrum-01.txt) into the current folder in your Learning Portfolio.

## 2.1 Reading data

Here is what you would typically do to read a text file.

In [3]:
with open('spectrum-01.txt', 'r') as file:
    file_content = file.read()

print(file_content)

Light Intensity, Ch A vs Actual Angular Position, Run #4
Actual Angular Position (  )	Light Intensity, Ch A ( % max )
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.3
0.000	-0.2
0.000	-0.2
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.004	-0.1
0.010	-0.2
0.018	-0.2
0.024	-0.3
0.029	-0.3
0.033	-0.3
0.036	-0.2
0.039	-0.1
0.043	-0.1
0.047	-0.1
0.053	-0.1
0.060	-0.1
0.066	-0.1
0.069	-0.1
0.073	-0.1
0.076	-0.1
0.079	-0.1
0.081	-0.1
0.082	-0.1
0.083	-0.2
0.083	-0.2
0.086	-0.2
0.090	-0.2
0.095	-0.2
0.100	-0.3
0.103	-0.3
0.104	-0.2
0.105	-0.3
0.107	-0.2
0.110	-0.2
0.115	-0.1
0.122	-0.2
0.128	-0.1
0.134	-0.2
0.139	-0.1
0.144	-0.2
0.150	-0.2
0.157	-0.2
0.164	-0.2
0.170	-0.3
0.175	-0.3
0.180	-0.2
0.185	-0.2
0.191	-0.1
0.195	-0.1
0.198	-0.2
0.201	-0.1
0.204	-0.2
0.206	-0.2
0.208	-0.3
0.210	-0.3
0.213	-0.1
0.217	0.3
0.222	0.6
0.226	0.2
0.230	0.0
0.233	-0.1
0.235	-0.1
0.237	

The `open()` function ‘opens’ your file. The `'r'` specifies that I only want to `r`ead from the file. Using `with` frees you from worrying about closing the file after you are done.

## 2.2 Writing data

Now, let’s write the following into a file.

In [4]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

### Writing to a file in one go

First, let’s write everything in one go.

In [5]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)

You should now have a file `my-text-once.txt` in your directory. You should open it to take a look. By the way, the `'w'` indicates that we are opening the file for `w`riting.

### Writing to a file, line by line

Let's see how to write a line at a time. This is useful when dealing with data generated on the fly. Since we don’t have such data now, we will split the lines of the previous text `[The contents in both files will be slightly different. However, this is not a time to worry about that.]`.

In [6]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)

Writing to a file is a very slow operation. So, it will slow things down if you do it in a loop.

# 3 Some useful packages

Let's see how to programmatically *create, copy, and delete files and folders* and *navigate the OS*. We will use the following three packages for these tasks.

| **Package**                                             | **Primarily used for**                                                            |
|---------------------------------------------------------|-----------------------------------------------------------------------------------|
| [os](https://docs.python.org/3/library/os.html)         | To 'talk' to the OS to create, modify, delete folders and write OS-agnostic code. |
| [glob](https://docs.python.org/3/library/glob.html)     |                                To search for files.                               |
| [shutil](https://docs.python.org/3/library/shutil.html) |                                   To copy files.                                  |   |   |

These packages are already part of the standard Python library. So you do not have to install them. Let’s import the packages first.

In [7]:
import os
import glob
import shutil

# 4 OS safe paths

Consider a file `data-01.txt` in the sub-directory `sg-data` of the directory `all-data`. to do is:

`all-data` -> ` sg-dat` -> `
 data-01.t`xt

If we want to access `data-01.txt`, all we have to do is:

In [8]:
path = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)

.\all-data\sg-data\data-01.txt


If you are on Windows, you will see.

`'.\\all-data\\sg-data\\data-01.txt'`

Else, it will be:

`'./all-data/sg-data/data-01.txt'`

So, using `os.path.join()` will adjust your path with either `/` or `\` as necessary. This means your code will seamlessly run on all the OS.

# 5 Folders

## 5.1 Creating folders

You can create a folder programmatically using `os.mkdir()`. This is very useful because you can write a tiny bit of code to quickly organise your data. For example, let’s say we need to store information about the people ‘John’, ‘Paul’ and ‘Ringo’. We can quickly create some folders for this by:

In [14]:
os.mkdir('people')

for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    print(f'Creating {path}')
    os.mkdir(path)

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'people'

You don’t need the `print()` statement.

## 5.2 Checking for existence

Python will complain if you try to run this code twice, saying that the file (yes, Python refers to folders as files) already exists. So, when you create resources, it is a good idea to check if they already exist. There are two ways to do this: use `try-except` with the `FileExistsError` or use `os.path.exists()`.

### Using `try-except`

In [15]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')

people\John already exists; skipping creation.
people\Paul already exists; skipping creation.
people\Ringo already exists; skipping creation.


### Using `os.path.exists()`

In [16]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

people\John already exists; skipping creation.
people\Paul already exists; skipping creation.
people\Ringo already exists; skipping creation.


## 5.3 Copying files

Let's see how to copy files programmatically.

First, there should be a copy of the 73 logo (`sp2273_logo.png`) in the current folder. Then, we will copy this into the folders we created for ‘John’, ‘Paul,’ and ‘Ringo’.

In [18]:
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person)
    shutil.copy('sp2273_logo.png', path_to_destination)
    print(f'Copied file to {path_to_destination}')

Copied file to people\John
Copied file to people\Paul
Copied file to people\Ringo


Let’s say I want all the images in a sub-folder called `imgs` in each person’s directory. I can do this by first creating the folders `imgs` and then moving the logo file into that folder.

In [19]:
for person in ['John', 'Paul', 'Ringo']:
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    # Move logo file
    current_path_of_logo = os.path.join('people', person, 'sp2273_logo.png')
    new_path_of_logo = os.path.join('people', person, 'imgs', 'sp2273_logo.png')

    shutil.move(current_path_of_logo, new_path_of_logo)
    print(f'Moved logo to {new_path_of_logo}')

Moved logo to people\John\imgs\sp2273_logo.png
Moved logo to people\Paul\imgs\sp2273_logo.png
Moved logo to people\Ringo\imgs\sp2273_logo.png


**For Your Information**

You can do all these extremely fast using only the terminal and its loops structure. *Just letting you know if you want to explore on your own.*

# 6 Listing and looking for files

If we want to know what files are in a folder, then `glob` does easy work of this. Let's see how to use it.

Example 1

We use this if we want **all** the files in the current directory.

Th`e` * is called **a wildca**rd and is read a**s ‘anythin**g’. Sowe aream askin`g gl`ob to givusme anything in the folder.

In [20]:
glob.glob('*')

['files,_folders_&_os_(need).ipynb',
 'my-text-lines.txt',
 'my-text-once.txt',
 'people',
 'sp2273_logo.png',
 'spectrum-01.txt']

Example 2

In [21]:
glob.glob('peo*')

['people']

We want to refine my search and ask glob to give only those files that match the pattern ‘peo’ followed by ‘anything’.

Example 3

We now want to know what is inside the folders that start with `peo`.

In [22]:
glob.glob('peo*/*')

['people\\John', 'people\\Paul', 'people\\Ringo']

Example 4

Now, we want to see the whole, detailed structure of the folder `people`. For this, we need to tell `glob` to search recursively (i.e. dig through **all** sub-file directories) by putting `recursive=True`.
We
I must also use two wildcard`s `** to sa**y all ‘sub-directorie**s’.

In [23]:
glob.glob('people/**', recursive=True)

['people\\',
 'people\\John',
 'people\\John\\imgs',
 'people\\John\\imgs\\sp2273_logo.png',
 'people\\Paul',
 'people\\Paul\\imgs',
 'people\\Paul\\imgs\\sp2273_logo.png',
 'people\\Ringo',
 'people\\Ringo\\imgs',
 'people\\Ringo\\imgs\\sp2273_logo.png']

Example 5

We want only the `.png` files. So, we just need to modify our pattern. We are asking `glob` to go through the whole structure of `people` and show us those files with the pattern ‘anything’`.png`!

In [24]:
glob.glob('people/**/*.png', recursive=True)

['people\\John\\imgs\\sp2273_logo.png',
 'people\\Paul\\imgs\\sp2273_logo.png',
 'people\\Ringo\\imgs\\sp2273_logo.png']

# 7 Extracting file info

When dealing with files and folders, you often have to extract the filename, folder or extension. You can do this by simple string manipulation; for example if we want the filename and extension:

In [25]:
path = 'people/Ringo/imgs/sp2273_logo.png'
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, extension)

people/Ringo/imgs/sp2273_logo.png png


`os.path.sep` is the path separator (i.e. `\` or `/`) for the OS. We split the path where the separator occurred and picked the last element in the list. We use a similar strategy for the file extension.

However, if you like`, `os provides some simple functions for these tasks.

In [26]:
path = 'people/Ringo/imgs/sp2273_logo.png'

In [27]:
os.path.split(path)      # Split filename from the rest

('people/Ringo/imgs', 'sp2273_logo.png')

In [28]:
os.path.splitext(path)   # Split extension

('people/Ringo/imgs/sp2273_logo', '.png')

In [29]:
os.path.dirname(path)    # Show the directory

'people/Ringo/imgs'

# 8 Deleting stuff

Lastly, let's see how to delete stuff.

If you want to remove a file:

In [30]:
os.remove('people/Ringo/imgs/sp2273_logo.png')

This ***WON'T WORK* with directories**. For an empty directory, use:

In [31]:
os.rmdir('people/Ringo')

OSError: [WinError 145] The directory is not empty: 'people/Ringo'

For a directory with files, use `shutil`:

In [32]:
shutil.rmtree('people/Ringo')

It goes without saying that you should **be careful** when using these functions.

Some people have had some **miserable** experiences by accidentally deleting files because they were more enthusiastic than sensible. With great power comes great responsibility, so **use with extreme caution**!