<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

- communicating with the OS is necessary to create, modify, move, copy, and delete files and directories (folders)
- Python modules to execute these necessary actions include `os`, `glob`, `shutil`)
- write code that will seamlessly run on both macOS and Windows

# 1 Important concepts

navigate the OS **efficiently**

## 1.1 Path

- `path` is used to specify a location on your computer
- like an address: follow it -- leads to your **file** or folder
- you can specify your path **absolutely** or **relatively**
- For example, I can specify that SPS is located on level 3 of block S16 (**absolute**). However, if I am already on Level 5 of S16, I can say, *go two floors down* (**relative**). 
- may be easier to use relative paths, particularly if you wan to move your folders around later on

<div class="alert alert-block alert-success">
<b>Remember:</b> <p>Path tells us how to find a file or folder and that you can specify it absolutely or relatively.</p>


</div>

example of an absolute path to a file on the Desktop on a Windows machine:</br> `C:\\Users\Chammika\Desktop\data-01.txt`

## 1.2 More about relative paths

- involves `.` and `..`

|**Notation**|**Meaning**|
|:---|---:|
|`.`|‘this folder’|
|`..`|‘one folder above’|

- thus, </br> `.\data-files\data-01.txt`: file `data-01.txt` in the folder `data-files` in the **current folder**.
..\data-files\data-01.txt means the file data-01.txt in the folder data-files located in the **folder above**.

<div class="alert alert-block alert-s z,/vd\fhuccess">
<b>Remember:</b> <p>Remember `.` means current folder, and `..` means one folder up.</p>


</div>

### macOS or Linux

macOS and Linux allow you to use ~ to refer to your home directory. So, for example, you can access the Desktop in these systems ‘relatively’ with ~/Desktop. So, I can look for a file in my Desktop using:  `~\Desktop\data-01.txt`c

SyntaxError: unexpected character after line continuation character (3057177757.py, line 1)

## 1.3 Path separator

Today’s major OSs (Windows, macOS, Linux) offer similar graphical environments. However, one of the most striking differences between Windows and macOS (or Linux) is the **path separator**.

Windows uses `\` as the path separator while macOS (or Linux) uses `/`. So, the absolute path to a file on the Desktop on each of these systems will look like this:
|OS|path name|
|:---|---:|
|Windows|`C:\\Users\chammika\Desktop\data-01.txt`|
|macOS (or Linux)|`/Users/chammika/Desktop/data-01.txt`|

If you want to share your code and want it to work on both systems, you must not **hardcode** either path separator. Later, I will show you how to use the Python `os` package to fix this problem.

## 1.4 Text files vs. Binary files

Files on a computer can be either **text files** or **binary files**. <\br>

<u>Text files</u> 
- simple, can be opened.
- contents can be examined by almost any software (e.g., Notepad, TextEdit, Jupyter)
- examples of text file formats: `.txt`, `.md`, `.csv`

<u>Binary files</u> 
- require some processing to make sense of what they contain
- for example, **raw** data in a `.png` file will **appear** like a mess
- some binary files will only **run on specific OS's** (for e.g., the `Excel.app` on a Mac will not run on Windows, nor will the `Excel.exe` file run on macOS (or Linux)
- some reasons for having binary files are **speed and size**; text files, though simple, can get **too bulky**

## 1.5 Extensions

Files are usually named to end with an extension separated from the name by a `.` like `name.extension`. This extension lets the OS know what software or app to use to extract the details in a file. For example, a `.xlsx` means use Excel or `.pptx` means use PowerPoint. Be careful about changing the extension of a file, as it will make your OS cough and throw a fit. If you don’t believe me, try changing a `.xlsx` to `.txt` and double-click.

_Note_: I believe him so I didn't try it. 

# 2 Opening and closing files

Now, let’s look at how we can open a file for reading and writing. I will show you a slightly advanced but better way of doing this by using the **with** statement (called a context manager). First, please download the file spectrum-01.txt into the current folder in your Learning Portfolio.

## 2.1 Reading data

In [None]:
#reading a text file
with open('spectrum-01.txt', 'r') as file:
    file_content = file.read()

print(file_content)

The `open()` function ‘opens’ your file. The `'r'` specifies that I only want to read from the file. Using `with` frees you from worrying about closing the file after you are done.

## 2.2 Writing data

let’s write the following into a file: </br>
`text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'`

In [7]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

### Writing to a file in one go

let’s write everything in one go.

In [8]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)

You should now have a file `my-text-once.txt` in your directory. You should open it to take a look. By the way, the `'w'` indicates that I am opening the file for writing.

### Writing to a file, line by line

This is useful when dealing with data generated on the fly. Since I don’t have such data now, I will split the lines of the previous text.

_Note_:The contents in both files will be slightly different. However, this is not a time to worry about that.

In [9]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)

I must add that writing to a file is a very slow operation. So, it will slow things down if you do it in a loop.

# 3 Some useful packages

Use the following packages to programmatically create, copy, and delete files and folders and navigate the OS.

|**Package**|**Primarily used for**|
|:---|---:|
|<u><code>os</code></u>|To ‘talk’ to the OS to create, modify, delete folders and write OS-agnostic code.|
|<u><code>glob</code></u>|To search for files.|
|<u><code>shutil</code></u>|To copy files.|

I am using both `os` **and** `shutil` because **shutil** offers some function (e.g., **shutil.copy()**) that **os** does not. 
- There are also subtle differences in what these functions do.

These packages are already part of the standard Python library. So you do not have to install them. Let’s import the packages first.

In [10]:
import os
import glob
import shutil

# 4 OS safe paths

Consider a **file** `data-01.txt` in the **sub-directory** `sg-data` of the **directory** `all-data`.

`all-data` --> `sg-data` --> `data-01.txt`

If I want to access `data-01.txt` all I have to do is:

In [11]:
path = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)

./all-data/sg-data/data-01.txt


If you are on Windows, you will see.</br>
`'.\\all-data\\sg-data\\data-01.txt'`

Else, it will be</br>
`'./all-data/sg-data/data-01.txt'`

So, using `os.path.join()` will adjust your path with either `/` or `\` as necessary. This means your code will seamlessly run on all the OS.

# 5 Folders

## 5.1 Creating folders

You can create a folder programmatically using `os.mkdir()`. This is very useful because you can write a **tiny bit of code** to quickly **organise your data**. For example, if we want to store information about the people ‘John’, ‘Paul’ and ‘Ringo’, I can quickly create some folders for this by:

In [15]:
os.mkdir('people')

for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    print(f'Creating {path}')
    os.mkdir(path)

Creating people/John
Creating people/Paul
Creating people/Ringo


You don’t need the `print()` statement. I have included it so I have some feedback on what is (or is not) happening.

## 5.2 Checking for existence

Python will complain if you try to run this code **twice**, saying that the file (yes, Python refers to folders as files) **already exists**. So, when you create resources, it is a good idea to **check** if they already exist. There are two ways to do this: 
- `try-except` with `FileExistsError`
- `os.path.exists()`

### Using `try-except`

In [16]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')

people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


### Using `os.path.exists()`

In [17]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


## 5.3 Copying files

First, there should be a copy of the 73 logo (`sp2273_logo.png`) in the current folder. Then, I will copy this into the folders I created for ‘John’, ‘Paul,’ and ‘Ringo’.

In [18]:
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person)
    shutil.copy('sp2273_logo.png', path_to_destination)
    print(f'Copied file to {path_to_destination}')

FileNotFoundError: [Errno 2] No such file or directory: 'sp2273_logo.png'

Let’s say I want all the images in a sub-folder called `imgs` in each person’s directory. I can do this by first creating the folders `imgs` and then moving the logo file into that folder.

In [19]:
for person in ['John', 'Paul', 'Ringo']:
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    # Move logo file
    current_path_of_logo = os.path.join('people', person, 'sp2273_logo.png')
    new_path_of_logo = os.path.join('people', person, 'imgs', 'sp2273_logo.png')

    shutil.move(current_path_of_logo, new_path_of_logo)
    print(f'Moved logo to {new_path_of_logo}')

FileNotFoundError: [Errno 2] No such file or directory: 'people/John/sp2273_logo.png'

<div class="alert alert-block alert-info">
<b>For Your Information:</b> </br>You can do all these extremely fast using only the terminal and its loops structure. Just letting you know if you want to explore on your own.


</div>

# 6 Listing and looking for files

If I want to know **what** files are in a folder, then `glob` does easy work of this. Let me show you how to use it.

Example 1: </br>
I use this if I want **all the files** in the **current directory**.
The `*` is called a <mark>wildcard</mark> and is read as ‘**anything**’. So, I am asking `glob` to give me **anything in the folder**.

In [20]:
glob.glob('*')

['my-text-once.txt',
 'files,_folders_&_os_(need).ipynb',
 'spectrum-01.txt',
 'people',
 'my-text-lines.txt']

Example 2: </br>
I want to **refine** my search and ask glob to give **only** those files that match the pattern ‘peo’ followed by ‘anything’.

In [21]:
glob.glob('peo*')

['people']

Example 3: </br> I now want to know what is inside the folders that start with `peo`.

In [22]:
glob.glob('peo*/*')

['people/Paul', 'people/John', 'people/Ringo']

Example 4: </br> Now, I want to see the **whole**, **detailed structure** of the folder people. For this, I need to tell `glob` to search **recursively** (i.e. dig through all sub-file directories) by putting `recursive=True`.

I must also use <mark>two wildcards</mark> `**` to say all ‘sub-directories’.

In [23]:
glob.glob('people/**', recursive=True)

['people/', 'people/Paul', 'people/John', 'people/John/imgs', 'people/Ringo']

Example 5: </br>
I want only the `.png` files. So, I just need to modify my pattern. I am asking `glob` to go through the whole structure of people and show me those files with the pattern ‘anything’`.png`!

In [25]:
glob.glob('people/**/*.png', recursive=True)

[]

# 7 Extracting file info

When dealing with files and folders, you often have to **extract** the filename, folder or extension. You can do this by simple **string manipulation**; for example if I want the filename and extension:

In [26]:
path = 'people/Ringo/imgs/sp2273_logo.png'
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, extension)

sp2273_logo.png png


`os.path.sep` is the path separator (i.e. `\` or `/`) for the OS. I split the path where the separator occurred and picked the last element in the list. I use a similar strategy for the file extension.

However, if you want, os provides some simple functions for these tasks.

In [27]:
path = 'people/Ringo/imgs/sp2273_logo.png'

In [28]:
os.path.split(path)      # Split filename from the rest

('people/Ringo/imgs', 'sp2273_logo.png')

In [29]:
os.path.splitext(path)   # Split extension

('people/Ringo/imgs/sp2273_logo', '.png')

In [30]:
os.path.dirname(path)    # Show the directory

'people/Ringo/imgs'

# 8 Deleting stuff

If you want to remove a file:

In [31]:
os.remove('people/Ringo/imgs/sp2273_logo.png')

FileNotFoundError: [Errno 2] No such file or directory: 'people/Ringo/imgs/sp2273_logo.png'

This won’t work with directories. For an empty directory, use:

In [32]:
os.remove('people/Ringo/imgs/sp2273_logo.png')

FileNotFoundError: [Errno 2] No such file or directory: 'people/Ringo/imgs/sp2273_logo.png'

For a directory with files, use `shutil`:

In [33]:
shutil.rmtree('people/Ringo')

It goes without saying that you should be careful when using these functions. Unfortunately, I have had some **miserable** experiences by accidentally deleting files because I was more enthusiastic than sensible. With great power comes great responsibility, so use with **extreme caution**!