<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

You must communicate with the OS to create, modify, move, copy, and delete files and directories (folders). This section is devoted to getting you up-to-speed with some Python modules (e.g., os, glob, shutil) that will allow you to execute these necessary actions.

# 1 Important concepts

In this section, I will touch on some concepts you need to navigate the OS efficiently.

In what follows, please note that I use the terms folder and directory interchangeably; they refer to the same thing.

## 1.1 Path

When dealing with computers, you will often encounter the term ‘path’. The path is simply a way to specify a location on your computer. It is like an address, and if you follow the path, it will take you to your file or folder.

Like specifying location, you can specify your path absolutely or relatively. So, for example, I can specify that SPS is located on level 3 of block S16. However, if I am already on Level 5 of S16, I can say, go two floors down. The former is an absolute path, and the latter is relative. I have always found it easier to use relative paths, especially if I later want to move my folders about.

An example of a path:

`C:\Users\Lai Woh Jon\Documents\GitHub\learning-portfolio-laiwohjon\files, folders & os\files,_folders_&_os_(need).ipynb`

## 1.2 More about relative paths

When dealing with relative paths, you will find it helpful to know . and .. notation.

`.` means 'this folder

`..` means 'one folder above'

Lets take a sample path as an example:

`C:\\Users\Chammika\Desktop\data-01.txt`

- .\data-files\data-01.txt means the file data-01.txt in the folder data-files in the current folder.
- ..\data-files\data-01.txt means the file data-01.txt in the folder data-files located in the folder above.

### macOS or Linux

macOS and Linux allow you to use ~ to refer to your home directory. So, for example, you can access the Desktop in these systems ‘relatively’ with ~/Desktop. So, I can look for a file in my Desktop using:

~\Desktop\data-01.txt

I'll KIV this info in case I ever decide to own a Macbook. 

## 1.3 Path separator

Today’s major OSs (Windows, macOS, Linux) offer similar graphical environments. However, one of the most striking differences between Windows and macOS (or Linux) is the path separator.

Windows uses \ as the path separator while macOS (or Linux) uses /. So, the absolute path to a file on the Desktop on each of these systems will look like this:

**Windows**: C:\\Users\chammika\Desktop\data-01.txt
**macOS (or Linux)**: /Users/chammika/Desktop/data-01.txt

If you want to share your code and want it to work on both systems, you must not hardcode either path separator. Later, I will show you how to use the Python os package to fix this problem.

## 1.4 Text files vs. Binary files

You can think of all files on your computer as being either text files or binary files. Text files are simple and can be opened, and their contents examined by almost any software (e.g., Notepad, TextEdit, Jupiter,…). Examples of text file formats are `.txt`, `.md` or `.csv`.

Binary files, in contrast, require some processing to make sense of what they contain. For example, if you look at the raw data in a `.png` file, you will see gibberish. In addition, some binary files will only run on specific OSs. For example, the `Excel.app` on a Mac will not run on Windows, nor will the `Excel.exe` file run on macOS (or Linux). Some reasons for having binary files are speed and size; text files, though simple, can get bulky.

Binary files store data in a format not directly interpretable by humans. It contains a series of bytes that may represent text, images, executable code, or any other type of data.

## 1.5 Extensions

Files are usually named to end with an extension separated from the name by a `.`, like `name.extension`.

This extension lets the OS know what software or app to use to extract the details in a file. For example, a .xlsx means use Excel or .pptx means use PowerPoint. Be careful about changing the extension of a file, as it will make your OS cough and throw a fit. If you don’t believe me, try changing a .xlsx to .txt and double-click.

# 2 Opening and closing files

Now, let’s look at how we can open a file for reading and writing. I will show you a slightly advanced but better way of doing this by using the with statement (called a context manager). First, please download the file spectrum-01.txt into the current folder in your Learning Portfolio.

## 2.1 Reading data

In [2]:
with open('spectrum-01.txt', 'r') as file:
    file_content = file.read()

print(file_content)

Light Intensity, Ch A vs Actual Angular Position, Run #4
Actual Angular Position (  )	Light Intensity, Ch A ( % max )
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.3
0.000	-0.2
0.000	-0.2
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.004	-0.1
0.010	-0.2
0.018	-0.2
0.024	-0.3
0.029	-0.3
0.033	-0.3
0.036	-0.2
0.039	-0.1
0.043	-0.1
0.047	-0.1
0.053	-0.1
0.060	-0.1
0.066	-0.1
0.069	-0.1
0.073	-0.1
0.076	-0.1
0.079	-0.1
0.081	-0.1
0.082	-0.1
0.083	-0.2
0.083	-0.2
0.086	-0.2
0.090	-0.2
0.095	-0.2
0.100	-0.3
0.103	-0.3
0.104	-0.2
0.105	-0.3
0.107	-0.2
0.110	-0.2
0.115	-0.1
0.122	-0.2
0.128	-0.1
0.134	-0.2
0.139	-0.1
0.144	-0.2
0.150	-0.2
0.157	-0.2
0.164	-0.2
0.170	-0.3
0.175	-0.3
0.180	-0.2
0.185	-0.2
0.191	-0.1
0.195	-0.1
0.198	-0.2
0.201	-0.1
0.204	-0.2
0.206	-0.2
0.208	-0.3
0.210	-0.3
0.213	-0.1
0.217	0.3
0.222	0.6
0.226	0.2
0.230	0.0
0.233	-0.1
0.235	-0.1
0.237	

The `open()` function ‘opens’ your file. The `'r'` specifies that I only want to read from the file. Using `with` frees you from worrying about closing the file after you are done.

## 2.2 Writing data

Now, let’s write the following into a file.

In [4]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

### Writing to a file in one go

In [5]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)

### Writing to a file, line by line

This is useful when dealing with data generated on the fly. Since I don’t have such data now, I will split the lines of the previous text

In [6]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)

Writing to a file is a very slow operation. So, it will slow things down if you do it in a loop.

# 3 Some useful packages

3 packages that are important:
- `os`: To ‘talk’ to the OS to create, modify, delete folders and write OS-agnostic code.
- `glob`: To search for files.
- `shutil`: To copy files.

`shutil` offers some function (e.g., shutil.copy()) that os does not. There are also subtle differences in what these functions do, but let’s not worry about that now.

How to import packages:

In [2]:
import os
import glob
import shutil

# 4 OS safe paths

Consider a file data-01.txt in the sub-directory sg-data of the directory all-data.

`all-data` --> `sg-data` --> `data-01.txt`

If I want to access data-01.txt, I have to:

In [8]:
path = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)

.\all-data\sg-data\data-01.txt


Whether folders are separated by `/` or `\` depends on your OS, but `os.path.join` will run either way. 

# 5 Folders

## 5.1 Creating folders

You can create a folder programmatically using os.mkdir(). This is very useful because you can write a tiny bit of code to quickly organise your data. For example, let’s say we need to store information about the people ‘John’, ‘Paul’ and ‘Ringo’. I can quickly create some folders for this by:

In [9]:
os.mkdir('people')

for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    print(f'Creating {path}')
    os.mkdir(path)

Creating people\John
Creating people\Paul
Creating people\Ringo


This line constructs a file path by joining the folder name 'people' and the current person's name using os.path.join()

You don’t need the print() statement. It is included for extra transparency.

I have created folders John, Paul and Ringo within the "people" folder

## 5.2 Checking for existence

Python will complain if you try to run this code twice, saying that the file (yes, Python refers to folders as files) already exists. So, when you create resources, it is a good idea to check if they already exist. There are two ways to do this: use `try-except` with the `FileExistsError` or use `os.path.exists()`.

### Using try-except

Recall this from earlier chapters.

In [10]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')


people\John already exists; skipping creation.
people\Paul already exists; skipping creation.
people\Ringo already exists; skipping creation.


### Using os.path.exists()

Using existing method within the package.

`os.path.exists(path)`: This method takes a `path` argument, which is a string representing the file or directory path you want to check.

**Return Value:**
If the path exists, the method returns True.
If the path does not exist, the method returns False.

In [11]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

people\John already exists; skipping creation.
people\Paul already exists; skipping creation.
people\Ringo already exists; skipping creation.


## 5.3 Copying files

Let me show you how to copy files programmatically.

First, there should be a copy of the 73 logo (sp2273_logo.png) in the current folder. Then, we will copy this into the folders we created for ‘John’, ‘Paul,’ and ‘Ringo’.

In [3]:
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person)
    shutil.copy('sp2273_logo.png', path_to_destination)
    print(f'Copied file to {path_to_destination}')

Copied file to people\John
Copied file to people\Paul
Copied file to people\Ringo


shutil.copy('sp2273_logo.png', path_to_destination) uses the shutil.copy() function to copy a file named 'sp2273_logo.png' to the destination specified by path_to_destination.

Let’s say I want all the images in a sub-folder called `imgs` in each person’s directory. I can do this by first creating the folders `imgs` and then moving the logo file into that folder.

In [4]:
for person in ['John', 'Paul', 'Ringo']:
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    # Move logo file
    current_path_of_logo = os.path.join('people', person, 'sp2273_logo.png')
    new_path_of_logo = os.path.join('people', person, 'imgs', 'sp2273_logo.png')

    shutil.move(current_path_of_logo, new_path_of_logo)
    print(f'Moved logo to {new_path_of_logo}')

Moved logo to people\John\imgs\sp2273_logo.png
Moved logo to people\Paul\imgs\sp2273_logo.png
Moved logo to people\Ringo\imgs\sp2273_logo.png


In the first part, we create a new path to the directory that we want to create. If the path does not already exist, we will create the folder. 

We then assign the current path of logo to a variable and the new path of the logo to another variable. We then use shutil.move to move the logo from the old path to the new path.

You can do all these extremely fast using only the terminal and its loops structure. 

# 6 Listing and looking for files

If I want to know what files are in a folder, then `glob` does easy work of this. Let me show you how to use it.

I use this if I want all the files in the *current* directory.

The * is called a wildcard and is read as ‘anything’. So, I am asking glob to give me anything in the folder.

In [5]:
glob.glob('*')

['files,_folders_&_os_(need).ipynb',
 'my-text-lines.txt',
 'my-text-once.txt',
 'people',
 'sp2273_logo.png',
 'spectrum-01.txt',
 'spectrum-01.txt.url']

If I want to ask `glob` to only give me those files that match the pattern ‘peo’ followed by ‘anything’. (ie file name start with "peo").

In [6]:
glob.glob('peo*')

['people']

If I want to know what is inside the folders that start with `peo`.

In [7]:
glob.glob('peo*/*')
#/* represents everthing that is in the folder that starts
#with peo.

['people\\John', 'people\\Paul', 'people\\Ringo']

Now, I want to see the whole, detailed structure of the folder `people`. For this, I need to tell `glob` to search recursively (i.e. dig through all sub-file directories) by putting `recursive=True`.

I must also use two wildcards `**` to say all ‘sub-directories’.

In [8]:
glob.glob('people/**', recursive=True)

['people\\',
 'people\\John',
 'people\\John\\imgs',
 'people\\John\\imgs\\sp2273_logo.png',
 'people\\Paul',
 'people\\Paul\\imgs',
 'people\\Paul\\imgs\\sp2273_logo.png',
 'people\\Ringo',
 'people\\Ringo\\imgs',
 'people\\Ringo\\imgs\\sp2273_logo.png']

What if I want to search through all my sub-directories, but I only want the `.png` files? 

In [9]:
glob.glob('people/**/*.png', recursive=True)

['people\\John\\imgs\\sp2273_logo.png',
 'people\\Paul\\imgs\\sp2273_logo.png',
 'people\\Ringo\\imgs\\sp2273_logo.png']

In the previous example, we added "/\*.png" to show that I want everything that ends with png

# 7 Extracting file info

When dealing with files and folders, you often have to extract the filename, folder or extension. You can do this by simple string manipulation; for example if I want the filename and extension:

In [20]:
path = 'people\Ringo\imgs\sp2273_logo.png'
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, extension)

sp2273_logo.png png


os.path.sep is the path separator (i.e. \ or /) for the OS. We split the path where the separator occurred and picked the last element in the list (using index -1). We use a similar strategy for the file extension.

However, if you like, `os` provides some simple functions for these tasks.

In [16]:
path = 'people/Ringo/imgs/sp2273_logo.png'

In [17]:
os.path.split(path)      # Split filename from the rest

('people/Ringo/imgs', 'sp2273_logo.png')

In [21]:
os.path.splitext(path)   # Split extension

('people\\Ringo\\imgs\\sp2273_logo', '.png')

In [22]:
os.path.dirname(path)    # Show the directory

'people\\Ringo\\imgs'

# 8 Deleting stuff

If you want to remove a file:

In [23]:
os.remove('people/Ringo/imgs/sp2273_logo.png')

This won’t work with directories. For an empty directory, use:

In [25]:
os.rmdir('people/Ringo/imgs')

For a directory with files, use `shutil`:

In [26]:
shutil.rmtree('people/John')

It goes without saying that you should be careful when using these functions.