<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

There is no choice; you must always interact with your operating system (OS) to get anything done. This particularly applies to programming, as you must communicate with the OS to create, modify, move, copy, and delete files and directories (folders). This section is devoted to getting you up-to-speed with some Python modules (e.g., `os`, `glob`, `shutil`) that will allow you to execute these necessary actions. I will also show you how to write code that will seamlessly run on both macOS and Windows.

# 1 Important concepts

In this section, I will touch on some concepts you need to navigate the OS efficiently.

In what follows, please note that I use the terms __folder__ and __directory__ interchangeably; they refer to the __same thing__.

## 1.1 Path

When dealing with computers, you will often encounter the term __‘path’__. The path is simply a way to specify a location on your computer. It is like an __address__, and if you follow the path, it will take you to your file or folder.

Like specifying location, you can specify your path absolutely or relatively. So, for example, I can specify that SPS is located on level 3 of block S16 (absolute path). However, if I am already on Level 5 of S16, I can say, go two floors down(relative path). The former is an __absolute path__, and the latter is __relative path__. I have always found it _easier to use relative paths_, especially if I later want to move my folders about.

For example, here is an absolute path to a file on the `Desktop` on a Windows machine.

```
C:\\Users\Chammika\Desktop\data-01.txt
```

## 1.2 More about relative paths

It's meticulous but when dealing with relative paths, you will find it helpful to know `.` and `..` notation.

| Notation |       Meaning      |
|:--------:|:------------------:|
|     .    |    ‘this folder’   |
|    ..    | ‘one folder above’ |

So,
- ```.\data-files\data-01.txt``` means the file data-01.txt in the folder data-files in the __current folder__.
- ```..\data-files\data-01.txt``` means the file data-01.txt in the folder data-files located in the __folder above__.


__Remember:__`.` means current folder, and `..` means one folder up.

### macOS or Linux

__macOS__ and Linux allow you to use `~` to refer to your __home directory__. So, for example, you can access the `Desktop` in these systems ‘relatively’ with `~/Desktop`. So, I can look for a file in my Desktop using:

```~\Desktop\data-01.txt```
Where the name of the text file is data-01

## 1.3 Path separator

Today’s major OSs (Windows, macOS, Linux) offer similar graphical environments. However, one of the most ___striking differences__ between Windows and macOS_ (or Linux) is the __path separator__.

- Windows uses `\` as the path separator 
- macOS (or Linux) uses `/`

So, the absolute path to a file on the Desktop on each of these systems will look like this:

- Windows: ```C:\\Users\chammika\Desktop\data-01.txt```
- macOS (or Linux): ```/Users/chammika/Desktop/data-01.txt```

_If_ you want to share your code and want it _to work on both systems_, you _must not hardcode_ either path separator. Later, I will show you how to use the Python os package to fix this problem.

## 1.4 Text files vs. Binary files

You can think of all files on your computer as being __either text files or binary files__. 
- Text files are simple and can be opened, and their contents examined by almost any software (e.g., Notepad, TextEdit, Jupiter,…). Examples of text file formats are `.txt`, `.md` or `.csv`

- Binary files, in contrast, _require some processing to make sense of_ what they contain. For example, if you look at the raw data in a `.png` file, you will see gibberish. In addition, _some binary files will only run on specific OSs_. For example, the _`Excel.app` on a Mac will not run on Windows, nor will the `Excel.exe` file run on macOS (or Linux)_
  - Some reasons for having binary files are speed and size (cuz text files, though simple, can get bulky)

## 1.5 Extensions

Files are usually named to end with an __extension__ separated from the name by a `.` like `name.extension`. This `extension` lets the OS know what software or app to use to extract the details in a file. For example, a `.xlsx` means use Excel or `.pptx` means use PowerPoint. 

- Be careful about changing the extension of a file, as it will make your OS throw a fit. If you don’t believe me, try changing a `.xlsx` to `.txt` and double-click.

# 2 Opening and closing files

Now, let’s look at how we can __open a file for reading and writing__. I will show you a slightly advanced but better way of doing this by using the with statement (called a __context manager__). First, please download the file spectrum-01.txt into the current folder in your Learning Portfolio.

## 2.1 Reading data

Here is what you would typically do to read a text file.


In [2]:
# The open() function ‘opens’ your file. 
# The 'r' specifies that I only want to read from the file. 
# Using 'with' frees you from worrying about closing the file after you are done.

with open('spectrum-01.rtf', 'r') as file:
    file_content = file.read()

print(file_content)

{\rtf1\ansi\ansicpg1252\cocoartf2758
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fmodern\fcharset0 Courier;}
{\colortbl;\red255\green255\blue255;\red0\green0\blue0;}
{\*\expandedcolortbl;;\cssrgb\c0\c0\c0;}
\paperw11900\paperh16840\margl1440\margr1440\vieww11520\viewh8400\viewkind0
\deftab720
\pard\pardeftab720\partightenfactor0

\f0\fs26 \cf0 \expnd0\expndtw0\kerning0
\outl0\strokewidth0 \strokec2 Light Intensity, Ch A vs Actual Angular Position, Run #4\
Actual Angular Position (  )	Light Intensity, Ch A ( % max )\
0.000	-0.2\
0.000	-0.1\
0.000	-0.1\
0.000	-0.1\
0.000	-0.1\
0.000	-0.2\
0.000	-0.1\
0.000	-0.1\
0.000	-0.1\
0.000	-0.2\
0.000	-0.1\
0.000	-0.1\
0.000	-0.2\
0.000	-0.3\
0.000	-0.2\
0.000	-0.2\
0.001	-0.1\
0.001	-0.1\
0.001	-0.1\
0.001	-0.1\
0.001	-0.1\
0.001	-0.1\
0.004	-0.1\
0.010	-0.2\
0.018	-0.2\
0.024	-0.3\
0.029	-0.3\
0.033	-0.3\
0.036	-0.2\
0.039	-0.1\
0.043	-0.1\
0.047	-0.1\
0.053	-0.1\
0.060	-0.1\
0.066	-0.1\
0.069	-0.1\
0.073	-0.1\
0.076	-0.1\
0.079	-0.1\
0.081	-0

## 2.2 Writing data

Now, let’s write the following into a file.

In [3]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

I will use two writing methods to demonstrate how they work.

### Writing to a file in one go

First, let’s write everything in one go.

In [5]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)

You should now have a file ```my-text-once.txt``` in your directory. You should open it to take a look. By the way, the ```w``` indicates that I am opening the file for ```w```riting.

Checked. Indeed we do have this new file with the little story written in it.
![image.png](attachment:eae223c1-6d02-4c9b-a503-bb3702803cac.png)

### Writing to a file, line by line


Let me show you how to write a line at a time. 

This is useful when dealing with data generated on the fly. Since I don’t have such data now, I will split the lines of the previous text [The contents in both files will be slightly different. However, this is not a time to worry about that.]


In [6]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)

Boom. Now we have a new file called my-text-lines.txt ![Screenshot 2024-02-19 at 10.41.26 PM.png](attachment:090d7beb-73cd-4733-b22a-67cad8200aea.png)

Ask: Yuan Zhe, is this what the split lines are supposed to look like?

I must add that _writing to a file is a very __slow__ operation_. So, it will slow things down if you do it in a loop.

# 3 Some useful packages

Let me show you how to programmatically create, copy, and delete files and folders and navigate the OS. I will use the following three packages for these tasks.

| Package | Primarily used for                                                                |
|---------|-----------------------------------------------------------------------------------|
| `os`      | To ‘talk’ to the OS to create, modify, delete folders and write OS-agnostic code. |
| `glob`    | To search for files.                                                              |
| `shutil`  | To copy files.                                                                    |

I am using `os` and `shutil` because `shutil` offers some function (e.g., shutil.copy()) that os does not. 
There are also subtle differences in what these functions do, but let’s not worry about that now.

These packages are already part of the standard Python library. So you do not have to install them. Let’s __import the packages__ first.

In [7]:
import os
import glob
import shutil

# 4 OS safe paths

Consider a file `data-01.txt` in the sub-directory `sg-data` of the directory `all-data`.

`all-data` to `sg-data` to `data-01.txt`

If I want to access `data-01.txt` all I have to do is:

In [9]:
path = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)
#Boom, the conjoined path 

./all-data/sg-data/data-01.txt


Note: We see the above cuz we're current on macOS.
Just FYI, if ever one a windows machine you'll see:
```'.\\all-data\\sg-data\\data-01.txt'```

So, using `os.path.join()` will __adjust your path__ with either / or \ as necessary. This means your code will seamlessly run on all the OS.

# 5 Folders

## 5.1 Creating folders
You can __create a folder__ programmatically using `os.mkdir()`. This is very useful because you can write a tiny bit of code to quickly _organise_ your data.

For example, let’s say we need to store information about the people ‘John’, ‘Paul’ and ‘Ringo’. I can quickly create some folders for this by:



In [10]:
os.mkdir('people')

for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    print(f'Creating {path}')
    os.mkdir(path)

Creating people/John
Creating people/Paul
Creating people/Ringo


You don’t need the `print()` statement. I have included it so I have some feedback on what is (or is not) happening.



## 5.2 Checking for existence

Python will complain if you try to run this code twice, saying that the file (yes, __Python refers to folders as files__) already exists. So, when you create resources, it is a good idea to _check if they already exist_. 

There are two ways to do this: 
- use `try-except` with the `FileExistsError` 
- use `os.path.exists()`.

### Using try-except

In [11]:
#try-except
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')


people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


### Using `os.path.exists()`

In [12]:
#for-else
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


## 5.3 Copying files

Let me show you how to copy files programmatically.

First, there should be a copy of the 73 logo (`sp2273_logo.png`) in the current folder (Note: we didn;t have this in the folder already for some reason and so we just used a guitar.png). Then, I will copy this into the folders I created for ‘John’, ‘Paul,’ and ‘Ringo’.

In [13]:
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person)
    shutil.copy('Guitar.png', path_to_destination)
    print(f'Copied file to {path_to_destination}')

Copied file to people/John
Copied file to people/Paul
Copied file to people/Ringo


Let’s say I want all the images in a sub-folder called "imgs" in each person’s directory. I can do this by first _creating the folders imgs and then moving the logo file into that folder_.

In [15]:
for person in ['John', 'Paul', 'Ringo']:
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    # Move guitar image file
    current_path_of_guitar_image = os.path.join('people', person, 'Guitar.png')
    new_path_of_guitar_image = os.path.join('people', person, 'imgs', 'Guitar.png')

    shutil.move(current_path_of_guitar_image, new_path_of_guitar_image)
    print(f'Moved logo to {new_path_of_guitar_image}')

Moved logo to people/John/imgs/Guitar.png
Moved logo to people/Paul/imgs/Guitar.png
Moved logo to people/Ringo/imgs/Guitar.png


__Note:__ You can do all these extremely fast using only the terminal and its loops structure. Just letting you know if you want to explore on your own.



# 6 Listing and looking for files

If I __want to know what files are in a folder__, then `glob` does easy work of this. Let me show you how to use it.



__Example 1:__ I use this if I want __all__ the files in the current directory.

The `*` is called a wildcard and is read as ‘__anything__’. So, I am asking `glob` to _give me anything in the folder_.

In [16]:
glob.glob('*')

['my-text-once.txt',
 'files,_folders_&_os_(need).ipynb',
 'spectrum-01.rtf',
 'people',
 'Guitar.png']

__Example 2:__ I want to refine my search and __ask glob to give only those files that match the pattern ‘peo’__ followed by ‘anything’.

['people']

In [17]:
glob.glob('peo*')

['people']

__Example 3:__ I now want to know what is _inside the folders that start with `peo`_.

In [18]:
glob.glob('peo*/*')

['people/Paul', 'people/John', 'people/Ringo']

__Example 4:__ Now, I want __to see the whole, detailed structure of the folder `people`__. For this, I need to tell `glob` to search recursively (i.e. __dig through __all__ sub-file directories__) by putting `recursive=True`.

I must also __use two wildcards ** to say all ‘sub-directories’__.

In [19]:
glob.glob('people/**', recursive=True)

['people/',
 'people/Paul',
 'people/Paul/imgs',
 'people/Paul/imgs/Guitar.png',
 'people/John',
 'people/John/imgs',
 'people/John/imgs/Guitar.png',
 'people/Ringo',
 'people/Ringo/imgs',
 'people/Ringo/imgs/Guitar.png']

__Example 5:__ I want __only the `.png` files__. So, I just need to modify my pattern. I am asking glob to go through the whole structure of `people` and show me those files with the pattern ‘anything’`.png`!

In [20]:
glob.glob('people/**/*.png', recursive=True)

['people/Paul/imgs/Guitar.png',
 'people/John/imgs/Guitar.png',
 'people/Ringo/imgs/Guitar.png']

# 7 Extracting file info

When dealing with files and folders, you _often have to extract the filename_, folder or extension. You can do this by simple _string manipulation_; for example if I want the filename and extension:

In [22]:
path = 'people/Ringo/imgs/Guitar.png'
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, extension)

Guitar.png png


Above, __`os.path.sep`__ is the __path separator__ (i.e. \ or /) for the OS. I split the path where the separator occurred and picked the last element in the list. I use a similar strategy for the file extension.

However, if you like, `os` provides __some simple functions for these tasks__.

In [23]:
path = 'people/Ringo/imgs/Guitar.png'

In [24]:
os.path.split(path)      # Split filename from the rest

('people/Ringo/imgs', 'Guitar.png')

In [25]:
os.path.splitext(path)   # Split extension

('people/Ringo/imgs/Guitar', '.png')

In [26]:
os.path.dirname(path)    # Show the directory

'people/Ringo/imgs'

# 8 Deleting stuff

Lastly, let me show you how to delete stuff.

If you want to remove a file:

In [34]:
os.remove('people/Ringo/imgs/Guitar.png')

This won’t work with directories. For an empty directory, use instead:

In [35]:
# can think of rm of 'remove'
os.rmdir('people/Ringo/imgs')

For a directory with files, use `shutil`:

In [36]:
# Again, can think of rm as 'remove'
shutil.rmtree('people/Ringo')

__Becareful!__

It goes without saying that you should be careful when using these functions. Unfortunately, I have had some miserable experiences by accidentally deleting files because I was more enthusiastic than sensible. With great power comes great responsibility, so __use with extreme caution!__