<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

We must always interact with our operating system (OS) to get anything done. We have to interact with our OS to create, modify, move, copy and delete files and directories (folders). <br>Here we learn how to execute these necessary actions ith some Python modules (e.g. `glob`, `shutil`). 

# 1 Important concepts

- folders and directory refers to the same thing, they are used interchangeably in this section

## 1.1 Path

<span style='color:orange'>path</span>. The path is simply a way to specify a location on our computer, it is like an address. By following the path, it will take us to our folder. 
<br> For example, we can specify our path absolutely or relatively. Here is an absolute path on the `Desktop` for a MacOS machine.

In [1]:
/Users/bradentan/Desktop

NameError: name 'Users' is not defined

## 1.2 More about relative paths

When dealing with relative paths, it will be helpful to know `.` and `..` notation. 

|Notation|Meaning|
|:------:|:------|
|`.`|'this folder'|
|`..`|'one folder above'|

Hence:
- `.\data-files\data-01.txt` means the file `data-01.txt` in the folder `data-files` in the **current** folder <br>
- `..\data-files\data-01.txt` means the file `data-01.txt` in the folder `data-files` located in the folder **above**

### macOS or Linux

MacOS and Linux allows us to use `~` to refer to our home directory. For example we can simple access our Desktop using the command: <br>
`~\Desktop\data-01.txt`

## 1.3 Path separator

Today's major Oss (Windows, macOS, Linux) offer similar graphical environments. However one of the most striking differences between Windows and MacOS (or Linux) is the <span style='color:orange'>path separator</span>. 

Windows `\` as thr path separator while macOS or (Linux) uses `/`. So, the absolute path to a file on the Desktop on each of the systems will look like this:

|OS|Path|
|:-|:---|
|Windows|`C:\\Users\bradentan\Desktop\data-01.txt`|
|MacOS|`/Users/bradentan/Desktop/data-01.txt`|

## 1.4 Text files vs. Binary files

We can think of the files on our computer as either <span style='color:orange'>text files</span> or <span style='color: orange'>binary files</span>.<br>
Text files are simple and can be opened, their contents examined by almost any software (e.g. Notepad, TextEdit, Jupiter) <br>Examples of text file formats are `.txt`, `.msd`, or `.csv`.
___
Binary files in contrast require some processing to make sense of what they contain. For example, if we look at the raw data in a `.png` file, you will see gibberish. Additionally, some binary files will only run on specific OSs. 
<br> For example, the `Excel.app` on a Mac will not run Windows, nor will the `Excel.exe` file run on macOS (or Linux). 

## 1.5 Extensions

Files are usually nameed to an end with an <span style='color:orange'>extension</span> separated from the name by a `.` like `name.extension`. <br> This `extension` lets the OS know what software or app to use to extract the details in a file. <br>
For example, `.xlsx` means use Excel or `.pptx` means use PowerPoint. 

# 2 Opening and closing files

We can open a file for reading and writing with an advanced but better way by using the `with` statement (called a <span style='color:orange'>context manager</span>). 

## 2.1 Reading data

In [None]:
#This is whaty we would typically do to read a text file:
with open ('spectrum-01.txt', 'r') as file:
    file_content=file.read()
print(file_content)

## 2.2 Writing data

In [6]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

*Here we use 2 writing methods so that we know how it works 

### Writing to a file in one go

In [8]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)

We now have a `my-text-once.txt` file in our directory. This is in the same folder that this notebook is in. `w` indicates that we have opened the file for writing. 

### Writing to a file, line by line

Writing a line at a time is useful when dealing with data generated on the fly. <br>
Since we don't have the data now, we will split the lines of the previous text.

In [9]:
with open('my-text-lines.txt', 'w')as file:
    for line in text.splitlines():
        file.writelines(line)

Writing a file is a very slow operation, so it will slow things down if we do it in a loop. 

# 3 Some useful packages

Here are some packages to programmatically create, copy and delete files and folders.
|Package|Primarily used for|
|:------|:-----------------|
|`os`|To `talk` to the OS to create, modify, delete folders and write OS-agnostic code|
|`glob`|To search for files|
|`shutil`|To copy files|

`shutil` offers some functions that `os` does not. For example `shutil.copy()`. 
<br>These packages are already part of the standard Python library. So we do not have to install them. 

In [10]:
#Let's import the packages first
import os 
import glob 
import shutil

# 4 OS safe paths

Consider a file `data-01.txt` in the sub-directory sg-data of the directory all-data. <br>
`all-data`--> `sg-data` --> `data-01.txt` <br>
If we want access to `data-01.txt` we have to:

In [11]:
path=os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)

./all-data/sg-data/data-01.txt


So using `os.path.join()` will adjust our path, with either `/` or `\` as necessary! 

# 5 Folders

## 5.1 Creating folders

We can create a folder programatically using `os.mkdir()`. This is very useful because we can write a tiny bit of code to organise our data!

In [16]:
#Example: We can store information about people
os.mkdir('people')
for person in ['John', 'Paul', 'Ringo']:
    path=os.path.join('people', person)
    print(f'creating{path}')
    os.mkdir(path)

FileExistsError: [Errno 17] File exists: 'people'

## 5.2 Checking for existence

As seen above, python will complain if we try to run the code twice, saying that the file already exists. Hence when we create new resources, it is a good idea to check if they already exists! 
<br>
There are 2 ways to do this, use: `try-except` with the `FileExistsError` or use `os.path.exists()`. 

### Using try-except

In [17]:
for person in ['John', 'Paul', 'Ringo']:
    path=os.path.join('people', person)
    try: 
        os.mkdir(path)
        print(f'creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')

people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


### Using os.path.exists()

In [19]:
for person in ['John', 'Paul', 'Ringo']:
    path=os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists, skipping creation.')
    else:
        os.mkdir(path)
        print(f'creating {path}!')

people/John already exists, skipping creation.
people/Paul already exists, skipping creation.
people/Ringo already exists, skipping creation.


## 5.3 Copying files

Here is how to copy files programatically. 
1. There should be a copy of the 73 logo (`sp2273_logo.png`) in the current folder.
2. We will then copy this into the folders created for 'John', 'Paul', and 'Ringo'.

In [22]:
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person)
    shutil.copy('sp2273_logo.png', path_to_destination)
    print(f'Copied file to {path_to_destination}')

Copied file to people/John
Copied file to people/Paul
Copied file to people/Ringo


Let's say we want the images in a sub-folder called `imgs` in each person's directory. I can do this first by creating the folders `imgs` and then moving the logo file into that folder. 

In [23]:
for person in ['John', 'Paul', 'Ringo']:
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    # Move logo file
    current_path_of_logo = os.path.join('people', person, 'sp2273_logo.png')
    new_path_of_logo = os.path.join('people', person, 'imgs', 'sp2273_logo.png')

    shutil.move(current_path_of_logo, new_path_of_logo)
    print(f'Moved logo to {new_path_of_logo}')

Moved logo to people/John/imgs/sp2273_logo.png
Moved logo to people/Paul/imgs/sp2273_logo.png
Moved logo to people/Ringo/imgs/sp2273_logo.png


<span style='color:red'>**FYI**</span>: We can also do this really fast using the terminal and its loops strcture 

# 6 Listing and looking for files

If we want to know what files are in a folder, then `glob` is really useful for this.

In [24]:
#Example 1: If we want all the files in the current directory 

glob.glob('*')

['my-text-once.txt',
 'files,_folders_&_os_(need).ipynb',
 'sp2273_logo.png',
 'people',
 'my-text-lines.txt']

`*` is called a <span style='color:orange'>wildcard</span> and is read as 'anything'. So in this case, we are asking `glob` to give us anything in the folder.

In [25]:
#Example 2: Refining our search to ask glob to give us files that matches a pattern
glob.glob('peo*')

['people']

In this case, we refined our search and ask `glob` to give only those files that matches the pattern `peo` and then followed by `*` so aynthing. 

In [29]:
#Example 3: Using glob to get information of files in folder
glob.glob('peo*/*')

['people/Paul', 'people/John', 'people/Ringo']

The pattern specifies /*, which means it looks for any files or directories inside those matching directories. It does not match files or directories in the root or starting path directly but only those within directories that start with "peo".

In [30]:
#Example 4: Using glob to see the detailed structure
glob.glob('people/**', recursive=True)

['people/',
 'people/Paul',
 'people/Paul/imgs',
 'people/Paul/imgs/sp2273_logo.png',
 'people/John',
 'people/John/imgs',
 'people/John/imgs/sp2273_logo.png',
 'people/Ringo',
 'people/Ringo/imgs',
 'people/Ringo/imgs/sp2273_logo.png']

In this case, we need to tell `glob` to search recursively (dig through all sub-file directories) by putting `recursive=True`. 
<br>
Two wildcards `**` to say all sub-directories`. 

In [31]:
#Example 5: Choosing file formats
glob.glob('people/**/*.png', recursive=True)

['people/Paul/imgs/sp2273_logo.png',
 'people/John/imgs/sp2273_logo.png',
 'people/Ringo/imgs/sp2273_logo.png']

In this case, we're asking `glob` to filture through the whole structure of `people` and show me those files with the pattern `anything.png`.

# 7 Extracting file info

When dealing with files and folders, we might need to extract the file name, folder or extension. We can do this with simple string manipulation.

In [32]:
path = 'people/Ringo/imgs/sp2273_logo.png'
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, extension)

sp2273_logo.png png


`os.path.sep` is the path separator (`/` or `\`).

However, `os` provides simple functions for these tasks.

In [33]:
path = 'people/Ringo/imgs/sp2273_logo.png'

In [34]:
os.path.split(path)      # Split filename from the rest

('people/Ringo/imgs', 'sp2273_logo.png')

In [35]:
os.path.splitext(path)   # Split extension

('people/Ringo/imgs/sp2273_logo', '.png')

In [36]:
os.path.dirname(path)    # Show the directory

'people/Ringo/imgs'

# 8 Deleting stuff

Lastly, we learn how to **delete** stuff. 

In [37]:
os.remove('people/Ringo/imgs/sp2273_logo.png')

In [38]:
#The above code does not work with directories, for an empty directory use:
os.rmdir('people/Ringo')

OSError: [Errno 66] Directory not empty: 'people/Ringo'

In [40]:
#For directories with files:
shutil.rmtree('people/Ringo')

<span style='color:red'> **BE CAREFUL WHEN USING THESE FUNCTIONS!!!** </span>