<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

Communicate with the OS to **create, modify, move, copy, and delete** files and directories (folders)

# Important concepts

**folder** and **directory** refer to the same thing.

## Path

- File address

- Tells us how to find a file or folder 

- You can specify it absolutely or relatively.

In [1]:
C:\\Users\Chammika\Desktop\data-01.txt # example of a file path

SyntaxError: unexpected character after line continuation character (1619185342.py, line 1)

## More about relative paths

`.` means '**this**  folder'

`..` means 'one folder **above**' 

For example: 

`.\data-files\data-01.txt` means the file `data-01.txt` in the folder data-files in the **current** folder.


`..\data-files\data-01.txt` means the file `data-01.txt` in the folder data-files located in the **folder above**.

### macOS or Linux

macOS and Linux allow you to use `~` to refer to your home directory.

`~\Desktop\data-01.txt`

## Path separator

#### !! Windows uses `\` as the path separator while macOS (or Linux) uses `/`.

Windows: `C:\\Users\chammika\Desktop\data-01.txt`

macOS or Linux: `/Users/chammika/Desktop/data-01.txt`

## Text files vs. Binary files

**Text files**: 

- Simple and can be opened, but can get bulky

- Contents can be examined by almost any software (e.g., Notepad, TextEdit, Jupiter,…).

-  `.txt`, `.md` or `.csv`.

**Binary files**: 

- Require some processing to make sense of what they contain (Eg. if you look at the raw data in a `.png` file, you will see gibberish).

- Only run on specific OSs (Eg. `Excel.app` on a Mac will not run on Windows, nor will the `Excel.exe` file run on macOS (or Linux)).

- Some reasons for having binary files are speed and size

## Extensions

- Separated from the name of a file by a `.`
  
- `name.extension`
  
- Lets the OS know what software or app to use to extract the details in a file.

- `.xlsx`: use Excel
  
- `.pptx`: use PowerPoint.

- **Be careful about changing the extension of a file, as it will make your OS cough and throw a fit.** Try changing a `.xlsx` to `.txt` and double-click.

# Opening and closing files

using the `with` statement

## Reading data

The `open()` function ‘opens’ your file. The `'r'` specifies that I only want to `r`ead from the file. Using `with` frees you from worrying about closing the file after you are done.

In [2]:
with open('spectrum-01.txt', 'r') as file:
    file_content = file.read()

print(file_content)

FileNotFoundError: [Errno 2] No such file or directory: 'spectrum-01.txt'

In [3]:
# changing the file extension will result in a FileNotFoundError

with open('spectrum-01.xlsx', 'r') as file:
    file_content = file.read()

print(file_content)

FileNotFoundError: [Errno 2] No such file or directory: 'spectrum-01.xlsx'

## Writing data

##### A very slow operation. So, it will slow things down if you do it in a loop.

In [4]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

### Writing to a file in one go

In [5]:
with open('my-text-once.txt', 'w') as file:
    file.write(text)

You should now have a file `my-text-once.txt` in your directory with the above text. 

`'w'` indicates that I am opening the file for `w`riting.

### Writing to a file, line by line

In [6]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)

# Some useful packages

| Package | Primarily used for                                                                |
|---------|-----------------------------------------------------------------------------------|
| `os `     | To ‘talk’ to the OS to create, modify, delete folders and write OS-agnostic code. |
| `glob`   | To search for files.                                                              |
| `shutil`  | To copy files.                                                                    |

In [7]:
import os
import glob
import shutil

# OS safe paths

Consider a file `data-01.txt` in the sub-directory `sg-data` of the directory `all-data`.

`all-data` --> `sg-data` --> `data-01.txt`

If I want to access `data-01.txt` all I have to do is:

In [8]:
path = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
print(path)

./all-data/sg-data/data-01.txt


Using `os.path.join()` will adjust your path with either `/` or `\` as necessary. This means your code will seamlessly run on all the OS.

# Folders

## Creating folders

 #### using `os.mkdir()`

In [9]:
os.mkdir('people')

for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    print(f'Creating {path}') # print statement is optional
    os.mkdir(path)

# to view results, go under computer files > sp2273 > learning portfolio 

Creating people/John
Creating people/Paul
Creating people/Ringo


In [10]:
os.mkdir('colours')

for colour in ['Blue', 'White', 'Purple']:
    path = os.path.join('colours', colour)
    os.mkdir(path)

# to view results, go under computer files > sp2273 > learning portfolio 

## Checking for existence

Python will complain if you try to run this code twice, saying that the file (yes, Python refers to folders as files) already exists. So, when you create resources, it is a good idea to check if they already exist. There are two ways to do this: use `try-except` with the `FileExistsError` or use `os.path.exists()`

### Using try-except

In [11]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')

people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


### Using os.path.exists()

In [12]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

people/John already exists; skipping creation.
people/Paul already exists; skipping creation.
people/Ringo already exists; skipping creation.


## Copying files

In [13]:
for person in ['John', 'Paul', 'Ringo']:
    path_to_destination = os.path.join('people', person) # setting up the file path to copy the image to
    shutil.copy('sp2273_logo.png', path_to_destination) # copying the image to the file
    print(f'Copied file to {path_to_destination}')

FileNotFoundError: [Errno 2] No such file or directory: 'sp2273_logo.png'

Let’s say I want all the images in a sub-folder called `imgs` in each person’s directory. I can do this by first creating the folders `imgs` and then moving the logo file into that folder. Let's take a look at how this can be done:

In [14]:
for person in ['John', 'Paul', 'Ringo']:
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    # Move logo file
    current_path_of_logo = os.path.join('people', person, 'sp2273_logo.png')
    new_path_of_logo = os.path.join('people', person, 'imgs', 'sp2273_logo.png')

    shutil.move(current_path_of_logo, new_path_of_logo)
    print(f'Moved logo to {new_path_of_logo}')

FileNotFoundError: [Errno 2] No such file or directory: 'people/John/sp2273_logo.png'

# Listing and looking for files

If I want to know what files are in a folder, then `glob` does easy work of this.

1) I want **all** the files in the current directory.



In [15]:
glob.glob('*') 

# The * is called a wildcard and is read as ‘anything’. So, I am asking glob to give me anything in the folder.

['my-text-once.txt',
 'files,_folders_&_os_(need).ipynb',
 'colours',
 'people',
 'my-text-lines.txt']

2) Give only those files that **match the pattern ‘peo’ followed by ‘anything’**.

In [16]:
glob.glob('peo*')

['people']

3) I now want to know **what is inside** the folders that start with `peo`.

In [17]:
glob.glob('peo*/*')

['people/Paul', 'people/John', 'people/Ringo']

4)  I want to see the **whole, detailed structure** of the folder `people`.

In [18]:
glob.glob('people/**', recursive=True)

# tell glob to search recursively (i.e. dig through all sub-file directories) by putting recursive=True
# use two wildcards ** to say all ‘sub-directories’

['people/', 'people/Paul', 'people/John', 'people/John/imgs', 'people/Ringo']

5) Now, I want **only the `.png` files**. So, I just need to modify my pattern. I am asking `glob` to go through the whole structure of `people` and show me those files with the pattern ‘anything’`.png`!

In [19]:
glob.glob('people/**/*.png', recursive=True)

[]

# Extracting file info

In [20]:
# TO EXTRACT FILE NAME, FOLDER, OR EXTENSION

path = 'people/Ringo/imgs/sp2273_logo.png'

# os.path.sep is the path separator (i.e. \ or /) for the OS.
# split the path where the separator occurred and picked the last element in the list. 
filename = path.split(os.path.sep)[-1] 

extension = filename.split('.')[-1]
print(filename, extension)

sp2273_logo.png png


In [21]:
path = 'people/Ringo/imgs/sp2273_logo.png'

In [22]:
os.path.split(path)      # Split filename from the rest

('people/Ringo/imgs', 'sp2273_logo.png')

In [23]:
os.path.splitext(path)   # Split extension

('people/Ringo/imgs/sp2273_logo', '.png')

In [24]:
os.path.dirname(path)    # Show the directory

'people/Ringo/imgs'

# Deleting stuff

In [25]:
os.remove('people/Ringo/imgs/sp2273_logo.png') # remove file

FileNotFoundError: [Errno 2] No such file or directory: 'people/Ringo/imgs/sp2273_logo.png'

In [26]:
os.rmdir('people/Ringo') # remove file from AN EMPTY DIRECTORY

In [27]:
shutil.rmtree('people/Ringo') # removing file from a NON-EMPTY DIRECTORY

FileNotFoundError: [Errno 2] No such file or directory: 'people/Ringo'