<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

Exposure to Python modules: `os`,`glob`,`shutil`.\
Writing code that runs on both macOS and Windows.

# 1 Important concepts

Concepts to navigate OS efficiently.\
Folder and directory refer to the same thing.

## 1.1 Path

Way to specify location on computer to find file or folder.\
2 ways to specify path:
- Absolute.
- Relative.

E.g.: Absolute path for `files,_folders_&_os_(need)` in Learning Portfolio within C Drive (C:).

In [57]:
"C:\learning-portfolio-violetwj\files, folders & os\files,_folders_&_os_(need).ipynb"

'C:\\learning-portfolio-violetwj\x0ciles, folders & os\x0ciles,_folders_&_os_(need).ipynb'

In [58]:
"C:\\learning-portfolio-violetwj\\files, folders & os\\files,_folders_&_os_(need).ipynb"

'C:\\learning-portfolio-violetwj\\files, folders & os\\files,_folders_&_os_(need).ipynb'

Note `\` in path name has to be escaped in Python as `\\`.\
If `\` not escaped, `\f` interpreted as a `\x0c`, which is a form feed character.\
Implies that path copied for folder/ file only shows final format of path name, not the actual code of path name.

## 1.2 More about relative paths

|Notation|Meaning|
|:-:|:-:|
|`.`|'this folder'|
|`..`|'one folder above'|

Given current directory is `C:\learning-portfolio-violetwj\files, folders & os`:
- `.\files,_folders_&_os_(need).ipynb` means that the `.ipynb` file is in the `files, folders & os` folder (i.e. **current** folder).
- `..\files,_folders_&_os_(need).ipynb` means that the `.ipynb` file is in the `learning-portfolio-violetwj` folder (i.e. folder **above**).

### macOS or Linux

Uses `~` to refer to home directory.\
`~` is relative access/ path.\
E.g.: Accessing `Desktop` with `~/Desktop`.

## 1.3 Path separator

Windows and macOS/ Linux use different path separators:
- Windows: `\` => `C:\learning-portfolio-violetwj\files, folders & os\files,_folders_&_os_(need).ipynb`.
- macOS/ Linux: `/`. => `/learning-portfolio-violetwj/files, folders & os/files,_folders_&_os_(need).ipynb`.

To share code and make it work on both systems, cannot hardcode either path separator.\
Implies that method to show path name must not specify either `\` or `/`.\
Solved by using Python `os` package.

## 1.4 Text files vs. Binary files

All files are either **text** files or **binary** files.

Text files:
- Simple, can be opened and contents interpreted by most softwares.
- E.g. of text file formats: `.txt`, `.md`, `.csv`.
- Can get bulky in size despite being simple.

Binary files:
- Need to be processed to make sense of it.
- Can be OS-specific: `Excel.app` with macOS/ Linux, `Excel.exe` with Windows; not interchangable.
- E.g. of binary file formats: `.png`.
- Have advantage of faster speeds and smaller sizes.

## 1.5 Extensions

File names end with extension separated by `.`.\
Extension allows OS to know what software or app to use to extract file details.\
Avoid directly changing file extension; use a software or app as appropriate.\
Format (excluding `[]`): `[file name].[extension]`.\
E.g.:
- `.xlsx` using Excel.
- `.pptx` using PowerPoint.



# 2 Opening and closing files

Using context manager `with` to open file for reading and writing.

## 2.1 Reading data

In [59]:
with open('spectrum-01.txt','r') as file:
    file_content=file.read()

print(file_content)

Light Intensity, Ch A vs Actual Angular Position, Run #4
Actual Angular Position (  )	Light Intensity, Ch A ( % max )
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.3
0.000	-0.2
0.000	-0.2
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.004	-0.1
0.010	-0.2
0.018	-0.2
0.024	-0.3
0.029	-0.3
0.033	-0.3
0.036	-0.2
0.039	-0.1
0.043	-0.1
0.047	-0.1
0.053	-0.1
0.060	-0.1
0.066	-0.1
0.069	-0.1
0.073	-0.1
0.076	-0.1
0.079	-0.1
0.081	-0.1
0.082	-0.1
0.083	-0.2
0.083	-0.2
0.086	-0.2
0.090	-0.2
0.095	-0.2
0.100	-0.3
0.103	-0.3
0.104	-0.2
0.105	-0.3
0.107	-0.2
0.110	-0.2
0.115	-0.1
0.122	-0.2
0.128	-0.1
0.134	-0.2
0.139	-0.1
0.144	-0.2
0.150	-0.2
0.157	-0.2
0.164	-0.2
0.170	-0.3
0.175	-0.3
0.180	-0.2
0.185	-0.2
0.191	-0.1
0.195	-0.1
0.198	-0.2
0.201	-0.1
0.204	-0.2
0.206	-0.2
0.208	-0.3
0.210	-0.3
0.213	-0.1
0.217	0.3
0.222	0.6
0.226	0.2
0.230	0.0
0.233	-0.1
0.235	-0.1
0.237	

`open()` only opens file.\
`r` specifies to only read from file.\
Need not worry about closing file when done.

## 2.2 Writing data

In [60]:
text='Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'

2 writing methods:

### Writing to a file in one go

In [61]:
with open('my-text-once.txt','w') as file:
    file.write(text)

Text file created and named `my-text-once`.

### Writing to a file, line by line

In [62]:
with open('my-text-lines.txt','w') as file:
    for line in text.splitlines():
        file.writelines(line)

Useful for data generated on the spot.\
Writing to a file is very slow operation.\
Worsened by writing to file in loop.

# 3 Some useful packages

Create, copy and delete files/ folders, as well as navigating OS.

|Package|Primary Use|
|:-|:-|
|`os`|'Talk' to OS to create, modify and delete folders,<br> as well as write OS-compatible code.|
|`glob`|Search files.|
|`shutil`|Copy files.|

Using both `os` and `shutil` as `shutil` offers some functions that `os` does not.\
Subtle differences in these functions.

In [63]:
import os
import glob
import shutil

In [64]:
help(os)

Help on module os:

NAME
    os - OS routines for NT or Posix depending on what system we're on.

MODULE REFERENCE
    https://docs.python.org/3.11/library/os.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This exports:
      - all functions from posix or nt, e.g. unlink, stat, etc.
      - os.path is either posixpath or ntpath
      - os.name is either 'posix' or 'nt'
      - os.curdir is a string representing the current directory (always '.')
      - os.pardir is a string representing the parent directory (always '..')
      - os.sep is the (or a most common) pathname separator ('/' or '\\')
      - os.extsep is the extension separator (always '.')
      - os.altsep is the alternate pathn

In [65]:
help(glob)

Help on module glob:

NAME
    glob - Filename globbing utility.

MODULE REFERENCE
    https://docs.python.org/3.11/library/glob.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

FUNCTIONS
    escape(pathname)
        Escape all special characters.
    
    glob(pathname, *, root_dir=None, dir_fd=None, recursive=False, include_hidden=False)
        Return a list of paths matching a pathname pattern.
        
        The pattern may contain simple shell-style wildcards a la
        fnmatch. Unlike fnmatch, filenames starting with a
        dot are special cases that are not matched by '*' and '?'
        patterns by default.
        
        If `include_hidden` is true, the patterns '*', '?', '**'  will match h

In [66]:
help(shutil)

Help on module shutil:

NAME
    shutil - Utility functions for copying and archiving files and directory trees.

MODULE REFERENCE
    https://docs.python.org/3.11/library/shutil.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    XXX The functions here don't copy the resource fork or other metadata on Mac.

CLASSES
    builtins.OSError(builtins.Exception)
        Error
            SameFileError
        ExecError
        SpecialFileError
    
    class Error(builtins.OSError)
     |  Method resolution order:
     |      Error
     |      builtins.OSError
     |      builtins.Exception
     |      builtins.BaseException
     |      builtins.object
     |  
     |  Data descriptors defined here:
  

# 4 OS safe paths

File `data-01.txt` in sub-directory `sg-data` of directory `all-data`.

Access `data-01.txt`:

In [67]:
path=os.path.join('.','all-data','sg-data','data-01.txt')
print(path)

.\all-data\sg-data\data-01.txt


Path shown above is for Windows.\
If not Windows, path is shown as: `./all-data/sg-data/data-01.txt`.\
`os.path.join()` adjusts path with either `\` or `/` as necessary to run on all OS.

# 5 Folders

## 5.1 Creating folders

Create folder with `os.mkdir()`.\
Useful for organising data with code.

In [68]:
os.mkdir('people')

for person in ['John','Paul','Ringo']:
    path=os.path.join('people',person)
    print(f'Creating {path}')           # Not necessary, good to have to know if files are created or not.
    os.mkdir(path)

Creating people\John
Creating people\Paul
Creating people\Ringo


## 5.2 Checking for existence

If code block for creating folders ran twice, there will be error.\
Good to check if file already exists.\
2 ways to check existing folders:
- `try-except`.
- `os.path.exists()`.

### Using try-except

In [69]:
for person in ['John','Paul','Ringo']:
    path=os.path.join('people',person)
    try:
        os.mkdir(path)
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.')

people\John already exists; skipping creation.
people\Paul already exists; skipping creation.
people\Ringo already exists; skipping creation.


### Using os.path.exists()

In [70]:
for person in ['John','Paul','Ringo']:
    path=os.path.join('people',person)
    if os.path.exists(path):
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)
        print(f'Creating {path}')

people\John already exists; skipping creation.
people\Paul already exists; skipping creation.
people\Ringo already exists; skipping creation.


## 5.3 Copying files

Copying SP2273 logo into each folder within 'people' folder:

In [71]:
for person in ['John','Paul','Ringo']:
    path_to_destination=os.path.join('people',person)
    shutil.copy('sp2273_logo.png',path_to_destination)
    print(f'Copied file to {path_to_destination}')

Copied file to people\John
Copied file to people\Paul
Copied file to people\Ringo


Moving all images into sub-folder in each person's directory:
- Create 'imgs' folder.
- Move logo file.

In [72]:
for person in ['John','Paul','Ringo']:
    path_to_imgs=os.path.join('people',person,'imgs')   #  Create folder 'imgs'.
    if not os.path.exists(path_to_imgs):
        os.mkdir(path_to_imgs)

    current_path_of_logo=os.path.join('people',person,'sp2273_logo.png')     # Move logo file.
    new_path_of_logo=os.path.join('people',person,'imgs','sp2273_logo.png')
    shutil.move(current_path_of_logo,new_path_of_logo)
    print(f'Moved logo to {new_path_of_logo}')

Moved logo to people\John\imgs\sp2273_logo.png
Moved logo to people\Paul\imgs\sp2273_logo.png
Moved logo to people\Ringo\imgs\sp2273_logo.png


Can be done very fast with terminal and loop structures.

# 6 Listing and looking for files

Using `glob` to search for files present in folder.

E.g. 1: Search **all** files in current directory.

In [73]:
glob.glob('*')

['files,_folders_&_os_(need).ipynb',
 'my-text-lines.txt',
 'my-text-once.txt',
 'people',
 'sp2273_logo.png',
 'spectrum-01.txt']

Wildcard: `*`, read as 'anything'.

E.g. 2: Refine search to give only files that match 'peo' followed by 'anything'.

In [74]:
glob.glob('peo*')

['people']

E.g. 3: Checking what is inside folders starting with `peo`.

In [75]:
glob.glob('peo*/*')

['people\\John', 'people\\Paul', 'people\\Ringo']

E.g. 4: Show whole, detailed structure of `people` folder.

In [76]:
glob.glob('people/**',recursive=True)

['people\\',
 'people\\John',
 'people\\John\\imgs',
 'people\\John\\imgs\\sp2273_logo.png',
 'people\\Paul',
 'people\\Paul\\imgs',
 'people\\Paul\\imgs\\sp2273_logo.png',
 'people\\Ringo',
 'people\\Ringo\\imgs',
 'people\\Ringo\\imgs\\sp2273_logo.png']

`glob` searches recursively (i.e. **all** sub-file directories), indicated by `recursive=True`.\
Using 2 wildcards `**` to denote all 'sub-directories'.

E.g. 5: Show only `.png` files.

In [77]:
glob.glob('people/**/*.png',recursive=True)

['people\\John\\imgs\\sp2273_logo.png',
 'people\\Paul\\imgs\\sp2273_logo.png',
 'people\\Ringo\\imgs\\sp2273_logo.png']

`glob` searches through whole structure of `people`.\
Shows files with pattern: `[anything].png`.

# 7 Extracting file info

Extracting filename, folder or extension using simple string manipulation.

E.g.: Extract filename and extension.

In [83]:
path='people\Ringo\imgs\sp2273_logo.png'
filename=path.split(os.path.sep)[-1]
extension=filename.split('.')[-1]
print(filename,extension)

sp2273_logo.png png


Note: Path separator indicated in path name depends on OS used.\
`[-1]` picks up last element in list.\
Similar strategy for extension, split filename using `.` and take its last element.

Simple functions of `os`:

In [84]:
path='people\Ringo\imgs\sp2273_logo.png'

Split filename from rest:

In [85]:
os.path.split(path)

('people\\Ringo\\imgs', 'sp2273_logo.png')

Split extension:

In [86]:
os.path.splitext(path)

('people\\Ringo\\imgs\\sp2273_logo', '.png')

Show directory:

In [87]:
os.path.dirname(path)

'people\\Ringo\\imgs'

# 8 Deleting stuff

Remove file:

In [88]:
os.remove('people\Ringo\imgs\sp2273_logo.png')

Remove empty directory:

In [89]:
os.rmdir('people\Ringo')

OSError: [WinError 145] The directory is not empty: 'people\\Ringo'

Remove directory with files with `shutil`:

In [90]:
shutil.rmtree('people\Ringo')

Be very careful when running functions for deleting stuff.\
Use with extreme caution; backup separately when possible.