<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Files, Folders & OS (Need)</span></div>

# What to expect in this chapter

We must communicate with the OS to create, modify, move, copy, and delete files and directories(folders).\
Learning the functions: **os()**, **glob()**, **shutil()**.

# 1 Important concepts

The terms **folder** and **directory** can be used interchangably.

## 1.1 Path

The **path** is simply a way to specify a location on your computer, like an address.\
If we follow the path, it will take is directly to the specified file or folder.

- The path can be specified **absolutely** or **relatively**
    - A path specified **absolutely** gives the exact location of the file or folder and will allow you to arrive at the destination no matter the starting point
    - A path specified **relatively** gives the directions to the specified file or folder **relative** to a particular starting point
    - If the absolute path isn't specified (Starting from the drive), Python will assume a relative path, starting from the current directory
      - In Jupyter Notebook, the current directory is the directory that the used notebook is in
    - While relative paths are often shorter and easier, it is specific to a particular starting point and hence cannot be easily reused
    - Relative paths are often used to move folders about.

## 1.2 More about relative paths

**When dealing with relative paths:**\
**.** represents 'this folder'\
**..** represents 'one folder above'

**.\data-files\data-01.txt** means the file **data-01.txt** in the folder **data-files** in the **current** folder\
**..\data-files\data-01.txt** means the file **data-01.txt** in the folder **data-files** located in the folder **above** the current folder

### macOS or Linux

## 1.3 Path separator

Windows uses **\ (backslash)** as the path separator\
macOS (or Linux) uses **/ (forward slash)** as the path separator

In order for the same code to work on both systems, we must **not hardcode** either path separator.\
The Python **os** package can be used to fix this problem.

## 1.4 Text files vs. Binary files

- All files on your computer can be classified as either **text files** or **binary files**.
    - **Text Files** are simple and can be opened, and their contents can be examined by almost any software (e.g. Notepad, TextEdit, Jupiter).
    - **Binary Files** requires some **processing** to make sense of what they contain.
      - The raw data of a **.png** file will not be the photo, but instead gibberish
      - Only **Excel.app** works on Mac, while only **Excel.exe** works on Windows
    

## 1.5 Extensions

Files are usually named to end with an **extension** separated from the name by a **.**\
The **extension** lets the OS know what software or app to use to extract the details in a file (Eg .pptx lets OS know to use Powerpoint, and .xlsx to use Excel).

# 2 Opening and closing files

We can open a file for reading and writing using the **with** statement (called a context manager).\
**With** is **not a function!!!**

- We use the **open()** function to **open** our file.
- There are 2 important arguments to consider: open(**file**, **mode**)
    - **file** is the path of the file we want to open (if path not specified, opens file of stated name in the current folder)
    - **mode** is an optional string that specifies the mode in which the file is opened (the purpose of opening). By default, it is set to 'r' for read
        - **'r'** to open for reading
        - **'w'** to open for writing
        - **'x'** to create a new file and open it for writing
        - More

In [19]:
help(open)

Help on function open in module _io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise OSError upon failure.

    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)

    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position).
    In text m

## 2.1 Reading data

In [11]:
with open('spectrum-01.txt', 'r') as file: # File must be in current folder, hence that of this particular worksheet
    file_content = file.read()

print(file_content)

Light Intensity, Ch A vs Actual Angular Position, Run #4
Actual Angular Position (  )	Light Intensity, Ch A ( % max )
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.1
0.000	-0.1
0.000	-0.2
0.000	-0.3
0.000	-0.2
0.000	-0.2
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.001	-0.1
0.004	-0.1
0.010	-0.2
0.018	-0.2
0.024	-0.3
0.029	-0.3
0.033	-0.3
0.036	-0.2
0.039	-0.1
0.043	-0.1
0.047	-0.1
0.053	-0.1
0.060	-0.1
0.066	-0.1
0.069	-0.1
0.073	-0.1
0.076	-0.1
0.079	-0.1
0.081	-0.1
0.082	-0.1
0.083	-0.2
0.083	-0.2
0.086	-0.2
0.090	-0.2
0.095	-0.2
0.100	-0.3
0.103	-0.3
0.104	-0.2
0.105	-0.3
0.107	-0.2
0.110	-0.2
0.115	-0.1
0.122	-0.2
0.128	-0.1
0.134	-0.2
0.139	-0.1
0.144	-0.2
0.150	-0.2
0.157	-0.2
0.164	-0.2
0.170	-0.3
0.175	-0.3
0.180	-0.2
0.185	-0.2
0.191	-0.1
0.195	-0.1
0.198	-0.2
0.201	-0.1
0.204	-0.2
0.206	-0.2
0.208	-0.3
0.210	-0.3
0.213	-0.1
0.217	0.3
0.222	0.6
0.226	0.2
0.230	0.0
0.233	-0.1
0.235	-0.1
0.237	

## 2.2 Writing data

In [15]:
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.'
# Assigning the string to the variable 'text'

### Writing to a file in one go

In [20]:
with open('my-text-once.txt', 'w') as file: # Why did 'w' mode create file?
    file.write(text)

### Writing to a file, line by line

In [27]:
with open('my-text-lines.txt', 'w') as file:
    for line in text.splitlines():
        file.writelines(line)

We use the function **text.splitlines()** to break the saved text at line boundaries and return individual lines of the string.\
We use a **for** loop to write each line one by one to the file.

In [26]:
print(text.splitlines(1))

['Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\n', 'Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.']


In [24]:
help(text.splitlines)

Help on built-in function splitlines:

splitlines(keepends=False) method of builtins.str instance
    Return a list of the lines in the string, breaking at line boundaries.

    Line breaks are not included in the resulting list unless keepends is given and
    true.



# 3 Some useful packages

The **os** package is used to 'talk' to the OS to create, modify, delete folders as write OS-agnostic code.\
The **glob** package is used to search for files.\
The **shutil** package is used to copy files.

In [28]:
import os
import glob
import shutil

# 4 OS safe paths

In [41]:
path_no_dot = os.path.join('all-data', 'sg-data', 'data-01.txt')
path_single_dot = os.path.join('.', 'all-data', 'sg-data', 'data-01.txt') # The single dot represents current folder and is not actually needed, since relative paths default to the current folder
path_double_dot = os.path.join('..', 'all-data', 'sg-data', 'data-01.txt')
print(path_no_dot)
print(path_single_dot)
print(path_double_dot)

all-data\sg-data\data-01.txt
.\all-data\sg-data\data-01.txt
..\all-data\sg-data\data-01.txt


- In the function **os.path.join()**:
    - **os** is the package
    - **path** is an attribute of the **os** package. We can import only the **path** attribute using "from os import path" or "import os.path", or simply "import os" to import all of os's attributes, including path
      - **os.path** can be thought of as a **sub-package** for simplicity's sake
    - **join** is a function of the sub-package "os.path"
      - **join** has 2 arguments: **path** and ***paths**
        - **path** is an optional argument representing a path segment that can be previously defined such that one can reuse it to access many files from the same directory
        - ***paths** is an infinite amount of further path segments to define the exact location of the desired file
        - Both **path** and ***paths** can be used with a **for** loop to define many files in the same folder using a list
      - It literally **joins** path segments, that allows us to print out a OS-agnostic final path that uses **/** and **\\** accordingly to the OS
      - The function outputs the final path but does not return it. We can then assign it to a variable to use.

In [36]:
help(os.path.join)

Help on function join in module ntpath:

join(path, *paths)



# 5 Folders

## 5.1 Creating folders

- We can create **folders** or **directories** using **os.mkdir()**
    - **mkdir** is an attribute of the package **os** and is used to **make directories**


In [44]:
os.mkdir('people')

for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    print(f'Creating {path}') # Not needed, will create in background
    os.mkdir(path)

Creating people\John
Creating people\Paul
Creating people\Ringo


## 5.2 Checking for existence

Python will give an error if we try to run the above code twice. This is because it is unable to create folders (or files) when another of the same name already exists. Hence, we can use the below methods to check if folders (or files) of particular names already exist.\
Python refers to folders as files, giving **FileExceptError** in both cases.

### Using try-except

In [49]:
for person in ['John', 'Paul', 'Ringo']:                    # Creating list of folders to check, without exact path
    path = os.path.join('people', person)                   # 'people' as a path segment representing the folder to check from, person presenting each element in the list (no quotation marks since it is a defined variable), which are further path segments
    try:
        os.mkdir(path)                                      # Tries to create path
        print(f'Creating {path}')
    except FileExistsError:
        print(f'{path} already exists; skipping creation.') # Lets us know if attempt to create file/folder work, instead of giving an Error

people\John already exists; skipping creation.
people\Paul already exists; skipping creation.
people\Ringo already exists; skipping creation.


### Using os.path.exists()

In [45]:
for person in ['John', 'Paul', 'Ringo']:
    path = os.path.join('people', person)
    if os.path.exists(path):                                 # A function of os.path that simply checks if a file in the specified path exists
        print(f'{path} already exists; skipping creation.')
    else:
        os.mkdir(path)                                       # Creates folder/file if it doesn't already exist
        print(f'Creating {path}')

people\John already exists; skipping creation.
people\Paul already exists; skipping creation.
people\Ringo already exists; skipping creation.


## 5.3 Copying files

- We can use the **shutil.copy(src, dst)** function, which is part of the **shutil** package, to copy files.
    - There are 2 important arguments, **src** and **dst**, both of which should be **path-like objects
      - **src** represents the file to be copied, and should be the file's path
      - **dst** represents the destination the file should be copied to
        - If **dst** is a folder, it will copy the file to the folder
        - If **dst** is a file, it will replace the old file

In [51]:
help(shutil.copy)

Help on function copy in module shutil:

copy(src, dst, *, follow_symlinks=True)
    Copy data and mode bits ("cp src dst"). Return the file's destination.

    The destination may be a directory.

    If follow_symlinks is false, symlinks won't be followed. This
    resembles GNU's "cp -P src dst".

    If source and destination are the same file, a SameFileError will be
    raised.



In [53]:
for person in ['John', 'Paul', 'Ringo']:                    # Creating list of folders to be copied to, without exact path
    path_to_destination = os.path.join('people', person)
    shutil.copy('sp2273_logo.png', path_to_destination)     # Copying file to each destination (folder) in the list
    print(f'Copied file to {path_to_destination}')

Copied file to people\John
Copied file to people\Paul
Copied file to people\Ringo


In [54]:
for person in ['John', 'Paul', 'Ringo']:
    # Create folder 'imgs'
    path_to_imgs = os.path.join('people', person, 'imgs')                    # Defines path to new target subfolders (destinations)
    if not os.path.exists(path_to_imgs):                                     # Checking if 'imgs' subfolder exists in each of the folders in the list
        os.mkdir(path_to_imgs)                                               # Creates the subfolder if it doesn't exist

    # Move logo file
    current_path_of_logo = os.path.join('people', person, 'sp2273_logo.png')             # Defining current path of logo
    new_path_of_logo = os.path.join('people', person, 'imgs', 'sp2273_logo.png')         # Defining new target path of logo

    shutil.move(current_path_of_logo, new_path_of_logo)                                  # Moving image from original folders into subfolders
    print(f'Moved logo to {new_path_of_logo}')

Moved logo to people\John\imgs\sp2273_logo.png
Moved logo to people\Paul\imgs\sp2273_logo.png
Moved logo to people\Ringo\imgs\sp2273_logo.png


# 6 Listing and looking for files

In [56]:
help(glob.glob)

Help on function glob in module glob:

glob(pathname, *, root_dir=None, dir_fd=None, recursive=False, include_hidden=False)
    Return a list of paths matching a pathname pattern.

    The pattern may contain simple shell-style wildcards a la
    fnmatch. Unlike fnmatch, filenames starting with a
    dot are special cases that are not matched by '*' and '?'
    patterns by default.

    If `include_hidden` is true, the patterns '*', '?', '**'  will match hidden
    directories.

    If `recursive` is true, the pattern '**' will match any files and
    zero or more directories and subdirectories.



- **globglob()** searches for **files** or **folders** inside a specified path
- There are 2 important arguments for **globglob()**, **pathname** and **recursive**
    - **pathname** specifies the path to search in
      - We can use **\*** to represent a **wildcard**
        - Using **\*** by itself searches for **anything**
        - Using **\*** before or after a string searches for files or folders that **start or end** with the string
        - We can also use end the path with **\*.png** to only search for files of a specific extension
      - In the pathname, we can use **/** even in Windows
    - **recursive=True** tells glob to search **recursively**, which digs through **all file sub-file directories**
      - By default, recursive is set to False. We must specify **recursive=True**
      - We must also end the path with **\*\*** to search recursively
        - When searching for files of a specific extension recursively, we should type the **\*\*** to search recursively **before** the extension
          - Eg. **people/\*\*/\*.png** (with backslashes instead)

In [59]:
glob.glob('*') # Searches for all files/folders in the current directory

['files,_folders_&_os_(need).ipynb',
 'my-text-lines.txt',
 'my-text-once.txt',
 'people',
 'sp2273_logo.png',
 'spectrum-01.txt']

In [63]:
glob.glob('peo*') # Searches for all files/folders that start with 'peo' in the current directory

['people']

In [119]:
glob.glob('peo*/*') # Searches for all files/folders INSIDE all files/folders starting with 'peo' in the current directory

['people\\John', 'people\\Paul', 'people\\Ringo']

In [120]:
glob.glob('people/**', recursive=True) # Searches the folder named 'people' recursively

['people\\',
 'people\\John',
 'people\\John\\imgs',
 'people\\John\\imgs\\sp2273_logo.png',
 'people\\Paul',
 'people\\Paul\\imgs',
 'people\\Paul\\imgs\\sp2273_logo.png',
 'people\\Ringo',
 'people\\Ringo\\imgs',
 'people\\Ringo\\imgs\\sp2273_logo.png']

In [68]:
glob.glob('people/**/*.png', recursive=True) # Searches the folder named 'people' recursively, and returns the files that end in .png

['people\\John\\imgs\\sp2273_logo.png',
 'people\\Paul\\imgs\\sp2273_logo.png',
 'people\\Ringo\\imgs\\sp2273_logo.png']

# 7 Extracting file info

- **Split** is a **method** in Python, that is used with a defined **string** to **split** it, via **string.split()**
  - It **splits** the string into separate elements and gives a **list** containing the elements
  - The **split** method has 2 arguments, **separator** and **maxsplit**
    - **separator** is an optional argument that specifies the separator to use when splitting the string. By default, any whitespace is a separator
    - **maxsplit** is an optional argument that specifies how many splits to do. Its default value is -1, which means 'all occurrences'
- We can use the **split** method to split paths, in order to extract file info from a path
  - We use the separator **os.path.sep** from the os package
    - This specifies the separator as **/** or **\\** depending on OS
    - On windows (where the separator is **\\**), we must use double backslashes in the pathname **\\\\\\**
  - Since the split method gives a **list**, we can follow it with **[index]** to pick a particular element from the list
    - In many cases we only want the last element (when we want the filename or extension), so we can use **[-1]**

In [8]:
import os
path = r'people\Ringo\imgs\sp2273_logo.png' # r string takes the code literally, ignoring \ as an escape sequence
filename = path.split(os.path.sep)[-1]
extension = filename.split('.')[-1]
print(filename, extension)

sp2273_logo.png png


In [4]:
path = "people" + os.path.sep + "Ringo"
print(path)

people\Ringo


**Q**: Why is it that copying the path in Windows gives single backslashes, but we need double backslashes in order to properly split using os.path.sep?

- Instead of using the **split method** (which is native to Python), we can also use **functions** from the **os package** to extract specific file information
  - The function **os.path.split(path)** splits the path into a **tuple (head, tail)** where the head and tail is everything before and after the final slash
  - The function **os.path.splitext(path)** splits the path into a **tuple (root, ext)** where the root and ext is everything before and after the final dot
    - A leading period will be considered to be part of the root
    - If there is no period, **ext** will be empty
  - The function **os.path.dirname(path)** returns the directory component of the path
    - In most cases, this will be the same as **head** in **os.path.split**

In [139]:
path = 'people/Ringo/imgs/sp2273_logo.png'

In [134]:
os.path.split(path)      # Split filename from the rest

('people/Ringo/imgs', 'sp2273_logo.png')

In [137]:
os.path.splitext(path)   # Split extension

('people/Ringo/imgs/sp2273_logo', '.png')

In [132]:
os.path.dirname(path)    # Show the directory

'people/Ringo/imgs'

# 8 Deleting stuff

- We can use **functions** from the **os** and **shutil** packages to **remove/delete** files/folders
  - **os.remove(path)** deletes the file specified in the path
  - **os.rmdir(path)** deletes the directory specified in the path
    - This only works with **empty** directories
  - **shutil.rmtree(path)** deletes **directories with files**, or a directory tree

In [143]:
os.remove('people/Ringo/imgs/sp2273_logo.png')

In [144]:
os.rmdir('people/Ringo')

OSError: [WinError 145] The directory is not empty: 'people/Ringo'

In [146]:
shutil.rmtree('people/Ringo')