# io and os modules

The Input Output module ```io``` is used for reading and writing data to a file.

The Operating System ```os``` module is used to navigate around the Operating System and has equivalent functions to the native scripting languages PowerShell in Windows or bash in Linux.

## io module

### Text Files

A text file called ```text.txt``` can be created in the same folder as the Interactive Python Notebook File:

```
Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full.

One for my master,
One for my dame,
And one for the little boy
Who lives down the lane.
```

Text files can be viewed in Notepad++ with View → Show Symbol → Show All Characters:

<img src='./images/img_001.png' alt='img_001' width='500'/>

Notice that there is a ```CRLF``` at the end of each line instructing to move onto the next row. This stands for carriage return and line feed.

### open function

The ```open``` function in the ```io``` module is used for opening text and binary files. The module can be imported and the docstring viewed:

In [1]:
import io

In [2]:
io.open?

[1;31mSignature:[0m
[0mio[0m[1;33m.[0m[0mopen[0m[1;33m([0m[1;33m
[0m    [0mfile[0m[1;33m,[0m[1;33m
[0m    [0mmode[0m[1;33m=[0m[1;34m'r'[0m[1;33m,[0m[1;33m
[0m    [0mbuffering[0m[1;33m=[0m[1;33m-[0m[1;36m1[0m[1;33m,[0m[1;33m
[0m    [0mencoding[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0merrors[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mnewline[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mclosefd[0m[1;33m=[0m[1;32mTrue[0m[1;33m,[0m[1;33m
[0m    [0mopener[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Open file and return a stream.  Raise OSError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returne

Because this function is so commonly used, a copy of it is included in ```builtins```:

In [3]:
open?

[1;31mSignature:[0m
[0mopen[0m[1;33m([0m[1;33m
[0m    [0mfile[0m[1;33m,[0m[1;33m
[0m    [0mmode[0m[1;33m=[0m[1;34m'r'[0m[1;33m,[0m[1;33m
[0m    [0mbuffering[0m[1;33m=[0m[1;33m-[0m[1;36m1[0m[1;33m,[0m[1;33m
[0m    [0mencoding[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0merrors[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mnewline[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mclosefd[0m[1;33m=[0m[1;32mTrue[0m[1;33m,[0m[1;33m
[0m    [0mopener[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Open file and return a stream.  Raise OSError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returned I/O object is closed

The ```open``` function requires a file which can be specified directly when it is in the same folder as the interactive Python notebook file (or Python script file).

The ```mode``` keyword input argument can be specified using a single letter:

|mode|definition|
|---|---|
|'r'|open an existing file and read existing content|
|'w'|open an existing file and write over existing content|
|'a'|open an existing file and append new content|
|'x'|create a new file and write new content|

The ```encoding``` keyword argument is used to specify the encoding, which recall was discussed in detail when the ```bytes``` class examined in a previous notebook. The encoding has a default value of ```'utf-8'``` but if the data was processed with a Microsoft Product may require ```'utf-8-sig'``` in order to remove an unwanted BOM. To recap:

|encoding|bytes per character|bits per character|byte order|byte order marker BOM|
|---|---|---|---|---|
|'utf-8'|1, 2, 3, 4|8, 16, 24, 32|big endian| |
|'utf-8-sig'|1, 2, 3, 4|8, 16, 24, 32|big endian|efbbbf|
|'utf-32'|4|32|little endian|fffe0000|
|'utf-32-le'|4|32|little endian| |
|'utf-32-be'|4|32|big endian| |
|'utf-16'|2|16|little endian|fffe|
|'utf-16-le'|2|16|little endian| |
|'utf-16-be'|2|16|big endian| |
|'latin1'|1|8| ||
|'ascii'|1|8| ||

The ```newline``` keyword input argument can be used to specify the character that is used to represent a new line.

On Linux this is normally just the new line character escape character ```'\n'```. 

On Windows two escape characters carriage return and new line are used ```'\r\n'```.

The ```errors``` keyword input argument is used to handle errors, normally due to encoding issues and are set to ```'strict'``` by default.

A file can be opened:

In [4]:
file = open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n')

Under the hood, this is actually the initialisation signature of the ```TextIOWrapper``` class and a new instance is created with the instance name ```file```. This can be seen when the datatype of ```file``` is checked:

In [5]:
type(file)

_io.TextIOWrapper

The ```_io``` indicates that this class is from the ```io``` module. The prefix with an underscore means the module is internally being used here. The ```open``` function is essentially equivalent to the initialisation method of this class:

In [6]:
io.TextIOWrapper?

[1;31mInit signature:[0m
[0mio[0m[1;33m.[0m[0mTextIOWrapper[0m[1;33m([0m[1;33m
[0m    [0mbuffer[0m[1;33m,[0m[1;33m
[0m    [0mencoding[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0merrors[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mnewline[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mline_buffering[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mwrite_through[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
Character and line based layer over a BufferedIOBase object, buffer.

encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getencoding().

errors determines the strictness of encoding and decoding (see
help(codecs.Codec) or the documentation for codecs.register) and
defaults to "strict".

newline controls how line endings are handled. It can be None, '',
'\n', '\r', and '\

Data can be read from the ```TextIOWrapper``` instance for example using the files ```readlines``` method returning a list of strings:

In [7]:
file.readlines()

['Baa, baa, black sheep,\r\n',
 'Have you any wool?\r\n',
 'Yes, sir, yes, sir,\r\n',
 'Three bags full.\r\n',
 '\r\n',
 'One for my master,\r\n',
 'One for my dame,\r\n',
 'And one for the little boy\r\n',
 'Who lives down the lane.']

After working with a file, it should be closed. The file can be closed using the ```TextIOWrapper``` close method:

In [8]:
file.close()

The ```print_identifier_group``` function from the custom ```helper_module``` can be imported to view the identifiers in more detail:

In [9]:
from helper_module import print_identifier_group

The ```TextIOWrapper``` class has the standard ```object``` based datamodel attributes and identifiers seen before. There are some additions such as ```__enter__``` (*dunder enter*) and ```__exit__``` (*dunder exit*):

In [10]:
print_identifier_group(file, kind='datamodel_attribute')

['__dict__', '__doc__']


In [11]:
print_identifier_group(file, kind='datamodel_method')

['__class__', '__del__', '__delattr__', '__dir__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']


The ```TextIOWrapper``` class has a number of attributes, most of these correspond to the input arguments provided when initialising the instance:

In [12]:
print_identifier_group(file, kind='attribute')

['_CHUNK_SIZE', '_finalizing', 'buffer', 'closed', 'encoding', 'errors', 'line_buffering', 'mode', 'name', 'newlines', 'write_through']


The ```TextIOWrapper``` method ```readable``` will check whether a file is readable returning a boolean. The method ```read``` will read the entire file as a Unicode string, the method ```readline``` will read an individual line as a string and then advance, while the method ```readlines``` will read every line returning a list of Unicode strings corresponding to each line.

The methods ```writable```, ```write``` and ```writelines``` are the write counterparts.

The methods ```seekable``` and ```seek``` relate to the cursor position.

In [13]:
print_identifier_group(file, kind='method')

['_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'detach', 'fileno', 'flush', 'isatty', 'read', 'readable', 'readline', 'readlines', 'reconfigure', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']


The datamodel methods ```__enter__``` (*dunder enter*) and ```__exit__``` (*dunder exit*) are used by a ```with``` code block to open the file when the code block begins and close the file when the code block is exited respectively:

In [14]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    print(file.name)
    print(file.mode)
    print(file.encoding)
    print(file.errors)
    print('readable: ', file.readable())
    print('writeable: ', file.writable())
    print('seekable: ', file.seekable())

text.txt
r
utf-8
strict
readable:  True
writeable:  False
seekable:  True


The ```TextIOWrapper``` method ```read``` can be used to read the entire contents of the text file as a single string, notice that this includes carriage returns and new line escape characters:

In [15]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data1 = file.read()

In [16]:
data1

'Baa, baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

When printed this gives a similar display to the file opened in Notepad:

In [17]:
print(data1)

Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full.

One for my master,
One for my dame,
And one for the little boy
Who lives down the lane.


The ```TextIOWrapper``` method ```readline``` will only read a single line:

In [18]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data2 = file.readline()

Note that the end of this string includes the carriage return and new line escape characters:

In [19]:
data2

'Baa, baa, black sheep,\r\n'

These whitespace characters can be stripped using the string method ```strip```:

In [20]:
data2.strip()

'Baa, baa, black sheep,'

The ```TextIOWrapper``` method ```readlines``` will instead output a list of strings:

In [21]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data3 = file.readlines()

Notice the square brackets ```[]``` enclosing the list and comma delimiter between each line. Each line also includes the carriage return and newline character:

In [22]:
data3

['Baa, baa, black sheep,\r\n',
 'Have you any wool?\r\n',
 'Yes, sir, yes, sir,\r\n',
 'Three bags full.\r\n',
 '\r\n',
 'One for my master,\r\n',
 'One for my dame,\r\n',
 'And one for the little boy\r\n',
 'Who lives down the lane.']

These can be removed using a list comprehension:

In [23]:
data4 = [line.strip() for line in data3]

In [24]:
data4

['Baa, baa, black sheep,',
 'Have you any wool?',
 'Yes, sir, yes, sir,',
 'Three bags full.',
 '',
 'One for my master,',
 'One for my dame,',
 'And one for the little boy',
 'Who lives down the lane.']

The length of the single string obtained using the ```TextIOWrapper``` method ```read``` is:

In [25]:
len(data1)

175

In [26]:
data1

'Baa, baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

The 5th character and onwards can be seen by indexing into the string using the slice:

In [27]:
data1[5:]

'baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

Each character in the file also has a zero-ordered numeric index which the cursor can be placed at using the ```TextIOWrapper``` method ```seek```:

In [28]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.seek(5)
    data5 = file.read()

This gives a similar result to slicing of the string as seen above:

In [29]:
data5

'baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

And the length of the string is ```175 - 5```:

In [30]:
len(data5)

170

The ```TextIOWrapper``` method ```write``` can be used to write text to a file, note that this file is opened with ```mode='w'```. 

On Linux the keyword input argument should be ```newline='\n'``` and the ```\n``` should be incorporated in a string for a new line.

On Windows the keyword input argument should be ```newline='\r\n'``` however ```\n``` should be incorporated in a string for a new line. Each ```\n``` used in the ```write``` method will be converted into ```\r\n```.

In [31]:
with open('text2.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Hello World!\nBye World!')

This file can then be read using the ```TextIOWrapper``` method ```read```, note that this file is opened with ```mode='r'``` and ```newline='\r\n'```:

In [32]:
with open('text2.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data6 = file.read()

In [33]:
data6

'Hello World!\r\nBye World!'

Be careful not to use ```\r\n``` within the ```TextIOWrapper``` method ```write``` as the ```\n``` will be converted into ```\r\n``` and the result will be ```\r\r\n``` which is wrong:

In [34]:
with open('text3.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Hello World!\r\nBye World!')

with open('text3.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data7 = file.read()
    
data7

'Hello World!\r\r\nBye World!'

The method ```writelines``` can be used to write a list of strings to a file. Note once again that this file is opened with ```mode='w'```. 

On Linux the keyword input argument should be ```newline='\n'``` and the ```\n``` should be at the end of each string.

On Windows the keyword input argument should be ```newline='\r\n'``` however and each string should end using ```\n```. Each ```\n``` used in the ```writelines``` method will be converted into ```\r\n```.

In [35]:
with open('text4.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.writelines(['Hello World!\n', 'Bye World!'])

with open('text4.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data8 = file.read()
    
data8

'Hello World!\r\nBye World!'

When ```mode='w'``` any content in an existing file is removed:

In [36]:
with open('text4.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('J')

with open('text4.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data9 = file.read()
    
data9

'J'

When ```mode='a'``` any content in an existing file is instead appended:

In [37]:
with open('text2.txt', mode='a', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Appended')

with open('text2.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data10 = file.read()
    
data10

'Hello World!\r\nBye World!Appended'

Three variations of ```mode``` were seen ```'r'```, ```'w'``` and ```'a'``` and and have the alias ```'rt'```, ```'wt'``` and ```'at'``` indicating that this function is being used on a text file and a text ```.txt``` file uses a Unicode string with ```'utf-8'``` encoding by default.

```'rb'```, ```'wb'``` and ```'ab'``` instead indicate that this function is being used on a binary file and a binary ```.bin``` file uses a byte string. There is no ```encoding```, ```errors```, ```newline``` as raw ```bytes``` are used. When in binary mode, all the ```TextIOWrapper``` methods expect a byte string.

In [38]:
with open('text5.bin', mode='wb') as file:
    file.write(b'\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0d\x0a\x48\x65\x6c\x6c\x6f')

with open('text5.bin', mode='rb') as file:
    data11 = file.read()
    
data11

b'Hello World!\r\nHello'

The binary file can be read into text mode if the correct encoding is supplied:

In [39]:
with open('text5.bin', mode='rt', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data12 = file.readlines()
    
data12

['Hello World!\r\n', 'Hello']

## os module

So far, the only text files examined have been in the same folder as the Interactive Python Notebook File. The Operating System Module ```os``` can be used to navigate around the Operating System. To import the module use:

In [40]:
import os

A summary about the module can be found using ```?```

In [41]:
os?

[1;31mType:[0m        module
[1;31mString form:[0m <module 'os' (frozen)>
[1;31mFile:[0m        c:\users\pyip\miniconda3\envs\jupyterlab\lib\os.py
[1;31mDocstring:[0m  
OS routines for NT or Posix depending on what system we're on.

This exports:
  - all functions from posix or nt, e.g. unlink, stat, etc.
  - os.path is either posixpath or ntpath
  - os.name is either 'posix' or 'nt'
  - os.curdir is a string representing the current directory (always '.')
  - os.pardir is a string representing the parent directory (always '..')
  - os.sep is the (or a most common) pathname separator ('/' or '\\')
  - os.extsep is the extension separator (always '.')
  - os.altsep is the alternate pathname separator (None or '/')
  - os.pathsep is the component separator used in $PATH etc
  - os.linesep is the line separator in text files ('\r' or '\n' or '\r\n')
  - os.defpath is the default search path for executables
  - os.devnull is the file path of the null device ('/dev/null', etc.)

Pro

More details can be seen using ```help```:

In [42]:
help(os)

Help on module os:

NAME
    os - OS routines for NT or Posix depending on what system we're on.

MODULE REFERENCE
    https://docs.python.org/3.11/library/os.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This exports:
      - all functions from posix or nt, e.g. unlink, stat, etc.
      - os.path is either posixpath or ntpath
      - os.name is either 'posix' or 'nt'
      - os.curdir is a string representing the current directory (always '.')
      - os.pardir is a string representing the parent directory (always '..')
      - os.sep is the (or a most common) pathname separator ('/' or '\\')
      - os.extsep is the extension separator (always '.')
      - os.altsep is the alternate pathn

In [43]:
print_identifier_group(os, kind='attribute')

['abc', 'altsep', 'curdir', 'defpath', 'devnull', 'environ', 'extsep', 'linesep', 'name', 'pardir', 'path', 'pathsep', 'sep', 'st', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'sys']


In [44]:
print_identifier_group(os, kind='method')

['_Environ', '_check_methods', '_execvpe', '_exists', '_exit', '_fspath', '_get_exports_list', '_walk', 'abort', 'access', 'add_dll_directory', 'chdir', 'chmod', 'close', 'closerange', 'cpu_count', 'device_encoding', 'dup', 'dup2', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'fdopen', 'fsdecode', 'fsencode', 'fspath', 'fstat', 'fsync', 'ftruncate', 'get_exec_path', 'get_handle_inheritable', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getenv', 'getlogin', 'getpid', 'getppid', 'isatty', 'kill', 'link', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'open', 'pipe', 'popen', 'putenv', 'read', 'readlink', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'set_handle_inheritable', 'set_inheritable', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'startfile', 'stat', 'strerror', 'symlink', 'system', 'times', 'truncate', 'umask', 'unlink', 'unsetenv', 'urandom', 'utime', 'waitpid', 'waitstatus_to_exitcode', 'walk', '

The main purpose of the ```os``` module is to navigate around the Operating System and as a consequence many of its identifiers are grouped under the ```path``` module:

In [45]:
os.path?

[1;31mType:[0m        module
[1;31mString form:[0m <module 'ntpath' (frozen)>
[1;31mFile:[0m        c:\users\pyip\miniconda3\envs\jupyterlab\lib\ntpath.py
[1;31mDocstring:[0m  
Common pathname manipulations, WindowsNT/95 version.

Instead of importing this module directly, import os and refer to this
module as os.path.

In [46]:
print_identifier_group(os.path, kind='attribute')

['_LCMAP_LOWERCASE', '_LOCALE_NAME_INVARIANT', 'altsep', 'curdir', 'defpath', 'devnull', 'extsep', 'genericpath', 'os', 'pardir', 'pathsep', 'sep', 'stat', 'supports_unicode_filenames', 'sys']


In [47]:
print_identifier_group(os.path, kind='method')

['_LCMapStringEx', '_abspath_fallback', '_get_bothseps', '_getfinalpathname', '_getfinalpathname_nonstrict', '_getfullpathname', '_getvolumepathname', '_nt_readlink', '_path_normpath', '_readlink_deep', 'abspath', 'basename', 'commonpath', 'commonprefix', 'dirname', 'exists', 'expanduser', 'expandvars', 'getatime', 'getctime', 'getmtime', 'getsize', 'isabs', 'isdir', 'isfile', 'islink', 'ismount', 'join', 'lexists', 'normcase', 'normpath', 'realpath', 'relpath', 'samefile', 'sameopenfile', 'samestat', 'split', 'splitdrive', 'splitext']


The ```os``` attribute ```name``` will give the name of the operating system. ```nt``` for Windows and ```posix``` for Linux. On this Windows machine:

In [48]:
os.name

'nt'

The ```os``` module is further compartmentalised into path related identifiers via the ```path``` attribute. This module contains the path related attributes. ```curdir``` is the current directory:

In [49]:
os.path.curdir

'.'

```pardir``` is the parent directory:

In [50]:
os.path.pardir

'..'

These have the same values on Windows and Linux so ```.``` and ```..``` are commonly used directly.

```extsep``` is the extension seperator which splits the file name from the file extension:

In [51]:
os.path.extsep

'.'

This has the same value as ```curdir``` on Windows and Linux so ```.``` is commonly used for this also.

The main difference is in the seperator as Windows preferences the back slash ```\``` while Linux uses the forward slash ```/```. 

In Python ```\``` is used to insert an escape character in a string and has to be presented as ```\\```:

In [52]:
os.path.sep

'\\'

Windows also recognises the forward slash ```/``` as an alternative seperator:

In [53]:
os.path.altsep

'/'

These are commonly used so accessible also from ```os``` directly:

In [54]:
os.curdir

'.'

In [55]:
os.pardir

'..'

In [56]:
os.extsep

'.'

In [57]:
os.sep

'\\'

In [58]:
os.altsep

'/'

```linesep``` is an attribute for the line seperator in a file. Recall this is ```'\r\n'``` in Windows and ```'\n'``` in Linux/Mac:

In [59]:
os.linesep

'\r\n'

My ```%UserProfile%``` on this Windows computer is found in:

In [60]:
'C:\\Users\\Philip'

'C:\\Users\\Philip'

This can be simplified using a raw string:

In [61]:
r'C:\Users\Philip'

'C:\\Users\\Philip'

Alternatively it could be constructed using:

In [62]:
'C:' + os.sep + 'Users' + os.sep + 'Philip' 

'C:\\Users\\Philip'

Using ```os.sep``` is slightly more reliable than manually placing backslashes as it is easy to miss one or include one additional. The ```join``` function from the ```os.path``` module will automatically include seperators:

In [63]:
os.path.join('C:\\', 'Users', 'Philip')

'C:\\Users\\Philip'

And this function is generally quite smart at removing excess seperators which would otherwise result in the path not being found:

In [64]:
os.path.join('C:\\', '\\Users', 'Philip')

'C:\\Users\\Philip'

Hardcoding an absolute path like the above is bad practice. If a file is being searched for in this absolute path above, it'll work on my computer but it won't work if you copy the code on your computer because your ```~``` (```%USERPROFILE%``` for Windows or ```HOME``` for Linux/Mac) will be different. The ```os``` module has an ```environ``` dictionary attribute which is used to access environmental variables which are essentially relative locations in accordance to your user profile:

In [65]:
os.environ

environ{'ALLUSERSPROFILE': 'C:\\ProgramData',
        'APPDATA': 'C:\\Users\\pyip\\AppData\\Roaming',
        'CHROME_CRASHPAD_PIPE_NAME': '\\\\.\\pipe\\crashpad_22616_TCCFFOYCAPAXVOJY',
        'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files',
        'COMMONPROGRAMFILES(X86)': 'C:\\Program Files (x86)\\Common Files',
        'COMMONPROGRAMW6432': 'C:\\Program Files\\Common Files',
        'COMPUTERNAME': 'IBH-APP-PYIP',
        'COMSPEC': 'C:\\WINDOWS\\system32\\cmd.exe',
        'CONDA_DEFAULT_ENV': 'jupyterlab',
        'CONDA_EXE': 'C:\\Users\\pyip\\miniconda3\\Scripts\\conda.exe',
        'CONDA_EXES': '"C:\\Users\\pyip\\miniconda3\\condabin\\..\\Scripts\\conda.exe"  ',
        'CONDA_PREFIX': 'C:\\Users\\pyip\\miniconda3\\envs\\jupyterlab',
        'CONDA_PROMPT_MODIFIER': '(jupyterlab) ',
        'CONDA_PYTHON_EXE': 'C:\\Users\\pyip\\miniconda3\\python.exe',
        'CONDA_ROOT': 'C:\\Users\\pyip\\miniconda3',
        'CONDA_SHLVL': '1',
        'DRIVERDATA': 'C:\\Window

The function ```getenv``` reads an environmental variable from this dictionary although it is more common to index into the dictionary using the key.

Windows and Linux have different names and locations of Environmental Variables. Therefore the keys for the ```os.environ``` dictionary are different. In Windows the ```USERNAME``` can be obtained using the key ```'USERNAME'```, on Linux the ```USER``` can be obtained using the key ```'USER'```:

A check can be made for ```os.name``` and the appropriate environmental variable added::

In [66]:
if(os.name == 'nt'):
    # if Windows
    name = os.environ['USERNAME']
else:
    # else Linux/Mac
    name = os.environ['USER']
    
name

'PYip'

And so to get to the ```USERPROFILE```:

In [67]:
if(os.name == 'nt'):
    # if Windows
    home = os.path.join('C:\\', 'Users', os.environ['USERNAME'])
else:
    # else Linux/Mac
    home = os.path.join(os.sep + 'home', os.environ['USER'])
    
home

'C:\\Users\\PYip'

The ```USERPROFILE``` can also be selected using the key ```'USERPROFILE'``` on Windows or ```HOME``` on Linux/Mac:

In [68]:
if(os.name == 'nt'):
    # if Windows
    home = os.environ['USERPROFILE']
else:
    # else Linux/Mac
    home = os.environ['HOME']
    
home

'C:\\Users\\pyip'

This can be used in the ```join``` function from ```os.path``` to get to Documents:

In [69]:
if(os.name == 'nt'):
    # if Windows
    home = os.environ['USERPROFILE']
else:
    # else Linux/Mac
    home = os.environ['HOME']
    
documents = os.path.join(home, 'Documents')
documents

'C:\\Users\\pyip\\Documents'

Alternatively the ```expanduser``` method can be used, to expand a path from USERPROFILE on Windows and HOME on Linux/Mac using ```'~'```

In [70]:
os.path.expanduser('~') 

'C:\\Users\\pyip'

Care needs to be taken with seperators wit this method:

In [71]:
os.path.expanduser('~' + os.sep + 'Documents') 

'C:\\Users\\pyip\\Documents'

In Windows, the environmental Variables are normally upper case and enclosed in ```%```. The ```os.path``` function ```expandvars``` can be used to expand these locations:

In [72]:
os.path.expandvars('%USERPROFILE%' + os.sep + 'Documents')

'C:\\Users\\pyip\\Documents'

The current working directory can be found using the ```os``` function ```getcwd```:

In [73]:
os.getcwd()

'c:\\Users\\pyip\\Documents\\GitHub\\python-notebooks\\os_module'

The files and subdirectories in this directory can be listed using the ```os``` function ```listdir```:

In [74]:
os.listdir()

['helper_module.py',
 'images',
 'notebook.ipynb',
 'old.md',
 'test.ipynb',
 'text.txt',
 'text2.txt',
 'text3.txt',
 'text4.txt',
 'text5.bin',
 'text5.txt',
 '__pycache__']

This will by default be the folder containing the Interactive Python Notebook file. The directory can be changed using the ```os``` function ```chdir``` for example to the parent directory using ```..``` or ```os.pardir```:

In [75]:
os.chdir(os.pardir)

In [76]:
os.getcwd()

'c:\\Users\\pyip\\Documents\\GitHub\\python-notebooks'

This parent directory can be assigned to a variable using the ```os``` function ```getcwd```:

In [77]:
parent = os.getcwd()
parent

'c:\\Users\\pyip\\Documents\\GitHub\\python-notebooks'

And the ```os.path``` function ```join``` can be used to join this with the folder and the name of the notebook itself:

In [78]:
notebook_path = os.path.join(parent, 'os_module', 'notebook' + os.extsep + 'ipynb')
notebook_path

'c:\\Users\\pyip\\Documents\\GitHub\\python-notebooks\\os_module\\notebook.ipynb'

The ```os.path``` function ```exists``` can be used to check whether a file exists returning a boolean:

In [79]:
os.path.exists(notebook_path)

True

The ```os.path``` function ```split``` returns a ```tuple``` where the first element is the directory of the file and the second element is the file including the file extension:

In [80]:
os.path.split(notebook_path)

('c:\\Users\\pyip\\Documents\\GitHub\\python-notebooks\\os_module',
 'notebook.ipynb')

In [81]:
file_path, file = os.path.split(notebook_path)

In [82]:
file_path

'c:\\Users\\pyip\\Documents\\GitHub\\python-notebooks\\os_module'

In [83]:
file

'notebook.ipynb'

The ```os.path``` function ```splitext``` splits a file path from its file extension, once again returning a 2 element ```tuple``` of the file path including the file name and the extension respectively:

In [84]:
os.path.splitext(file)

('notebook', '.ipynb')

In [85]:
os.path.splitext(notebook_path)

('c:\\Users\\pyip\\Documents\\GitHub\\python-notebooks\\os_module\\notebook',
 '.ipynb')

The current working directory can be changed to the folder of this notebook file:

In [86]:
os.chdir(file_path)

The contents can be listed using:

In [87]:
os.listdir()

['helper_module.py',
 'images',
 'notebook.ipynb',
 'old.md',
 'test.ipynb',
 'text.txt',
 'text2.txt',
 'text3.txt',
 'text4.txt',
 'text5.bin',
 'text5.txt',
 '__pycache__']

A directory can be made using the ```os``` function make directory ```mkdir```. Here a check will be made to see if the directory exists and if it doesn't to create it:

In [88]:
if not os.path.exists('directory1'):
    os.mkdir('directory1')

In [89]:
if not os.path.exists('directory2'):
    os.mkdir('directory2')

In [90]:
os.listdir()

['directory1',
 'directory2',
 'helper_module.py',
 'images',
 'notebook.ipynb',
 'old.md',
 'test.ipynb',
 'text.txt',
 'text2.txt',
 'text3.txt',
 'text4.txt',
 'text5.bin',
 'text5.txt',
 '__pycache__']

```'directory2'``` can be removed using the ```os``` command remove directory ```rmdir```:

In [91]:
os.rmdir('directory2')

In [92]:
os.listdir()

['directory1',
 'helper_module.py',
 'images',
 'notebook.ipynb',
 'old.md',
 'test.ipynb',
 'text.txt',
 'text2.txt',
 'text3.txt',
 'text4.txt',
 'text5.bin',
 'text5.txt',
 '__pycache__']

The current working directory can be changed to ```directory1```:

In [93]:
os.chdir('directory1')

In [94]:
os.listdir()

[]

And a file can be created, this time using ```newline=os.linesep``` selecting the defaults of the operating system:

In [95]:
with open('text.txt', mode='w', encoding='utf-8', errors='strict', newline=os.linesep) as file:
    file.write('Hello World!\nBye World!')

A Python file can be created in the same manner:

In [96]:
with open('script.py', mode='w', encoding='utf-8', errors='strict', newline=os.linesep) as file:
    file.write("print('Hello World!')\n")

These files can be seen:

In [97]:
os.listdir()

['script.py', 'text.txt']

If the parent folder is selected using:

In [98]:
os.chdir(os.pardir)

And this folder is attempted to be deleted using ```rmdir``` an ```OSError: The directory is not empty``` will display:

This is done as a background check to make sure files aren't also accidently deleted. Individual files can be deleted using the ```os``` method ```remove```:

In [99]:
os.remove('directory1' + os.sep + 'text.txt')
os.remove('directory1' + os.sep + 'script.py')

And now because it is empty it can be deleted:

In [100]:
os.rmdir('directory1')

The ```os``` module also has the more powerful ```makedirs``` which can be used to create multiple subfolders:

In [101]:
os.makedirs('directory1' + os.sep + 'subdirectory1')

In [102]:
os.listdir()

['directory1',
 'helper_module.py',
 'images',
 'notebook.ipynb',
 'old.md',
 'test.ipynb',
 'text.txt',
 'text2.txt',
 'text3.txt',
 'text4.txt',
 'text5.bin',
 'text5.txt',
 '__pycache__']

In [103]:
os.listdir('directory1')

['subdirectory1']

In [104]:
os.listdir('directory1' + os.sep + 'subdirectory1')

[]

The ```os``` function ```removedirs``` can be used to remove a directory of empty subdirectories:

In [105]:
os.removedirs('directory1' + os.sep + 'subdirectory1')

The ```os``` function ```replace``` can be used to replace a source with a destination, in essence allowing renaming of a directory or file and moving location of a directory or file:

In [106]:
if not os.path.exists('directory1'):
    os.makedirs('directory1' + os.sep + 'subdirectory1')

In [107]:
file_path = os.path.join(os.getcwd(), 'directory1', 'subdirectory1', 'script.py')

with open(file_path, mode='w', encoding='utf-8', errors='strict', newline=os.linesep) as file:
    file.write("print('Hello World!')\n")

In [108]:
os.listdir('directory1')

['subdirectory1']

In [109]:
os.listdir('directory1' + os.sep + 'subdirectory1')

['script.py']

This script file can be renamed using:

In [110]:
source =  os.path.join(os.getcwd(), 'directory1', 'subdirectory1', 'script.py')
destination = os.path.join(os.getcwd(), 'directory1', 'subdirectory1', 'pscript.py')
os.replace(source, destination)

In [111]:
os.listdir('directory1')

['subdirectory1']

In [112]:
os.listdir('directory1' + os.sep + 'subdirectory1')

['pscript.py']

In [113]:
source = os.path.join(os.getcwd(), 'directory1', 'subdirectory1', 'pscript.py')
destination = os.path.join(os.getcwd(), 'directory1', 'script.py')
os.replace(source, destination)

In [114]:
os.listdir('directory1')

['script.py', 'subdirectory1']

In [115]:
os.listdir('directory1' + os.sep + 'subdirectory1')

[]

Another Python script file can be created:

In [116]:
file_path = os.path.join(os.getcwd(), 'directory1', 'subdirectory1', 'script1.py')

with open(file_path, mode='w', encoding='utf-8', errors='strict', newline=os.linesep) as file:
    file.write("print('Hello World!')\n")

Instead of using the ```os``` function ```listdir```:

In [117]:
os.listdir('directory1')

['script.py', 'subdirectory1']

In [118]:
os.listdir('directory1' + os.sep + 'subdirectory1')

['script1.py']

The ```os``` function ```walk``` can be used to create a generator:

In [119]:
forward = os.walk('directory1')
forward

<generator object _walk at 0x000001FA04A41FE0>

When next is used a three element ```tuple``` is generated of the parent folder, a list of subfolders and a list of files:

In [120]:
next(forward)

('directory1', ['subdirectory1'], ['script.py'])

In [121]:
next(forward)

('directory1\\subdirectory1', [], ['script1.py'])

This is typically used in a loop:

In [122]:
top = os.walk('directory1')
for root, dirs, files in top:
    print(root, end='\n')
    print('\t', dirs)
    print('\t', files)

directory1
	 ['subdirectory1']
	 ['script.py']
directory1\subdirectory1
	 []
	 ['script1.py']


The ```topdown``` input argument can be assigned to ```False``` showing longer file paths first:

In [123]:
top = os.walk('directory1', topdown=False)
for root, dirs, files in top:
    print(root, end='\n')
    print('\t', dirs)
    print('\t', files)

directory1\subdirectory1
	 []
	 ['script1.py']
directory1
	 ['subdirectory1']
	 ['script.py']


This can be used in some for loops for example to recursively delete a all files and subdirectories in a directory: 

In [124]:
for root, dirs, files in os.walk('directory1', topdown=False):
    for name in files:
        os.remove(os.path.join(root, name))
    for name in dirs:
        os.rmdir(os.path.join(root, name))
        
os.rmdir('directory1')