# os module

## Text Files

A text file called ```text.txt``` can be created in the same folder as the Interactive Python Notebook File:

```
Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full.

One for my master,
One for my dame,
And one for the little boy
Who lives down the lane.
```

Text files can be viewed in Notepad++ with View → Show Symbol → Show All Characters:

<img src='./images/img_001.png' alt='img_001' width='500'/>

Notice that there is a ```LF``` at the end of each line instructing to move onto the next row. This stands for line feed.

Text files are opened in Python by using the function ```open```. The docstring of this function can be viewed:

In [1]:
open?

[1;31mSignature:[0m
 [0mopen[0m[1;33m([0m[1;33m
[0m    [0mfile[0m[1;33m,[0m[1;33m
[0m    [0mmode[0m[1;33m=[0m[1;34m'r'[0m[1;33m,[0m[1;33m
[0m    [0mbuffering[0m[1;33m=[0m[1;33m-[0m[1;36m1[0m[1;33m,[0m[1;33m
[0m    [0mencoding[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0merrors[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mnewline[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mclosefd[0m[1;33m=[0m[1;32mTrue[0m[1;33m,[0m[1;33m
[0m    [0mopener[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Open file and return a stream.  Raise OSError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returned I/O object is close

The ```open``` function requires a file which can be specified directly when it is in the same folder as the interactive Python notebook file (or Python script file).

The ```mode``` keyword input argument can be specified using a single letter:

|mode|definition|
|---|---|
|'r'|open an existing file and read existing content|
|'w'|open an existing file and write over existing content|
|'a'|open an existing file and append new content|
|'x'|create a new file and write new content|

The ```encoding``` keyword argument is used to specify the encoding, which recall was discussed in detail when the ```bytes``` class examined in a previous notebook. The encoding has a default value of ```'utf-8'``` but if the data was processed with a Microsoft Product may require ```'utf-8-sig'``` in order to remove an unwanted BOM. To recap:

|encoding|bytes per character|bits per character|byte order|byte order marker BOM|
|---|---|---|---|---|
|'utf-8'|1, 2, 3, 4|8, 16, 24, 32|big endian| |
|'utf-8-sig'|1, 2, 3, 4|8, 16, 24, 32|big endian|efbbbf|
|'utf-32'|4|32|little endian|fffe0000|
|'utf-32-le'|4|32|little endian| |
|'utf-32-be'|4|32|big endian| |
|'utf-16'|2|16|little endian|fffe|
|'utf-16-le'|2|16|little endian| |
|'utf-16-be'|2|16|big endian| |
|'latin1'|1|8| ||
|'ascii'|1|8| ||

The ```newline``` keyword input argument can be used to specify the character that is used to represent a new line and by default uses the escape characters carriage return and new line ```\r\n```.

The ```errors``` keyword input argument is used to handle errors, normally due to encoding issues and are set to ```'strict'``` by default.

A file can be opened:

In [37]:
file = open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n')

Under the hood, this is actually the intialisation signature of the ```TextIOWrapper``` class and a 
Under the hood, this is actually the intialisation signature of the ```TextIOWrapper``` class and a new instance is created with the instance name ```file```. This can be seen when the datatype of ```file``` is checked:

In [30]:
type(file)

_io.TextIOWrapper

Data can be read from the ```TextIOWrapper``` instance for example using the files ```readlines``` method returning a list of strings:

In [20]:
file.readlines()

['Baa, baa, black sheep,\r\n',
 'Have you any wool?\r\n',
 'Yes, sir, yes, sir,\r\n',
 'Three bags full.\r\n',
 '\r\n',
 'One for my master,\r\n',
 'One for my dame,\r\n',
 'And one for the little boy\r\n',
 'Who lives down the lane.']

After working with a file, it should be closed. The file can be closed using the ```TextIOWrapper``` close method:

In [21]:
file.close()

The ```print_identifier_group``` function from the custom ```helper_module``` can be imported to view the identifiers in more detail:

In [23]:
from helper_module import print_identifier_group

The ```TextIOWrapper``` class has the standard ```object``` based datamodel attributes and identifiers seen before. There are some additions such as ```__enter__``` (*dunder enter*) and ```__exit__``` (*dunder exit*):

In [27]:
print_identifier_group(file, kind='datamodel_attribute')

['__dict__', '__doc__']


In [28]:
print_identifier_group(file, kind='datamodel_method')

['__class__', '__del__', '__delattr__', '__dir__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']


The ```TextIOWrapper``` class has a number of attributes, most of these correspond to the input arguments provided when initialising the instance:

In [25]:
print_identifier_group(file, kind='attribute')

['_CHUNK_SIZE', '_finalizing', 'buffer', 'closed', 'encoding', 'errors', 'line_buffering', 'mode', 'name', 'newlines', 'write_through']


The ```TextIOWrapper``` method ```readable``` will check whether a file is readable returning a boolean. The method ```read``` will read the entire file as a Unicode string, the method ```readline``` will read an individual line as a string and then advance, while the method ```readlines``` will read every line returning a list of Unicode strings corresponding to each line.

The methods ```writable```, ```write``` and ```writelines``` are the write counterparts.

The methods ```seekable``` and ```seek``` relate to the cursor position.

In [40]:
print_identifier_group(file, kind='method')

['_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'detach', 'fileno', 'flush', 'isatty', 'read', 'readable', 'readline', 'readlines', 'reconfigure', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']


The datamodel methods ```__enter__``` (*dunder enter*) and ```__exit__``` (*dunder exit*) are used by a ```with``` code block to open the file when the code block begins and close the file when the code block is exited respectively:

In [39]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    print(file.name)
    print(file.mode)
    print(file.encoding)
    print(file.errors)
    print('readable: ', file.readable())
    print('writeable: ', file.writable())
    print('seekable: ', file.seekable())

text.txt
r
utf-8
strict
readable:  True
writeable:  False
seekable:  True


The method ```read``` can be used to read the entire contents of the text file as a single string, notice that this includes carriage returns and new line escape characters:

In [15]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data1 = file.read()

In [16]:
data1

'Baa, baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

When printed this gives a similar display to the file opened in notepad:

In [17]:
print(data1)

Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full.

One for my master,
One for my dame,
And one for the little boy
Who lives down the lane.


The method ```readline``` will only read a single line:

In [18]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data2 = file.readline()

Note that this including the carriage return and new line escape character:

In [19]:
data2

'Baa, baa, black sheep,\r\n'

These can be stripped using the string method ```strip```:

In [20]:
data2.strip()

'Baa, baa, black sheep,'

The method ```readlines``` will instead output a list of strings:

In [35]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data3 = file.readlines()

Notice the square brackets ```[]``` enclosing the list and comma delimiter between each line. Each line also includes the carriage return and newline character:

In [36]:
data3

['Baa, baa, black sheep,\r\n',
 'Have you any wool?\r\n',
 'Yes, sir, yes, sir,\r\n',
 'Three bags full.\r\n',
 '\r\n',
 'One for my master,\r\n',
 'One for my dame,\r\n',
 'And one for the little boy\r\n',
 'Who lives down the lane.']

These can be removed using a list comprehension:

In [23]:
data4 = [line.strip() for line in data]

In [24]:
data4

['Baa, baa, black sheep,',
 'Have you any wool?',
 'Yes, sir, yes, sir,',
 'Three bags full.',
 '',
 'One for my master,',
 'One for my dame,',
 'And one for the little boy',
 'Who lives down the lane.']

The length of the single string obtained using the method ```read``` is:

In [25]:
len(data1)

175

In [26]:
data1

'Baa, baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

The 5th character and onwards can be seen by indexing into the string using the slice:

In [31]:
data1[5:]

'baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

Each character in the file also has a zero-ordered numeric index which the cursor can be placed at using the method ```seek```:

In [27]:
with open('text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.seek(5)
    data5 = file.read()

This gives a similar result to slicing of the string as seen above:

In [28]:
data5

'baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

And the length of the string is ```175 - 5```:

In [29]:
len(data5)

170

The ```write``` function can be used to write additional text to a file, note that this file is opened with ```mode='w'```:

In [45]:
with open('text2.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Hello World!\r\nBye World!\r\n')

This file can then be read using the ```read``` function, note that this file is opened with ```mode='r'```:

In [47]:
with open('text2.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data6 = file.read()

In [50]:
data6

'Hello World!\r\r\nBye World!\r\r\n'

In [41]:
with open('text3.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Hello World!\r\n')
    file.write('Bye World!\r\n')

In [42]:
with open('text4.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.writelines(['Hello World!\r\n', 'Bye World!\r\n'])

In [None]:
with open('text2.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Hello World!\r\nBye World!\r\n')

In [24]:
? print_identifier_group

[1;31mSignature:[0m
 [0mprint_identifier_group[0m[1;33m([0m[1;33m
[0m    [0mobj[0m[1;33m,[0m[1;33m
[0m    [0mkind[0m[1;33m=[0m[1;34m'all'[0m[1;33m,[0m[1;33m
[0m    [0msecond[0m[1;33m=[0m[1;33m<[0m[1;32mclass[0m [1;34m'object'[0m[1;33m>[0m[1;33m,[0m[1;33m
[0m    [0mshow_unique_identifiers[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mshow_only_intersection_identifiers[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Group identifiers from an obj into categories defined by the parameter kind and print. kind can have the possible values: 
'all', 'datamodel_method, 'datamodel_attribute', 'error_class', 'class', 'method', 'constant', or 'attribute'.

second class is an optional second class for comparison, normally a parent class. 

show_unique_identifiers can be used to show only identifiers that are unique to the first class.

show_only_intersection_identifiers can 

file.close()

A number of identifiers from the file object can be viewed:

In [6]:
print(dir(file), end=' ')

['_CHUNK_SIZE', '__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_finalizing', 'buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline', 'readlines', 'reconfigure', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'write_through', 'writelines'] 

The identifiers methods and attributes...

In [3]:
print(dir(open), end=' ')

['__annotations__', '__builtins__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__getstate__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__wrapped__'] 

A file should always be closed after working with it. The method ```close``` can be used to close the file: 

A file is normally accessed using a ```with``` code block which opens the file and automatically closes it as the code block is exited: