# IO, CSV, JSON, Pickle and Shelve Modules

The Input Output module ```io``` is used for reading and writing data to a file. The Comma Seperated Values module ```csv``` is used in conjunction with the input output module ```io``` for reading and writing data to a ```CSV``` file and other basic formats used for spreadsheets. The ```json``` module is used for reading and writing data from a JSON ```str``` to a Python ```dict```. The ```pickle``` module is used for serialising data and the ```shelve``` module is for storing serialised data in a database.

## Importing Modules

The modules need to be imported before identifiers from them can be used:

In [1]:
import io
import csv
import json
import pickle
import shelve

## Categorize_Identifiers Module

This notebook will use the following functions ```dir2```, ```variables``` and ```view``` in the custom module ```categorize_identifiers``` which is found in the same directory as this notebook file. ```dir2``` is a variant of ```dir``` that groups identifiers into a ```dict``` under categories and ```variables``` is an IPython based a variable inspector. ```view``` is used to view a ```Collection``` in more detail:

In [2]:
from categorize_identifiers import dir2, variables, view

## Text Files (.txt)

A text file called ```text.txt``` can be created in the same folder as the Interactive Python Notebook File:

```
Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full.

One for my master,
One for my dame,
And one for the little boy
Who lives down the lane.
```

Text files can be viewed in Notepad++ with View → Show Symbol → Show All Characters:

<img src='./images/img_001.png' alt='img_001' width='500'/>

Notice that there is a ```CRLF``` at the end of each line instructing to move onto the next row. This stands for carriage return and line feed.

### Open Function

The ```open``` function in the ```io``` module is used for opening text and binary files. Its docstring can be viewed:

In [3]:
io.open?

[1;31mSignature:[0m
[0mio[0m[1;33m.[0m[0mopen[0m[1;33m([0m[1;33m
[0m    [0mfile[0m[1;33m,[0m[1;33m
[0m    [0mmode[0m[1;33m=[0m[1;34m'r'[0m[1;33m,[0m[1;33m
[0m    [0mbuffering[0m[1;33m=[0m[1;33m-[0m[1;36m1[0m[1;33m,[0m[1;33m
[0m    [0mencoding[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0merrors[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mnewline[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mclosefd[0m[1;33m=[0m[1;32mTrue[0m[1;33m,[0m[1;33m
[0m    [0mopener[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Open file and return a stream.  Raise OSError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returne

```ipython``` also includes this function in the global namespace:

In [4]:
open?

[1;31mSignature:[0m
[0mopen[0m[1;33m([0m[1;33m
[0m    [0mfile[0m[1;33m,[0m[1;33m
[0m    [0mmode[0m[1;33m=[0m[1;34m'r'[0m[1;33m,[0m[1;33m
[0m    [0mbuffering[0m[1;33m=[0m[1;33m-[0m[1;36m1[0m[1;33m,[0m[1;33m
[0m    [0mencoding[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0merrors[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mnewline[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mclosefd[0m[1;33m=[0m[1;32mTrue[0m[1;33m,[0m[1;33m
[0m    [0mopener[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Open file and return a stream.  Raise OSError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returned I/O object is closed

The ```open``` function requires a file which can be specified directly when it is in the same folder as the interactive Python notebook file (or Python script file).

If the file for example ```text.txt``` is in the same folder as the notebook it can be specified using:

```python
'text.txt'
```

Or the folder can be specified on Windows using```.\```. On Windows the ```\``` is the default file seperator. However in Python recall that ```\``` is used to insert an escape character, so a raw ```str``` is required for example:

```python
r'.\text.txt'
```

Linux uses ```/``` as a file seperator. This is also recognised as an alternative file seperator in Windows so the file can be selected using:

```python
'./text.txt'
```

If the file is in a parent folder this can be specified using ```../``` for example:

```python
'../text.txt'
```

If present in a subfolder on the same level as the notebook this can be specified using:

```python
'./files/text.txt'
```

A ```files_input``` and ```files_output``` subfolder will be used in this tutorial.

The ```mode``` keyword input argument can be specified using a single letter:

|mode|definition|
|---|---|
|'r'|open an existing file and read existing content|
|'w'|open an existing file and write over existing content|
|'a'|open an existing file and append new content|
|'x'|create a new file and write new content|

The ```encoding``` keyword argument is used to specify the encoding, which recall was discussed in detail when the ```bytes``` class examined in a previous notebook. The encoding has a default value of ```'utf-8'``` but if the data was processed with a Microsoft Product may require ```'utf-8-sig'``` in order to remove an unwanted BOM. To recap:

|encoding|bytes per character|bits per character|byte order|byte order marker BOM|
|---|---|---|---|---|
|'utf-8'|1, 2, 3, 4|8, 16, 24, 32|big endian| |
|'utf-8-sig'|1, 2, 3, 4|8, 16, 24, 32|big endian|efbbbf|
|'utf-32'|4|32|little endian|fffe0000|
|'utf-32-le'|4|32|little endian| |
|'utf-32-be'|4|32|big endian| |
|'utf-16'|2|16|little endian|fffe|
|'utf-16-le'|2|16|little endian| |
|'utf-16-be'|2|16|big endian| |
|'latin1'|1|8| ||
|'ascii'|1|8| ||

The ```newline``` keyword input argument can be used to specify the character that is used to represent a new line.

On Linux this is normally just the new line character escape character ```'\n'```. 

On Windows two escape characters carriage return and new line are used ```'\r\n'```.

The ```errors``` keyword input argument is used to handle errors, normally due to encoding issues and are set to ```'strict'``` by default.

A file can be opened:

In [5]:
file = open('./files_input/text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n')

Under the hood, this is actually the initialisation signature of the ```TextIOWrapper``` class and a new instance is created with the instance name ```file```. This can be seen when the datatype of ```file``` is checked:

In [6]:
type(file)

_io.TextIOWrapper

The ```_io``` indicates that this class is from the ```io``` module. The prefix with an underscore means the module is internally being used here. The ```open``` function is essentially equivalent to the initialisation method of this class:

In [7]:
io.TextIOWrapper?

[1;31mInit signature:[0m
[0mio[0m[1;33m.[0m[0mTextIOWrapper[0m[1;33m([0m[1;33m
[0m    [0mbuffer[0m[1;33m,[0m[1;33m
[0m    [0mencoding[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0merrors[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mnewline[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mline_buffering[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mwrite_through[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
Character and line based layer over a BufferedIOBase object, buffer.

encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getencoding().

errors determines the strictness of encoding and decoding (see
help(codecs.Codec) or the documentation for codecs.register) and
defaults to "strict".

newline controls how line endings are handled. It can be None, '',
'\n', '\r', and '\

Data can be read from the ```TextIOWrapper``` instance for example using the files ```readlines``` method returning a list of strings:

In [8]:
file.readlines()

['Baa, baa, black sheep,\r\n',
 'Have you any wool?\r\n',
 'Yes, sir, yes, sir,\r\n',
 'Three bags full.\r\n',
 '\r\n',
 'One for my master,\r\n',
 'One for my dame,\r\n',
 'And one for the little boy\r\n',
 'Who lives down the lane.']

After working with a file, it should be closed. The file can be closed using the ```TextIOWrapper``` close method:

In [9]:
file.close()

The ```TextIOWrapper``` class has a number of attributes, most of these correspond to the input arguments provided when initialising the instance.

The ```TextIOWrapper``` method ```readable``` will check whether a file is readable returning a boolean. The method ```read``` will read the entire file as a Unicode string, the method ```readline``` will read an individual line as a string and then advance, while the method ```readlines``` will read every line returning a list of Unicode strings corresponding to each line.

The methods ```writable```, ```write``` and ```writelines``` are the write counterparts.

The methods ```seekable``` and ```seek``` relate to the cursor position.

The ```TextIOWrapper``` class has the additional datamodel methods ```__enter__``` (*dunder enter*) and ```__exit__``` (*dunder exit*) which are used when accessing the file with a ```with``` code block: 

In [10]:
dir2(file, object, unique_only=True)

{'attribute': ['buffer',
               'closed',
               'encoding',
               'errors',
               'line_buffering',
               'mode',
               'name',
               'newlines',
               'write_through'],
 'method': ['close',
            'detach',
            'fileno',
            'flush',
            'isatty',
            'read',
            'readable',
            'readline',
            'readlines',
            'reconfigure',
            'seek',
            'seekable',
            'tell',
            'truncate',
            'writable',
            'write',
            'writelines'],
 'datamodel_attribute': ['__dict__', '__module__'],
 'datamodel_method': ['__del__',
                      '__enter__',
                      '__exit__',
                      '__iter__',
                      '__next__'],
 'internal_attribute': ['_CHUNK_SIZE', '_finalizing'],
 'internal_method': ['_checkClosed',
                     '_checkReadable',
                 

### With Code Block

The datamodel methods ```__enter__``` (*dunder enter*) and ```__exit__``` (*dunder exit*) are used by a ```with``` code block to open the file when the code block begins and close the file when the code block is exited respectively:

In [11]:
with open('./files_input/text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    print(file.name)
    print(file.mode)
    print(file.encoding)
    print(file.errors)
    print('readable: ', file.readable())
    print('writeable: ', file.writable())
    print('seekable: ', file.seekable())

./files_input/text.txt
r
utf-8
strict
readable:  True
writeable:  False
seekable:  True


### Read

The ```TextIOWrapper``` method ```read``` can be used to read the entire contents of the text file as a single string, notice that this includes carriage returns and new line escape characters:

In [12]:
with open('./files_input/text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data1 = file.read()

In [13]:
data1

'Baa, baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

When printed this gives a similar display to the file opened in Notepad:

In [14]:
print(data1)

Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full.

One for my master,
One for my dame,
And one for the little boy
Who lives down the lane.


The ```TextIOWrapper``` method ```readline``` will only read a single line:

In [15]:
with open('./files_input/text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data2 = file.readline()

Note that the end of this string includes the carriage return and new line escape characters:

In [16]:
data2

'Baa, baa, black sheep,\r\n'

These whitespace characters can be stripped using the string method ```strip```:

In [17]:
data2.strip()

'Baa, baa, black sheep,'

The ```TextIOWrapper``` method ```readlines``` will instead output a list of strings:

In [18]:
with open('./files_input/text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data3 = file.readlines()

Notice the square brackets ```[]``` enclosing the list and comma delimiter between each line. Each line also includes the carriage return and newline character:

In [19]:
data3

['Baa, baa, black sheep,\r\n',
 'Have you any wool?\r\n',
 'Yes, sir, yes, sir,\r\n',
 'Three bags full.\r\n',
 '\r\n',
 'One for my master,\r\n',
 'One for my dame,\r\n',
 'And one for the little boy\r\n',
 'Who lives down the lane.']

These can be removed using a list comprehension:

In [20]:
data4 = [line.strip() for line in data3]

In [21]:
data4

['Baa, baa, black sheep,',
 'Have you any wool?',
 'Yes, sir, yes, sir,',
 'Three bags full.',
 '',
 'One for my master,',
 'One for my dame,',
 'And one for the little boy',
 'Who lives down the lane.']

### Seek

The length of the single string obtained using the ```TextIOWrapper``` method ```read``` is:

In [22]:
len(data1)

175

In [23]:
data1

'Baa, baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

The 5th character and onwards can be seen by indexing into the string using the slice:

In [24]:
data1[5:]

'baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

Each character in the file also has a zero-ordered numeric index which the cursor can be placed at using the ```TextIOWrapper``` method ```seek```:

In [25]:
with open('./files_input/text.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.seek(5)
    data5 = file.read()

This gives a similar result to slicing of the string as seen above:

In [26]:
data5

'baa, black sheep,\r\nHave you any wool?\r\nYes, sir, yes, sir,\r\nThree bags full.\r\n\r\nOne for my master,\r\nOne for my dame,\r\nAnd one for the little boy\r\nWho lives down the lane.'

And the length of the string is ```175 - 5```:

In [27]:
len(data5)

170

### Write

The ```TextIOWrapper``` method ```write``` can be used to write text to a file, note that this file is opened with ```mode='w'```. 

On Linux the keyword input argument should be ```newline='\n'``` and the ```\n``` should be incorporated in a string for a new line.

On Windows the keyword input argument should be ```newline='\r\n'``` however ```\n``` should be incorporated in a string for a new line. Each ```\n``` used in the ```write``` method will be converted into ```\r\n```.

In [28]:
with open('./files_output/text2.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Hello World!\nBye World!')

This file can then be read using the ```TextIOWrapper``` method ```read```, note that this file is opened with ```mode='r'``` and ```newline='\r\n'```:

In [29]:
with open('./files_output/text2.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data6 = file.read()

In [30]:
data6

'Hello World!\r\nBye World!'

Be careful not to use ```\r\n``` within the ```TextIOWrapper``` method ```write``` as the ```\n``` will be converted into ```\r\n``` and the result will be ```\r\r\n``` which is wrong:

In [31]:
with open('./files_output/text3.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Hello World!\r\nBye World!')

with open('./files_output/text3.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data7 = file.read()
    
data7

'Hello World!\r\r\nBye World!'

The method ```writelines``` can be used to write a list of strings to a file. Note once again that this file is opened with ```mode='w'```. 

On Linux the keyword input argument should be ```newline='\n'``` and the ```\n``` should be at the end of each string.

On Windows the keyword input argument should be ```newline='\r\n'``` however and each string should end using ```\n```. Each ```\n``` used in the ```writelines``` method will be converted into ```\r\n```.

In [32]:
lines = ['Hello World!', 'Bye World!']

with open('./files_output/text4.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.writelines([line + '\n' for line in lines])

with open('./files_output/text4.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data8 = file.read()
    
data8

'Hello World!\r\nBye World!\r\n'

### Append

When ```mode='w'``` any content in an existing file is removed:

In [33]:
with open('./files_output/text4.txt', mode='w', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('J')

with open('./files_output/text4.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data9 = file.read()
    
data9

'J'

When ```mode='a'``` any content in an existing file is instead appended:

In [34]:
with open('./files_output/text2.txt', mode='a', encoding='utf-8', errors='strict', newline='\r\n') as file:
    file.write('Appended')

with open('./files_output/text2.txt', mode='r', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data10 = file.read()
    
data10

'Hello World!\r\nBye World!Appended'

Three variations of ```mode``` were seen ```'r'```, ```'w'``` and ```'a'``` and and have the alias ```'rt'```, ```'wt'``` and ```'at'``` indicating that this function is being used on a text file and a text ```.txt``` file uses a Unicode string with ```'utf-8'``` encoding by default.

## Bin Files (.bin)

```'rb'```, ```'wb'``` and ```'ab'``` instead indicate that this function is being used on a binary file and a binary ```.bin``` file uses a byte string. There is no ```encoding```, ```errors```, ```newline``` as raw ```bytes``` are used. When in binary mode, all the ```TextIOWrapper``` methods expect a byte string.

In [35]:
with open('./files_output/text5.bin', mode='wb') as file:
    file.write(b'\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0d\x0a\x48\x65\x6c\x6c\x6f')

with open('./files_output/text5.bin', mode='rb') as file:
    data11 = file.read()
    
data11

b'Hello World!\r\nHello'

The binary file can be read into text mode if the correct encoding is supplied:

In [36]:
with open('./files_output/text5.bin', mode='rt', encoding='utf-8', errors='strict', newline='\r\n') as file:
    data12 = file.readlines()
    
data12

['Hello World!\r\n', 'Hello']

## Comma Separated Values Files (.csv)

Recall that a text file uses a ```CRLF``` at the end of each line instructing to move onto the next row. Notice all the data looks like a column:

<img src='./images/img_002.png' alt='img_002' width='500'/>

A comma seperated values files uses a comma to move onto the next column:

<img src='./images/img_003.png' alt='img_003' width='500'/>

When a csv file is opened in a spreadsheet editor such as Excel or OnlyOffice Desktop Editors SpreadSheet, these are used to construct a grid:

<img src='./images/img_004.png' alt='img_004' width='500'/>

### Open Function

The ```readlines``` method of the ```TextIOWrapper``` instance can be used to read the data as a list of strings where each string corresponds to a row of text:

In [37]:
with open('./files_input/sheet1.csv', mode='r', encoding='utf-8', newline='\r\n') as file:
    data = file.readlines()

data

['x,y\r\n', '0,0\r\n', '1,2\r\n', '2,4\r\n', '3,6\r\n', '4,8\r\n', '5,10\r\n']

### Byte Order Marker (BOM)

If the 1st entry has the prefix ```\ufeff``` then the file is saved using UTF-8 with BOM instead of UTF-8:

In [38]:
with open('./files_input/sheet2.csv', mode='r', encoding='utf-8', newline='\r\n') as file:
    data = file.readlines()

data

['\ufeffx,y\r\n',
 '0,0\r\n',
 '1,2\r\n',
 '2,4\r\n',
 '3,6\r\n',
 '4,8\r\n',
 '5,10\r\n']

To handle the BOM, the encoding can be changed from ```'utf-8'``` to ```'utf-8-sig'```:

In [39]:
with open('./files_input/sheet2.csv', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    data = file.readlines()

data

['x,y\r\n', '0,0\r\n', '1,2\r\n', '2,4\r\n', '3,6\r\n', '4,8\r\n', '5,10\r\n']

### CSV Reader

The ```reader``` function of the ```csv``` module can be used within the ```with``` code block created using the ```open``` function:

In [40]:
with open('./files_input/sheet1.csv', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter=',', quotechar='"')

This creates an instance of the ```reader``` class from the ```csv``` module:

In [41]:
type(csv_file)

_csv.reader

Its identifiers can be viewed. The datamodel identifier ```__next__``` (*dunder next*) is defined meaning the ```reader``` class is essentially an iterator. When ```next``` is used on this iterator it returns the current row as a list and each element in the list corresponds to a column. The attribute ```line_num``` corresponds to the zero-order indexed line number of the file:

In [42]:
dir2(csv_file, object, unique_only=True)

{'attribute': ['dialect', 'line_num'],
 'datamodel_attribute': ['__module__'],
 'datamodel_method': ['__iter__', '__next__']}


A ```for``` loop can be used to create a ```dict``` instance where the keys correspond to the line numbers and the value is the data for each line in the form of a ```list```:

In [43]:
csv_data = {}

In [44]:
with open('./files_input/sheet1.csv', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter=',', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = row

csv_data

{'row_1': ['x', 'y'],
 'row_2': ['0', '0'],
 'row_3': ['1', '2'],
 'row_4': ['2', '4'],
 'row_5': ['3', '6'],
 'row_6': ['4', '8'],
 'row_7': ['5', '10']}

Now the relevent data can be indexed using the ```key``` which returns the row as a ```list```:

In [45]:
csv_data['row_2']

['0', '0']

And this ```list``` can be indexed to select a column:

In [46]:
csv_data['row_2'][0]

'0'

Note that this is read in as a ```str``` instance and can be cast into an ```int``` using:

In [47]:
int(csv_data['row_2'][0])

0

### CSV Writer

The ```writer``` function of the ```csv``` module can be used within the ```with``` code block created using the ```open``` function when the file is opened in write mode:

In [48]:
with open('./files_output/sheet3.csv', mode='w', encoding='utf-8', newline='\r\n') as file:
    csv_file = csv.writer(file, delimiter=',', quotechar='"') 

This creates an instance of the ```writer``` class from the ```csv``` module:

In [49]:
type(csv_file)

_csv.writer

Its identifiers can be viewed:

In [50]:
dir2(csv_file, object, unique_only=True)

{'attribute': ['dialect'],
 'method': ['writerow', 'writerows'],
 'datamodel_attribute': ['__module__']}


The ```writerow``` method can be used to write each row individually:

In [51]:
with open('./files_output/sheet3.csv', mode='w', encoding='utf-8', newline='\r\n') as file:
    csv_file = csv.writer(file, delimiter=',', quotechar='"')
    csv_file.writerow(['x', 'y', 'z'])
    csv_file.writerow([1, 2, 3])
    csv_file.writerow([2, 4, 6])    

This can be read into a ```dict``` instance using:

In [52]:
csv_data = {}

with open('./files_output/sheet3.csv', mode='r', encoding='utf-8', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter=',', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = row

csv_data

{'row_1': ['x', 'y', 'z'], 'row_2': ['1', '2', '3'], 'row_3': ['2', '4', '6']}

The method ```writerows``` method can be used to write the data using a ```list``` of ```list``` instances. Each nested ```list``` instance is a row: 

In [53]:
with open('./files_output/sheet4.csv', mode='w', encoding='utf-8', newline='\r\n') as file:
    csv_file = csv.writer(file, delimiter=',', quotechar='"')
    csv_file.writerows([['w', 'x', 'y', 'z'],
                        [0, 1, 2, 3],
                        [0, 2, 4, 6]])    

In [54]:
csv_data = {}

with open('./files_output/sheet4.csv', mode='r', encoding='utf-8', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter=',', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = row

csv_data

{'row_1': ['w', 'x', 'y', 'z'],
 'row_2': ['0', '1', '2', '3'],
 'row_3': ['0', '2', '4', '6']}

### Special Characters

The ```,``` is a special character in a CSV file. When it is incorporated into a cell within a CSV file. The cell contents have to be enclosed in double quotations:

<img src='./images/img_005.png' alt='img_005' width='500'/>

This file can be read using:

In [55]:
csv_data = {}

with open('./files_input/sheet5.csv', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter=',', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = row

csv_data

{'row_1': ['libraries', 'version'],
 'row_2': ['python', '3,11,5'],
 'row_3': ['numpy', '1,26,0'],
 'row_4': ['pandas', '2,1,1'],
 'row_5': ['matplotlib', '3,8,0'],
 'row_6': ['seaborn', '0,13,0']}

### European CSV Files

Some European languages will use the semicolon as a delimiter instead of a colon:

<img src='./images/img_006.png' alt='img_006' width='500'/>

If the CSV is read in using the wrong delimiter, each row will be read in as a single column:

In [56]:
csv_data = {}

with open('./files_input/sheet6.csv', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter=',', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = row

csv_data

{'row_1': ['libraries;version'],
 'row_2': ['python;3', '11', '5'],
 'row_3': ['numpy;1', '26', '0'],
 'row_4': ['pandas;2', '1', '1'],
 'row_5': ['matplotlib;3', '8', '0'],
 'row_6': ['seaborn;0', '13', '0']}

This can be fixed by specifying the correct delimiter:

In [57]:
csv_data = {}

with open('./files_input/sheet6.csv', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter=';', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = row

csv_data

{'row_1': ['libraries', 'version'],
 'row_2': ['python', '3,11,5'],
 'row_3': ['numpy', '1,26,0'],
 'row_4': ['pandas', '2,1,1'],
 'row_5': ['matplotlib', '3,8,0'],
 'row_6': ['seaborn', '0,13,0']}

## Printer File (.prn)

Another file format is the print file format which uses a variable number of spaces to visually separate out the data:

<img src='./images/img_007.png' alt='img_007' width='500'/>

When this is read in, each row is read in as a ```list``` containing a single ```str```:

In [58]:
csv_data = {}

with open('./files_input/sheet7.prn', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter='\t', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = row

csv_data

{'row_1': ['name    x       y'],
 'row_2': ['zero zer       0       0'],
 'row_3': ['one two        1       2'],
 'row_4': ['two four       2       4'],
 'row_5': ['three si       3       6'],
 'row_6': ['four eig       4       8'],
 'row_7': ['five ten       5      10']}

The string instance can be accessed from the ```row``` and the ```str``` method ```split``` can be used to split the data into columns:

In [59]:
import re
csv_data = {}

with open('./files_input/sheet7.prn', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter='\t', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = re.split(r'\s{2,}', row[0])
        
csv_data

{'row_1': ['name', 'x', 'y'],
 'row_2': ['zero zer', '0', '0'],
 'row_3': ['one two', '1', '2'],
 'row_4': ['two four', '2', '4'],
 'row_5': ['three si', '3', '6'],
 'row_6': ['four eig', '4', '8'],
 'row_7': ['five ten', '5', '10']}

## Text Files Tab Delimited (.txt)

A tab limited text file uses a tab as a delimiter:

<img src='./images/img_008.png' alt='img_008' width='500'/>

In [60]:
csv_data = {}

with open('./files_input/sheet8.txt', mode='r', encoding='utf-8-sig', newline='\r\n') as file:
    csv_file = csv.reader(file, delimiter='\t', quotechar='"')
    for row in csv_file:
        csv_data[f'row_{csv_file.line_num}'] = row

csv_data

{'row_1': ['x', 'y'],
 'row_2': ['0', '0'],
 'row_3': ['1', '2'],
 'row_4': ['2', '4'],
 'row_5': ['3', '6'],
 'row_6': ['4', '8'],
 'row_7': ['5', '10']}

## JSON Files (.json)

JSON is very similar to a ```str``` of a Python ```dict``` instance:

In [61]:
mapping = {True: {'red': (1, 0, 0),
                  'green': (0, 0.5, 0),
                  'blue': (0, 0, 1)},
           False: {'black': (0, 0, 0)}}

In [62]:
str(mapping)

"{True: {'red': (1, 0, 0), 'green': (0, 0.5, 0), 'blue': (0, 0, 1)}, False: {'black': (0, 0, 0)}}"

JavaScript Object Notation (JSON) as the name suggests originally comes from JavaScript and has became a very popular format for storing data particularly on websites. Because it is from a different programming language, there are some subtle differences to a ```str``` of a Python ```dict```.

The ```dumps``` function can be used to dump a Python ```object``` for example the ```dict``` instance ```mapping``` to a JSON ```str```:

In [63]:
json_string = json.dumps(mapping)

In [64]:
json_string

'{"true": {"red": [1, 0, 0], "green": [0, 0.5, 0], "blue": [0, 0, 1]}, "false": {"black": [0, 0, 0]}}'

JSON uses the double quotations. There the outer quotations that indicate a Python ```str``` instance are single and all the ```str``` instances that were in the Python ```dict``` instance are now enclosed in double quotations. 

JSON does not have a counterpart to a ```tuple``` and therefore the ```tuple``` instances that were present in the Python ```dict``` have in essence been cast to ```list``` instances. 

The representation of an ```int``` or ```float``` is unchanged.

JSON uses a lowercase ```bool``` cast to a ```str``` instance for a ```bool```.  

The ```loads``` function can be used to load a JSON ```str``` to a Python ```dict```:

In [65]:
mapping2 = json.loads(json_string)

In [66]:
mapping2

{'true': {'red': [1, 0, 0], 'green': [0, 0.5, 0], 'blue': [0, 0, 1]},
 'false': {'black': [0, 0, 0]}}

This is a similar form to before but the ```tuple``` instances are ```list``` instances because as previously mentioned ```JSON``` has no counterpart to a ```tuple```. The boolean values are now lowercase ```str``` instances.

Because this JSON ```str``` is now a Python ```dict``` containing ```str```, ```list```, ```int```, ```float``` and ```dict``` instances, it can be accessed by indexing:

In [67]:
mapping2['true']

{'red': [1, 0, 0], 'green': [0, 0.5, 0], 'blue': [0, 0, 1]}

In [68]:
mapping2['true']['red']

[1, 0, 0]

The ```dump``` function can be used to dump a Python ```object``` to a ```.json``` file. The ```open``` function from ```io``` is used to open the file within a ```with``` code block in write mode. The ```dump``` function from ```json``` is used within this ```with``` code block and the file is closed after the ```with``` code block exits:

In [69]:
with open('./files_output/mapping.json', "w") as file:
    json.dump(mapping, file)

The ```load``` function can be used to read the Python ```object``` from the ```.json``` file:

In [70]:
with open('./files_output/mapping.json', "r") as file:
    data = json.load(file)

The ```io```, ```csv```, ```json``` are used for basic reading and writing of data and it is important to understand the file formats above. For most cases, spreadsheets are read into a ```pd.DataFrame``` instance using the ```pd.read_csv``` option. The ```pd.DataFrame``` class is a datastructure that is very similar to a spreadsheet and the ```pandas``` library is based around this data structure. More details about ```pandas``` will be given in later tutorials.

## Pickle Files (.pkl)

All the file formats see before are high level which are human readable. There are some additional common file formats that are machine readible. These file formats normally use a ```bytes``` instance. Recall that the fundamental unit in a ```byte``` looks like the following:

<img src='./images/img_009.png' alt='img_009' width='500'/>

Each bit in a ```byte``` is either ```0``` which can be represented as LOW in a digital voltage or ```1``` which can be represented as HIGH in a digital voltage:

<img src='./images/img_010.png' alt='img_010' width='500'/>

An example time trace such as that shown above is normally transmitted over pin 3 of a Serial Port:

<img src='./images/img_011.png' alt='img_011' width='350'/>

Serial ports have fixed baud rates of 110, 300, 1200, 2400, 4800, 9600, 19200, 38400, 57600, or 115200 bits per second. 

If a baud rate of 9600 is used as an example. This means the digital trace for a bit will last:

In [71]:
f'{round(1/9600*1e6, 2)} µs'

'104.17 µs'

And for a byte, which is 8 bits:

In [72]:
f'{round(8/9600*1e6, 2)} µs'

'833.33 µs'

The single ```byte``` above can be expressed in binary as: 

In [73]:
0b01101000

104

This casts it to a decimal. The binary system is better at the machine level but is not human readable. Decimal is human readable but is harder to visualise what is going on at the hardware level. Therefore the hexadecimal system is commonly employed which splits a ```bytes``` up into two halfs and represents each half with a hexadecimal character:

<img src='./images/img_012.png' alt='img_012' width='500'/>

In [74]:
hex(104)

'0x68'

The above concepts were covered when the ```bytes``` class was discussed. The ```pickle``` module is used to serialise a Python ```object``` into ```bytes``` and because the ```bytes``` instance isn't very human readable it is "pickled". The identifiers of the ```pickle``` module can be examined:

In [75]:
dir2(pickle)

{'attribute': ['bytes_types',
               'codecs',
               'compatible_formats',
               'dispatch_table',
               'format_version',
               'io',
               'maxsize',
               're',
               'sys'],
 'constant': ['ADDITEMS',
              'APPEND',
              'APPENDS',
              'BINBYTES',
              'BINBYTES8',
              'BINFLOAT',
              'BINGET',
              'BININT',
              'BININT1',
              'BININT2',
              'BINPERSID',
              'BINPUT',
              'BINSTRING',
              'BINUNICODE',
              'BINUNICODE8',
              'BUILD',
              'BYTEARRAY8',
              'DEFAULT_PROTOCOL',
              'DICT',
              'DUP',
              'EMPTY_DICT',
              'EMPTY_LIST',
              'EMPTY_SET',
              'EMPTY_TUPLE',
              'EXT1',
              'EXT2',
              'EXT4',
              'FALSE',
              'FLOAT',
            

Notice there are are large number of constants, these are ```bytes``` instances used to identify a protocol, and opticodes which identify the datatype and hence number of ```bytes``` to be expected for that instance and the end of a ```bytes``` stream:

In [76]:
pickle.dumps(1)

b'\x80\x04K\x01.'

Recall a ```bytes``` that corresponds to a printable ASCII character will be displayed as the ASCII character. Non-printable characters will be represented as a hexadecimal escape sequence ```\x``` followed by two hexadecimal characters e.g. ```\x80```. Often ```bytes``` instances are cast into hexadecimal ```str``` instances for better readability:

In [77]:
pickle.dumps(1).hex()

'80044b012e'

If the constants are examined as hexadecimal ```str``` instances:

In [78]:
optcodes = {}
for constant in dir2(pickle, print_output=False)['constant']:
    if isinstance(getattr(pickle, constant), bytes):
        optcodes[constant] = getattr(pickle, constant).hex()
    else:
        optcodes[constant] = getattr(pickle, constant)

optcodes

{'ADDITEMS': '90',
 'APPEND': '61',
 'APPENDS': '65',
 'BINBYTES': '42',
 'BINBYTES8': '8e',
 'BINFLOAT': '47',
 'BINGET': '68',
 'BININT': '4a',
 'BININT1': '4b',
 'BININT2': '4d',
 'BINPERSID': '51',
 'BINPUT': '71',
 'BINSTRING': '54',
 'BINUNICODE': '58',
 'BINUNICODE8': '8d',
 'BUILD': '62',
 'BYTEARRAY8': '96',
 'DEFAULT_PROTOCOL': 4,
 'DICT': '64',
 'DUP': '32',
 'EMPTY_DICT': '7d',
 'EMPTY_LIST': '5d',
 'EMPTY_SET': '8f',
 'EMPTY_TUPLE': '29',
 'EXT1': '82',
 'EXT2': '83',
 'EXT4': '84',
 'FALSE': '4930300a',
 'FLOAT': '46',
 'FRAME': '95',
 'FROZENSET': '91',
 'GET': '67',
 'GLOBAL': '63',
 'HIGHEST_PROTOCOL': 5,
 'INST': '69',
 'INT': '49',
 'LIST': '6c',
 'LONG': '4c',
 'LONG1': '8a',
 'LONG4': '8b',
 'LONG_BINGET': '6a',
 'LONG_BINPUT': '72',
 'MARK': '28',
 'MEMOIZE': '94',
 'NEWFALSE': '89',
 'NEWOBJ': '81',
 'NEWOBJ_EX': '92',
 'NEWTRUE': '88',
 'NEXT_BUFFER': '97',
 'NONE': '4e',
 'OBJ': '6f',
 'PERSID': '50',
 'POP': '30',
 'POP_MARK': '31',
 'PROTO': '80',
 'PUT': '70

The picked hexadecimal ```str``` of the ```int``` ```1``` can be examined:

In [79]:
pickle.dumps(1).hex()

'80044b012e'

This means:

|Hex Value|Meaning|
|---|---|
|80|Start OptCode: PROTO|
|04|Protocol Version: DEFAULT_PROTOCOL|
|4b|Type Int Optcode: BININT1|
|01|Numeric value: 1|
|2e|Stop OptoCode: STOP|

If a ```bool``` instance is instead examined:

In [80]:
pickle.dumps(True).hex()

'8004882e'

This means:

|Hex Value|Meaning|
|---|---|
|80|Start OptCode: PROTO|
|04|Protocol Version: DEFAULT_PROTOCOL|
|88|Bool True Optcode: NEWTRUE|
|2e|Stop OptoCode: STOP|

If a ```str``` instance of ASCII characters is examined. Notice the data for the ```str``` is included in the pickled ```bytes``` instance:

In [81]:
pickle.dumps('hello')

b'\x80\x04\x95\t\x00\x00\x00\x00\x00\x00\x00\x8c\x05hello\x94.'

An equivalent ```bytes``` instance can be cast to a hexadecimal ```str``` instance:

In [82]:
b'hello'.hex()

'68656c6c6f'

The picked ```bytes``` sequence can also be a hexadecimal ```str``` instance and the data above can be seen:

In [83]:
pickle.dumps('hello').hex()

'80049509000000000000008c0568656c6c6f942e'

The following means

|Hex Value|Meaning|
|---|---|
|80|Start OptCode: PROTO|
|04|Protocol Version: DEFAULT_PROTOCOL|
|95|Frame OptCode: FRAME|
|09 00 00 00 00 00 00 00|Frame Size: 9 bytes (0x9)|
|8c|String OptCode: SHORT_BINUNICODE|
|05|Length of String: len('hello')|
|68 65 6c 6c 6f|String: 'hello'|
|94|Memoize OptCode: MEMOIZE|
|2e|Stop OptCode: STOP|


The frame size is the number of ```bytes``` the serialised ```str``` covers:

In [84]:
len('8c0568656c6c6f942e') / 2

9.0

More complicated datatypes can be serialised for example a ```tuple```:

In [85]:
pickle.dumps(('hello', 'world', '!')).hex()

'80049517000000000000008c0568656c6c6f948c05776f726c64948c01219487942e'

This means 

|Hex Value|Meaning|
|---|---|
|80|Start OptCode: PROTO|
|04|Protocol Version: DEFAULT_PROTOCOL|
|95|Frame OptCode: FRAME|
|17 00 00 00 00 00 00 00|Frame Size: 23 bytes (0x17)|
|8c|String OptCode: SHORT_BINUNICODE|
|05|Length of String: len('hello')|
|68 65 6c 6c 6f|String: 'hello'|
|94|Memoize: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|05|Length of String: len('world')|
|77 6f 72 6c 64|String: 'world'|
|94|Memoize Optcode: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|01|Length of String len('!')|
|21|string: '!'|
|94|Memoize Optcode: MEMOIZE|
|87|Tuple OptCode: TUPLE3|
|94|Memoize: MEMOIZE|
|2e|Stop OptCode: STOP|



A ```list```:

In [86]:
pickle.dumps(['hello', 'world', '!']).hex()

'80049519000000000000005d94288c0568656c6c6f948c05776f726c64948c012194652e'

Which means

|Hex Value|Meaning|
|---|---|
|80|Start OptCode: PROTO|
|04|Protocol Version: DEFAULT_PROTOCOL|
|95|Frame OptCode: FRAME|
|19 00 00 00 00 00 00 00|Frame Size: 25 (0x19)|
|5d|Empty List OptCode: EMPTY_LIST|
|94|Memoize OptCode: MEMOIZE|
|28|Mark OptCode: MARK|
|8c|String OptCode: SHORT_BINUNICODE|
|05|Length of String: len('hello')|
|68 65 6c 6c 6f|String: 'hello'|
|94|Memoize OptCode: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|05|Length of String: len('world')|
|77 6f 72 6c 64|String: 'world'|
|94|Memoize OptCode: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|01|Length of String len('!')|
|21|string: '!'|
|94|Memoize OptCode: MEMOIZE|
|65|List Append OptCode: APPENDS|
|2e|Stop OptCode: STOP|

A ```dict```:

In [87]:
pickle.dumps({'r': 'red', 'g': 'green', 'b': 'blue'}).hex()

'80049526000000000000007d94288c0172948c03726564948c0167948c05677265656e948c0162948c04626c756594752e'

Which means:

|Hex Value|Meaning|
|---|---|
|80|Start OptCode: PROTO|
|04|Protocol Version: DEFAULT_PROTOCOL|
|95|Frame OptCode: FRAME|
|26 00 00 00 00 00 00 00|Frame Size: 38 bytes (0x26)|
|7d|Empty Dict OptCode: EMPTY_DICT|
|94|Memoize OptCode: MEMOIZE|
|28|Mark OptCode: MARK|
|8c|String OptCode: SHORT_BINUNICODE|
|01|Length of String len('r')|
|72|string: 'r'|
|94|Memoize OptCode: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|03|Length of String len('red')|
|72 65 64|string: 'red'|
|94|Memoize OptCode: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|01|Length of String len('g')|
|67|string: 'g'|
|94|Memoize OptCode: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|05|Length of String len('green')|
|67 72 65 65 6e|string: 'green'|
|94|Memoize OptiCode: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|01|Length of String len('b')|
|62|string: 'b'|
|94|Memoize OptCode: MEMOIZE|
|8c|String OptCode: SHORT_BINUNICODE|
|04|Length of String len('blue')|
|62 6c 75 65|string: 'blue'|
|94|Memoize OptCode: MEMOIZE|
|75|Dict SetItem OptCode: SETITEMS|
|2e|Stop OptCode: STOP|

A Python ```float``` is a double precision floating point number and the IEEE representation can be used using the ```struct.pack```:

In [88]:
import struct
struct.pack('>d', 0.1).hex()

'3fb999999999999a'

The picked version of the ```float``` can then be examined:

In [89]:
pickle.dumps(0.1).hex()

'8004950a00000000000000473fb999999999999a2e'

Which means:

|Hex Value|Meaning|
|---|---|
|80|Start OptCode: PROTO|
|04|Protocol Version: DEFAULT_PROTOCOL|
|95|Frame OptCode: FRAME|
|0a 00 00 00 00 00 00 00|Frame Size: 10 bytes (0xa)|
|2e|Stop OptCode: STOP|
|47|BINFLOAT|
|3f b9 99 99 99 99 9a|0.1 in 64 Bit IEEE|
|2e|Stop OptCode: pickle.STOP.hex()|

Normally, the fine details of the opticodes is not required:

In [90]:
bytestream = pickle.dumps({'r': 'red', 'g': 'green', 'b': 'blue'})

In [91]:
bytestream

b'\x80\x04\x95&\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x01r\x94\x8c\x03red\x94\x8c\x01g\x94\x8c\x05green\x94\x8c\x01b\x94\x8c\x04blue\x94u.'

And the ```loads``` function is used to read the data back into a Python ```object```:

In [92]:
pickle.loads(bytestream)

{'r': 'red', 'g': 'green', 'b': 'blue'}

Like the ```json``` module, the ```pickle``` module has the counterparts ```dump``` and ```load``` which are used to dump data and load data to a file:

In [93]:
with open('./files_output/serialised.pkl', mode='wb') as file:
    pickle.dump(obj=1, file=file)
    pickle.dump(obj='hello', file=file)
    pickle.dump(obj=('hello', 'world', '!'), file=file)
    pickle.dump(obj=['hello', 'world', '!'], file=file)
    pickle.dump(obj={'r': 'red', 'g': 'green', 'b': 'blue'}, file=file)
    pickle.dump(obj=0.1, file=file)

In the previous files examined, the new line was used as a delimiter. In the ```.pkl``` file, there is only a new line character if it is present in the ```bytes``` instance by chance:

In [94]:
with open('./files_output/serialised.pkl', mode='rb') as file:
    data = file.readlines()

data

[b'\x80\x04K\x01.\x80\x04\x95\t\x00\x00\x00\x00\x00\x00\x00\x8c\x05hello\x94.\x80\x04\x95\x17\x00\x00\x00\x00\x00\x00\x00\x8c\x05hello\x94\x8c\x05world\x94\x8c\x01!\x94\x87\x94.\x80\x04\x95\x19\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x05hello\x94\x8c\x05world\x94\x8c\x01!\x94e.\x80\x04\x95&\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x01r\x94\x8c\x03red\x94\x8c\x01g\x94\x8c\x05green\x94\x8c\x01b\x94\x8c\x04blue\x94u.\x80\x04\x95\n',
 b'\x00\x00\x00\x00\x00\x00\x00G?\xb9\x99\x99\x99\x99\x99\x9a.']

The ```load``` function will only read out one value at a time using the Start Opticode:

In [95]:
with open('./files_output/serialised.pkl', mode='rb') as file:
    print(pickle.load(file))
    print(pickle.load(file))

1
hello


The file is essentially an iterator and there is no easy way to determine the number of items left in the file, therefore a ```for``` loop cannot be constructed. Instead a ```while``` loop can be constructed that is set to ```break``` when an ```EOFError``` is encountered:

In [96]:
unpickled_data = {}

with open('./files_output/serialised.pkl', mode='rb') as file:
    num = 0
    while True:
        try:
            unpickled_data[num] = pickle.load(file=file)
            num += 1
        except EOFError:
            break

unpickled_data

{0: 1,
 1: 'hello',
 2: ('hello', 'world', '!'),
 3: ['hello', 'world', '!'],
 4: {'r': 'red', 'g': 'green', 'b': 'blue'},
 5: 0.1}

It is more common to store pickled objects in a database and the ```shelve``` module is used to do this. If the identifiers for the ```shelve``` module are examined, notice there is an ```open``` method:

In [97]:
dir2(shelve)

{'attribute': ['collections'],
 'constant': ['DEFAULT_PROTOCOL'],
 'method': ['open'],
 'upper_class': ['BsdDbShelf',
                 'BytesIO',
                 'DbfilenameShelf',
                 'Pickler',
                 'Shelf',
                 'Unpickler'],
 'datamodel_attribute': ['__all__',
                         '__builtins__',
                         '__cached__',
                         '__doc__',
                         '__file__',
                         '__loader__',
                         '__name__',
                         '__package__',
                         '__spec__'],
 'internal_method': ['_ClosedDict']}


This behaves analogously to the ```open``` method found in ```io``` and is typically inplemented via a ```with``` code block. The identifiers for the ```DbfilenameShelf``` instance created can be examined. Notice the consistency with ```dict``` identifiers. The database can essentially be thought of as a set of shelves, each shelf has a ```key``` and a ```value```. Conceptually each pickled object is shelved in the database, which is where the name of the module ```shelve``` comes from:

In [98]:
with shelve.open('./files_output/database.pkl') as database:
    print(type(database))
    dir2(database, object, unique_only=True)

<class 'shelve.DbfilenameShelf'>
{'attribute': ['cache', 'dict', 'keyencoding', 'writeback'],
 'method': ['clear',
            'close',
            'get',
            'items',
            'keys',
            'pop',
            'popitem',
            'setdefault',
            'sync',
            'update',
            'values'],
 'datamodel_attribute': ['__abstractmethods__',
                         '__dict__',
                         '__module__',
                         '__reversed__',
                         '__slots__',
                         '__weakref__'],
 'datamodel_method': ['__class_getitem__',
                      '__contains__',
                      '__del__',
                      '__delitem__',
                      '__enter__',
                      '__exit__',
                      '__getitem__',
                      '__iter__',
                      '__len__',
                      '__setitem__'],
 'internal_attribute': ['_MutableMapping__marker', '_abc_impl', '

Each ```value```, a Python ```object``` to be pickled can be saved on a new shelf using a new shelve ```key```:

In [99]:
with shelve.open('./files_output/database.pkl') as database:
    database['unicode_string'] = 'hello'
    database['archive'] = ('hello', 'world', '!')
    database['active'] = ['hello', 'world', '!']
    database['mapping'] = {'r': 'red', 'g': 'green', 'b': 'blue'}
    database['num'] = 0.1

Normally the database is read using a ```with``` code block but for convenience it can be read directly:

In [100]:
database = shelve.open('./files_output/database.pkl')

The ```keys``` attribute only shows a ```KeysView```:

In [101]:
database.keys()

KeysView(<shelve.DbfilenameShelf object at 0x0000020AC03A14F0>)

This can be cast into a ```list``` to view the keys:

In [102]:
list(database.keys())

['unicode_string', 'archive', 'active', 'mapping', 'num']

An item can be selected by indexing using its ```key```:

In [103]:
database['unicode_string']

'hello'

When the item is accessed it is unpickled.

Since the database wasn't accessed in a ```with``` code block it should be closed using:

In [104]:
database.close()

[Return to Anaconda Tutorial](../readme.md)