# <span style="color:#1B3EA9"><b>Saving data to disk</b></span>

<br>

After parsing CSV, HTML, XML or other types of data, it is sometimes convenient to save the parsed data, so that the orginal data does not need to re-parsed, and thus so that future data processing can be done more efficiently. This is especially true when the original data files contain much more information (e.g. 1 GB) than the amount of information you intend to use (e.g. 1 MB).

As a reminder:

**parse** = "to analyze or separate (input, for example) into more easily processed components" (source: [wordnik.com](https://www.wordnik.com/words/parse))

<br>

⚠️ **WARNING!**  &nbsp; &nbsp; The directory in which this notebook is saved contains a number of CSV files ("data*.csv"). Running this notebook will opverwrite those files, so optionally copy the CSV files elsewhere as a backup.

<br>

___

First let's import the modules we'll need for this lecture.

In [1]:
import os
import csv
import numpy as np

<a name="toc"></a>
# Table of Contents

* [np.savetxt](#np.savetxt)
* [csv.writer](#csv.writer)
* [csv.DictWriter](#csv.DictWriter)
* [open](#open)

___

<a name="np.savetxt"></a>
## Using `np.savetxt` to save data
[Back to Table of Contents](#toc)
<br>

The [np.savetxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html) function is the easiest way to read well-formatted text data.

Let's first create an array of random numbers.

In [2]:
np.random.seed(0)
a = np.random.randn(10, 3)
a = np.asarray(1000 * a, dtype=int)   # convert to large integers (for easier reading)

print(a)

[[ 1764   400   978]
 [ 2240  1867  -977]
 [  950  -151  -103]
 [  410   144  1454]
 [  761   121   443]
 [  333  1494  -205]
 [  313  -854 -2552]
 [  653   864  -742]
 [ 2269 -1454    45]
 [ -187  1532  1469]]


Next, let's save this array in a CSV file, in the *same folder* as this notebook:

In [3]:
dir0     = os.path.abspath('')   # directory in which this notebook is saved
fnameCSV = os.path.join( dir0, 'data-np.csv' )

np.savetxt(fnameCSV, a, delimiter=',')

If the cell above executed without errors, you should now see a file called "data-np.csv" in the same folder as this notebook. Verify that its contents are the same as the array above.

`np.savetxt` is very useful for writing purely numerical data. If you would also like to save other kinds of data (like strings), it can sometimes be more convenient to work with another writing function, like `csv.writer` or `open` (see below).

___

<a name="csv.writer"></a>
## Using `csv.writer` to save data
[Back to Table of Contents](#toc)
<br>

The [csv.writer](https://docs.python.org/2/library/csv.html) function can also be used to write data files.

Let's use `csv.writer` to save the same array as above.

In [4]:
dir0     = os.path.abspath('')      # directory in which this notebook is saved
fnameCSV = os.path.join( dir0, 'data-csv.csv' )

with open(fnameCSV, 'w') as fid:    # open in write mode
    writer = csv.writer(fid)        # create a writer object
    for aa in a:                    # cycle through rows
        writer.writerow(aa)         # write the current row to file

This can be made even simpler with `writerows`:

In [5]:
with open(fnameCSV, 'w') as fid:    # open in write mode
    writer = csv.writer(fid)        # create a writer object
    writer.writerows( a )           # write all rows

If the cells above executed without errors, you should now see a file called "data-csv.csv" in the same folder as this notebook. Verify that its contents are the same as the array above.

For a simple numerical array like above, `csv.writer` has no advantages over `np.savetxt`. Nevertheless, `csv.writer` can be more useful in some cases. One example is when using dictionaries to store data.

Consider the following dictionaries which contain musical song information:

In [6]:
song0 = dict(title='Peaceful Easy Feeling', artist='Eagles', album='Eagles')
song1 = dict(artist='Michael Jackson', album='Thriller', title='Billie Jean')
song2 = dict(album='Homework', artist='Daft Punk', title='Da Funk')
song3 = dict(artist='Yagya', title='Snowflake 4', album='Rhythm Of Snow')
song4 = dict(artist='Gigi Masin', title='Venice In Winter', album='Kite')

Note that all songs contain the same information (artist, song and album), but that the data are entered in a different order in each dictionary definition.  This type of inconsistent ordering occurs often in real-world data files.

Although the data could be organized into an array, with three columns representing artist, song and album, then written to file using `np.savetxt` or `csv.writer`, there is an easier way to write disctionary content, using `csv.DictWriter`.

___

<a name="csv.DictWriter"></a>
## Using `csv.DictWriter` to save data
[Back to Table of Contents](#toc)
<br>

The [csv.DictWriter](https://docs.python.org/2/library/csv.html) class can also be used to write data files.

Let's use `csv.DictWriter` to write the five songs above to a CSV file.

In [7]:
dir0     = os.path.abspath('')      # directory in which this notebook is saved
fnameCSV = os.path.join( dir0, 'data-csv-dict.csv' )

songs    = [song0, song1, song2, song3, song4]

with open(fnameCSV, 'w') as fid:
    colnames = ['artist', 'title', 'album']  # column names
    writer   = csv.DictWriter(fid, fieldnames=colnames)
    writer.writeheader()
    for song in songs:
        writer.writerow( song )

In the saved file (`data-csv-dict.csv`), you'll notice that `csv.DictWriter` has automatically re-ordered all song information, according to the `colnames` order.

___

<a name="open"></a>
## Using `open` to save data
[Back to Table of Contents](#toc)
<br>

Like reading data, the `open` function is the most flexible way to write data files.

Let's use `open` to write the five songs above to a CSV file.

In [8]:
dir0     = os.path.abspath('')   # directory in which this notebook is saved
fnameCSV = os.path.join( dir0, 'data-open.csv' )

with open(fnameCSV, 'w') as fid:
    fid.write( 'artist,title,album\n' )  # write column names with a newline character
    for song in songs:
        a  = song['artist']
        t  = song['title']
        al = song['album']
        fid.write( f'{a},{t},{al}\n' )