# Separating Data and Code
Good coding practice is to keep your data and code separate.
This means not having your data embedded within the code itself.
As a result, you will be able to apply the same code to different data sets.

## The bad way

In [None]:
# import the various modules that we'll need for this lesson
import numpy as np
import pandas as pd
from matplotlib import pyplot
import scipy
import scipy.ndimage

In [None]:
# make some random data
a = np.random.normal(loc=0, scale=1, size=1000)
smoothed = scipy.ndimage.gaussian_filter1d(a, sigma=2)

# plot the data
fig = pyplot.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(a)
ax.set_xlabel('Index')
ax.set_ylabel('Value')
# save to a file
pyplot.savefig('NicePlot.png')
# show the plot
pyplot.show()

## A better way
1. Generate/collect your data and save it to a file someplace.
2. Load this data and plot, save the plot.

Generating /collecting data could be done in another script but for now we do it in one cell.

In [None]:
# make data
a = np.random.normal(loc=0, scale=1, size=1000)
# convert to a pandas data frame (see later for pandas intro)
df = pd.DataFrame(data={'Numbers':a})
# write to a csv file using a format that is easy to read (csv)
df.to_csv('random_data.csv', index=False)

Let's take a peek at the file to see what is in it.

In [None]:
! head random_data.csv

Now lets make a function to plot that data. Again this could be in a separate script, but we'll just use a separate cell.

In [None]:
def plot():
    """
    Read a data file and plot it to another file.
    """
    tab = pd.read_csv('random_data.csv')
    data = tab['Numbers']
    smoothed = scipy.ndimage.gaussian_filter1d(data, sigma=2)
    
    fig = pyplot.figure()
    ax = fig.add_subplot(1,1,1)
    ax.plot(a)
    ax.set_xlabel('Index')
    ax.set_ylabel('Value')
    pyplot.savefig('NicePlot.png')
    pyplot.show()
    return

plot()

# Questions:

If I want to use a different data set (maybe something that's not random junk), what do I need to change?

If I want to save the file to a different location or use a different name, what do I need to change?

If I have 10 input files, how to I automate reading all the data and saving all the figures?

---

The above solution is good, but we want to be able to re-use the function with different filenames without having to edit the code.

How about we make the input/output filenames arguments to the function?

In [None]:
def plot(infile, outfile):
    """
    Read a data file and plot it to another file.
    
    Parameters
    ----------
    infile : string
        The input filename. Assume csv format.
        Assumed to have a column called 'Numbers' that we are going to plot.
        
    outfile : string
        The output filename. Should be .png or something that matplotlib
        knows how to write
    """
    tab = pd.read_csv(infile)
    data = tab['Numbers']
    smoothed = scipy.ndimage.gaussian_filter1d(data, sigma=2)
    
    fig = pyplot.figure()
    ax = fig.add_subplot(1,1,1)
    ax.plot(a)
    ax.set_xlabel('Index')
    ax.set_ylabel('Value')
    pyplot.savefig(outfile)
    return

In [None]:
plot(infile='random_data.csv',
     outfile='NicePlot.png')

## Solved?

What other modifications would we need to make to this code so that we could have a greater control on the intput/output files?