# Developing Functions

This exercise will lead you through taking some common data processing steps and wrapping them up into a reusable function.

In [None]:
import pandas as pd
import numpy as np
from pathlib import Path # handy for working with file paths, consistent across systems (windows, mac, unix)

In [None]:
pd.__version__ # check we're working with the same version of pandas

In [None]:
data_filepath = Path("../data/my_csv_file.csv")

In [None]:
# import a csv file

In [None]:
# rename some of the columns

In [None]:
# check the data types are correct

In [None]:
# drop rows with no data

In [None]:
# fill zeros with np.nan

In [None]:
# add a new parameter

In [None]:
# add a data quality flag

In [None]:
# output this to a 'processed files' folder

In [None]:
# Combine the data processing steps above into a reusable function.

def process_csv_file(filepath, output_folder = Path("../data/processed/"), fill_with=np.nan):
    """
    Process a csv file so it's ready for exploratory data analysis.
    
    Parameters
    -----------
    filepath
        Path to the csv file to import.
    output_folder 
        Path to the folder where you want the processed version to reside.
    fill_with 
        Value to substitute for zero.
        
    Returns
    --------
    output_filepath
        Path to the csv file which is output.
        
    Notes
    --------
    This function will convert data types and fill zeros with the specified value.
    """
    pass

Copy this file, and the libraries imported above into the separate file `processor.py`. 

Now when we want to use this function, we can import it:

In [None]:
from process_pipeline.processor import process_csv_file

If you want to check back to see what arguments the function takes, you can use the inline help:

In [None]:
help(process_csv_file)

In [None]:
process_csv_file(data_filepath, 
                 output_folder =Path("../data/another_processed_data_folder/"), 
                 fill_with=" ")