# Functions

Now that we have a baseline workflow, how do we organize it into those seperate boxes in the workflow.  That is, how do we create those modules so that things are modular?

We use functions!

## Introduction
How many of you have created a function in Python before?

Even if you haven't created one, you've most certainly used them.  In fact, we've already used them today.  Remember get_record() from the dataretrival package?  That's a function.

## Parts of a function
```df = nwis.get_record(sites='04294000', service='iv', start='2022-06-01', end='2022-11-01', parameterCD='00060')```  
  
***Name:*** get_record  
  This is how you call the function

***Parameters:*** sites, service, start, end, parameterCD
  These are variables you pass to the function to customize it... to get it to do what you want.

***Return Value:*** df
  This is what the function returns to you

## Creating a Function

OK... let's create our first function for our first workflow block, acquire_data().  Note that I have to define it before I use it...


In [None]:
import dataretrieval.nwis as nwis

def acquire_data():
    df = nwis.get_record(sites='04294000', service='iv', start='2022-06-01', end='2022-11-01', parameterCD='00060')
    return df

returned_df = acquire_data()
returned_df

## Parameterizing the Function

So, now we have a function, but it's not super useful as a function... it always returns the same thing, streamflow at site 0429400 between June 1, 2022 and November 1, 2022.

Let's define some parameters to make this more useful...  But first, we need to talk about a critical concept when working with functions.

### Variable Scope

One of the most common concepts to trip up folks who are just starting with functions is the concept of variable scope.  When you create a function, all the variables you define within the function, stay in the function.

That means they are ***NOT*** accessible outside the function. So, with this in mind, is df above, which is defined in acquire_data(), available outside acquire_data()?


In [None]:
df

So why do programming languages do this?  ***To make sure the functions stay modular.***  Functions should take as parameters everything they need to do their job and then return their work without modifying any other variables you define.  If functions could define and modify variables outside the function, this could quickly wreck havoc!  Because of variable scope, you can reuse variable names inside the function knowing that those variables are seperate from their variables outside of the functions:

In [None]:
import dataretrieval.nwis as nwis

def acquire_data():
    # df inside the function
    df = nwis.get_record(sites='04294000', service='iv', start='2022-06-01', end='2022-11-01', parameterCD='00060')
    return df

# a totally seperate df outside the function
df = acquire_data()
df

OK, with variable scope understood, let's parameterize some of the variables in our function and change the name to better describe exactly what it does...

In [3]:
import dataretrieval.nwis as nwis

def acquire_streamflow_nwis_iv(site, start, end):
    df = nwis.get_record(sites=site, service='iv', start=start, end=end, parameterCD='00060')
    return df

df = acquire_streamflow_nwis_iv(site='04294000', start="2022-06-01", end="2022-11-01")
df

Unnamed: 0_level_0,site_no,00060,00060_cd
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-06-01 04:00:00+00:00,04294000,2240.0,A
2022-06-01 04:15:00+00:00,04294000,2210.0,A
2022-06-01 04:30:00+00:00,04294000,2210.0,A
2022-06-01 04:45:00+00:00,04294000,2190.0,A
2022-06-01 05:00:00+00:00,04294000,2190.0,A
...,...,...,...
2022-11-02 02:45:00+00:00,04294000,914.0,A
2022-11-02 03:00:00+00:00,04294000,860.0,A
2022-11-02 03:15:00+00:00,04294000,780.0,A
2022-11-02 03:30:00+00:00,04294000,718.0,A


## Integrate back into Baseline Workflow

Now that we have this function, let's use it!

In [4]:
import dataretrieval.nwis as nwis

def acquire_streamflow_nwis_iv(site, start, end):
    df = nwis.get_record(sites=site, service='iv', start=start, end=end, parameterCD='00060')
    return df

# Acquire / Filter
# Replace old code with our new function
# df = nwis.get_record(sites='04294000', service='iv', start='2022-06-01', end='2022-11-01', parameterCD='00060')
df = acquire_streamflow_nwis_iv(site='04294000', start='2022-06-01', end='2022-11-01')

# Manipulate
daily = df['00060'].resample('1D').mean()

# Visualize
daily.describe()

count     155.000000
mean      989.816458
std      1278.847398
min       110.748958
25%       306.796875
50%       549.020833
75%      1064.348958
max      9091.354167
Name: 00060, dtype: float64

In [None]:
## Functionalize/Modularize Rest of the Workflow

In [7]:
import dataretrieval.nwis as nwis

def acquire_streamflow_nwis_iv(site, start, end):
    df = nwis.get_record(sites=site, service='iv', start=start, end=end, parameterCD='00060')
    return df

def resample_to_daily(df):
    return df['00060'].resample('1D').mean()

def visualize_summary_statistics(df):
    print(df.describe())

# Acquire / Filter
df = acquire_streamflow_nwis_iv(site='04294000', start='2022-06-01', end='2022-11-01')

# Manipulate
daily = resample_to_daily(df)

# Visualize
visualize_summary_statistics(daily)

count     155.000000
mean      989.816458
std      1278.847398
min       110.748958
25%       306.796875
50%       549.020833
75%      1064.348958
max      9091.354167
Name: 00060, dtype: float64
