<a href="https://colab.research.google.com/github/BYU-Hydroinformatics/baseflow-notebooks/blob/main/baseflow_utils.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

***This module contains a collection of helper functions designed to assist with various tasks related to streamflow data analysis, geographic coordinate transformations, and more. Below is an overview of the functions provided in this module, along with a brief description of their purpose and usage.***

***These utility functions are designed to streamline common tasks in data processing and analysis, making your workflow more efficient and effective.***

# Install baseflow package from github

In [None]:
!pip install git+https://github.com/BYU-Hydroinformatics/baseflow.git@merge-my-changes

Collecting git+https://github.com/BYU-Hydroinformatics/baseflow.git@merge-my-changes
  Cloning https://github.com/BYU-Hydroinformatics/baseflow.git (to revision merge-my-changes) to /tmp/pip-req-build-gkgcyy0b
  Running command git clone --filter=blob:none --quiet https://github.com/BYU-Hydroinformatics/baseflow.git /tmp/pip-req-build-gkgcyy0b
  Running command git checkout -b merge-my-changes --track origin/merge-my-changes
  Switched to a new branch 'merge-my-changes'
  Branch 'merge-my-changes' set up to track remote branch 'merge-my-changes' from 'origin'.
  Resolved https://github.com/BYU-Hydroinformatics/baseflow.git to commit 466857305d76f94ee4b28f38a9b2937a1050bf3b
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: baseflow
  Building wheel for baseflow (setup.py) ... [?25l[?25hdone
  Created wheel for baseflow: filename=baseflow-0.0.9-py3-none-any.whl size=101286 sha256=add8e6e232d9e94c749a91bfff1177517242d3eca422e03113a7b81c94091773

# Load necessary packages

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objs as go

### Load example Q data

In [None]:
df = pd.read_csv(baseflow.example, index_col=0, parse_dates=True)
Q = df[df.columns[0]]
Q

time
2001-01-01     4.089
2001-01-02     6.633
2001-01-03     6.530
2001-01-04     4.725
2001-01-05     4.242
               ...  
2010-12-27     1.781
2010-12-28     1.338
2010-12-29     6.804
2010-12-30     4.191
2010-12-31    42.535
Name: GRDC_1160815, Length: 3652, dtype: float64

# Utils functions

This section introduces each function and their usage in detail.

You need to import it before use.

In [None]:
import baseflow
import baseflow.utils

## clean_streamflow

The clean_streamflow function is designed to clean up a series of streamflow data. Streamflow data typically represents the flow of water in a river or stream over time. This function ensures that the data is valid and useful by removing any invalid values ​​and keeping only the years that have enough data points.

In [None]:
help(baseflow.utils.clean_streamflow)

Help on function clean_streamflow in module baseflow.utils:

clean_streamflow(series)
    Cleans a streamflow time series by removing invalid values and keeping only years with at least 120 data points.
    
    Args:
        series (pandas.Series): The streamflow time series to be cleaned.
    
    Returns:
        tuple: A tuple containing the cleaned streamflow values and the corresponding dates.



It is easy to call this function following the introduction in the help().

In [None]:
clean = baseflow.utils.clean_streamflow(Q)
clean

(array([ 4.089,  6.633,  6.53 , ...,  6.804,  4.191, 42.535]),
 DatetimeIndex(['2001-01-01', '2001-01-02', '2001-01-03', '2001-01-04',
                '2001-01-05', '2001-01-06', '2001-01-07', '2001-01-08',
                '2001-01-09', '2001-01-10',
                ...
                '2010-12-22', '2010-12-23', '2010-12-24', '2010-12-25',
                '2010-12-26', '2010-12-27', '2010-12-28', '2010-12-29',
                '2010-12-30', '2010-12-31'],
               dtype='datetime64[ns]', name='time', length=3652, freq=None))

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=Q_reset, mode='lines', name='Q'))
fig.add_trace(go.Scatter(y=clean_reset, mode='lines', name='clean'))
fig.show()

## exist_ice

The purpose of the exist_ice function is to check whether a given date falls within a specified ice period. This can be useful in various applications, such as environmental studies or hydrology, where understanding the presence of ice during certain times of the year is important.

In [None]:
help(baseflow.utils.exist_ice)

Help on function exist_ice in module baseflow.utils:

exist_ice(date, ice_period)
    Checks if a given date falls within an ice period.
    
    Args:
        date (datetime.datetime): The date to check.
        ice_period (tuple or numpy.ndarray): The ice period, either as a tuple of (start_month, start_day, end_month, end_day) or as a numpy array of months.
    
    Returns:
        bool or numpy.ndarray: True if the date falls within the ice period, False otherwise. If `ice_period` is a numpy array, the return value will be a numpy array of the same shape.



In [None]:
# Convert the index of the 'clean' Series to DatetimeIndex
ice = baseflow.utils.exist_ice(pd.to_datetime(clean.index), ice_period= ((3,1),(4,1)))
ice

array([False, False, False, ..., False, False, False])

In [None]:
clean_ice = clean.copy()
clean_ice[~ice] = None

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=clean, mode='lines', name='clean'))

fig.add_trace(go.Scatter(y=clean_ice, mode='markers', name='ice'))

fig.update_layout(showlegend=True)

fig.show()

## moving_average
The moving_average function is designed to calculate the moving average of a given list of numbers. A moving average is a way to smooth out data by creating a series of averages of different subsets of the full data set. This can be useful for identifying trends in data over time.

In [None]:
help(baseflow.utils.moving_average)

Help on function moving_average in module baseflow.utils:

moving_average(x, w)
    Computes the moving average of the input array `x` using a window size of `w`.
    
    Args:
        x (numpy.ndarray): The input array.
        w (int): The window size for the moving average.
    
    Returns:
        numpy.ndarray: The moving average of the input array `x`.



In [None]:
clean_series = pd.Series(clean_list)  # Convert the list to a pandas Series
mav = clean_series.rolling(window=5).mean()

In [None]:
# Assuming mav is calculated using one of the methods above
fig = go.Figure()
fig.add_trace(go.Scatter(y=clean_list, mode='lines', name='Original Data'))
fig.add_trace(go.Scatter(y=mav, mode='lines', name='Moving Average'))

fig.update_layout(showlegend=True)
fig.show()

## geo2imagexy

The purpose of the geo2imagexy function is to convert geographic coordinates, which are typically given as longitude (x) and latitude (y), into image coordinates, which are represented as column (col) and row (row) indices. This is useful when you want to map geographic data onto an image or a grid.

In [None]:
help(baseflow.utils.geo2imagexy)

Help on function geo2imagexy in module baseflow.utils:

geo2imagexy(x, y)
    Converts geographic coordinates (x, y) to image coordinates (col, row).
    
    Args:
        x (float): The x-coordinate in geographic space.
        y (float): The y-coordinate in geographic space.
    
    Returns:
        Tuple[int, int]: The corresponding column and row indices in image space.



In [None]:
# Assuming you have latitude and longitude coordinates
lat = 34.0522
lon = -118.2437

# And an image file
image_file = "path/to/your/image.jpg"

# Calculate x, y coordinates on the image
# Removed image_file as an argument
x, y = baseflow.utils.geo2imagexy(lat, lon)

print("X coordinate:", x)
print("Y coordinate:", y)

X coordinate: 428
Y coordinate: 416


## Original Kling-Gupta Efficiency (KGE)

Original Kling-Gupta Efficiency (KGE) and its three components (r, α, β) as per [Gupta et al., 2009](https://doi.org/10.1016/j.jhydrol.2009.08.003).  

The kge function is designed to calculate a statistical measure called the Kling-Gupta Efficiency (KGE). This measure is used to evaluate how well a set of simulated data matches a set of observed or real-world data. The KGE is particularly useful in hydrology and environmental sciences to assess the performance of models that predict streamflow or other environmental variables.

Note, all four values KGE, r, α, β are returned, in this order.

**Calculation Details:**

$$
E_{\text{KGE}} = 1 - \sqrt{[r - 1]^2 + [\alpha - 1]^2 + [\beta - 1]^2}
$$

$$
r = \frac{\text{cov}(e, s)}{\sigma(e) \cdot \sigma(s)}
$$

$$
\alpha = \frac{\sigma(s)}{\sigma(e)}
$$

$$
\beta = \frac{\mu(s)}{\mu(e)}
$$

where *e* is the *evaluation* series, *s* is (one of) the *simulations* series, *cov* is the covariance, *σ* is the standard deviation, and *μ* is the arithmetic mean.


In [None]:
help(baseflow.utils.kge)

In [None]:
simulations = Q['simulations'].values
evaluation = Q['evaluation'].values

In [None]:
kge_value = kge(simulations, evaluation)

print(f"Kling-Gupta Efficiency (KGE): {kge_value}")