## Getting started 2: the batteries

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cedargrid/grid-battery-data/blob/main/notebooks/02_getting_started_battery.ipynb)

This notebook looks at the battery data from M5bat. This part of the data complements the system-level data that we explored in a separate notebook. There are 10 different batteries in this dataset; we only examine a couple of them here.

The data is available in a [zip file here](https://publications.rwth-aachen.de/record/985923/files/M5BAT_04-2023_RAW.zip). There's an [overview of the data here](https://publications.rwth-aachen.de/record/985923/files/Report_04-2023.pdf).

After you download the data, unzip it in your working directory. You might need to adjust the paths below.

Python 3.9 or newer should be fine for this notebook. Check the requirements.txt file for required packages.

In [None]:
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

pd.options.display.float_format = '{:.2f}'.format
pio.templates.default = 'seaborn'

## Battery data
This is data captured at each individual battery unit.

Refer to https://publications.rwth-aachen.de/record/985923/files/Report_04-2023.pdf for details.

Description of the data:

| Variable | Description | Unit |
| --- | --- | ---- |
| DateAndTime | Date and Time | UTC Timezone ('yyyy-MM-dd HH:mm:ss')|
| P_AC_Set | Setpoint for active power of battery unit after the inverter | kW (- = charging; + = discharging) |
| Q_AC_Set | Setpoint for reactive power of battery unit after the inverter | kVAr |
| P_AC | Active power of battery unit after the inverter | kW (- = charging; + = discharging) |
| Q_AC | Reactive power of battery unit after the inverter | kVAr |
| SOC | State of Charge (BMS value) | 0.1% |
| I_DC_Batt | Current measured at battery unit | 0.1A |
| U_DC_Batt | Voltage measured at battery unit | 0.1V |
| Mode_PQ | Inverter Mode (Power output) | True / False |
| Mode_Stop | Inverter Mode (Stop) | True / False |
| Mode_Silent | Inverter Mode (Silent) | True / False |
| Mode_Wait | Inverter Mode (Switching) | True / False |
| Interpolated | Interpolation signal for data evaluation | True = Value linear interpolated |

## Loading the data
The following cell shows how to read the raw data. Uncomment the code to run it.

**If you don't want to bother with the raw data, skip to the next cell.**

In [None]:
# We'll start with the battery number 10, which is the smallest capacity (230 kWh) and is the only one with a LTO chemistry.

# batt_full = pd.read_csv('Batt10.csv', sep=';', parse_dates=['DateAndTime'], index_col='DateAndTime')

# We'll downsample this for an initial exploration.|

#batt = batt_full.sample(frac=.001, random_state=13).sort_index()  # data gets scrambled when sampling, so we re-sort
#batt

In [None]:
# Pre-sampled data (< 1 MB) stored in github.
github_file_path = 'https://github.com/cedargrid/grid-battery-data/raw/e4beb41c933ee8825349315bee74a26209c99588/notebooks/batt10_sample.parquet'
batt = pd.read_parquet(github_file_path)

In [None]:
fig = px.line(batt, y=batt.SOC / 10, labels=dict(y='state of charge %'))
fig.update_yaxes(range=[1, 100])

The battery is kept mostly in the 30%-70% SOC range and is charged / discharged quite actively.

## One day energy flow

In [None]:
# Generate the oneday sample from the raw data:
# apr13 = datetime(2023, 4, 13)  # nothing particularly special about this date.
# apr14 = datetime(2023, 4, 14)
# oneday = batt_full.loc[apr13 : apr14].copy()

github_file_path = 'https://github.com/cedargrid/grid-battery-data/raw/13d27e96a8067a3d817517fa439e95330d33b4a7/notebooks/batt10_one_day.parquet'
oneday = pd.read_parquet(github_file_path)
oneday['current'] = oneday.I_DC_Batt / 10  # convert to units of amps
oneday['voltage'] = oneday.U_DC_Batt / 10  # convert to units of volts

First we'll look at how current is flowing in and out of the battery

In [None]:
px.line(oneday, y='current')

We see a couple of things:
* The battery is discharging at up to ~900A at times, and charging at up to ~600A. 
* The current is changing sign frequently, meaning the battery is going from absorbing energy to releasing energy and vice-versa many times of day. (Bonus question: what properties of a battery are important for this type of use?)

Let's quantify how often it is changing direction.

In [None]:
inclusive_changes = (np.sign(oneday.current) != np.sign(oneday.current.shift())).sum()  # includes changes from 0 to - or 0 to + and changes from - to + or + to -.
exclusive_changes = (np.sign(oneday.current) * np.sign(oneday.current.shift()) < 0).sum()  # only includes changes from - to + and + to -

print(inclusive_changes, exclusive_changes, len(oneday))

In one day, the current changes direction about 800 times (or 1500 depending on how you count). That's once every couple of minutes! So we see that this battery is quite dynamic. On the other hand, the amount of energy pushed or pulled in each of these segments is faily small.

Next let's examine how power and energy are distributed: how much energy is flowing into and out of the battery in a day? And how is power distributed?


In [None]:
# Amount of power at each step
oneday['power'] = oneday['voltage'] * oneday['current']

discharge_hours = (oneday.power < 0).sum() / 60 / 60
charge_hours = (oneday.power > 0).sum() / 60 / 60
nothing_hours = (oneday.power == 0).sum() / 60 / 60

print(f"The battery is discharging for {discharge_hours:.0f} hours; "
      f"charging for {charge_hours:.0f} hours; and "
      f"doing nothing for {nothing_hours:.0f} hours.")

# Plot a histogram of the entire day. We take a log of the count because the graph is hard to read in the normal scale (try it)
px.histogram(oneday, x=oneday.power / 1000, log_y=True, labels=dict(x='power (kW)'))

The histogram shows that the most common state of the battery is near-0 power or close to it. It also shows that +/- 200 kW is an additional common state.

In [None]:
# Power is in Watts and each time step is a second, so to compute energy in kWh, we need to sum and divide by 3.6e6
oneday['energy'] = oneday['power'] / 3.6e6
total_discharged_energy = oneday.loc[oneday.power < 0].energy.sum()
total_charged_energy = oneday.loc[oneday.power > 0].energy.sum()
net_energy = oneday.energy.sum()
print(f"Over the day, {-total_discharged_energy:.0f} kWh flowed into the battery; "
      f"{total_charged_energy:.0f} kWh flowed out of the battery; so "
      f"net energy was: {net_energy:.0f} kWh")

## Exercises
1. Summarize the battery stats day over day. Start with stats similar to those shown above (number of sign changes, time spent discharging/charging). Hint: use `resample('1d')` followed by an aggregate. Do these change much each day? Are there patterns? Extra research: what might be driving the frequency shifts?
2. Load the data from one of the other batteries in the dataset (your choice) and compare. Do these have a similar number of sign changes? Is the power / energy similar across the batteries?

## Correlating the grid frequency with the battery current

The battery current flows in response to the frequency of the grid. Can we see that in the data? 

Note that in this section we are looking at Battery 1 and 10.

First we load the raw data, from 3 files this time, and then combine them. Uncomment to run.

**If you don't want to bother with the raw data, skip to the next cell**

In [None]:
# Load the raw data.  Again, you can skip to the next step if you don't want to bother with it.
'''  <--- remove to run
batt1_full = pd.read_csv('Batt1.csv', sep=';', parse_dates=['DateAndTime'], index_col='DateAndTime')
batt10_full = pd.read_csv('Batt10.csv', sep=';', parse_dates=['DateAndTime'], index_col='DateAndTime')
bess = pd.read_csv('BESS.csv', sep=';', parse_dates=['DateAndTime'], index_col='DateAndTime')

# Process it
eg_day = datetime(2023, 4, 7)  # just picked at random; nothing special about this day
eg_day_range = pd.date_range(eg_day, eg_day + timedelta(days=1), freq='1s', inclusive='left')
oneday_batt1 = batt1_full.loc[eg_day_range]
oneday_batt10 = batt10_full.loc[eg_day_range]
oneday_bess = bess.loc[eg_day_range]

together = oneday_batt1.join(oneday_batt10, lsuffix='_batt1', rsuffix='_batt10')
together = together.join(oneday_bess, lsuffix='_batt', rsuffix='_bess')
'''

In [None]:
github_file_path = 'https://github.com/cedargrid/grid-battery-data/raw/13d27e96a8067a3d817517fa439e95330d33b4a7/notebooks/bess_battery_together_sample.parquet'
together = pd.read_parquet(github_file_path)

When the grid frequency is above 50k, the battery should respond by absorbing energy, so the current should go negative. Conversely, when the grid frequency is below 50k, we'd expect the current to be positive. To verify, we correlate the sign shifted grid frequency with the sign of battery current.

In [None]:
together['grid_freq_sign'] = np.sign(together['Grid_frequency'] - 50000)
together['batt1_current_sign'] = np.sign(together['I_DC_Batt_batt1'])
together['batt10_current_sign'] = np.sign(together['I_DC_Batt_batt10'])

#print(together['grid_freq_sign'].value_counts())
#print(together['batt10_current_sign'].value_counts())

print(together[['grid_freq_sign', 'batt1_current_sign']].corr())

So they are inversely correlated, as expected, but not perfectly so. Next we investigate whether the two signals are offset in time; e.g. perhaps the battery takes some time to respond.

In [None]:
shift_seconds = list(range(-120, 120, 5))
correlations_1 = [together['grid_freq_sign'].corr(together['batt1_current_sign'].shift(i), method='spearman') for i in shift_seconds]
fig = go.Figure()
fig.add_scatter(x=shift_seconds, y=correlations_1, name='battery 1')
fig.update_layout(xaxis_title="Time shift (s)", yaxis_title="corr coefficient", yaxis_range=[-.7, 0], title='Correlation between battery and grid freq, shifted in time')


The plot shows that the (anti)-correlation peaks at 5-10 seconds; i.e. the battery is about 5-10 seconds behind.

## Exercises
The battery and grid frequency shifts are (anti) correlated, but not perfectly so. Why?
1. There is some margin around the 50k frequency in which the system will not react. Explore margin sizes to find the effect
2. Look at the data from the remainder of the batteries. Are they moving in unison with the battery described here?

## Aside / appendix: sanity checking the change in energy by the change in state-of-charge (SOC)

This section shows that the SOC readings are a little suspicious. It is not essential for working with the data, but we kept it in because working through data issues is so common.

In [None]:
oneday['soc_change'] = oneday.SOC.diff()
net_soc_change = oneday.soc_change.sum() / 10   # SOC is in units of .1%
est_net_energy = net_soc_change / 100 * 230
print(f"Net SOC change: {net_soc_change:.1f}%; estimated net energy: {est_net_energy:.0f} kWh")

The sanity check failed! Let's look at the change in SOC over time

In [None]:
px.scatter(oneday, y=oneday.soc_change / 10, labels=dict(y='soc change %'))

This is suspicious: there are several places where the SOC jumps by 2-3 percentage points instantaneously. This is theoretically 
possible, but it does not match the power values at those points. Thus it appears there are data issues with the SOC.