### Chapter 3 - DS for Public Policy
- Exploration of solar data from the National Institute of Standards and Technology (NIST) Net Zero Energy Residential Test Facility, a house that produces as much energy as it uses in a year.
- Looking at a slice of the net zero house's photovoltaic time series data from 2015. The dataset contains hourly estimates of solar energy production and exposure on the Net Zero home's solar panels.
- We want to know how much energy is produced a year and how variable that production is over the course of a year

In [1]:
# Import statements of necessary libraries
import polars as plr
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os

In [None]:
# Change the working directory to our desired path
os.chdir("Public Policy/")
# Load the solar data from a CSV file into a Polars DataFrame
solar = plr.read_csv("data/PV-hour.csv")
# Print the shape of the DataFrame
print(solar.shape)
# View first 10 rows of the DataFrame
solar.head(10)

(8760, 32)


Timestamp,PV_PVSystem1ACEnergyOSEACPV1OS,PV_PVSystem2ACEnergyOSEACPV2OS,PV_PVBacksideTemp2,PV_PVBacksideTemp3,PV_PVBacksideTemp4,PV_PVBacksideTemp7,PV_StringVoltageUStr2,PV_StringVoltageUStr4,PV_VoltsANUAN1,PV_VoltsBNUBN1,PV_AmpsAIA1,PV_AmpsBIB1,PV_Watts3PhTotalW3PhT1,PV_PowerFactor3PhTotalPF3PhT1,PV_FrequencyF1,PV_WhoursDeliveredWhD1,PV_VoltsANUAN2,PV_VoltsBNUBN2,PV_AmpsAIA2,PV_AmpsBIB2,PV_Watts3PhTotalW3PhT2,PV_PowerFactor3PhTotalPF3PhT2,PV_FrequencyF2,PV_WhoursDeliveredWhD2,PV_StringCurrentIStr1,PV_StringCurrentIStr2,PV_StringCurrentIStr3,PV_StringCurrentIStr4,PV_PVSystem1ACPowerOSPACPV1OS,PV_PVSystem2ACPowerOSPACPV2OS,PV_PVInsolationHArray
str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""2015-02-01 00:00:00-05:00""",0.0,0.0,-10.143994,-10.451637,-10.145245,-10.205501,1.335409,1.334891,118.322563,118.361702,0.338393,0.331943,9.543753,0.120366,59.997329,8826412.0,118.39797,118.394806,0.336017,0.335954,8.952575,0.112473,59.98686,8796006.0,-2e-06,-1.6e-05,9e-05,3.2e-05,1.247293,1.524335,0.057408
"""2015-02-01 01:00:00-05:00""",0.0,0.0,-10.104545,-10.039162,-9.990295,-10.08447,1.257941,1.243191,117.749032,117.788625,0.336826,0.330343,9.469817,0.120451,60.014858,8826421.0,117.82274,117.823719,0.334386,0.334338,8.861749,0.112738,60.004418,8796015.0,-6e-06,-1.1e-05,7.8e-05,3.3e-05,1.246217,1.595343,0.060286
"""2015-02-01 02:00:00-05:00""",0.0,0.0,-9.213759,-9.110991,-9.033023,-8.999277,1.153671,1.130783,117.76434,117.802009,0.336814,0.33028,9.497751,0.120825,60.008762,8826431.0,117.838662,117.837107,0.334385,0.334326,8.886896,0.112955,59.998251,8796024.0,-9e-06,-1.7e-05,9e-05,2.7e-05,1.240928,1.616503,0.058811
"""2015-02-01 03:00:00-05:00""",0.0,0.0,-8.030592,-7.893998,-7.93456,-7.73677,1.282356,1.266588,117.984424,118.023583,0.337472,0.330887,9.55212,0.121009,60.006634,8826440.0,118.059252,118.058993,0.33505,0.334921,8.949422,0.113124,59.996355,8796033.0,-9e-06,-1.6e-05,9e-05,2.5e-05,1.254152,1.629727,0.058865
"""2015-02-01 04:00:00-05:00""",0.0,0.0,-5.850926,-5.66464,-5.549987,-5.31752,1.268114,1.272691,118.005577,118.045006,0.33761,0.33096,9.558916,0.121021,60.012212,8826450.0,118.083527,118.082426,0.335125,0.335017,8.965035,0.113217,60.002402,8796042.0,-2e-06,-1.6e-05,8.5e-05,3e-05,1.256797,1.611213,0.059288
"""2015-02-01 05:00:00-05:00""",0.0,0.0,-5.063146,-4.919412,-4.872442,-4.675449,1.16893,1.16893,117.858716,117.899114,0.337094,0.330437,9.543731,0.121268,60.006757,8826459.0,117.935362,117.93502,0.334629,0.334518,8.958629,0.113534,59.996785,8796051.0,-7e-06,-2.3e-05,9e-05,2.9e-05,1.251507,1.650886,0.058733
"""2015-02-01 06:00:00-05:00""",0.0,0.0,-4.698231,-4.548714,-4.527002,-4.374637,5.845814,5.891082,117.746719,117.785858,0.336815,0.330153,9.538776,0.121424,60.011114,8826469.0,117.824394,117.82369,0.334313,0.334205,8.939389,0.113545,60.000763,8796060.0,0.000102,9.8e-05,0.000204,0.000148,1.264732,1.637662,0.059928
"""2015-02-01 07:00:00-05:00""",12.0,11.0,-3.238496,-3.039984,-3.0507,-2.89071,282.053042,283.421266,117.744462,117.783794,0.502199,0.498733,22.215017,0.151573,59.998826,8826491.0,117.822197,117.821282,0.475477,0.476004,22.056976,0.146224,59.988774,8796080.0,0.03489,0.035122,0.033183,0.034785,12.381218,14.256448,0.599965
"""2015-02-01 08:00:00-05:00""",511.0,504.0,0.685583,0.481678,0.673011,0.631826,437.095672,437.921692,117.519982,117.622736,2.405711,2.417697,515.35496,0.815021,60.000788,8827001.0,117.596193,117.654995,2.413096,2.419166,511.769933,0.808565,59.989848,8796587.0,0.6331,0.634723,0.62114,0.629558,501.583334,500.289981,5.740961
"""2015-02-01 09:00:00-05:00""",1192.0,1179.0,5.329179,4.658096,5.11116,4.646399,442.41852,443.295912,117.79341,117.89553,2.991736,3.008612,691.488409,0.973707,60.001833,8827692.0,117.870666,117.929743,2.992955,3.001119,689.472603,0.973896,59.991289,8797276.0,0.842655,0.845785,0.832538,0.840221,683.250478,677.556022,7.538642


We are interested in the total amount of sunlight shining on the solar arrays at any given hour (kWh), and will focus on the `PV_PVInsolationHArray` in the DataFrame `solar`. 

In [3]:
# View the 5 number summary of the PV_PVInsolationHArray
solar[["PV_PVInsolationHArray"]].describe()

statistic,PV_PVInsolationHArray
str,f64
"""count""",8760.0
"""null_count""",0.0
"""mean""",9.452628
"""std""",14.533754
"""min""",0.0
"""25%""",0.055849
"""50%""",0.339645
"""75%""",14.842793
"""max""",54.843525


So, we can view the hourly variability in this basic summary. This tells us that the PV arrays are exposed to fairly small amounts of energy for most hours as indicated by the median relative to the mean. However, there are periods where the PV arrays are exposed to intense sunlight.

We can measure the variability by using the coefficient of variation (CV) by dividing the standard deviation, $\sigma$, in relation to the mean, $\mu$:

$CV = \frac{\sigma}{\mu}$

In [4]:
# Calculate the coefficient of variation (CV) for the PV_PVInsolationHArray
cv = solar["PV_PVInsolationHArray"].std() / solar["PV_PVInsolationHArray"].mean()
print(f"Coefficient of Variation (CV): {cv}")

Coefficient of Variation (CV): 1.5375357839988257


Values of the coefficient of variation that exceed 1 indicate greater dispersion, and this indicates one standard deviation is 1.54 times (rounded) as wide as the mean - variable energy generation.

Can we view this with a line graph?

In [5]:
# Convert the 'Timestamp' column to datetime format
solar.with_columns(
    plr.col("Timestamp").str.to_datetime("%Y-%m-%d %H:%M:%S", strict=False)
)
# Create a line plot using Plotly
fig = px.line(solar.to_pandas(), x="Timestamp", y="PV_PVInsolationHArray",
              title="Hourly PV Insolation on Solar Arrays",
              labels={"PV_PVInsolationHArray": "PV Insolation (kWh)", "Timestamp": "Time"})
fig.show()

High variability... will come back to this later, but for now we will leave this work behind.