# Clock Plot Example usage
This notebook provides examples of how to use clock_plot

In [127]:
import pandas as pd
import os
import clock_plot.clock as cp
import plotly.express as px
import clock_plot
import warnings
import os
import urllib.request as urllib2
import json
from datetime import date, datetime

In [128]:
# Suppress warnings as there is a warning coming from plotly using pd.append rather than pd.concat
warnings.simplefilter(action='ignore', category=FutureWarning)

## Example data 1 - energy usage for a single house
This data is a cleaned version of energy usage for a single house from the Energy System Catapult's Living Lab - full data available [here](https://usmart.io/org/esc/discovery?tags=Living%20Lab).

In [129]:
readings_raw = pd.read_csv(os.path.join(clock_plot.CLOCK_PLOT_DIR, "..", "data", "eden_2_houseid_324_combined_data.csv"))
readings_raw["datetime"] = pd.to_datetime(readings_raw["datetime"])

As is generally the case with Plotly (the package clock_plot is built on), data needs to be in a long format rather than wide in order for grouping to work nicely.

In [130]:
readings = readings_raw.melt( id_vars=["datetime"], value_vars=["reading_elec","reading_gas"])
readings.rename(columns={"variable":"fuel","value":"reading"}, inplace=True)
readings["fuel"] = readings["fuel"].map({"reading_elec":"elec","reading_gas":"gas"})
readings.head()

Unnamed: 0,datetime,fuel,reading
0,2017-12-08 10:00:00,elec,423.688148
1,2017-12-08 11:00:00,elec,371.261564
2,2017-12-08 12:00:00,elec,448.061277
3,2017-12-08 13:00:00,elec,547.379868
4,2017-12-08 14:00:00,elec,279.790683


We can start with a basic plot, showing gas usage by season. To do this, we specify the datetime and value columns, filter to fuel='gas' and specify that we want to define color based on season. Note that the colors for season are automatically defined to be relatively intuitive (though these can be overwritten).

In [131]:
fig = cp.clock_plot(readings, datetime_col='datetime', value_col='reading', 
                    filters={'fuel': 'gas'}, 
                    color='season',
                    title_start='Seasonal usage')

Next, we can focus in on summer and winter months. To do this we filter based on season, using a list to specify the two seasons we are interested in, and then color based on month. We then apply a custom color sequence to make it more visually appealing.

We can see that in this example the definition of season works well - winter and summer months have very different usage patterns.

In [132]:
fig = cp.clock_plot(readings, datetime_col='datetime', value_col='reading', 
                    filters={'fuel': 'gas', 
                            'season':['Winter', 'Summer']}, 
                    color='month',
                    color_discrete_sequence=px.colors.qualitative.Prism + [px.colors.qualitative.Prism[2]],
                    title_start='Monthly usage',
                    )

The next plot shows how we can look at both granular data and averages on the same plot. For example, we can look at individual weeks within 2018 using line_group='week' and then include an average for the whole year by specifying aggregate={'year':'mean'}. Note that a subtitle is automatically generated explaining that each line represents a single week.

You can see from the plot that there's a consistent peak at either 7 or 8am, with much more varied usage in the late afternoon/evening that gets smoothed out by the average.

In [133]:
fig = cp.clock_plot(readings, datetime_col='datetime', value_col='reading', 
                    filters={'fuel': 'elec', 'year': 2018}, 
                    color='year',
                    line_group='week',
                    aggregate={'year':'mean'},
                    title_start='Usage')

We hypothesise that much of the variation might occur over the weekend, so we can split by that. If you toggle elements on/off (using the legend) you can see that the weekend is more variable, but weekdays still do have a lot of variation.

In [134]:
fig = cp.clock_plot(readings, datetime_col='datetime', value_col='reading', 
                    filters={'fuel': 'elec', 'year': 2018}, 
                    color='weekend',
                    line_group='week',
                    aggregate={'weekend':'mean'},
                    color_discrete_sequence=['red', 'blue'],
                    title_start='Weekend/Weekday Usage')

We can break things down even further by plotting a line for each date. Note that we can resize the plot by passing the standard Plotly height and width parameters in. We can also plot a non-smoothed version by specifying line_shape='linear'.

In [135]:
fig = cp.clock_plot(readings, datetime_col='datetime', value_col='reading', 
                    filters={'fuel': 'gas', 'year': 2018, 'month':'January'}, 
                    line_group='date',
                    height=600,
                    width=600,
                    title_start='Usage')
fig = cp.clock_plot(readings, datetime_col='datetime', value_col='reading', 
                    filters={'fuel': 'gas', 'year': 2018, 'month':'January'}, 
                    line_group='date',
                    line_shape='linear',
                    height=600,
                    width=600,
                    title_start='Usage')                    

As well as defining color, we can use line_dash so we can compare across two dimensions at once. For example, the below plot shows both gas and electricity for 2018 and 2019. It also shows a good example of the type of pattern that might be discoverable using these plots - between 2018 and 2019 the shapes have rotated one hour earlier, suggesting a change in routine for whoever is living in this house.

In [136]:
fig = cp.clock_plot(readings, datetime_col="datetime", value_col="reading",
                    filters={"season":"Summer"},
                    color="fuel",
                    line_dash="year",
                    title_start=f'Gas and electric usage by year', 
)

## Dataset 2 - UK Energy mix
This data is from the National Grid Electricity System Operator and looks and which fuels were used to generate the UK's energy (available [here](https://data.nationalgrideso.com/carbon-intensity1/historic-generation-mix))

In [137]:
# Load in Energy Mix Data
n_years_offset = 10
n_offset = 48*365*n_years_offset
n_years = 2
n_rows = 48*365*n_years
url = f"https://data.nationalgrideso.com/api/3/action/datastore_search?resource_id=f93d1835-75bc-43e5-84ad-12472b180a98&limit={n_rows}&offset={n_offset}"
fileobj = urllib2.urlopen(url)
data = fileobj.read()
datadict = json.loads(data.decode('utf-8'))
gen_mix_raw = pd.DataFrame( datadict["result"]["records"] )
gen_mix_raw.head()

Unnamed: 0,_id,DATETIME,GAS,COAL,NUCLEAR,WIND,HYDRO,IMPORTS,BIOMASS,OTHER,...,IMPORTS_perc,BIOMASS_perc,OTHER_perc,SOLAR_perc,STORAGE_perc,GENERATION_perc,LOW_CARBON_perc,ZERO_CARBON_perc,RENEWABLE_perc,FOSSIL_perc
0,175162,2018-12-29T04:30:00,3357,0,6891,9801,410,1978,1343,63,...,8.3,5.6,0.3,0.0,0.0,100,77.4,71.7,42.8,14.1
1,175163,2018-12-29T05:00:00,3584,0,6882,10319,422,1926,1519,64,...,7.8,6.1,0.3,0.0,0.0,100,77.4,71.3,43.5,14.5
2,175164,2018-12-29T05:30:00,3506,0,6891,10721,434,1903,1532,63,...,7.6,6.1,0.3,0.0,0.0,100,78.2,72.0,44.5,14.0
3,175165,2018-12-29T06:00:00,4231,0,6881,11090,440,1841,1542,64,...,7.1,5.9,0.2,0.0,0.0,100,76.5,70.6,44.2,16.2
4,175166,2018-12-29T06:30:00,4486,0,6889,11596,443,1747,1505,65,...,6.5,5.6,0.2,0.0,0.0,100,76.4,70.8,45.0,16.8


In [138]:
# 'Melt' the DataFrame, to get columns of 'DATETIME', 'SOURCE' and 'VALUE'
gen_mix = gen_mix_raw.melt(id_vars = ["DATETIME"], value_vars = ["GAS","COAL","NUCLEAR","WIND","HYDRO","IMPORTS","BIOMASS","OTHER","SOLAR"])
gen_mix.rename(columns={"variable":"SOURCE", "value":"VALUE"}, inplace=True)
gen_mix['DATETIME'] = pd.to_datetime(gen_mix['DATETIME'])
gen_mix.head()

Unnamed: 0,DATETIME,SOURCE,VALUE
0,2018-12-29 04:30:00,GAS,3357
1,2018-12-29 05:00:00,GAS,3584
2,2018-12-29 05:30:00,GAS,3506
3,2018-12-29 06:00:00,GAS,4231
4,2018-12-29 06:30:00,GAS,4486


Let's look at how different energy sources vary based on time of day - we'll focus on gas, wind and solar. Note the difference between the two plots below - the first plot has one line per year_month (e.g. 202101), whereas the second has one row per month (e.g. "January") - i.e. the same month across multiple years is aggregated.

We can see wind doesn't vary much by time of day, gas is used a lot less overnight (when demand is low), and unsurprisingly solar is used during daylight hours!

In [139]:
fig = cp.clock_plot(gen_mix, datetime_col='DATETIME', value_col='VALUE',
                    filters={'SOURCE':['GAS','WIND','SOLAR']},  
                    color='SOURCE', 
                    line_group='year_month',
                    height=600, width=600,
                    title_start='UK Energy generation',
                    color_discrete_sequence=['red', 'blue', 'green'],
                    category_orders={ 'SOURCE': ['GAS','WIND','SOLAR'] } )
fig = cp.clock_plot(gen_mix, datetime_col='DATETIME', value_col='VALUE',
                    filters={'SOURCE':['GAS','WIND','SOLAR']},  
                    color='SOURCE', 
                    line_group='month',
                    height=600, width=600,
                    title_start='UK Energy generation',
                    color_discrete_sequence=['red', 'blue', 'green'],
                    category_orders={ 'SOURCE': ['GAS','WIND','SOLAR'] } )

Let's dig into solar in more detail by looking at it across seasons. Since we have half-hourly data we can switch from the default 24 bins per day (i.e. hourly) to 48 (i.e. half-hourly). Unsurprisingly solar generates for fewer hours in winter than summer. However, even in summer it doesn't generate much between 6pm and 6am. Note that spring and summer look quite similar - this is because the definition of Spring is Apr-May and Summer is Jun-Aug, so the summer solstice (longest day) occurs right near the beginning of summer.

In [140]:
fig = cp.clock_plot(gen_mix, datetime_col='DATETIME', value_col='VALUE',
                    filters={'SOURCE':'SOLAR'},  
                    color='season', 
                    title_start='Energy generation',
                    line_group='week',
                    line_shape='linear',
                    aggregate={'season':'mean'},
                    bins_per_day=48)

We can create a custom season field that is more centred on the solstices and feed that into the plot. That results in a plot that is more like what we would expect!

In [141]:
Y = 2000 # dummy leap year to allow input X-02-29 (leap day)
seasons = [('winter', (date(Y,  1,  1),  date(Y,  2, 5))),
           ('spring', (date(Y,  2, 6),   date(Y,  5, 5))),
           ('summer', (date(Y,  5, 6),  date(Y,  8, 6))),
           ('autumn', (date(Y,  8, 7),  date(Y, 11, 5))),
           ('winter', (date(Y, 11, 6),  date(Y, 12, 31)))]

def get_season(now):
    if isinstance(now, datetime):
        now = now.date()
    now = now.replace(year=Y)
    return next(season for season, (start, end) in seasons
                if start <= now <= end)

In [142]:
gen_mix['season_accurate'] = gen_mix['DATETIME'].apply(get_season)

In [143]:
fig = cp.clock_plot(gen_mix, datetime_col='DATETIME', value_col='VALUE',
                    filters={'SOURCE':'SOLAR'},  
                    color='season_accurate', 
                    title_start='Energy generation',
                    line_group='week',
                    line_shape='linear',
                    aggregate={'season_accurate':'mean'},
                    category_orders={ 'season_accurate': ['spring','summer','autumn', 'winter'] },                     
                    bins_per_day=48)