project-eddy

Extraction daily insights from water consumption

The idea of this work is to try and extract daily insights about per household water consumption whilst not yet having smart meters. The way we do it here is by looking at each day individually as an aggregate of all meter readings that include that day and assume that is suggestive of what the consumption for that day is in general compared to other days, and also whether for some of the different types of households that trend is different.

Here is an illustration of the idea - the inputs to the model for that day are all the meter readings that cross over that day:

Here is an example of one of the resulting distribution for one of the customers, after applying the series of daily models - rather than a series of flat lines we now have daily variance, which arguably is a step in the right direction:

A good way to progress this work is to install sufficient number of smart meters to be able to validate the daily variance and then use the insights to map onto the rest of the network and have a really useful PCC / Marketing solution. Meter read data is often partial, having one or two reads over many years for a large portion of the customers, and there are often data quality issues, for example date or reading entered wrong that now plays tricks on our model.

Performance

You can expect to get the following performance on the final model:

Training set score: 82%
Testing set score: 77%

Modelling technique

We use xgboost regression throughout.

Process

Featurisation
-Aggregate average weather figures across the reading window -Use the dummy variable approach to categoricals -Apply exclusions
Build daily variance models
-Select modelling window
-Create a model for each day of the modelling window
Redistribute daily consumption figures according to the daily models' insight
-Run daily model predicton on the values for the day rather than aggregates over the meter read window
-Preserve total consumption figures per meter read window
Generate final daily dataset
-Take samples from the data in order to fit into memory
-Ensure preservation of more of the COVID data as it is scarcer
-Add seasonality and COVID19 variables
-Add persisting weather trends variables
Build general daily model
-Ensure we split the data by customer rather than by day to get fair train vs. test insights
-Hide previous consumption data from the model at this point in order to learn to separate data under COVID influence better
Redistribute daily consumption yet again during the COVID19 window
-Take away COVID's impact as we can't afford to exclude most of 2020 data from our model
-Estimate percentage impact of COVID19
Build final model
-Add all previous consumption variables to increase our accuracy
-Take away all COVID-related variables
-Produce model validation

Data

Note: Data is anonymised. Use some or all of the data to build a model or create your own dataset by using your own networks' data.

Meter readings
-Includes domestic vs nondomestic
-Includes measured vs Unmeasured
Household descriptive data
-Class description
-Misc descriptions
-Building class
-Building age
-Building type
-Storeys
Acorn demographics data
-Category, groupy and type
-Occupancy rate
Weather data - median daily AW-wide values per weather variable
-Air Pressure
-Compass
-Humidity
-Lightning Count
-Radiation
-Rainfall
-Rainfall Intensity
-Temperature
-Wind Chill
-Wind Direction
-Wind Speed
Seasonality
-Month of the year
-Day of the week
-School holidays
-Public holidays
-COVID period
-COVID stage
-New COVID deaths
-New Hospital admissions
-New COVID cases

Why is this useful?

Know to-date per capita consumption figures across the board
Run various what-if scenarios. Any scenario-related data that a company anticipates to use must be included into the model. Alternatively the model can be quickly re-trained to include new types of data
Improve marketing campaigns and their targetting
This model can still be used at meter read windows level to predict what the next / current meter reading would look like, then we can fully validate it with 80-90% accuracy reading by reading, and overall PHC / PCC figures that vary <1% compared to actuals
The daily models can be skipped and the general and final model can use the standard 'daily_consumption' as dependent variable rather than the redistributed 'new_daily_consumption' if it feels like we are doing too much data manipulation here.

Next steps

Source smart meter data and map insights to existing network by using clustering (acorn types, property variables and so on) or simple mapping onto acorn types for example.
Install smart meters in one representative DMA for insight as well as validation
Share ideas among water companies and propose changes / improvements to this model

Requirements

256GB Memory
12-24+ hours running time depending on the number of cores used, we used 32
Jupyter Notebook - see requirements.txt for libraries used. Use 'pip install -r requirements.txt' to install the libraries.

Any questions?

Please post them here and we'll do our best to answer them.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
LICENSE		LICENSE
README.md		README.md
extracting_daily_insights_from_water_consumption_meter_read_windows_with_xgboost.ipynb		extracting_daily_insights_from_water_consumption_meter_read_windows_with_xgboost.ipynb
misc_data_preprocessed.pqt		misc_data_preprocessed.pqt
phc_data.pqt		phc_data.pqt
phclib.py		phclib.py
requirements.txt		requirements.txt
weather_data.pqt		weather_data.pqt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

project-eddy

Extraction daily insights from water consumption

Performance

Modelling technique

Process

Data

Why is this useful?

Next steps

Requirements

Any questions?

About

Releases

Packages

Contributors 3

Languages

License

affinitywater/project-eddy

Folders and files

Latest commit

History

Repository files navigation

project-eddy

Extraction daily insights from water consumption

Performance

Modelling technique

Process

Data

Why is this useful?

Next steps

Requirements

Any questions?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages