# Understanding Water Consumption

# Introduction 

### Background information on water companies in the UK:

Water companies in the UK were privatised in 1989 and, as most water companies are regional monopolies, household customers cannot select their water company.  Water companies are governed by Ofwat, the economic regulator, and must submit returns as part of a 5-year cycle price review process. The current regulatory period is known as PR19, the next regulatory period will be PR24. 

https://commonslibrary.parliament.uk/research-briefings/cbp-8931/#:~:text=The%20water%20industry%20in%20England%20and%20Wales%20was,or%20switch%20their%20supplier%20and%20competition%20is%20limited.


Increasing water scarcity due to, amongst other factors, population growth, climate change, and changing weather patterns make understanding of household water consumption drivers critical for water resource management. By understanding the drivers of household water consumption, policymakers and water companies can develop effective strategies to manage water resources, reduce wastage, and ensure sustainable water supplies for the future. 

Increased abstraction, for both household consumption and non-household use, places pressure on freshwater ecosystems, reduces river flows, and depletes groundwater resources whilst production, treatment and distribution of potable water increases energy consumption, increases carbon emissions all of which come with financial and non-financial costs. Infrastructure planning, policy amendments and behavioural change educational campaigns and initiatives require detailed understanding of the drivers of ever increasing water consumption if  water scarcity, escalating costs and environmental impacts are to be managed so that net zero positions can be reached and behavioural change can be embedded in order to ensure sustainable water management for the future.

In March 2023 the scientific journals Water, Environments, Forests and Remote Sensing, called for submission of papers on the topic “Remote Sensing in Water Resources Management Models” (Ref https://www.mdpi.com/topics/Remote_Sensing_Water) indicating that there is a desire for further research focused on water resource management in general. 

A standard method for tracking water consumption is through calculation of per capita consumption (PCC) (litres of water used per person per day). One of the many performance commitments for water companies, is to reduce their PCC by an agreed number of litres per person per day, the gold standard being 100 litres per person per day. Calculation of PCC relies heavily on accurate population statistics at a district metered area (DMA) level and metering of all households. 

Not all households within the UK are metered with meter penetration being below 60% in some areas. Where meter data exists, meter reading can prove challenging due to limited access to properties, the high cost of manual and drive by meter readings and reluctance of customers to actively share details of their personal water consumption with water companies.  To date, evidence shared with households on the impact of water consumption on their environment has been restricted mostly to theoretical statements or less than robust statistics. 

The focus of water companies to reduce household consumption relies heavy on behavioural change interventions to encourage lower water consumption, kickback from consumers is that water companies are not doing their bit in reducing leakage - this results in people being less motivated to reduce their water consumption. 

A more scientific method of depicting water consumption at a DMA level, without the need to meter every household, would provide evidence to communities of their overall impact on water consumption and provide valuable input into behavioural change campaigns, whilst also providing water companies insight into which areas to target high cost infrastructure spend such as smart metering programmes, as a longer term solution to achieving individual accountability for decreasing household consumption.   
 
This tutorial provide some very basic tools that can be shared with non-technical users whilst also encouraging researchers and  developers to further explore how measurements such as built up indices, landuse classification. correlation of building age with high consumption (the hypothesis being that older households may have unknown customer side leakage) can be be used to understand drivers of PCC. 

Sharing impirical evidence with consumers may encourage adoption of sustainable water efficient behaviours. 

Data used is publically available data, the intention being to supplement this further with more advanced python scripting in the future, Google Earth Engine analysis and water company specific data. In addition to exploring drivers of consumption, challenging the hypothesis that increasing household consumption is the root cause of increasing water demand in areas with recorded high PCC by uncovering the relationship between leakage, non-household and household consumption and how this correlates with land use changes over time.  
<br>
<br>
**Literature review:**<br> 
<br>
Prediction of water consumption based on population (https://link.springer.com/article/10.1007/s11356-021-12368-0)

https://www.researchgate.net/publication/363059307_Assessing_the_causality_relationship_and_time_series_model_for_electricity_consumption_per_capita_and_human_development_in_Colombia

Relationship between urban ecological water demand and land use structure in rapid urbanization area
https://www.researchgate.net/publication/289350922_Relationship_between_urban_ecological_water_demand_and_land_use_structure_in_rapid_urbanization_area

https://www.researchgate.net/publication/344333563_Land-Use_Change_and_Future_Water_Demand_in_California%27s_Central_Coast

**Note:**
<br>
The tools described in this tutorial do not provide an end to end solution for understanding water consumption, they support the process and are intended to stimulate further exploration of  water consumption drives with a view of discovering alternate methods for monitoring and managing household water consumption.  

## Set-up and Installation 

Clone the repository: 
How
URL to repository
Desired director to clone it to 

Create and activate the conda environment: 
Separate environement
Instructions on how to create and activate it 

Install dependencies: 
List dependencies for your code
Requirements.txt file in   repository with the necessary packages and versions. install using conda install 

Data requirements: 
Details of data required 
Instructions on where to obtain them, links, sample data

Execution instructions: 
Main entry point / specific script to execute. 

Examples and tutorials: 
Example use cases to demonstrate the code. 
Step-by-step instructions and sample input/output to help

Troubleshooting and support: 
Common issues and guidance on how to resolve them


<summary>

<summary>

## Repository Link

Tools covered in this tutorial are: 
Creating a chloropleth map of water company boundaries given a Shapefile
Amending the chloropleth map to create a map showing per capita consumption for a selected period given consumption data
Creating an interactive folium map showing per capita consumption for a given period for each water company
Selecting and download a satellite image using the SentinelAPI for a given time period for a given water company area
Pearson's correlation analysis to determine whether there is a correlation between household water consumption and land use and population and household consumption
Stacking image bands to create a composite image (3 bands)
Principle Component Analysis using two jpeg images

Data sources: 


## Creating a chloropleth map of water company boundaries given a shape file

***Methodology:***

Water company boundaries can be visualised using the function wrz_boundaries within the water_company_boundaries module (data source is xxxx). The number of companies and the chloropleth map are returned.

Packages required to run this code include os, geopandas, pyplot from matplotlib and crs from cartopy.

The function wrz_boundaries is defined within the water_company_boundaries module. This function generates a plot displaying the boundaries of water companies in England and Wales based on the provided company data. The function reads the company data from a given file (the pathway to the file is hardcoded), creates a GeoDataFrame using GeoPandas, and plots the boundaries of the water companies on a map. Each company is represented by a different color. The resulting plot provides a visual representation of the geographic distribution of water companies. 

The co-ordinate reference system for the plot is set to Transverse Mercator, British National Grid (27700) using crs from cartopy. Pyplot is used to plot the map. 

The colour scheme can be changed, some examples being 'jet' (blue to red through green and yellow) and 'cividis' (best choice for people with color vision deficiencies as it transitions from dark to light and has noticeable variations in brightness and hue). This function is hardcoded to 'viridis'. More detail on selection choices can be found in the matplotlib documentation library. (https://matplotlib.org/stable/tutorials/colors/colormaps.html) 
     
The function plots the company data (COMPANY) which is hardcoded. The data source, however is a variable called company_data. The example data is available in the 'waterdemand' Github repository or can be downloaded from the Commons Library (
https://commonslibrary.parliament.uk/constituency-information-water-companies/#watercompanies). The output for the function is a chloropleth plot of the water resource boundaries for all water companies in England and Wales. 

The Title of the graph is hardcoded within the function. 

***A refinement to this code would be to create variable for the column to plot and for the title, to add gridlines and to all a scalebar.***


**Calling the function module:**

In [None]:
from water_company_boundaries import wrz_boundaries
company_data = 'data_files/WaterSupplyAreas_incNAVs v1_4.shp'  
wrz_boundaries(company_data)  # calls the function to create a chloropleth map of English and Welsh the Water Supply Areas

The above code also creates a png image called wrz.png which is written to the data_files folder within the waterdemand directory.

<div style="display:flex">
    <img src="data_files/wrz.png" style="width:60%">
    <div style="margin-left:5px">
    
     
        
<br>
<br>
<br>
<br>
Images such as this can be used as input into behavioural change campaigns and strategic water resource management, for example,  or to identify areas that require further investigation. 
        <br>
        <br>
Although this map is at a water resource zone level, this script can be easily modified to show district metered areas if required. 

</div>


## Amending the chloropleth map to create a map showing per capita consumption for a selected period given consumption data

***Methodology:***

Using the above chloropleth map as a starting point, other data can be merged with the water company shape file to map, for instance, per capita consumption for a selected period.  

Packages required to run this code include os, pandas, geopandas, pyplot from matplotlib and crs from cartopy.

This function is hard-coded to use the same water resource boundaries as given above. Consumption data is from Ofwat PR24 submission open data available from  https://www.ofwat.gov.uk/publication/historical-performance-trends-for-pr24-v2-0/.

***Note: The code can be amended to visualise total household consumption, leakage etc as this information is available within the provided dataset.*** 

The function loads geographical (using geopandas) and statistical (using pandas) data, merges them based on water company acronyms, and creates a chloropleth map using the GeoDataFrame created from the merge. The map shows the variation in PCC across different areas, helping to visualize the differences in water consumption patterns and to identify areas of high consumption which can then be looked at in further detail. 

The input for this function is a string called pcc_period which gives the time period for which the PCC data is being visualized. The format for the data is YYYY-YY, for example '2019-20' or '2011-12'.

The function returns the chloropleth map, which displays the average per capita consumption per water company area for the selected period. 

The title is hardcoded into the function. 

***Further enhancements to this code would be to create a template which could be populated with the required data to allow visualisation of a district metered area (DMA) within a water company boundary and to create a variable for the title.***

**Caution:** 
<br>
If an image has been generated and amendments are required to the image, the image must be deleted from the directory prior to rerunning to code if the name of the output image is to remain the same.

The function module is called as follows: 


In [None]:
from chloropleth import chloropleth_pcc 

pcc_period = '2019-20'
chloropleth_pcc(pcc_period)

Below are examples of two periods showing the change in PCC.

<div style="display:flex">
    <img src="data_files/2011_12pcc.jpg" style="width:50%">
    <img src="data_files/2019_20pcc .jpg" style="width:50%">
</div>

## Creating an interactive folium map showing per capita consumption for a given period for each water company

***Methodology:***

An interactive view of the chloropleth map can be created using a Folium map (Figure 5). This map view is valuable to behavioural change experts as they are able to zoom into areas of high consumption to identify environmental areas of interest to use as strategic nudges (reference nudging and choice architecture) to influence individuals to adopt desired behaviours without feeling coerced.   

As for the chloropleth maps detailed above, this function requires os, pandas and geopandas to run. In addition it requires folium to run. Folium creates an interactive map that can be viewed in a web browser. It also allows for customisable markers and icons, layer control and can be integrated with numpy and pandas libraries. Additional information on functionality can be found at https://python-visualization.github.io/folium/

The function folium_map_pcc is imported from folium_pcc and creates a basic folium map showing PCC data for water companies. Although this is hardcoded, there is a standard function called folium_map within module std_folium_map, which will be covered later in this tutorial, that has input variables. The function module folium_map_pcc returns an interactive map representing PCC data. This can be amended to include other relevant data, for instance leakage, total household consumption an example. 

***This code has limited value but is simple and easy to use if needed by a person not familiar with python***

The function module is called as follows: 


In [None]:
from folium_pcc import folium_pcc_map
from IPython import display

pcc_map = folium_pcc_map()
display.display(pcc_map)

<div style="display:flex">
    <img src="folium_.png" style="width:80%">
</div>

                Figure 5:  Interactive Folium map of per capita water consumption per water company area. 

## Selecting and download a satellite image using the SentinelAPI for a given time period for a given water company area

<br>
Once an area has been selected for further review, for instance Bournemouth which appears to have a high PCC, the sentinel satelite image can be downloaded to continue with further analysis. 

***Methodology:***

Once an area has been selected for further analysis, for instance the high consumption area of Bournemouth, a Sentinel Satellite image can be downloaded using the download_best_overlap_image function from the download_sat_image_company module.  
    
This function module uses the SentinelAPI (an application programming interface for accessing data from the European Space Agency's Sentinel satellites) to download the best overlapping image for the water company are selected within a specified date range.  The function uses geopandas (to select the area to search), sentinelsat (to retrieve the image), IPython (to display the image retrived) and os (to read the geodataframe into memory). 

Inputs to the function are company_detail (a string giving the water company area to cover - in the provided data set options are selected from the AreaServed column (viewed on the interactive popup in the folium map above), date_start and date end - the date range for the search (format 'YYYYMMDD')

As provided in GitHub, the .py file provided has no output. To download or view the satellite image retrived, the relevant line in the code should be uncommented. 

The function is set to create a minimum rotated rectangle polygon to use as a search area and return images with less than 10% cloud cover, this can be adjusted as required. The best overlapping image is selected by calculating the percentage overlap of the search polygon and the images retrieved and sorting these in descending order so that the image with the highest overlap can be returned.  All matches or the best available match can be downloaded. The download can also be limited to the image bands only.    

***Note:
The downloads take considerable time and should only be executed if required. The longer the search period selected, the more images will be downloaded - high volumes can result in failure to download successfully.  It is possible that an error occurs during the execution, if this occurs adjust the selection date range to see if this resolves the issue. The use of the magic '%matplotlib' command will allow this code to run in Jupyter Notebook ***

The function module is called as follows:

In [None]:
#%matplotlib inline

import download_sat_image_company
from IPython.display import Image
from chloropleth import chloropleth_pcc
import folium_pcc
import folium
from IPython.display import HTML

Once the required packages are imported, the water company area and date range for the search are given. 

In [None]:
# select water company to review in further detail:
company_detail ='Bournemouth' # taken from the AreaServed column in the wrz geodataframe
date_start='20200601' # start search from this date
date_end='20230101' # end search at this date

download_sat_image_company.download_best_overlap_image(company_detail,date_start,date_end) # call the function from this module

Satellite images retrieved should be preprocessed prior to conducting further analysis.  For the purposes of this tutorial the assumption is made that preprocessing has been completed. The results of the Principle Component Analysis that follow later in this tutorial will be severely negatively impacted by the failure to preprocess the images. 

In the next steps, for illustrative purposes, the Corine landuse data from data.gov.uk will be used to understand the correlation between water consumption and landuse (https://www.data.gov.uk/dataset/cd2c59e7-afd9-471d-a056-c5845619dcd7/corine-land-cover-2018-for-the-uk-isle-of-man-jersey-and-guernsey)

### Pearson's correlation analysis to test for correlation between household water consumption and land use, and population and household consumption

***Methodology:***


Using a standard Pearson's Correlation Coefficient methodology, the correlation between household water consumption and population can be calculated, where +1 = 100% positive correlation and -1 = 100% negative correlation. The formula for calculating this coefficient (r) is: 

r

where x and y are two vectors of length n, and mx and my are the means of x and y respectively.  The robustness of the calculation relies on an adequate sample size (between 20 and 30), data outliers can interfere with the results obtained. 

The code to calculate r is in two parts, the first creates the merged dataset to run the correlation analysis on, the second part computes the correlation coefficient. Inputs required to run the code are the shape file containing the water companyd ata the household consumption and population figures for each water company (the se were extracted from the xxxxxx), the shape file for land use data and the labels for the land use data.  The script then completes the following steps: 

Part 1:
1. Loads water company data from the shapefile and removes unnecessary columns.
2. Filters the data to select features with specific area types.
3. Fixes an issue with a specific record in the 'COMPANY' column where Northumbrian Water appears under two different names)
4. Performs a union operation on the geometries based on the 'COMPANY' column.
5. Creates a new GeoDataFrame with the unioned geometries for each company.
6. Loads correlation data from a CSV file and merges it with the water company data.
7. Converts specific columns to numeric values and filters out NaN values.
8. Calculates the Pearson correlation coefficient between household population and household consumption.
9. Loads land use data from a shapefile and merges it with a CSV file containing labels.
10. Drops unnecessary columns from the merged land use data.
11. Performs a spatial join between the water company data and merged land use data.
12. Groups the data by company and land use label, summing the 'Area_Ha' column.
13. Creates a new GeoDataFrame with the grouped data, setting the geometry as the centroid of each land use label.
14. Filters the rows where the land use label includes 'urban'.
15. Groups the rows by company and calculates the sum of the 'Area_Ha' column for each group.
16. Rounds the values in the 'Area_Ha' column to 2 significant digits.
17. Merges the water company data with the area by company data.

Part 2:
18. Calculates the Pearson correlation coefficient between household consumption and area of urban land use.


***Note:
Further refinement to this script would be to create a template for the water company data input and to create function modules for calculation of the Pearson's Correlation Coefficient. This script could also be rerun on nonhousehold data. In addition, code to calculate consumption (nonhousehold and household) per hectare of land defined as urban landuse could be added and be used to identify outliers for further analysis ***

The script is run as follows:

In [None]:
%run correlation_landuse.py 

The results indicate that there is a very strong correlation between population and household water consumption. There is also a good correlation, though not as strong, between the area for urban landuse and household water consumption. The next step would be to check the correlation between household populations as reported by water companies and that reported by Ordinance Survey. If granular water consumption data can be obtained (for instance logger data for inflows and outflows of water at a district metered area) this can be analysed in conjunction with built up and vegetative health indices to see if these can be used as a predictor of higher than average water consumption volumes. 

### Principle Component Analysis of multi temporal images

Using the satellite images retrieved from the SentinelAPI, Principle Component Analysis can be used to understand how changes in landuse have impacted water consumption over time. Such insights may provide valuable scientific evidence to validate the need for consumers to reduce their water consumption. 

***Methodology:***

Principle Component Analysis (PCA) detects changes in landuse over time to further analyse the correlation between changes in landuse and water consumption in a given area.  As this is an unsupervised method, it is quick to run (if automated) and produced a difference image. The method is described in detail in the tutorial created by Kumar (2017), changes have been made to the python module, PCSKmeans, to replace redundant libraries. The module name has been updated to PCSKmeans_updated. 

For the module to be called, the following packages need to be installed: cv2, numpy, sklearn, collections, PIL, imageio, os, and math.

The follow functions are defined in the module: 

***find_PCAKmeans(imagepath1, imagepath2):*** This is the main function that takes two image paths as inputs and performs change detection. It reads the images, resizes them, calculates the difference image, performs PCA on the difference image, and prepares the feature vector space (FVS).

***find_vector_set(diff_image, new_size):*** This function divides the difference image into smaller blocks, extracts feature vectors from each block, and calculates the mean vector from all the feature vectors.

***find_FVS(EVS, diff_image, mean_vec, new):*** This function extracts blocks from the difference image, flattens them into feature vectors, combines them into a feature vector space, and transforms the feature vector space using PCA by multiplying it with the eigenvectors and subtracting the mean vector.

***clustering(FVS, components, new):*** This function performs K-means clustering on the feature vector space. It assigns each feature vector to a cluster, identifies the least common cluster as a reference for change detection, reshapes the cluster assignments into a change map, and returns the least common cluster index and the change map.

The module also includes additional code to save the change map and clean change map as image files. Finally, it provides the image paths and calls the find_PCAKmeans function to execute the change detection process.

***Note:
The downloads take considerable time and should only be executed if required. The longer the search period selected, the more images will be downloaded - high volumes can result in failure to download successfully.  It is possible that an error occurs during the execution, if this occurs adjust the selection date range to see if this resolves the issue. The use of the magic '%matplotlib' command will allow this code to run in Jupyter Notebook ***

The function module is called as follows:

In [None]:
%run PCSKmeans_updated

# References