# Understanding Water Consumption : Multi-factor Analysis 

# Introduction 

### Background information on water companies in the UK:

Water companies in the UK were privatised in 1989 and, as most water companies are regional monopolies, household customers cannot select their water company.  Water companies are governed by Ofwat, the economic regulator, and must submit returns as part of a 5-year cycle price review process. The current regulatory period is known as PR19, the next regulatory period will be PR24. 

https://commonslibrary.parliament.uk/research-briefings/cbp-8931/#:~:text=The%20water%20industry%20in%20England%20and%20Wales%20was,or%20switch%20their%20supplier%20and%20competition%20is%20limited.


Increasing water scarcity due to, amongst other factors, population growth, climate change, and changing weather patterns make understanding of household water consumption drivers critical for water resource management. By understanding the drivers of household water consumption, policymakers and water companies can develop effective strategies to manage water resources, reduce wastage, and ensure sustainable water supplies for the future. 

Increased abstraction, for both household consumption and non-household use, places pressure on freshwater ecosystems, reduces river flows, and depletes groundwater resources whilst production, treatment and distribution of potable water increases energy consumption, increases carbon emissions all of which come with financial and non-financial costs. Infrastructure planning, policy amendments and behavioural change educational campaigns and initiatives require detailed understanding of the drivers of ever increasing water consumption if  water scarcity, escalating costs and environmental impacts are to be managed so that net zero positions can be reached and behavioural change can be embedded in order to ensure sustainable water management for the future.

In March 2023 the scientific journals Water, Environments, Forests and Remote Sensing, called for submission of papers on the topic “Remote Sensing in Water Resources Management Models” (Ref https://www.mdpi.com/topics/Remote_Sensing_Water) indicating that there is a desire for further research focused on water resource management in general. 

A standard method for tracking water consumption is through calculation of per capita consumption (PCC) (litres of water used per person per day). One of the many performance commitments for water companies, is to reduce their PCC by an agreed number of litres per person per day, the gold standard being 100 litres per person per day. Calculation of PCC relies heavily on accurate population statistics at a district metered area (DMA) level and metering of all households. 

Not all households within the UK are metered with meter penetration being below 60% in some areas. Where meter data exists, meter reading can prove challenging due to limited access to properties, the high cost of manual and drive by meter readings and reluctance of customers to actively share details of their personal water consumption with water companies.  To date, evidence shared with households on the impact of water consumption on their environment has been restricted mostly to theoretical statements or less than robust statistics. 

The focus of water companies to reduce household consumption relies heavy on behavioural change interventions to encourage lower water consumption, kickback from consumers is that water companies are not doing their bit in reducing leakage - this results in people being less motivated to reduce their water consumption. 

A more scientific method of depicting water consumption at a DMA level, without the need to meter every household, would provide evidence to communities of their overall impact on water consumption and provide valuable input into behavioural change campaigns, whilst also providing water companies insight into which areas to target high cost infrastructure spend such as smart metering programmes, as a longer term solution to achieving individual accountability for decreasing household consumption.   
 
This tutorial provide some very basic tools that can be shared with non-technical users whilst also encouraging researchers and  developers to further explore how measurements such as built up indices, landuse classification. correlation of building age with high consumption (the hypothesis being that older households may have unknown customer side leakage) can be be used to understand drivers of PCC. 

Sharing impirical evidence with consumers may encourage adoption of sustainable water efficient behaviours. 

Data used is publically available data, the intention being to supplement this further with more advanced python scripting in the future, Google Earth Engine analysis and water company specific data. In addition to exploring drivers of consumption, challenging the hypothesis that increasing household consumption is the root cause of increasing water demand in areas with recorded high PCC by uncovering the relationship between leakage, non-household and household consumption and how this correlates with land use changes over time.  
<br>
<br>
**Literature review:**<br> 
<br>
Prediction of water consumption based on population (https://link.springer.com/article/10.1007/s11356-021-12368-0)

https://www.researchgate.net/publication/363059307_Assessing_the_causality_relationship_and_time_series_model_for_electricity_consumption_per_capita_and_human_development_in_Colombia

Relationship between urban ecological water demand and land use structure in rapid urbanization area
https://www.researchgate.net/publication/289350922_Relationship_between_urban_ecological_water_demand_and_land_use_structure_in_rapid_urbanization_area

https://www.researchgate.net/publication/344333563_Land-Use_Change_and_Future_Water_Demand_in_California%27s_Central_Coast

**Note:**
<br>
The tools described in this tutorial do not provide an end to end solution for understanding water consumption, they support the process and are intended to stimulate further exploration of  water consumption drives with a view of discovering alternate methods for monitoring and managing household water consumption.  

<div style="display:flex">
    <img src="method_image.jpg" style="width:100%">
</div>
<br>
***Figure 1: Possible components required to support an end to end process*** <br>(reference CNewmarch Assessment)

## Set-up and Installation 

Clone the repository: 
How
URL to repository
Desired director to clone it to 

Create and activate the conda environment: 
Separate environement
Instructions on how to create and activate it 

Install dependencies: 
List dependencies for your code
Requirements.txt file in   repository with the necessary packages and versions. install using conda install 

Data requirements: 
Details of data required 
Instructions on where to obtain them, links, sample data

Execution instructions: 
Main entry point / specific script to execute. 

Examples and tutorials: 
Example use cases to demonstrate the code. 
Step-by-step instructions and sample input/output to help

Troubleshooting and support: 
Common issues and guidance on how to resolve them


<summary>

<summary>

## Repository Link

Tools covered in this tutorial are: 
Creating a chloropleth map of water company boundaries given a Shapefile
Amending the chloropleth map to create a map showing per capita consumption for a selected period given consumption data
Creating an interactive folium map showing per capita consumption for a given period for each water company
Pearson's correlation analysis to determine whether there is a correlation between household water consumption and land use and population and household consumption
Selecting and download a satellite image using the SentinelAPI for a given time period for a given water company area
Stacking image bands to create a composite image (3 bands)
Principle Component Analysis using two jpeg images

Data sources: 


## Creating a chloropleth map of water company boundaries given a shape file

***Methodology:***

Water company boundaries can be visualised using the function wrz_boundaries within the water_company_boundaries module (data source is xxxx). The number of companies and the chloropleth map are returned.

Packages required to run this code include os, geopandas, pyplot from matplotlib and crs from cartopy.

The function wrz_boundaries is defined within the water_company_boundaries module. This function generates a plot displaying the boundaries of water companies in England and Wales based on the provided company data. The function reads the company data from a given file (the pathway to the file is hardcoded), creates a GeoDataFrame using GeoPandas, and plots the boundaries of the water companies on a map. Each company is represented by a different color. The resulting plot provides a visual representation of the geographic distribution of water companies. 

The co-ordinate reference system for the plot is set to Transverse Mercator, British National Grid (27700) using crs from cartopy. Pyplot is used to plot the map. 

The colour scheme cane be changed, some examples being 'jet' (blue to red through green and yellow) and 'cividis' (best choice for people with color vision deficiencies as it transitions from dark to light and has noticeable variations in brightness and hue). This function is hardcoded to 'viridis'. More detail on selection choices can be found in the matplotlib documentation library. (https://matplotlib.org/stable/tutorials/colors/colormaps.html) 
     
The function plots the company data (COMPANY) which is hardcoded. The data source, however is a variable called company_data. The example data is available in the 'waterdemand' Github repository or can be downloaded from the Commons Library (
https://commonslibrary.parliament.uk/constituency-information-water-companies/#watercompanies). The output for the function is a chloropleth plot of the water resource boundaries for all water companies in England and Wales. 

The Title of the graph is hardcoded within the function. 

***A refinement to this code would be to create variable for the column to plot and for the title, to add gridlines and to all a scalebar.***


**Calling the function module:**

from water_company_boundaries import wrz_boundaries
company_data = 'data_files/WaterSupplyAreas_incNAVs v1_4.shp'  
wrz_boundaries(company_data)  # calls the function to create a chloropleth map of English and Welsh the Water Supply Areas

The above code also creates a png image called wrz.png which is written to the data_files folder within the waterdemand directory.

<div style="display:flex">
    <img src="data_files/wrz.png" style="width:60%">
    <div style="margin-left:5px">
    
     
        
<br>
<br>
<br>
<br>
Images such as this can be used as input into behavioural change campaigns and strategic water resource management, for example,  or to identify areas that require further investigation. 
        <br>
        <br>
Although this map is at a water resource zone level, this script can be easily modified to show district metered areas if required. 

</div>


## Amending the chloropleth map to create a map showing per capita consumption for a selected period given consumption data

***Methodology:***

Using the above chloropleth map as a starting point, other data can be merged with the water company shape file to map, for instance, per capita consumption for a selected period.  

Packages required to run this code include os, pandas, geopandas, pyplot from matplotlib and crs from cartopy.

This function is hard-coded to use the same water resource boundaries as given above. Consumption data is from Ofwat PR24 submission open data available from  https://www.ofwat.gov.uk/publication/historical-performance-trends-for-pr24-v2-0/.

***Note: The code can be amended to visualise total household consumption, leakage etc as this information is available within the provided dataset. 

The function loads geographical (using geopandas) and statistical (using pandas) data, merges them based on water company acronyms, and creates a chloropleth map using the GeoDataFrame created from the merge. The map shows the variation in PCC across different areas, helping to visualize the differences in water consumption patterns and to identify areas of high consumption which can then be looked at in further detail. 

The input for this function is a string called pcc_period which gives the time period for which the PCC data is being visualized. The format for the data is YYYY-YY, for example '2019-20' or '2011-12'.

The function returns the chloropleth map, which displays the average per capita consumption per water company area for the selected period. 

The title is hardcoded into the function. 

***Further enhancements to this code would be to create a template which could be populated with the required data to allow visualisation pf a district metered area (DMA) within a water company boundary and to create a variable for the title.***

**Caution:** 
<br>
If an image has been generated and amendments are required to the image, the image must be deleted from the directory prior to rerunning to code if the name of the output image is to remain the same.

The function module is called as follows: 


Below are examples of two periods showing the change in PCC.

<div style="display:flex">
    <img src="data_files/2011_12pcc.jpg" style="width:50%">
    <img src="data_files/2019_20pcc .jpg" style="width:50%">
</div>

An interactive view of the above can be created using a Folium map as displayed in Figure z. This map view is valuable to behavioural change experts as they are able to zoom into areas of high consumption to identify environmental areas of interest to use as strategic nudges (reference nudging and choice architecture) to influence individuals to adopt desired behaviours without feeling coerced.   

## Creating an interactive folium map showing per capita consumption for a given period for each water company

***Methodology:***

Using the above chloropleth map as a starting point, other data can be merged with the water company shape file to map, for instance, per capita consumption for a selected period.  

Packages required to run this code include os, pandas, geopandas, pyplot from matplotlib and crs from cartopy.

This function is hard-coded to use the same water resource boundaries as given above. Consumption data is from Ofwat PR24 submission open data available from  https://www.ofwat.gov.uk/publication/historical-performance-trends-for-pr24-v2-0/.

***Note: The code can be amended to visualise total household consumption, leakage etc as this information is available within the provided dataset. 

The function loads geographical (using geopandas) and statistical (using pandas) data, merges them based on water company acronyms, and creates a chloropleth map using the GeoDataFrame created from the merge. The map shows the variation in PCC across different areas, helping to visualize the differences in water consumption patterns and to identify areas of high consumption which can then be looked at in further detail. 

The input for this function is a string called pcc_period which gives the time period for which the PCC data is being visualized. The format for the data is YYYY-YY, for example '2019-20' or '2011-12'.

The function returns the chloropleth map, which displays the average per capita consumption per water company area for the selected period. 

The title is hardcoded into the function. 

***Further enhancements to this code would be to create a template which could be populated with the required data to allow visualisation pf a district metered area (DMA) within a water company boundary and to create a variable for the title.***

**Caution:** 
<br>
If an image has been generated and amendments are required to the image, the image must be deleted from the directory prior to rerunning to code if the name of the output image is to remain the same.

The function module is called as follows: 


The code snippet below creates an interactive map showing PCC for each water company area. The code can be amended to show different periods or data as required.

In [None]:
from folium_pcc import folium_pcc_map
from IPython import display

pcc_map = folium_pcc_map()
display.display(pcc_map)

### Figure v:  Interactive Folium map of per capita water consumption per water company area. 

Once an area has been selected for further review, the sentinel satelite image can be downloaded. This image can be used to review landuse, calculate built up indices etc. 

First, we need to import the required packages and functions required to retrive the image:

In [None]:
#%matplotlib inline

import download_sat_image_company
from IPython.display import Image
from chloropleth import chloropleth_pcc
import folium_pcc
import folium
from IPython.display import HTML

Then we select the water company area to review. The water company name can be selected from the interactive map by looking at the AreaServed field. For this example, Bournemouth is selected as it has a high PCC. 

In [None]:
# select water company to review in further detail:
company_detail ='Bournemouth'
date_start='20200601'
date_end='20230101'
download_sat_image_company.download_best_overlap_image(company_detail,date_start,date_end)

This image can be used to ....

For now, the Corine data from xxx will be used to understand the correlation between water consumption and landuse

In [None]:
#correlation analysis 

In [None]:
#pca analysis 

In [None]:
import cv2
import numpy as np
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from collections import Counter
from PIL import Image
import imageio.v2 as imageio
import os
import math

In [None]:
def find_PCAKmeans(imagepath1, imagepath2):
    print('Operating')

    image1 = cv2.imread(imagepath1)
    image2 = cv2.imread(imagepath2)
    print(image1.shape, image2.shape)

    new_size = np.ceil(np.asarray(image1.shape) / 5) * 5  # Round up to the nearest multiple of 5
    new_size = new_size.astype(int)
    image1 = cv2.resize(image1, (new_size[1], new_size[0])).astype(np.int16)
    image2 = cv2.resize(image2, (new_size[1], new_size[0])).astype(np.int16)

    
    diff_image = np.abs(image1 - image2)
    cv2.imwrite('diff.jpg', diff_image)
    print('\nBoth images resized to', new_size)

    vector_set, mean_vec = find_vector_set(diff_image, new_size)

    pca = PCA()
    pca.fit(vector_set)
    EVS = pca.components_

    FVS = find_FVS(EVS, diff_image, mean_vec, new_size)

    print('\ncomputing k means')

    components = 3
    least_index, change_map = clustering(FVS, components, new_size)

    change_map[change_map == least_index] = 255
    change_map[change_map != 255] = 0

    change_map = change_map.astype(np.uint8)
    kernel = np.asarray(((0, 0, 1, 0, 0),
                         (0, 1, 1, 1, 0),
                         (1, 1, 1, 1, 1),
                         (0, 1, 1, 1, 0),
                         (0, 0, 1, 0, 0)), dtype=np.uint8)
    cleanChangeMap = cv2.erode(change_map, kernel)
    cv2.imwrite("data_files/test_data/changemap.jpg", change_map)
    cv2.imwrite("data_files/test_data/cleanchangemap.jpg", cleanChangeMap)

In [None]:
def find_vector_set(diff_image, new_size):
    block_size = 5
    num_blocks = (new_size[0] // block_size) * (new_size[1] // block_size)

    vector_set = np.empty((num_blocks, block_size * block_size), dtype=np.int16)
    idx = 0

    for i in range(new_size[0] // block_size):
        for j in range(new_size[1] // block_size):
            block = diff_image[i * block_size:(i + 1) * block_size, j * block_size:(j + 1) * block_size]
            feature = block.flatten()
            vector_set[idx, :] = feature[:vector_set.shape[1]]
            idx += 1

    mean_vec = np.mean(vector_set, axis=0)
    return vector_set, mean_vec

In [None]:
def find_FVS(EVS, diff_image, mean_vec, new):
    
    i = 2 
    feature_vector_set = []
    
    while i < new[0] - 2:
        j = 2
        while j < new[1] - 2:
            block = diff_image[i-2:i+3, j-2:j+3]
            feature = block.flatten()
            feature_vector_set.append(feature)
            j = j+1
        i = i+1

    feature_vector_set = np.array(feature_vector_set)
    feature_vector_set = feature_vector_set.reshape((-1, 25))
    FVS = np.dot(feature_vector_set, EVS)
    FVS = FVS - mean_vec
    print("\nfeature vector space size",FVS.shape)
    return FVS

In [None]:

def clustering(FVS, components, new):
    
    kmeans = KMeans(components, verbose = 0)
    kmeans.fit(FVS)
    output = kmeans.predict(FVS)
    count  = Counter(output)

    least_index = min(count, key = count.get)            
    print(new[0],new[1])
    change_map = np.reshape(output, (new[0] - 4, new[1] - 4, -1)) #this has been changed and should be checked
    
    return least_index, change_map

In [None]:
# Get the absolute path to the directory containing the image files, the values 
directory = os.path.abspath('data_files/test_data')

if __name__ == "__main__":
    imagepath1 = os.path.join(directory, 'img1a.jpg')     # Construct the absolute paths to the image files
    imagepath2 = os.path.join(directory, 'img2a.jpg')     # Construct the absolute paths to the image files
    find_PCAKmeans(imagepath1, imagepath2)

In [None]:
from stack_bands import band_stacking_three_bands

band_files = ('data_files/R10m/T30UWB_20221207T111339_B03_10m.jp2', 'data_files/R10m/T30UWB_20221207T111339_B04_10m.jp2', 'data_files/R10m/T30UWB_20221207T111339_B08_10m.jp2')
output_tiff = 'data_files/R10m/T30UWB_20221207T111339.tif'
output_jpg = 'data_files/test_data/Img2a.jpg'

band_stacking_three_bands(band_files, output_tiff, output_jpg)

In [None]:
# this works and also converts the tiff to a jpeg
import rasterio
import numpy as np

band_files = ('data_files/R10m/T30UWB_20221207T111339_B03_10m.jp2', 'data_files/R10m/T30UWB_20221207T111339_B04_10m.jp2', 'data_files/R10m/T30UWB_20221207T111339_B08_10m.jp2')
output_tiff = 'data_files/test_data/Img2a.tif'
output_jpg = 'data_files/test_data/Img2a.jpg'

# Create an empty array to store the band data
stacked_data = []

# Read each band file and stack the data
for i, band_file in enumerate(band_files):
    with rasterio.open(band_file) as band_src:
        band_data = band_src.read(1)  # Read the band data
        stacked_data.append(band_data)

# Open one of the band files to get the metadata
with rasterio.open(band_files[0]) as src:
    # Read the metadata
    meta = src.meta

# Update the metadata for the output TIFF file
meta.update(count=len(stacked_data))

# Write the stacked data to the output TIFF file
with rasterio.open(output_tiff, 'w', **meta) as dst:
    dst.write(np.array(stacked_data))

# Convert the GeoTIFF to JPEG
with rasterio.open(output_tiff) as src:
    profile = src.profile
    # Read the data from the GeoTIFF
    data = src.read()

# Convert the data to the 0-255 range
data = (data * 255 / data.max()).astype(np.uint8)

# Write the data to the JPEG file
with rasterio.open(output_jpg, 'w', driver='JPEG', width=profile['width'], height=profile['height'], count=profile['count'], dtype='uint8') as dst:
    dst.write(data)


### Looking at the correlation between water consumption and land use: 

In [None]:
import correlation_landuse

!python correlation_landuse.py

The results indicate that there is a very strong correlation between population and household water consumption. There is also a good correlation, though not as strong, between the area for urban landuse and household water consumption. The next step would be to check the correlation between household populations as reported by water companies and that reported by Ordinance Survey. If granular water consumption data can be obtained (for instance logger data for inflows and outflows of water at a district metered area) this can be analysed in conjunction with built up and vegetative health indices to see if these can be used as a predictor of higher than average water consumption volumes. 

**Next Steps**: These statistics can also be calculated for non-household water consumption to see if there is a variation between non-household and household correlation co-efficients. 

plot the population statistics from census data to see if they show the same pattern as pcc