Skip to content

andrewmgonzalez/China_Water_Usage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Identification of Water Supply Patterns Across China Using Supervised and Unsupervised Learning

Urbanization, one of the prevailing trends of the 21st century, places great stress on the water resources of cities across the globe. Our project aims to investigate the most important variables underlying urban water supply patterns in China, a region which has seen rapid urban growth in the past few decades. We applied statistical learning methods to 12 years of urban water supply data for 627 cities across China. In addition, a PCA-informed urban water sustainability index was developed in order to benchmark cities. The implications of our research effort will be useful for decision makers in water-stressed urban areas who seek novel insights into urban water supply patterns using statistical learning techniques.

The first major contribution of our work was the identification of variables most responsible for variance in patterns of urban water distribution and management. The second innovation was the identification of statistical learning algorithms which provided high accuracies in prediction and classification problems related to China’s urban water data. Thirdly, a general, systems-level perspective of major urban drinking water use trends in China was provided for the benefit of public-sector stakeholders. Lastly, an urban water use sustainability index was developed in order to benchmark cities against each other and identify areas where water sustainability was lacking.

Our findings can be summarized as follows: (1) PCA showed that approximately 46.8% of variability in the data could be explained by two principal components. (2) Random forest (88.3% test accuracy) and XGBoost (87.7% test accuracy) algorithms were effective in classifying provinces using numerical features of the dataset. (3) Chinese cities have consistently suffered water loss/leakage rates above 20% since 2001, and water prices are closely associated with leakage. (4) Based on the sustainability index, problem cities/regions were identified.

Data

The dataset is comprised of 87 variables grouped into six categories: (1) water supply and sale (e.g. daily supply capacity); (2) supply pipelines (e.g. pipe length and pipeline coverage area); (3) supply service (e.g. pressure standard compliance); (4) supply operations management (e.g. electricity consumption per unit of water supplied); (5) water supply finance (e.g. company revenues and costs); and (6) water supply prices. These yearbooks do not exhaustively cover all the cities in China. However, cities included in the yearbooks account for a total urban population of 332.4 million, which is around 43% of China’s urban population.

Authors

Andrew Gonzalez, Brandon Chou, Charles Li, Djavan De Clercq, Rinitha Reddy

Acknowledgments

The entire Data-X teaching team at UC Berkeley was incredibly helpful, especially: Ikhlaq Sidhu, Alexander Fred-Ojala, and Sana Iqbal

Releases

No releases published

Packages

No packages published