Urbanization, one of the prevailing trends of the 21st century, places great stress on the water resources of cities across the globe. Our project aims to investigate the most important variables underlying urban water supply patterns in China, a region which has seen rapid urban growth in the past few decades. We applied statistical learning methods to 12 years of urban water supply data for 627 cities across China. In addition, a PCA-informed urban water sustainability index was developed in order to benchmark cities. The implications of our research effort will be useful for decision makers in water-stressed urban areas who seek novel insights into urban water supply patterns using statistical learning techniques.
The first major contribution of our work was the identification of variables most responsible for variance in patterns of urban water distribution and management. The second innovation was the identification of statistical learning algorithms which provided high accuracies in prediction and classification problems related to China’s urban water data. Thirdly, a general, systems-level perspective of major urban drinking water use trends in China was provided for the benefit of public-sector stakeholders. Lastly, an urban water use sustainability index was developed in order to benchmark cities against each other and identify areas where water sustainability was lacking.
Our findings can be summarized as follows: (1) PCA showed that approximately 46.8% of variability in the data could be explained by two principal components. (2) Random forest (88.3% test accuracy) and XGBoost (87.7% test accuracy) algorithms were effective in classifying provinces using numerical features of the dataset. (3) Chinese cities have consistently suffered water loss/leakage rates above 20% since 2001, and water prices are closely associated with leakage. (4) Based on the sustainability index, problem cities/regions were identified.
The dataset is comprised of 87 variables grouped into six categories: (1) water supply and sale (e.g. daily supply capacity); (2) supply pipelines (e.g. pipe length and pipeline coverage area); (3) supply service (e.g. pressure standard compliance); (4) supply operations management (e.g. electricity consumption per unit of water supplied); (5) water supply finance (e.g. company revenues and costs); and (6) water supply prices. These yearbooks do not exhaustively cover all the cities in China. However, cities included in the yearbooks account for a total urban population of 332.4 million, which is around 43% of China’s urban population.
Andrew Gonzalez, Brandon Chou, Charles Li, Djavan De Clercq, Rinitha Reddy
The entire Data-X teaching team at UC Berkeley was incredibly helpful, especially: Ikhlaq Sidhu, Alexander Fred-Ojala, and Sana Iqbal