# **What role does urbanization, using China as an example, play in shaping average temperature trends in a warming world?**

# Project One

## Introduction

With the development of modern cities and urbanization, the global average temperature has been increasing over the past century, leading to concerns about the impacts of climate change. While urbanization significantly improves life standards and society efficiency, it is commonly believed to be a significant contributor to the global warming process. 

China, one of the fastest-growing developing countries in the world, is undergoing rapid urbanization in the recent few decades. This research aims to examine the role of urbanization in shaping average temperature trends in China. The analysis will be focused on finding the relationship between average temperature and other independent variables such as city, years, and percentage change in annual temperature based on a time series analysis starting from 1950 to 2012. Specifically, 1979 to 2012 is the period China decided to begin the journey of reforming and opening up. We will use year as independent variable to show the trend of annual temperature change between 1950 to 2012. We will also study the difference in annual average temperature using city as independent variable. We will use percentage in annual average temperature as the independent variable to see the frequency of cities that has a temperature over certain value. 

Several studies have investigated the role of urbanization in shaping average temperature trends in China. Li et al. (2013) highlighted the importance of land cover change and human activity in contributing to observed warming. Similarly, the contribution of urbanization to warming in China was found to be significant, accounting for approximately one-third of the total warming signal (Sun et al., 2016). Moreover, the magnitude of urbanization-induced warming effects was found to depend not only on a city's economic level, but also on its population scale and geographic environment (Fang et al., 2013).

By investigating the impact of urbanization on temperature trends in China, this study will further enhance our understanding of the interaction between human activity and the environment. We find out that the average temperature is increasing overall between 1978 to 2012, and remained stable between 1950 to 1978. This shows that the relationship between urbanization and economic development is positively related to the warming of China. The overall average temperature is increasing in China, with more extreme high-temperature years and less low-temperature years. 

## Data Cleaning

### Basic data cleaning

In [1]:
# Imports
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from shapely.geometry import Point

%matplotlib inline
# activate plot theme
import qeds
#qeds.themes.mpl_style();

# Read dataset
df = pd.read_csv('/Users/booker/Desktop/ECO225Project/Data/GlobalLandTemperaturesByCity.csv')

In [2]:
# Check missing values
print(df.isnull().sum())

# Replacing missing values (not dropping since we will calculate the temperature change in the later section)
df['AverageTemperature'].interpolate(method='linear', inplace=True)

dt                                    0
AverageTemperature               364130
AverageTemperatureUncertainty    364130
City                                  0
Country                               0
Latitude                              0
Longitude                             0
dtype: int64


We will first convert date to year for a easier calculation and groupby in the following steps.

In [3]:
# Convert all dates
df['Date'] = pd.to_datetime(df.dt)
df.drop(columns = ['dt'], axis = 1, inplace = True)
df['Year'] = df['Date'].dt.year

### Map

In [4]:
# Convert numerical latitude and longitude to float type
to_float = lambda x: float(x[:-1])
df['Latitude'] = df['Latitude'].apply(to_float)
df['Longitude'] = df['Longitude'].apply(to_float)

In [5]:
# 
df["Coordinates"] = list(zip(df.Longitude, df.Latitude))
df["Coordinates"] = df["Coordinates"].apply(Point)
gdf = gpd.GeoDataFrame(df, geometry="Coordinates")
gdf

Unnamed: 0,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude,Date,Year,Coordinates
0,6.068,1.737,Århus,Denmark,57.05,10.33,1743-11-01,1743,POINT (10.33000 57.05000)
1,6.012,,Århus,Denmark,57.05,10.33,1743-12-01,1743,POINT (10.33000 57.05000)
2,5.956,,Århus,Denmark,57.05,10.33,1744-01-01,1744,POINT (10.33000 57.05000)
3,5.900,,Århus,Denmark,57.05,10.33,1744-02-01,1744,POINT (10.33000 57.05000)
4,5.844,,Århus,Denmark,57.05,10.33,1744-03-01,1744,POINT (10.33000 57.05000)
...,...,...,...,...,...,...,...,...,...
8599207,11.464,0.236,Zwolle,Netherlands,52.24,5.26,2013-05-01,2013,POINT (5.26000 52.24000)
8599208,15.043,0.261,Zwolle,Netherlands,52.24,5.26,2013-06-01,2013,POINT (5.26000 52.24000)
8599209,18.775,0.193,Zwolle,Netherlands,52.24,5.26,2013-07-01,2013,POINT (5.26000 52.24000)
8599210,18.025,0.298,Zwolle,Netherlands,52.24,5.26,2013-08-01,2013,POINT (5.26000 52.24000)


In [6]:
grouped_2012 = df.loc[df['Year'] == 2012].groupby('City').mean().reset_index()
grouped_2012 = pd.merge(grouped_2012, df[['City', 'Country']], on='City')
# grouped_2012["Coordinates"] = list(zip(df.Longitude, df.Latitude))
# grouped_2012["Coordinates"] = grouped_2012["Coordinates"].apply(Point)
gdf = gpd.GeoDataFrame(grouped_2012, geometry="Coordinates")
gdf.set_index("Country").loc["China"]
gpb2012 = gdf.groupby("City").mean()
gpb2012.reset_index(inplace=True)

ValueError: Unknown column Coordinates

In [None]:
china = gpd.read_file('/Users/booker/Desktop/gadm36_CHN_shp/')

# Filter the shapefile to only include administrative boundaries for China
china = china[china['NAME_0'] == 'China']

# Merge the shapefile with your data on the appropriate column
merged_data = china.merge(gpb2012, left_on='NAME_2', right_on='City')

# Plot the GeoDataFrame
ax = merged_data.drop_duplicates().plot(column='AverageTemperature', cmap='coolwarm', legend=True, figsize=(10, 10))
ax.set_axis_off()
plt.show()

### Line Plots data cleaning
Group by China and year to get national average temperature, and the filter the year into three periods to create three datasets used to plot the line plots.

In [None]:
# Group by China and year to get national average temperature
df_china = df[df['Country'] == 'China']
grouped_na = df_china.groupby('Year').mean().reset_index()
grouped_na

In [None]:
# Filter the data to form 3 datasets ranging from different years
grouped_na_1 = grouped_na[(grouped_na['Year'] >= 1950) & (grouped_na['Year'] <= 1978)]
grouped_na_2 = grouped_na[(grouped_na['Year'] >= 1979) & (grouped_na['Year'] <= 2012)]
grouped_na_3 = grouped_na[(grouped_na['Year'] >= 1950) & (grouped_na['Year'] <= 2012)]

### Bar Charts data cleaning
We first group data by city and year, and then find the top 10 cities with the largest and lowest temperature increased from 1950 to 2012. Then we use bar charts to plot the data.

In [None]:
# Select data from 1950 to 2012
grouped_time = df_china[(df_china['Year'] >= 1950) & (df_china['Year'] <= 2012)]

# Group by City and Year to find the annual average temperature for each city
grouped_citi = grouped_time.groupby(['City', 'Year'])['AverageTemperature'].mean().reset_index()
grouped_citi

In [None]:
# Find the top 10 cities with the largest temperature increased from 1950 to 2012
grouped_citi.set_index(['City', 'Year'], inplace=True)

# Write the function that compute the percentage temperature change
def difference(df, first, last, column_name):
    for row in df.iterrows():
        index_value, columns_value = row
        start = df.loc[index_value[0], first]['AverageTemperature']
        end = df.loc[index_value[0], last]['AverageTemperature']
        cleaneddata = float((start - end)/end * 100)
        df.at[index_value, column_name] = cleaneddata
        
# Apply the function to compute the outcome
difference(grouped_citi, 2012, 1950, 'TemperaturePctChange')

In [None]:
# Find the top 10 cities with the largest temperature increased from 1950 to 2012
largest_city = grouped_citi.sort_values(by='TemperaturePctChange', ascending=False)
top_cities = largest_city.reset_index().drop_duplicates(subset='City')
top_cities.set_index('Year', inplace=True)
top_cities_1950 = top_cities.loc[1950].head(10)
top_cities_1950

In [None]:
# Find the top 10 cities with the lowest temperature increased from 1950 to 2012
lowest_city = grouped_citi.sort_values(by='TemperaturePctChange', ascending=True)
low_cities = lowest_city.reset_index().drop_duplicates(subset='City')
low_cities.set_index('Year', inplace=True)
low_cities_1950 = low_cities.loc[1950].head(10)
low_cities_1950

### Pivot table 
Create a pivot table to plot scatter plots for the top 4 cities with the largest and lowest temperature increased from 1950 to 2012. In addition, we can use bar charts to show the trend of the number of cities that exceeds 23 degrees and below 0 degrees under a 5 years interval.

In [None]:
# Create a pivot table
city_pivot = grouped_citi.pivot_table(values='AverageTemperature', index='Year', columns='City')
city_pivot

In [None]:
# Create a column that calculates the annual percentage change from 1950 to 2012
mean_pivot = city_pivot.reset_index()
mean_pivot["mean_temp_change"] = mean_pivot.mean(axis=1)

# Filter the data from 1950 to 1978
mean_pivot_before = mean_pivot[mean_pivot['Year'] >= 1950]
mean_pivot_before = mean_pivot_before[mean_pivot_before['Year'] <= 1978]

# Filter the data from 1979 to 2012
mean_pivot_after = mean_pivot[mean_pivot['Year'] >= 1979]
mean_pivot_after = mean_pivot_after[mean_pivot_after['Year'] <= 2012]

## Summary Statistics Tables

### Table 1: Average temperature from 1950 to 1978
In this table, we can see that the average temperature from 1950 to 1978 is around 12.94 degrees, with a standard deviation of 0.32 degrees. The maximum temperature is 13.54 degrees, and the minimum temperature is 12.27 degrees. We can then compare the average temperature, maximum and minimum temperature of the temperature from 1979 to 2012 to see if there's a warming China.

In [None]:
# Describing the average temperature and average temperature uncertainty from 1950 to 1978
grouped_na_1.describe()

### Table 2: Average temperature from 1950 to 1978
In this table, we can see that the average temperature from 1979 to 2012 is around 13.42 degrees, with a standard deviation of 0.42 degrees. The maximum temperature is 14.26 degrees, and the minimum temperature is 12.50 degrees. All of the indicators suggest that there is a increasing temperature trend compared to the period of 1950 to 1978. It provides us some insights of the big picture before we plot our graphs.

In [None]:
# Describing the average temperature and average temperature uncertainty from 1979 to 2012
grouped_na_2.describe()

### Table 3: Average temperature from 1950 to 2012
In this table, we can see that the average temperature from 1950 to 2012 is around 13.20 degrees, with a standard deviation of 0.44 degrees. The maximum temperature is 14.26 degrees, and the minimum temperature is 12.27 degrees.

In [None]:
# Describing the average temperature and average temperature uncertainty from 1950 to 2012
grouped_na_3.describe()

### Table 4: Top 10 cities with the largest percent change in average temperature
We can see the top 10 cities with the largest percent change in average temperature from 1950 to 2012. The maximum annual temperature is 14.70 degrees among those 10 cities, while the lowest annual temperature is -1.62 degrees, with a very high standard deviation of 5.98 degrees. The city with the maximum percent change in annual temperature is 46.29%, showing that there is a huge variation in temperature for that city. It can be the result from global warming and urbanization. 

In [None]:
top_cities_1950.describe()

### Table 5: Top 10 cities with the lowest percent change in average temperature
We can see the top 10 cities with the lowest percent change in average temperature from 1950 to 2012. The city with the minimum percent change in annual temperature is -4.61%, showing that there is a decrease in temperature for that city. We can also compare the previous graph and conclude that cities are becoming warmer.

In [None]:
low_cities_1950.describe()

## Plots, Histograms, Figures

### Line Plots

In this section, we will first use line plots to demonstrate how the average temperature changed from 1950 to 2013 nationally. 

Specifically, we will divide the time interval into two periods: 1950 to 1978 and 1979 to 2012. The first period is the founding of the People's Republic of China, whereas the second period is China decided to begin the journey of reforming and opening up. The second period represents the urbanization and fast economic development of China. We will also see the overall picture from 1950 to 2012.

From the first graph, titled 'Average Temperature Trend from 1950 to 1978', we can see that the average temperature trend in China is relatively steady and is moving around at 13.0 degrees. This is the period of Chairman Mao Zedong's era, named after the founder of the Communist Party of China. Economic development was unstable and slow due to some policies that were originally aimed to achieve the goal of industrialization. The process of urbanization and modernization was slow. 

From the second graph, 'Average Temperature Trend from 1979 to 2012', we can see that the average temperature trend in China is increasing. This is the period of the Deng Xiaoping era, named after the second leader that achieved economic recovery and development. He proposed a series of policies that aimed to accelerate the process of industrialization, including one of the most important policies in Chinese history: the Chinese reform. 

From the third graph, 'Average Temperature Trend from 1950 to 2012', we can see that the average temperature trend in China is increasing. This graph provides us with a clear image of how the average temperature change in China from 1950 to 2012. The temperature is positively correlated with the development of China, including economic development and urbanization. 

In [None]:
# Write function that can generate line plots
def plot_lineplot(dataset, title):
    plt.figure(figsize=(20, 5))
    plt.plot(dataset['Year'], dataset['AverageTemperature'], label='National Average')
    plt.xlabel('Year')
    plt.ylabel('Average Temperature')
    plt.title(title)
    plt.grid(linestyle='--', alpha=0.7)
    plt.legend()
    plt.show()

In [None]:
# Plot the graphs
plot_lineplot(grouped_na_1, 'Average Temperature Trend from 1950 to 1978')

In [None]:
plot_lineplot(grouped_na_2, 'Average Temperature Trend from 1979 to 2012')

In [None]:
plot_lineplot(grouped_na_3, 'Average Temperature Trend from 1950 to 2012')

### Bar charts
In this section, we will use bar charts to show the top 10 cities with the largest and lowest average temperature percent changes from 1950 to 2012. These charts will provide insights into how different regions in China have been affected by temperature changes over time.

Looking at the bar chart for the top 10 cities with the largest temperature percent changes, we can see that cities such as Yakeshi and Shuangyashan experienced an increase of over 35 percent in annual temperature from 1950 to 2012. In contrast, most other cities experienced an average temperature increase of around 10 percent. These findings indicate that many cities in China are becoming warmer, and some regions are experiencing more significant temperature changes than others.

The bar chart for the top 10 cities with the lowest temperature percent changes reveals that cities such as Dunhua and Yanji experienced the lowest percentage change in temperature, but only up to around -4.5%. This suggests that some regions in China may not have experienced significant temperature changes or may have even experienced a decrease in temperature.

Overall, these bar charts highlight the different temperature trends observed in various regions of China from 1950 to 2012. They provide further evidence that urbanization and economic development have led to an increase in temperature in many cities in China.

In [None]:
# Plot the graph of largest temperature change between 1950 to 2012
plt.figure(figsize=(20, 7))
plt.bar(top_cities_1950['City'], top_cities_1950['TemperaturePctChange'])
plt.xlabel('City')
plt.ylabel('Temperature % Change')
plt.title('Top 10 Cities with the Largest Temperature Percent Change from 1950 to 2012')
plt.tick_params(axis='x', labelsize=13)
plt.show()

In [None]:
# Plot the graph of lowest temperature change between 1950 to 2012
plt.figure(figsize=(20, 7))
plt.bar(low_cities_1950['City'], low_cities_1950['TemperaturePctChange'])
plt.xlabel('City')
plt.ylabel('Temperature % Change')
plt.title('Top 10 Cities with the Lowest Temperature Percent Change from 1950 to 2012')
plt.tick_params(axis='x', labelsize=13)
plt.show()

The graphs below illustrate the trend of the number of cities with annual temperatures exceeding 23 degrees and below 0 degrees, over a 5-year interval from 1950 to 2010.

The first graph indicates an upward trend, which suggests that the number of cities with an annual temperature of 23 degrees or higher has increased steadily over time. This trend is particularly prominent after 1980, coinciding with China's urbanization and economic growth. The graph provides further evidence that rapid urbanization has contributed to the accelerated increase in temperature.

The second graph shows a decreasing trend, which implies that fewer cities have an annual temperature of 0 degrees or lower. This suggests that all cities in China are experiencing warming temperatures, rather than cooling. Once again, the graph highlights the impact of urbanization and economic growth on temperature changes in China, as cities become increasingly hotter.

In [None]:
# Count the number of cities that is greater than 23 degrees
filt_temp = city_pivot[city_pivot > 23]
filt_count = filt_temp.count(axis=1)
grouped_count = filt_count.groupby(lambda x: x // 5 * 5).sum()
plt.figure(figsize=(20,7))
grouped_count.plot(kind='bar')
plt.title('Number of Cities greater than 23 degrees from 1950 to 2012')

plt.show()

In [None]:
# Count the number of cities that is lower than 0 degrees
filt_temp = city_pivot[city_pivot < 0]
filt_count = filt_temp.count(axis=1)
grouped_count = filt_count.groupby(lambda x: x // 5 * 5).sum()
plt.figure(figsize=(20,7))
grouped_count.plot(kind='bar')
plt.title('Number of Cities less than 0 degrees from 1950 to 2012')

plt.show()

### Scatter Plots
We now focus on examining the cities with the largest percentage change in average temperature. The following four graphs display the top four cities with the highest percentage change in average temperature. We observe a strong positive relationship between time and annual temperature change for each of these cities. This is consistent with the overall trend in China, where the average annual temperature has been increasing from 1950 to 2012. However, these graphs provide a more detailed picture of how the annual temperature has changed over time for each of these representative cities.

In [None]:
# Write a function to plot scatter plots
dff = city_pivot.reset_index()
def plot_scatter(city, title, ax):
    dff.plot(x='Year', y=city, kind='scatter', s=150, figsize=(15, 5), ax=ax)
    ax.set_title(title)
    ax.set_xlabel('Year')
    ax.set_ylabel('Average Temperature')

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
plot_scatter('Yakeshi', 'Scatter Plot of Yakeshi', ax[0])
plot_scatter('Shuangyashan', 'Scatter Plot of Shuangyashan', ax[1])

plt.tight_layout()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
plot_scatter('Xining', 'Scatter Plot of Xining', ax[0])
plot_scatter('Qitaihe', 'Scatter Plot of Qitaihe', ax[1])
plt.tight_layout()

### Histogram
We presented a histogram displaying the percentage change in annual temperature from 1950 to 2012 across multiple cities. Our analysis indicates that the majority of cities experienced a positive temperature change, with most cities having a percentage change of around 18.5. The overall shape of the histogram closely approximates a normal distribution.

Interestingly, we found that from 1950 to 1978, when economic development in China was slow and most areas were rural, most cities had an annual percentage change in temperature of 18.1. However, from 1979 to 2012, during a period of rapid economic development and urbanization, most cities had an annual percentage change in temperature ranging from 18.25 to 19.00. This indicates that temperatures have increased in China over the past few decades, with evidence pointing to urbanization as a significant factor in this trend.

While urbanization has brought benefits to humanity in terms of improved living standards and quality of life, it has also led to global warming. The observed increase in temperatures highlights the cost of urbanization and emphasizes the importance of implementing sustainable urban development strategies that mitigate the impact of global warming.

In [None]:
fig, ax = plt.subplots(figsize=(20, 7))
mean_pivot.plot(
    kind="hist", y="mean_temp_change", color="#1a69b1",
    bins=17, legend=False, density=True, ax=ax
)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_xlabel("Percentage Change in Temperature")
ax.set_ylabel("Frequency")
ax.set_title("National Histogram of Percentage Change in Temperature from 1950 to 2012")
ax.grid(True, alpha=0.3)

In [None]:
fig, ax = plt.subplots(figsize=(20, 7))
mean_pivot_before.plot(
    kind="hist", y="mean_temp_change", color="#1a69b1",
    bins=12, legend=False, density=True, ax=ax
)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_xlabel("Percentage Change in Temperature")
ax.set_ylabel("Frequency")
ax.set_title("National Histogram of Percentage Change in Temperature from 1950 to 1978")
ax.grid(True, alpha=0.3)

In [None]:
fig, ax = plt.subplots(figsize=(20, 7))
mean_pivot_after.plot(
    kind="hist", y="mean_temp_change", color="#1a69b1",
    bins=12, legend=False, density=True, ax=ax
)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_xlabel("Percentage Change in Temperature")
ax.set_ylabel("Frequency")
ax.set_title("National Histogram of Percentage Change in Temperature from 1979 to 2012")
ax.grid(True, alpha=0.3)

# Project Two

## The Message
We already plotted the corresponding line plots, bar charts, scatter plots, and histograms in China from 1950 to 2012 in project one and find out that the temperature is increasing when urbanization happened. In this section, we are going to expand more and merge another dataset that shows the China urban population from 1960 to 2012 t

In [None]:
urban = 

## Maps and Interpretations

## Conclusion

In conclusion, the research found that urbanization plays an important role in shaping average temperature trends in China from 1950 to 2012. As the graphs and evidence suggest, we see that the temperature is increasing at a higher speed from 1979 to 2012 (when urbanization happened and the economy starts to bloom in China) compared with 1950 to 1978 (when the economy is growing very slowly). In addition, the average annual temperature is increasing in China from 1950 to 2012. More and more cities experienced extremely high-temperature years and fewer cities experienced low-temperature years.

As urban areas start to grow, cities will experience higher temperatures due to potential factors such as industrialization and pollution, which can be discussed in future research. We can combine it with the population dataset for future implementation and study the relationship between population and temperature change. These findings are important to our understanding of global warming and may give us some suggestions for how to maintain the balance between society's development and the stability of the enviroment. 