In [None]:

import kagglehub
tarunrm09_climate_change_indicators_path = kagglehub.dataset_download('tarunrm09/climate-change-indicators')

print('Data source import complete.')


<div style="border: 2px dashed dodgerblue; border-radius: 5px; margin-bottom: 20px;">
<h1 style="text-align: center; padding: 10px 0 15px 0;"><span style="color: orangered;">Climate Change</span> <span style="color:dodgerblue;">Data Analysis and Model-Building</span></h1>
</div>

![climate-change1.jpg](attachment:39d242d4-cf4c-415b-a28e-b37e84dd70c3.jpg)

<a id="content"></a>

## Table of Contents

<a href=#imports>1. Importing Packages</a>

<a href=#data_load>2. Loading Data</a>

<a href=#data_cleaning>3. Cleaning Data</a>

<a href=#eda>4. Exploratory Data Analysis (EDA)</a>

<a href=#data_eng>5. Data Engineering</a>

<a href=#modelling>6. Modeling</a>

<a href=#performance>7. Model Performance</a>

<a href=#explanation>8. Model Explanations</a>

<a id="imports"></a>

## 1. Importing Packages

<a href="#content">
    Back to Table of Contents
</a>

---

⚡ All the packages used for the entire work.

---



In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
import numpy as np
import seaborn as sns
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from sklearn.model_selection import train_test_split
import geopandas as gpd
from sklearn.cluster import KMeans
import plotly.express as px
import folium

import os
os.environ["OMP_NUM_THREADS"] = "1"

<a id="data_load"></a>

## 2. Loading the Data

<a href="#content">
    Back to Table of Contents
</a>

---

⚡ Loading the data from climate_change_indicators.csv.

---



In [None]:
# Load the data# Load the data
climate_data = pd.read_csv('/kaggle/input/climate-change-indicators/climate_change_indicators.csv')

In [None]:
# Display the first few rows of the dataset to understand its structure
climate_data.head(5)

In [None]:
# Display the last few rows of the dataset to understand its structure
climate_data.tail(5)

<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<div style="padding: 15px; border-radius: 5px; background-color: white">
    
From these few rows of the beginning and end of the climate data, it can be inferred that the <strong>Indicator, Unit, Source, CTS_Code, CTS_Name</strong>, and <strong>CTS_Full_Descriptor</strong> columns <strong>have the same respective values for each row</strong>. Therefore, they will be excluded from the analysis and model-building.

</div>
</div>

In [None]:
# Generate descriptive statistics for the climate_data DataFrame
climate_data.describe()

<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">
<p>The <strong>climate_data.describe()</strong> function generates <strong></strong>descriptive statistics that summarize the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values.
</p>
    
**Breakdown**:
1. **Count**: The number of non-null entries in each column.
2. **Mean**: The average value of each column.
3. **Standard Deviation (std)**: A measure of the amount of variation or dispersion in each column.
4. **Minimum (min)**: The smallest value in each column.
5. **25th Percentile (25%)**: The value below which 25% of the data falls.
6. **Median (50%)**: The 50th percentile or the middle value, which is the median of the data.
7. **75th Percentile (75%)**: The value below which 75% of the data falls.
8. **Maximum (max)**: The largest value in each column.

<hr />

<div style="padding: 15px; border-radius: 5px; background-color: white">
    
The **largest value** of the ObjectId column is **225**, signifying that the total number of rows in the dataset is the same.

</div>
</div>

<a id="data_cleaning"></a>

## 3. Cleaning the Data

<a href="#content">
    Back to Table of Contents
</a>

---

⚡ Cleaning climate_data - which is already clean, so our focus will be on filling up null values.

---



In [None]:
climate_data.info()

<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">
<p>The <strong>climate_data.info()</strong> function is used to provide a concise summary of a DataFrame. It is particularly useful for getting a quick overview of the structure and contents of the dataset.
</p>

**Breakdown**:
1. **Index Range**: The output shows the range of the index, which tells you how many rows are in the DataFrame.

2. **Column Names and Types**: It lists all the columns in the DataFrame along with the data type (dtype) of each column. Common data types include:
   - `int64`: Integer values.
   - `float64`: Floating-point numbers.
   - `object`: Typically used for strings or mixed types.
   - `datetime64[ns]`: Date and time data.

3. **Non-Null Count**: For each column, it displays the number of non-null entries. This is useful for quickly identifying missing data.

4. **Memory Usage**: The total memory used by the DataFrame is displayed, which can help you understand the size of your dataset in memory. Knowing this can help us optimize data storage and processing in case of very large datasets.


<hr />
<div style="padding: 15px; border-radius: 5px; background-color: white">
It is evident that some columns are lacking some values although they have all been specified as non-null.
</div>
</div>

In [None]:
# Check for missing values in the dataset
missing_values = climate_data.isnull().sum()

# Check for duplicates
duplicate_rows = climate_data.duplicated().sum()

print("Missing Values: ", missing_values, "\nDuplicate Rows: ", duplicate_rows)

<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<div style="padding: 15px; border-radius: 5px; background-color: white">

Out of 72 columns (in total), there are missing values in ISO2 (2), F2018 (12), F2019 (12), F2020 (13), F2021 (12), and F2022 (12). Clearly, there are no duplicate rows.

</div>
</div>

In [None]:
# Select only the numeric year columns for filling missing values
year_columns = [col for col in climate_data.columns if col.startswith('F') and col[1:].isdigit()]

# Convert the selected year columns to numeric type (if not already)
climate_data[year_columns] = climate_data[year_columns].apply(pd.to_numeric, errors='coerce')

# Fill missing values in the year columns with the average of the surrounding values (row-wise)
climate_data[year_columns] = climate_data[year_columns].apply(lambda row: row.fillna(row.mean()), axis=1)

# Re-checking for any remaining missing values
remaining_missing_values = climate_data.isnull().sum()

remaining_missing_values[remaining_missing_values > 0]


In [None]:
# Identify the rows with missing ISO2 values and display relevant columns
iso2_missing_info = climate_data[climate_data['ISO2'].isnull()][['Country', 'ISO3']]

iso2_missing_info

In [None]:
# Fill in the missing ISO2 values manually
climate_data.loc[climate_data['ISO3'] == 'NAM', 'ISO2'] = 'NA'
climate_data.loc[climate_data['ISO3'] == 'WLD', 'ISO2'] = 'WL'

# Re-checking for any remaining missing values in ISO2
remaining_missing_values = climate_data['ISO2'].isnull().sum()

print("Missing values remaining: ", remaining_missing_values)

<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<div style="padding: 15px; border-radius: 5px; background-color: white">
    
Based on the corresponding values in the ISO3 column, the two missing values are filled for Namibia and World.

</div>
</div>

<a id="eda"></a>

## 4. Exploratory Data Analysis (EDA)

<a href="#content">
    Back to Table of Contents
</a>

---

⚡ Exploring the data to see possible trends in climate change over the years for countries and continents.

---



### Visualising world temperature trends over time

In [None]:
# Filter data for the "World"
world_data = climate_data[climate_data['Country'] == 'World']

# Extract years and corresponding temperature values
years = [int(col[1:]) for col in year_columns]
world_temperatures = world_data[year_columns].values.flatten()

# Plotting the global temperature trend
plt.figure(figsize=(10, 6))
plt.plot(years, world_temperatures, marker='o', linestyle='-', color='b')
plt.title('Global Temperature Trend (World Average)')
plt.xlabel('Year')
plt.ylabel('Temperature Change (°C)')
plt.grid(True)
plt.show()


<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">


<div style="padding: 15px; border-radius: 5px; background-color: white">

#### Insights:

1. **Overall Warming Trend:**
The most prominent insight is the clear upward trend in global temperatures. This indicates that, on average, the world has been getting warmer over the past several decades. The temperature increase is particularly noticeable from around the 1980s onwards.

2. **Fluctuations in Temperature:**
While the overall trend is upward, there are fluctuations year by year. This variability is typical in climate data, reflecting short-term influences such as volcanic activity, ocean currents (like El Niño and La Niña), and other natural climate variability factors.

3. **Significant Increase in Recent Decades:**
The rate of temperature increase appears to have accelerated in the last few decades. The graph shows steeper increases in temperature from the 1990s to 2020s, which aligns with the period of intensified global industrial activity and greenhouse gas emissions.

4. **Positive Temperature Anomalies:**
In the later years (2000s and 2010s), the temperature change remains consistently above 0°C, indicating positive temperature anomalies. This means that, compared to a baseline (likely mid-20th century), the global average temperature is consistently higher.

5. **Potential Implications:**
This trend is consistent with concerns about global warming and climate change. The rising temperatures could lead to various environmental impacts, including melting polar ice, rising sea levels, more frequent and severe weather events, and shifts in ecosystems and biodiversity.
  
<hr>

#### Conclusion:
The chart provides strong visual evidence of global warming, with a clear upward trend in temperatures over the past several decades. This trend is a critical indicator of ongoing climate change and underscores the importance of global efforts to mitigate its effects through reducing greenhouse gas emissions and adopting sustainable practices.

</div>
</div>

---

### Comparing Temperature Trends across continents

---

In [None]:
# Mapping countries to continents (simplified for illustration)
continent_map = {
    'Africa': ['Algeria', 'Angola', 'Benin', 'Botswana', 'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cameroon',
               'Central African Rep.', 'Chad', 'Comoros, Union of the', 'Congo, Dem. Rep. of the', 'Congo, Rep. of',
               'Djibouti', 'Egypt, Arab Rep. of', 'Equatorial Guinea, Rep. of', 'Eritrea, The State of', 'Eswatini, Kingdom of',
               'Ethiopia, The Federal Dem. Rep. of', 'Gabon', 'Gambia, The', 'Ghana', 'Guinea', 'Guinea-Bissau', 'Kenya',
               'Lesotho, Kingdom of', 'Liberia', 'Libya', 'Madagascar', 'Malawi', 'Mali', 'Mauritania, Islamic Rep. of',
               'Mauritius', 'Morocco', 'Mozambique, Rep. of', 'Namibia', 'Niger', 'Nigeria', 'Rwanda', 'São Tomé and Príncipe, Dem. Rep. of',
               'Senegal', 'Seychelles', 'Sierra Leone', 'Somalia', 'South Africa', 'South Sudan, Rep. of', 'Sudan',
               'Tanzania, United Rep. of', 'Togo', 'Tunisia', 'Uganda', 'Zambia', 'Zimbabwe'],
    'Asia': ['Afghanistan, Islamic Rep. of', 'Armenia, Rep. of', 'Azerbaijan, Rep. of', 'Bahrain, Kingdom of', 'Bangladesh',
             'Bhutan', 'Brunei Darussalam', 'Cambodia', 'China, P.R.: Mainland', 'Cyprus', 'Georgia', 'India', 'Indonesia',
             'Iran, Islamic Rep. of', 'Iraq', 'Israel', 'Japan', 'Jordan', 'Kazakhstan, Rep. of', 'Kuwait', 'Kyrgyz Rep.',
             "Lao People's Dem. Rep.", 'Lebanon', 'Malaysia', 'Maldives', 'Mongolia', 'Myanmar', 'Nepal', 'Oman', 'Pakistan',
             'Palestine', 'Philippines', 'Qatar', 'Saudi Arabia', 'Singapore', 'South Korea', 'Sri Lanka', 'Syria', 'Tajikistan, Rep. of',
             'Thailand', 'Turkmenistan', 'United Arab Emirates', 'Uzbekistan, Rep. of', 'Vietnam', 'Yemen, Rep. of'],
    'Europe': ['Albania', 'Andorra, Principality of', 'Austria', 'Belarus, Rep. of', 'Belgium', 'Bosnia and Herzegovina', 'Bulgaria',
               'Croatia, Rep. of', 'Czech Rep.', 'Denmark', 'Estonia, Rep. of', 'Finland', 'France', 'Germany', 'Greece',
               'Hungary', 'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Moldova, Rep. of',
               'Monaco', 'Montenegro', 'Netherlands, The', 'North Macedonia, Republic of ', 'Norway', 'Poland, Rep. of',
               'Portugal', 'Romania', 'Russian Federation', 'San Marino, Rep. of', 'Serbia, Rep. of', 'Slovak Rep.', 'Slovenia, Rep. of',
               'Spain', 'Sweden', 'Switzerland', 'Ukraine', 'United Kingdom'],
    'North America': ['Canada', 'United States', 'Mexico'],
    'South America': ['Argentina', 'Bolivia', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Guyana', 'Paraguay', 'Peru', 'Suriname',
                      'Uruguay', 'Venezuela, Rep. Bolivariana de'],
    'Oceania': ['Australia', 'Fiji, Rep. of', 'New Zealand', 'Papua New Guinea', 'Samoa', 'Solomon Islands', 'Vanuatu']
}

In [None]:
color_map_cont = {
    'Africa': 'dodgerblue',
    'Asia' : 'goldenrod',
    'Europe': 'tomato',
    'North America': 'brown',
    'Oceania': 'purple',
    'South America': 'limegreen'
}

# Add a 'Continent' column to the dataset based on the country-continent mapping
def get_continent(country):
    for continent, countries in continent_map.items():
        if country in countries:
            return continent
    return 'Unknown'

climate_data['Continent'] = climate_data['Country'].apply(get_continent)

# Filter out 'Unknown' continents if any
continent_data = climate_data[climate_data['Continent'] != 'Unknown']

# Group by 'Continent' and calculate the mean temperature change per year
continent_avg_temp = continent_data.groupby('Continent')[year_columns].mean()

# Plotting the temperature trends for each continent
plt.figure(figsize=(12, 8))

for continent in continent_avg_temp.index:
    plt.plot(years, continent_avg_temp.loc[continent], marker='o', linestyle='-', color=color_map_cont[continent], label=continent)

plt.title('Temperature Trends Across Continents (1950-2022)')
plt.xlabel('Year')
plt.ylabel('Average Temperature Change (°C)')
plt.legend()
plt.grid(True)
plt.show()


<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<p>
This chart compares the temperature trends across six continents: Africa, Asia, Europe, North America, Oceania, and South America from 1961 to 2022. The temperature change is measured in degrees Celsius (°C), and the trends are displayed over time.

</p>
    
<div style="padding: 15px; border-radius: 5px; background-color: white">    

    
    
#### Insights:

1. **Overall Warming Trend Across All Continents:**
All continents exhibit an overall increase in temperature over the observed period, indicating a global pattern of warming. This further confirms the widespread impact of climate change on a continental scale.

2. **Continental Variability in Temperature Trends:**
   - **Europe** (in tomato \[red]) and **North America** (in brown) show the most significant temperature increases, particularly after the 1980s. This could be related to their industrial activities, urbanization, and other factors influencing climate sensitivity in these regions.
   - **Asia** (in goldenrod \[gold]) and **Africa** (in dodgerblue \[blue]) also exhibit noticeable warming trends, though with more fluctuations compared to Europe and North America. These fluctuations might be due to diverse climates within these continents, ranging from tropical to arid regions.
   - **Oceania** (in purple) and **South America** (in limegreen \[green]) show more moderate temperature increases, but the overall trend remains upward, reflecting the global nature of the warming phenomenon.

3. **Post-2000 Temperature Increase:**
   - The period after 2000 shows a pronounced rise in temperatures across all continents. The steep increases in temperature are particularly noticeable in Europe and North America, where temperature anomalies exceed 2°C in some years.
   - This recent acceleration in warming aligns with the period of intensified greenhouse gas emissions and more frequent extreme weather events.

4. **Higher Variability in Certain Continents:**
   - **Europe** exhibits some of the highest year-to-year variability in temperature change, especially in the more recent decades. This could be due to the region's diverse climate zones, from Mediterranean climates in the south to polar climates in the far north.
   - **Asia** and **North America** also show high variability, possibly reflecting the influence of large landmasses with varied climates, including monsoons, deserts, and temperate zones.

5. **Regional Implications of Warming:**
The differences in warming rates among continents could have significant regional implications, such as shifts in agricultural zones, increased frequency of heatwaves, and changes in precipitation patterns, which could affect water resources, biodiversity, and human livelihoods.


<hr>

#### Conclusion:
The chart clearly demonstrates that climate change is a global issue, with every continent experiencing significant warming over the past 70 years. The higher temperature anomalies and variability observed in some continents, particularly Europe and North America, highlight the need for targeted climate adaptation and mitigation strategies. The increasing temperature trends underscore the urgency for global action to address the drivers of climate change and to prepare for its widespread impacts.

</div>

</div>

---

### Comparing the temperature trends of United States (USA), China (CHN), Brazil (BRA), Germany (DEU) and India (IND) over time

---

In [None]:
# Select the countries to compare
countries_to_compare = ['United States', 'China, P.R.: Mainland', 'Brazil', 'Germany', 'India']

# Filter the data for the selected countries
comparison_data = climate_data[climate_data['Country'].isin(countries_to_compare)]


In [None]:
# Define colors for the countries
color_map = {
    'United States': 'dodgerblue',
    'China, P.R.: Mainland': 'goldenrod',
    'Brazil': 'green',
    'Germany': 'tomato',
    'India': 'purple'
}

# Plotting the temperature trends with specific colors
plt.figure(figsize=(12, 8))

for country in countries_to_compare:
    country_data = comparison_data[comparison_data['Country'] == country]
    country_temperatures = country_data[year_columns].values.flatten()
    plt.plot(years, country_temperatures, marker='o', linestyle='-', color=color_map[country], label=country)

plt.title('Temperature Trends Comparison (1961-2022)')
plt.xlabel('Year')
plt.ylabel('Temperature Change (°C)')
plt.legend()
plt.grid(True)
plt.show()


<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<p>
This chart compares the temperature trends of five countries: the United States, China, Brazil, Germany, and India over the period from 1961 to 2022. The temperature change is measured in degrees Celsius (°C), and the trends are shown over time.
</p>

<div style="padding: 15px; border-radius: 5px; background-color: white">


#### Insights:

1. **Overall Warming Trend Across All Countries:**
   Similar to the global trend, all five countries exhibit an overall increase in temperature over the observed period, reflecting the broader pattern of global warming. This reinforces the idea that climate change is a global phenomenon affecting multiple regions.

2. **Temperature Variability:**
   There is significant year-to-year variability in the temperature changes for all countries, which is expected due to natural climate variability. However, despite these fluctuations, the long-term trend for all countries is upward.

3. **Country-Specific Differences:**
   - **China** (in goldenrod \[gold]) shows the most significant fluctuations, with periods of both sharp increases and decreases, particularly from the 1980s onward. This might reflect China's rapid industrialization and its associated environmental impacts.
   - **Germany** (in tomato \[red]) shows relatively stable temperature changes until the 1990s, after which there is a noticeable upward trend, possibly due to changes in regional climate patterns in Europe.
   - **India** (in purple) and **Brazil** (in green) show more consistent temperature increases, though India's trend seems to have fewer fluctuations compared to Brazil.
   - **The United States** (in dodgerblue \[blue]) exhibits a steady increase in temperature, with noticeable spikes in recent decades.
     
4. **Recent Decades (Post-2000):**
   The temperature increase becomes more pronounced in all countries after the year 2000. This is consistent with the global acceleration in warming due to increasing greenhouse gas emissions and other human activities.

5. **Inter-Country Comparison:**
   - **China** appears to have experienced the most significant warming, particularly after the 1990s, with some of the highest temperature anomalies.
   - **Germany** and **India** show relatively smoother trends, with fewer extreme changes, though they still reflect a clear warming trend.

6. **Climate Sensitivity:**
The differences in the temperature trends among these countries could be attributed to various factors such as geographical location, industrialization, urbanization, and local climate policies. For instance, regions that are closer to the poles might experience more rapid warming than those closer to the equator.

<hr />

#### Conclusion:
This chart highlights the ongoing and accelerating impact of climate change across different countries, with each country exhibiting unique patterns of temperature change. The data underscores the importance of both global and regional strategies in combating climate change, as the effects are widespread but vary in intensity and pattern across different regions. The substantial increase in temperature anomalies in recent decades also suggests an urgent need for climate action to mitigate these trends.
</div>
</div>

---

### Visualising temperature trends across continents using a heatmap

---

In [None]:
# Let's first check the available year columns to understand the range we can work with
available_year_columns = [col for col in climate_data.columns if col.startswith('F')]

# Display available year columns
available_year_columns[:10], available_year_columns[-10:]  # Show the first and last few to get an idea of the range


In [None]:
# Calculate the average temperature data per continent using the provided continent map
continent_data = pd.DataFrame(index=continent_map.keys(), columns=available_year_columns)

for continent, countries in continent_map.items():
    continent_data.loc[continent] = climate_data[climate_data['Country'].isin(countries)][available_year_columns].mean()

# Ensure all data in the DataFrame is numeric
continent_data = continent_data.apply(pd.to_numeric, errors='coerce')

# Plotting the corrected heatmap again
plt.figure(figsize=(15, 8))
sns.heatmap(continent_data, cmap='coolwarm', annot=False, cbar_kws={'label': 'Temperature Change (°C)'}, xticklabels=5)
plt.title('Temperature Changes Over Time by Continent')
plt.xlabel('Year')
plt.ylabel('Continent')
plt.show()


<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<p>
This heatmap visualizes temperature changes over time by continent, from 1961 to 2021. The temperature change is represented in degrees Celsius (°C), with the color scale ranging from blue (indicating cooler temperatures) to red (indicating warmer temperatures).
</p>



<div style="padding: 15px; border-radius: 5px; background-color: white">
    
#### Insights:

1. **Gradual Shift from Cool to Warm:**
   - The heatmap shows a clear progression from cooler temperatures (blue) in the early years (1960s-1970s) to warmer temperatures (red) in more recent years (2000s-2020s). This progression is consistent across all continents, indicating a global warming trend.

2. **Continental Differences in Warming Patterns:**
   - **Europe** and **North America** show the most pronounced warming trends, particularly noticeable from the 1990s onwards. The deep red colors in these regions after 2000 highlight significant temperature increases, which aligns with their more rapid warming rates observed in previous charts.
   - **Asia** and **Africa** also exhibit warming, but with slightly less intensity compared to Europe and North America. However, the trend is still clear, with temperatures shifting from blue to red over time.
   - **Oceania** and **South America** display a similar pattern of warming, though with more variability and slightly less intense changes in temperature. The less consistent patterns may reflect the influence of oceanic and other regional climatic factors.

3. **Acceleration of Warming After 2000:**
   - The transition to warmer temperatures becomes particularly pronounced after 2000 across most continents. This aligns with the period of accelerated global industrial activity and greenhouse gas emissions, which have driven significant climate changes in recent decades.
   - The deepening red colors in Europe and North America, in particular, highlight a marked increase in temperature anomalies, pointing to a period of rapid warming.

4. **Regional Climate Sensitivity:**
   - The heatmap suggests that some continents, like Europe and North America, may be more sensitive to global warming, showing more intense and consistent temperature increases over time.
   - Africa and Asia also show clear warming trends, but with some regions experiencing more moderate changes, possibly due to regional climate variability and different levels of industrialization.

5. **Temporal Patterns:**
   - The shift from blue to red over the decades indicates a clear temporal pattern of increasing temperatures, with the most intense warming occurring in the last two decades. This trend is consistent with the broader understanding of climate change, where the last few decades have seen the most significant increases in global temperatures.

<hr />

#### Conclusion:
This heatmap effectively illustrates the global nature of climate change, with all continents experiencing a shift from cooler to warmer temperatures over the past 60 years. The increasing intensity of red in recent years underscores the urgency of addressing climate change, as the impacts are becoming more pronounced and widespread. The differences in warming patterns among continents also suggest that regional strategies may be needed to address the specific climate challenges faced by different parts of the world.

</div>
</div>

---

### Visualising temperature changes across the six continents per decade

---

In [None]:
# Update the list of decades based on available data
decades = ['1960s', '1970s', '1980s', '1990s', '2000s', '2010s', '2020s']

# Ensure the data is correctly grouped by continent and decade
decade_data = {}

# Aggregate temperature data by continent and decade
for continent in continent_data.index:
    decade_data[continent] = []
    for decade in decades:
        start_year = int(decade[:4])
        end_year = start_year + 10
        decade_years = [f"F{year}" for year in range(start_year, min(end_year, 2023)) if f"F{year}" in available_year_columns]

        # Calculate the average for the decade
        if len(decade_years) > 0:
            avg_temp = continent_data.loc[continent, decade_years].mean()
            decade_data[continent].append(avg_temp)
        else:
            decade_data[continent].append(None)

# Convert the dictionary to a DataFrame
decade_df = pd.DataFrame(decade_data, index=decades)



In [None]:
color_map_cont = {
    'Africa': 'goldenrod',
    'Asia' : 'green',
    'Europe': 'red',
    'North America': 'dodgerblue',
    'Oceania': 'purple',
    'South America': 'brown'
}
# Plotting the temperature trends per decade for each continent
plt.figure(figsize=(12, 8))

for continent in decade_df.columns:
    plt.plot(decade_df.index, decade_df[continent], marker='o', linestyle='-', color=color_map_cont[continent], label=continent)

plt.title('Temperature Trends Across Continents Per Decade (1960-2020)')
plt.xlabel('Decade')
plt.ylabel('Average Temperature Change (°C)')
plt.legend()
plt.grid(True)
plt.show()

<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<p>
This chart illustrates the average temperature change across different continents (Africa, Asia, Europe, North America, South America, and Oceania) per decade from the 1960s to the 2020s. The temperature change is measured in degrees Celsius (°C), and the trends are displayed for each decade.
</p>



<div style="padding: 15px; border-radius: 5px; background-color: white">

#### Insights:

1. **Overall Increase in Temperature Across All Continents:**
The chart clearly shows that every continent has experienced a steady increase in average temperature change over the decades. This consistent upward trend across all regions highlights the global impact of climate change.

3. **Europe and North America Lead in Temperature Increase:**
   - **Europe** (red) shows the most significant increase, particularly after the 1980s. By the 2020s, Europe's average temperature change has reached nearly 2°C, indicating a more rapid rate of warming compared to other continents.
   - **North America** (dodgerblue \[blue]) follows closely behind Europe, also showing a significant increase in temperature, especially after the 1990s.

4. **Other Continents Exhibit Similar Warming Patterns:**
   - **Asia** (green), **Africa** (goldenrod \[gold/yellow]), **South America** (brown), and **Oceania** (purple) all show similar trends, with a steady rise in temperature over the decades. While the increase is less pronounced than in Europe and North America, the upward trend is still significant.

5. **Acceleration of Warming in Recent Decades:**
   - The most substantial increases in temperature occur from the 1990s onwards, reflecting the accelerating impact of global warming. This period coincides with significant industrial activity, increased greenhouse gas emissions, and other factors that have contributed to the rapid rise in global temperatures.
   - The 2000s and 2010s show the most pronounced differences between the continents, with Europe and North America experiencing the highest temperature changes, indicating that these regions may be more sensitive to climate change or have higher levels of emissions and industrial activity.

6. **Regional Differences in Climate Sensitivity:**
   - The differences in the rate of temperature increase among the continents suggest that some regions may be more vulnerable or responsive to global warming. For instance, Europe and North America have seen faster and more intense warming, possibly due to their geographic location, climate systems, or levels of industrialization.
   - Other continents, like Oceania and South America, exhibit a more gradual increase, which could be due to different regional climate patterns or slower rates of industrial growth.

<hr />

#### Conclusion:
The chart provides a clear visual representation of how global warming has affected different continents over time, with Europe and North America showing the most significant increases in temperature. The accelerating trend across all continents, especially in recent decades, underscores the urgency of global climate action to mitigate the impact of rising temperatures. This data is a stark reminder that while the effects of climate change may vary by region, they are universally felt and require coordinated efforts to address.

</div>
</div>

---

### Zooming into temperature changes for a particular country eg. (Ghana) over the years

---

In [None]:
# Specify a country
country_study = 'Ghana'

# Filter the data for the selected country
country_specific = climate_data[climate_data['Country'] == country_study]

In [None]:
# Plotting the temperature trends with specific colors
plt.figure(figsize=(12, 8))

country_specific = country_specific[country_specific['Country'] == country_study]
country_temperatures = country_specific[year_columns].values.flatten()
plt.plot(years, country_temperatures, marker='o', linestyle='-', label=country_study)

plt.title(f'Temperature Trends in {country_study} (1961-2022)')
plt.xlabel('Year')
plt.ylabel('Temperature Change (°C)')
plt.legend()
plt.grid(True)
plt.show()

<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<p>
This chart displays the temperature trends in Ghana from 1961 to 2022, with the temperature change measured in degrees Celsius (°C). The trend over time is visualized, highlighting how the average temperature in Ghana has varied and generally increased over the past six decades.
</p>


<div style="padding: 15px; border-radius: 5px; background-color: white">
    
#### Summary of Analysis:

1. **Long-Term Warming Trend:**
   - The chart shows a clear long-term increase in average temperature in Ghana from 1961 to 2022. This is consistent with global warming patterns observed in other parts of the world.

2. **Early Years (1960s to 1970s):**
   - In the early decades, particularly in the 1960s and 1970s, temperature fluctuations are evident, with several years experiencing slight cooling (negative temperature change). However, overall, the temperatures start to trend upwards.

3. **Increasing Temperature Variability (1980s to 1990s):**
   - The 1980s and 1990s show more pronounced variability in temperature, with some years experiencing sharp increases followed by decreases. This indicates more significant fluctuations in temperature, which might be related to both natural variability and the initial impacts of climate change.

4. **Significant Warming in Recent Decades (2000s to 2020s):**
   - The most noticeable trend is the sharp increase in temperatures from the 2000s onwards. The temperature change remains consistently positive and climbs steadily, indicating a strong warming trend. By 2022, the temperature change reaches around 1.5°C, a substantial increase compared to earlier decades.

5. **Implications for Ghana:**
   - The increasing temperatures could have significant implications for Ghana, including potential impacts on agriculture, water resources, and human health. The trend suggests that Ghana is experiencing the effects of global climate change, which may exacerbate issues related to droughts, heatwaves, and other climate-related challenges.


<hr />    
    
#### Conclusion:
   - The chart provides strong evidence that Ghana is warming, with the most significant increases in temperature occurring in the last two decades. This trend aligns with global patterns of climate change and highlights the importance of climate adaptation and mitigation strategies in Ghana to address the potential impacts of rising temperatures.
  
</div>
</div>

<a id="data_eng"></a>

## 5. Data Engineering

<a href="#content">
    Back to Table of Contents
</a>

---

⚡ Preparing and restructuring the climate data for modelling

---



---

### Continent data

---

In [None]:
continent_data

---

### Country data

---

In [None]:
available_countries = [col for col in climate_data['Country']]

In [None]:
country_data = pd.DataFrame(index=available_countries, columns=available_year_columns)
for country in available_countries:
    # Filter the climate_data for the current country and select the desired columns
    country_row = climate_data[climate_data['Country'] == country][available_year_columns]

    # Assign the selected row to the country_data DataFrame
    country_data.loc[country] = country_row.values.flatten()



In [None]:
country_data = country_data.apply(pd.to_numeric, errors='coerce')
country_data

In [None]:
# Select the temperature data for a specific continent (e.g., Asia)
continent = 'Africa'
continent_series = continent_data.loc[continent].dropna()

In [None]:
country = 'Ghana'
country_series = country_data.loc[country].dropna()

In [None]:
country_series

In [None]:
continent_series

<a id="modelling"></a>

## 6. Model-building

<a href="#content">
    Back to Table of Contents
</a>

---

⚡ Building a climate change model based on ARIMA and data specific to Africa (for continent) and Ghana (for country) that predicts temperatures for subsequent years

---



In [None]:
# help(ARIMA)

---

### Continent model - Africa

---

In [None]:

# Fit the ARIMA model
continent_model = ARIMA(continent_series, order=(7, 1, 1))  # ARIMA parameters (p, d, q)
continent_model_fit = continent_model.fit()


---

### Country model - Ghana

---

In [None]:
# Fit the ARIMA model
country_model = ARIMA(country_series, order=(7, 1, 1))  # ARIMA parameters (p, d, q)
country_model_fit = country_model.fit()

---

### Continent model prediction for the next 10 years

---

In [None]:
# Generate predictions for the next 10 years (2023-2032)
forecast_years_continent = [f'F{year}' for year in range(2023, 2033)]
forecast_continent = continent_model_fit.forecast(steps=10)
forecast_continent.index = forecast_years_continent

---

### Country model prediction for the next 10 years

---

In [None]:
# Generate predictions for the next 10 years (2023-2032)
forecast_years_country = [f'F{year}' for year in range(2023, 2033)]
forecast_country = country_model_fit.forecast(steps=10)
forecast_country.index = forecast_years_country

---

### Visualising continent historical with predicted data

---

In [None]:

# Combine historical data with predictions for visualization
combined_continent_series = pd.concat([continent_series, forecast_continent])

# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(combined_continent_series.index, combined_continent_series, marker='o', linestyle='-', color='blue', label='Observed & Forecasted')
plt.axvline(x='F2022', color='red', linestyle='--', label='Forecast Start')
plt.title(f'ARIMA Forecast for {continent} Temperature Trends (1961-2032)')
plt.xlabel('Year')
plt.ylabel('Temperature Change (°C)')
plt.legend()
plt.grid(True)

# Rotate the x-axis labels vertically
plt.xticks(rotation=90)

plt.show()


<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<p>
This chart presents the ARIMA forecast for Africa's temperature trends from 1961 to 2032, focusing on temperature changes (in °C) over time. The chart contains both observed and forecasted data, with the vertical red dashed line indicating the start of the forecast period.
</p>



<div style="padding: 15px; border-radius: 5px; background-color: white">
    
#### Summary of Analysis:

1. **Observed Data (1961–2022):**
   - The data up to the red dashed line reflects the historical observations from 1961 to approximately 2022. The chart shows an overall upward trend in temperature changes, with noticeable year-to-year variability.
   - Similar to the previous graph for Ghana, the temperature trend exhibits a gradual increase over time, with certain periods of sharp rises and occasional minor decreases.
   - The temperature increase is particularly evident from the 1990s onwards, indicating that warming has accelerated in recent decades.

2. **Forecasted Data (2023–2032):**
   - Starting from the red dashed line (around 2022), the ARIMA model projects future temperature changes for Africa until 2032.
   - The forecasted trend suggests a continued increase in temperatures, maintaining the upward trajectory that was observed in the historical data.
   - The projected temperature rise stabilizes around 1.2–1.4°C by 2032, which indicates persistent warming. Although there is still some variation, the forecast suggests less extreme fluctuations compared to the historical period.

3. **Implications for Africa:**
   - The chart clearly points to a future where temperatures in Africa are expected to continue rising. This may have significant consequences for the region, such as increased frequency and intensity of heatwaves, droughts, and other climate-related events.
   - The forecast underscores the importance of climate mitigation and adaptation strategies to address the anticipated challenges that could arise from these warming trends.

4. **Model Performance:**
   - ARIMA (Auto-Regressive Integrated Moving Average) models are commonly used for time-series forecasting, and the results here appear to capture both historical trends and provide reasonable future projections. However, real-world events and unforeseen climate factors could cause deviations from this forecast.

<hr />
    
#### Conclusion:
   - The ARIMA forecast in this chart highlights a concerning trend of continued warming in Africa, with temperature increases projected to persist through 2032. This forecast emphasizes the need for ongoing efforts to mitigate the impacts of climate change on the continent.
  
</div>
</div>

---

### Visualising country historical data with predicted data

---

In [None]:

# Combine historical data with predictions for visualization
combined_country_series = pd.concat([continent_series, forecast_country])

# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(combined_country_series.index, combined_country_series, marker='o', linestyle='-', color='blue', label='Observed & Forecasted')
plt.axvline(x='F2022', color='red', linestyle='--', label='Forecast Start')
plt.title(f'ARIMA Forecast for {country} Temperature Trends (1961-2032)')
plt.xlabel('Year')
plt.ylabel('Temperature Change (°C)')
plt.legend()
plt.grid(True)

# Rotate the x-axis labels vertically
plt.xticks(rotation=90)

plt.show()


<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<p>
This chart presents an ARIMA forecast for Ghana's temperature trends from 1961 to 2032, showing the observed and forecasted temperature changes in °C.
</p>



<div style="padding: 15px; border-radius: 5px; background-color: white">
    
#### Key Observations:

1. **Observed Data (1961–2022):**
   - Similar to the previous chart for Africa, this shows historical temperature changes for Ghana until around 2022, before the forecast period starts (marked by the red dashed line).
   - The historical data indicates a clear upward trend in temperatures over time, with fluctuations. The trendline shows gradual warming, punctuated by periods of sharp increases and occasional dips.
   - From approximately the 1990s onward, there is a more noticeable and consistent rise in temperatures, reflecting a warming trend.
     
2. **Forecasted Data (2023–2032):**
   - The forecasted period shows a continued rise in temperature through 2032. The forecast suggests that Ghana will experience a temperature change exceeding 1.2°C by the end of the period.
   - While the observed data had notable year-to-year variability, the forecast suggests a slightly smoother increase in temperature, with fewer extreme fluctuations.
   - The projected stabilization around 1.2°C suggests persistent warming, but at a less volatile rate than in the historical data.

3. **Implications for Africa:**
   - This temperature increase, as forecasted, could have significant environmental, agricultural, and socio-economic impacts on Ghana. Warming temperatures may lead to more frequent heatwaves, shifts in rainfall patterns, and potential threats to agriculture and water resources.
   - As in other regions, the forecasted trends for Ghana highlight the importance of preparing for climate adaptation and addressing the challenges posed by rising temperatures.

4. **Model Insights:**
   - The ARIMA model effectively captures both historical variability and future projections, though actual future temperatures could be influenced by unforeseen events or shifts in global climate patterns.
   - Ghana’s temperature changes seem to follow a trajectory similar to the African continent overall, with local variations in the degree and timing of temperature increases.


<hr />
    
#### Conclusion:
   - The ARIMA model projects that Ghana's temperatures will continue to rise, reaching about 1.2°C above the baseline by 2032. This reflects broader regional trends of increasing temperatures, and the forecast underscores the need for climate action and resilience planning in the country.
  
</div>
</div>

In [None]:

combined_continent_series.tail(10)  # Show the predicted values for the next 10 years in Africa


In [None]:

combined_country_series.tail(10)  # Show the predicted values for the next 10 years in Ghana


<a id="performance"></a>

## 7. Model performance

<a href="#content">
    Back to Table of Contents
</a>

---

⚡ Using performance metrics to measure the model accuracy

---



In [None]:

residuals = continent_model_fit.resid
plt.figure(figsize=(10, 6))
plt.plot(residuals)
plt.title('Residuals from ARIMA Model')
# Rotate the x-axis labels vertically
plt.xticks(rotation=90)
plt.show()


<div style="padding: 15px; border: 1px solid dodgerblue; border-radius: 5px; background-color: aliceblue">

<p>
The residuals from the Continent (Africa) ARIMA model, as shown in the plot, provide insights into the model's performance and whether it captures the underlying data effectively.
</p>



<div style="padding: 15px; border-radius: 5px; background-color: white">
    
#### Analysis:

1. **Mean-Centered Residuals:**
   - The residuals fluctuate around zero, which is ideal in time series models. This suggests that, on average, the ARIMA model has not systematically under- or overestimated the temperature trends.
     
2. **No Apparent Trend:**
   - The residuals do not display any obvious upward or downward trends, which indicates that the model has successfully captured the main trend in the data. If a trend were present, it would suggest that the ARIMA model had missed an important aspect of the data.

3. **Heteroscedasticity:**
   - While the residuals vary over time, there doesn't appear to be a clear pattern of increasing or decreasing variance (heteroscedasticity). If there were increasing variance, it could mean that the model becomes less reliable over time.

4. **High Variability:**
   - The residuals show substantial fluctuations, with some large spikes both positively and negatively. This could indicate that while the ARIMA model has captured the general trend, it may not fully account for all the variability in the data, possibly due to external factors or noise not captured in the model.

5. **Autocorrelation and Patterns:**
   - A further step would be to check for autocorrelation within the residuals (e.g., using an autocorrelation plot). If residuals are correlated, it suggests that the ARIMA model has not fully captured all patterns in the data, and additional lags or differencing terms may be necessary.

<hr />

#### Conclusion:
   - The ARIMA model appears to perform reasonably well, as the residuals are mean-centered, and there is no obvious trend or systematic error. However, the high variability in the residuals might indicate the need for potential refinement, such as exploring more complex models or incorporating external regressors to reduce noise.
  
</div>
</div>

In [None]:

residuals_country = country_model_fit.resid
plt.figure(figsize=(10, 6))
plt.plot(residuals_country)
plt.title('Residuals from ARIMA Country Model')
# Rotate the x-axis labels vertically
plt.xticks(rotation=90)
plt.show()


In [None]:
# model = ARIMA(continent_series, order=(7, 2, 1))  # ARIMA parameters (p, d, q)
residuals.plot(kind='kde')
plt.title('Residual Density Plot')
plt.show()

In [None]:
# model = ARIMA(country_series, order=(7, 2, 1))  # ARIMA parameters (p, d, q)
residuals_country.plot(kind='kde')
plt.title('Country Residual Density Plot')
plt.show()

### Continent Accuracy

In [None]:

mae = mean_absolute_error(continent_series[-len(residuals):], continent_model_fit.fittedvalues)
print(f'Continent Mean Absolute Error: {mae}')


In [None]:

mse = mean_squared_error(continent_series[-len(residuals):], continent_model_fit.fittedvalues)
print(f'Continent Mean Squared Error: {mse}')


In [None]:
rmse = np.sqrt(mse)
print(f'Continent Root Mean Squared Error: {rmse}')


In [None]:
# Predict in-sample (on the training data)
continent_predictions = continent_model_fit.predict(start=1, end=len(continent_series), typ='levels')


In [None]:
# Calculate R² score
r2 = r2_score(continent_series[1:], continent_predictions[1:])  # Exclude the first point because ARIMA predict starts from index 1
print(f'Continent R² Score: {r2}')

In [None]:
# First
# model = ARIMA(continent_series, order=(5, 1, 0))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.1524796445780324
# Mean Squared Error: 0.03677240295916035
# Root Mean Squared Error: 0.1917613176820611
# R² Score: 0.8738133431312158

# Second
# model = ARIMA(continent_series, order=(7, 2, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.1498938714110211
# Mean Squared Error: 0.03505845305749636
# Root Mean Squared Error: 0.18723902653425742
# R² Score: 0.840192078246505

# Third
# model = ARIMA(continent_series, order=(6, 2, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.1494478653452113
# Mean Squared Error: 0.03539841699741596
# Root Mean Squared Error: 0.1881446703933331
# R² Score: 0.8579029759435071

# Fourth
# model = ARIMA(continent_series, order=(5, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.15195174989957566
# Mean Squared Error: 0.03669754729360665
# Root Mean Squared Error: 0.1915660389881428
# R² Score: 0.8734101135217522

# Fifth
# model = ARIMA(continent_series, order=(6, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.15044112373100743
# Mean Squared Error: 0.03599876729929944
# Root Mean Squared Error: 0.18973341113072162
# R² Score: 0.869595063951439

# Sixth
# model = ARIMA(continent_series, order=(7, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.1474285315925977
# Mean Squared Error: 0.03347342476465636
# Root Mean Squared Error: 0.1829574397630672
# R² Score: 0.8500830587858229

# Seventh
# model = ARIMA(continent_series, order=(8, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.1470587129681906
# Mean Squared Error: 0.03310567903216963
# Root Mean Squared Error: 0.18194966070913576
# R² Score: 0.848389726541708

# Eighth
# model = ARIMA(continent_series, order=(9, 1, 1))  # ARIMA parameters (p, d, q
# Mean Absolute Error: 0.15029703391322832
# Mean Squared Error: 0.03480204363570089
# Root Mean Squared Error: 0.18655305849999054
# R² Score: 0.8619523243130178

# Ninth
# model = ARIMA(continent_series, order=(10, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.14606950730161192
# Mean Squared Error: 0.031605384138834285
# Root Mean Squared Error: 0.17777903177493765
# R² Score: 0.8420115310227937

# Tenth
# model = ARIMA(continent_series, order=(11, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.14595712859178278
# Mean Squared Error: 0.031601680368444435
# Root Mean Squared Error: 0.17776861468899519
# R² Score: 0.843083058024414

# Eleventh
# model = ARIMA(continent_series, order=(20, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.13347787814426726
# Mean Squared Error: 0.028349725761645166
# Root Mean Squared Error: 0.16837376803304357
# R² Score: 0.8111046030163367

# Twelfth
# model = ARIMA(continent_series, order=(25, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.12933442988896562
# Mean Squared Error: 0.02791222325729164
# Root Mean Squared Error: 0.16706951624186753
# R² Score: 0.805278969606791

# Thirteenth
# model = ARIMA(continent_series, order=(30, 1, 1))  # ARIMA parameters (p, d, q)
# Mean Absolute Error: 0.12544034794498096
# Mean Squared Error: 0.027084857214794478
# Root Mean Squared Error: 0.16457477697021033
# R² Score: 0.7922307119660194

### Country Accuracy

In [None]:

mae_country = mean_absolute_error(country_series[-len(residuals_country):], country_model_fit.fittedvalues)
print(f'Country Mean Absolute Error: {mae_country}')


In [None]:

mse_country = mean_squared_error(country_series[-len(residuals_country):], country_model_fit.fittedvalues)
print(f'Country Mean Squared Error: {mse_country}')


## Visualising the data with a map

In [None]:
climate_subset = climate_data[[
    'Country', 'ISO3', 'F1962', 'F1967', 'F1972', 'F1977', 'F1982', 'F1987',
    'F1992', 'F1997', 'F2002', 'F2007', 'F2012', 'F2017', 'F2022'
]]
climate_subset.head()

In [None]:
climate_subset = climate_subset.dropna().reset_index(drop = True).copy()

In [None]:
# choosing columns for grouping
I = climate_subset.iloc[:, 2:]
list_inertia = []
X_range = range(1, 11)

#calculating inertia
for X in X_range:
    Xmeans = KMeans(n_clusters = X, n_init = 10, random_state = 42)
    Xmeans.fit(I)
    list_inertia.append(Xmeans.inertia_)


In [None]:
#plot
plt.figure(figsize = (5, 3), facecolor = "orange")
plt.plot(X_range, list_inertia, marker = "o", linestyle = '-')
plt.xlabel('Number of Groupings (X)')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal X')
plt.grid(True)
plt.show()

In [None]:
n_clusters = 4
kmeans = KMeans(n_clusters = n_clusters, n_init = 10, random_state = 42)
climate_subset['Cluster'] = kmeans.fit_predict(I)

#showings the clustered area
display(climate_subset)

In [None]:
# Load the world shapefile (geopandas comes with a basic one)
df_subset = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))


In [None]:
# Merge world GeoDataFrame with climate data on the country name
df_subset['ISO3'] = df_subset['iso_a3']

df_subset = pd.merge(df_subset, climate_subset, on = ['ISO3'], how = 'left')
df_subset.head()
# world_climate = world.merge(climate_data, how='left', left_on='name', right_on='country')


In [None]:
# Plotting the map
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
df_subset.boundary.plot(ax=ax)  # Plot country boundaries
df_subset.plot(
    column = df_subset['Cluster'], ax = ax,
    legend = True,
    legend_kwds = {
        'label': "Temperature Change (°C) Cluster"
    },
    cmap='coolwarm'
)
ax.set_axis_off()
# ax.set_title("Map of temperature clusters based on the Climate Change Indicator")
plt.title('Map of temperature clusters based on the Climate Change Indicator')
plt.show()

In [None]:

# Plotting the map
fig, ax = plt.subplots(1, 1, figsize=(20, 15))
df_subset.boundary.plot(ax=ax)  # Plot country boundaries
df_subset.plot(
    column=df_subset['Cluster'], ax=ax,
    legend=True,
    legend_kwds={'label': "Temperature Change (°C) Cluster"},
    cmap='coolwarm'
)

# Annotate country/continent names
for idx, row in df_subset.iterrows():
    # Get the centroid of the geometry to place the text
    centroid = row['geometry'].centroid
    # Annotate the name (e.g., country name)
    ax.annotate(text=row['name'], xy=(centroid.x, centroid.y),
                horizontalalignment='center', fontsize=8, color='black', weight='bold')

ax.set_axis_off()
plt.title('Map of temperature clusters based on the Climate Change Indicator')
plt.show()


In [None]:
# pip install folium


In [None]:
# pip install plotly


In [None]:

# Assuming df_subset is a GeoDataFrame with geometry and 'Cluster' columns
df_subset['country_name'] = df_subset['name']  # Ensure the country names are in a separate column

# Plotly expects the GeoDataFrame to be converted to a GeoJSON format
geojson = gpd.GeoSeries(df_subset['geometry']).__geo_interface__

# Get the coolwarm colormap from Matplotlib
cmap = plt.get_cmap('coolwarm')

# Convert the colormap to a Plotly colorscale
coolwarm_colorscale = [[i / 255, f'rgba({r*255:.0f},{g*255:.0f},{b*255:.0f},{a})']
                       for i, (r, g, b, a) in enumerate(cmap(range(256)))]

# Create the choropleth map
fig = px.choropleth(df_subset,
                    geojson=geojson,
                    locations=df_subset.index,
                    color="Cluster",
                    hover_name="country_name",  # Column to display on hover
                    projection="natural earth",
                    color_continuous_scale=coolwarm_colorscale)

# Update layout to increase the figure size
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(
    title="Temperature Clusters Based on the Climate Change Indicator",
    width=1000,  # Adjust the width
    height=800   # Adjust the height
)

fig.show()


In [None]:


# Assuming df_subset is a GeoDataFrame with 'geometry', 'Cluster', and 'name' columns
temperature_column = 'Cluster'  # Use the 'Cluster' column for coloring

# Initialize a Folium map
m = folium.Map(location=[0, 0], zoom_start=2)

# Manually create a colormap for the clusters (you can customize these colors)
cluster_colors = {
    0: '#2b83ba',  # Blue
    1: '#abdda4',  # Light Green
    2: '#fdae61',  # Orange
    3: '#d7191c',  # Red
    4: '#a7191c',  # Red
    # Add more colors if you have more clusters
}

# Add GeoJSON layer to the map with a custom color scale for discrete categories
geojson_layer = folium.GeoJson(
    df_subset.to_json(),
    name="Temperature Clusters",
    style_function=lambda feature: {
        'fillColor': cluster_colors.get(feature['properties'][temperature_column], '#gray'),
        'color': 'black',
        'weight': 0.5,
        'fillOpacity': 0.7
    },
    tooltip=folium.GeoJsonTooltip(
        fields=['name'],  # Column that contains the country names
        aliases=['Country:'],  # Label for the tooltip
    )
).add_to(m)

# Add a legend manually (optional)
legend_html = '''
 <div style="position: fixed;
     bottom: 50px; left: 50px; width: 150px; height: 120px;
     border:2px solid grey; z-index:9999; font-size:14px;">
     &nbsp; <b>Cluster Colors</b> <br>
     &nbsp; Cluster 0: <i style="background:#2b83ba;color:white">Blue</i>
     &nbsp; Cluster 1: <i style="background:#abdda4;color:black">Light Green</i><br>
     &nbsp; Cluster 2: <i style="background:#fdae61;color:black">Orange</i><br>
     &nbsp; Cluster 3: <i style="background:#d7191c;color:white">Red</i><br>
     &nbsp; Cluster 3: <i style="background:#a7191c;color:white">Red</i><br>
 </div>
'''
m.get_root().html.add_child(folium.Element(legend_html))

# Add layer control to the map
folium.LayerControl().add_to(m)

# Save the map as an HTML file or display in a Jupyter notebook
m.save("temperature_clusters_map.html")

# If running in a Jupyter notebook, display the map
m


### Footnote to Fellow Kagglers 🌍

Thank you for exploring my analysis on climate data and ARIMA modeling! This notebook delves into historical and forecasted temperature trends for Africa and Ghana, highlighting both the challenges and potential insights derived from time series modeling in climate studies. By experimenting with model selection, residual analysis, and visualizations, I've aimed to capture temperature dynamics and support meaningful discussions on climate change impacts.

Feel free to:
- **Share your thoughts:** Climate modeling is complex, and every insight helps!
- **Fork and experiment:** Try tweaking the model parameters, exploring different data preprocessing techniques, or comparing ARIMA with other models like SARIMA, Prophet, or LSTM for more robust results.
- **Contribute ideas:** This work is part of a larger quest to understand climate shifts. I'm open to suggestions for improvement and eager to learn from your approaches too!

Together, let’s push the boundaries of climate data analysis and work towards actionable insights for our planet. 🌱

Happy Kaggle-ing! ✌️