<a href="https://colab.research.google.com/github/Aishaamalik/Climate-Change-Prediction/blob/main/Project_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Climate Change Prediction

## Loading the Dataset

In [1]:
import pandas as pd
import numpy as np


In [2]:

file_path = 'climate_change_dataset.csv'
df = pd.read_csv(file_path)

df.head()


Unnamed: 0,Year,Country,Avg Temperature (°C),CO2 Emissions (Tons/Capita),Sea Level Rise (mm),Rainfall (mm),Population,Renewable Energy (%),Extreme Weather Events,Forest Area (%)
0,2006,UK,8.9,9.3,3.1,1441,530911230,20.4,14,59.8
1,2019,USA,31.0,4.8,4.2,2407,107364344,49.2,8,31.0
2,2014,France,33.9,2.8,2.2,1241,441101758,33.3,9,35.5
3,2010,Argentina,5.9,1.8,3.2,1892,1069669579,23.7,7,17.7
4,2007,Germany,26.9,5.6,2.4,1743,124079175,12.5,4,17.4


----

#  Data Cleaning & Preprocessing

## 1:  Loadind and Basic Inspection

In [3]:

print("Shape:", df.shape)

df.info()

print("\nMissing values:\n", df.isnull().sum())

df.describe()


Shape: (1000, 10)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Year                         1000 non-null   int64  
 1   Country                      1000 non-null   object 
 2   Avg Temperature (°C)         1000 non-null   float64
 3   CO2 Emissions (Tons/Capita)  1000 non-null   float64
 4   Sea Level Rise (mm)          1000 non-null   float64
 5   Rainfall (mm)                1000 non-null   int64  
 6   Population                   1000 non-null   int64  
 7   Renewable Energy (%)         1000 non-null   float64
 8   Extreme Weather Events       1000 non-null   int64  
 9   Forest Area (%)              1000 non-null   float64
dtypes: float64(5), int64(4), object(1)
memory usage: 78.3+ KB

Missing values:
 Year                           0
Country                        0
Avg Temperature (°C)           0


Unnamed: 0,Year,Avg Temperature (°C),CO2 Emissions (Tons/Capita),Sea Level Rise (mm),Rainfall (mm),Population,Renewable Energy (%),Extreme Weather Events,Forest Area (%)
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,2011.432,19.8831,10.4258,3.0096,1738.761,705383000.0,27.3005,7.291,40.572
std,7.147199,8.542897,5.614665,1.146081,708.976616,409391000.0,12.970808,4.422655,17.398998
min,2000.0,5.0,0.5,1.0,501.0,3660891.0,5.1,0.0,10.1
25%,2005.0,12.175,5.575,2.0,1098.75,343624200.0,16.1,3.0,25.6
50%,2012.0,20.1,10.7,3.0,1726.0,713116600.0,27.15,8.0,41.15
75%,2018.0,27.225,15.4,4.0,2362.5,1073868000.0,38.925,11.0,55.8
max,2023.0,34.9,20.0,5.0,2999.0,1397016000.0,50.0,14.0,70.0


## 2: Data Cleaning

### 2.1: Check for Missing Values

In [15]:

missing = df.isnull().sum()
missing_percent = (missing / len(df)) * 100
missing_df = pd.DataFrame({'Missing Values': missing, 'Percent (%)': missing_percent})
missing_df[missing_df['Missing Values'] > 0]


Unnamed: 0,Missing Values,Percent (%)


There is no missing value

In [6]:
for col in df.columns:
    if df[col].dtype in ['float64', 'int64']:
        df[col] = df[col].fillna(df[col].median())
    else:
        df[col] = df[col].fillna(df[col].mode()[0])


### 2.2: Handle Duplicates

In [7]:

print("Duplicates:", df.duplicated().sum())

df = df.drop_duplicates()


Duplicates: 0


Theere is also no duplicate value

### 2.3: Check Data Types

In [8]:
df.dtypes


Unnamed: 0,0
Year,int64
Country,object
Avg Temperature (°C),float64
CO2 Emissions (Tons/Capita),float64
Sea Level Rise (mm),float64
Rainfall (mm),int64
Population,int64
Renewable Energy (%),float64
Extreme Weather Events,int64
Forest Area (%),float64


All listed data types are appropriate for the respective columns.

## 3: Feature Engineering & Encoding

### 3.1: Encode Categorical Variables

Check which are categorical:


In [9]:
cat_cols = df.select_dtypes(include=['object']).columns
df[cat_cols].nunique()


Unnamed: 0,0
Country,15


Apply encoding:

In [10]:
# Using one-hot encoding for nominal variables
df = pd.get_dummies(df, columns=cat_cols, drop_first=True)


### 3.2: Normalize/Scale Numeric Data

In [11]:
from sklearn.preprocessing import StandardScaler

num_cols = df.select_dtypes(include=['float64', 'int64']).columns

scaler = StandardScaler()
df[num_cols] = scaler.fit_transform(df[num_cols])


## 4: Final Check

In [12]:
df.info()
df.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 23 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Year                         1000 non-null   float64
 1   Avg Temperature (°C)         1000 non-null   float64
 2   CO2 Emissions (Tons/Capita)  1000 non-null   float64
 3   Sea Level Rise (mm)          1000 non-null   float64
 4   Rainfall (mm)                1000 non-null   float64
 5   Population                   1000 non-null   float64
 6   Renewable Energy (%)         1000 non-null   float64
 7   Extreme Weather Events       1000 non-null   float64
 8   Forest Area (%)              1000 non-null   float64
 9   Country_Australia            1000 non-null   bool   
 10  Country_Brazil               1000 non-null   bool   
 11  Country_Canada               1000 non-null   bool   
 12  Country_China                1000 non-null   bool   
 13  Country_France     

Unnamed: 0,Year,Avg Temperature (°C),CO2 Emissions (Tons/Capita),Sea Level Rise (mm),Rainfall (mm),Population,Renewable Energy (%),Extreme Weather Events,Forest Area (%),Country_Australia,...,Country_France,Country_Germany,Country_India,Country_Indonesia,Country_Japan,Country_Mexico,Country_Russia,Country_South Africa,Country_UK,Country_USA
0,-0.760398,-1.286284,-0.200611,0.078917,-0.420197,-0.426387,-0.532269,1.517721,1.105674,False,...,False,False,False,False,False,False,False,False,True,False
1,1.059406,1.301954,-1.002485,1.03919,0.943012,-1.461483,1.689213,0.160391,-0.550422,False,...,False,False,False,False,False,False,False,False,False,True
2,0.359481,1.641588,-1.358873,-0.706761,-0.702435,-0.64587,0.46277,0.386613,-0.291657,False,...,True,False,False,False,False,False,False,False,False,False
3,-0.200458,-1.637629,-1.537067,0.166215,0.216249,0.890271,-0.277724,-0.065831,-1.315216,False,...,False,False,False,False,False,False,False,False,False,False
4,-0.620413,0.821783,-0.859929,-0.532166,0.005982,-1.420634,-1.141633,-0.744495,-1.332467,False,...,False,True,False,False,False,False,False,False,False,False


## 5: Downloading the clead dataset

In [16]:
cleaned_file_path = '/content/cleaned_climate_change_dataset.csv'
df.to_csv(cleaned_file_path, index=False)


In [17]:
from google.colab import files
files.download(cleaned_file_path)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>