# Regional Climate & Energy Analysis
## Data Loading and initial inspection

This note loads the climate-energy dataset and establishes the regional structure used throughout the analysis.

Focus:
- Dataset shape and schema
- Time range validation
- Country coverage
- Region mapping (continent-level)

In [1]:
#import libraries
import pandas as pd
import numpy as np

In [8]:
#Load dataset
df = pd.read_csv("D:/data-analysis-and-visualization-1/data/raw/global_climate_energy_2020_2024.csv")

In [9]:
#basic inspection
df.head()

Unnamed: 0,date,country,avg_temperature,humidity,co2_emission,energy_consumption,renewable_share,urban_population,industrial_activity_index,energy_price
0,1/1/2020,Germany,28.29,31.08,212.63,11348.75,14.42,76.39,51.22,83.93
1,1/2/2020,Germany,28.38,37.94,606.05,4166.64,5.63,86.26,78.27,110.4
2,1/3/2020,Germany,28.74,57.67,268.72,4503.8,14.2,75.92,48.96,173.58
3,1/4/2020,Germany,26.66,51.34,167.32,3259.13,13.84,63.15,97.42,89.13
4,1/5/2020,Germany,26.81,65.38,393.89,7023.72,6.93,76.02,81.89,40.6


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36540 entries, 0 to 36539
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   date                       36540 non-null  object 
 1   country                    36540 non-null  object 
 2   avg_temperature            36540 non-null  float64
 3   humidity                   36540 non-null  float64
 4   co2_emission               36540 non-null  float64
 5   energy_consumption         36540 non-null  float64
 6   renewable_share            36540 non-null  float64
 7   urban_population           36540 non-null  float64
 8   industrial_activity_index  36540 non-null  float64
 9   energy_price               36540 non-null  float64
dtypes: float64(8), object(2)
memory usage: 2.8+ MB


In [12]:
df.shape

(36540, 10)

There are no empty values- only 10 columns and 36 540 entries.

In [13]:
#column validation
df.columns

Index(['date', 'country', 'avg_temperature', 'humidity', 'co2_emission',
       'energy_consumption', 'renewable_share', 'urban_population',
       'industrial_activity_index', 'energy_price'],
      dtype='object')

In [14]:
#convert date column
df["date"] = pd.to_datetime(df["date"])

In [15]:
df["date"].min(), df["date"].max()

(Timestamp('2020-01-01 00:00:00'), Timestamp('2024-12-31 00:00:00'))

Timestamp is from 2020 to 2024

In [16]:
df["country"].nunique()

20

In [23]:
df["country"].value_counts()

country
Germany           1827
France            1827
Netherlands       1827
Italy             1827
Spain             1827
Sweden            1827
Norway            1827
Poland            1827
Turkey            1827
United Kingdom    1827
United States     1827
Canada            1827
Brazil            1827
India             1827
China             1827
Japan             1827
Australia         1827
South Africa      1827
Mexico            1827
Indonesia         1827
Name: count, dtype: int64

In [24]:
#defining the region dictionary
region_mapping = {
    #Africa
    "South Africa": "Africa",

    #Europe
    "Germany": "Europe",
    "France": "Europe",
    "Netherlands": "Europe",
    "Italy": "Europe",
    "Spain": "Europe",
    "Sweden": "Europe",
    "Norway": "Europe",
    "Poland": "Europe",
    "United Kingdom": "Europe",

    #Asia
    "Turkey": "Asia",
    "India": "Asia",
    "China": "Asia",
    "Japan": "Asia",
    "Indonesia": "Asia",

    #North America
    "Canada": "North America",
    "United States": "North America",
    "Mexico": "North America",

    #South America
    "Brazil": "South America",

    #Oceania
    "Australia": "Oceania"
}

In [25]:
#Applying the region mapping
df["region"] = df["country"].map(region_mapping)

In [26]:
df["region"].isna().sum()

np.int64(0)

In [27]:
df["region"].value_counts()

region
Europe           16443
Asia              9135
North America     5481
South America     1827
Oceania           1827
Africa            1827
Name: count, dtype: int64

There are six regions: Europe has the highest data entries, South America, Oceania and Africa has the lowest data entries. This comes as a result of the number of countries covered in the regions which is only one per region.

In [28]:
df.head()

Unnamed: 0,date,country,avg_temperature,humidity,co2_emission,energy_consumption,renewable_share,urban_population,industrial_activity_index,energy_price,region
0,2020-01-01,Germany,28.29,31.08,212.63,11348.75,14.42,76.39,51.22,83.93,Europe
1,2020-01-02,Germany,28.38,37.94,606.05,4166.64,5.63,86.26,78.27,110.4,Europe
2,2020-01-03,Germany,28.74,57.67,268.72,4503.8,14.2,75.92,48.96,173.58,Europe
3,2020-01-04,Germany,26.66,51.34,167.32,3259.13,13.84,63.15,97.42,89.13,Europe
4,2020-01-05,Germany,26.81,65.38,393.89,7023.72,6.93,76.02,81.89,40.6,Europe


Region column has been added.

### Summary

In this notebook, I:
- Loaded and inspected the dataset
- Validated time span and country coverage
- Imposed a continent-level regional structure

This structure enables meaningful comparative analysis across climate, energy, and industrial systems.