# Change working directory
* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory.

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [2]:
import os
current_dir = os.getcwd()
current_dir

'c:\\Users\\mukti\\Desktop\\codeInstitute\\GlobalEcoInsights\\GlobalEcoInsights2000-2024\\jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [3]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [4]:
current_dir = os.getcwd()
current_dir

'c:\\Users\\mukti\\Desktop\\codeInstitute\\GlobalEcoInsights\\GlobalEcoInsights2000-2024'

# Section 3

### Build a Predictive Model using Regression Analysis

• Use Linear Regression Regression models to predict - 
   * CO2 emissions per capita based on population.
   * Sea level rise based on CO2 emissions and population.
• Evaluate Model Performance:
   * Train-test split, RMSE, R² score for accuracy.

In [5]:
# Import the required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
file_path = "temperature_cleaned.csv"
df = pd.read_csv(file_path)

# Predict CO2 Emissions based on Population
def predict_co2_emissions(data):
    X = data[['Population']]
    y = data['CO2_Emissions_tons_per_capita']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("CO2 Emissions Prediction:")
    print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
    print("R2 Score:", r2_score(y_test, y_pred))

# Predict Sea Level Rise based on Population and CO2 Emissions
def predict_sea_level_rise(data):
    X = data[['Population', 'CO2_Emissions_tons_per_capita']]
    y = data['Sea_Level_Rise_mm']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("\nSea Level Rise Prediction:")
    print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
    print("R2 Score:", r2_score(y_test, y_pred))

# Run predictions
predict_co2_emissions(df)
predict_sea_level_rise(df)

CO2 Emissions Prediction:
Mean Squared Error: 32.75128285255626
R2 Score: -0.004314668947038802

Sea Level Rise Prediction:
Mean Squared Error: 1.2671316370634944
R2 Score: -0.025404975338411795


#### Key Insights - 
* CO₂ Emissions Prediction:
MSE (Mean Squared Error): 32.75
R² Score: -0.0043 (very low, indicating poor predictive power)
* Sea Level Rise Prediction:
MSE: 1.27
R² Score: -0.0254 (also poor predictive power)

### Improved Linear Regression Analysis incorporting additional variables
The low R² values suggest that population alone is not a strong predictor of CO₂ emissions or sea level rise.
Other factors like industrialization, energy sources, and geographical factors likely have a stronger impact.
We could improve the model by incorporating additional variables such as renewable energy use, forest area, and extreme weather events.

In [6]:
# import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
file_path = "temperature_cleaned.csv"
df = pd.read_csv(file_path)

# Predict CO2 Emissions based on multiple features
def predict_co2_emissions(data):
    X = data[['Population', 'Avg_Temperature_degC', 'Renewable_Energy_pct', 'Extreme_Weather_Events', 'Forest_Area_pct']]
    y = data['CO2_Emissions_tons_per_capita']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("CO2 Emissions Prediction:")
    print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
    print("R2 Score:", r2_score(y_test, y_pred))

# Predict Sea Level Rise based on multiple features
def predict_sea_level_rise(data):
    X = data[['Population', 'CO2_Emissions_tons_per_capita', 'Avg_Temperature_degC', 'Renewable_Energy_pct', 'Extreme_Weather_Events', 'Forest_Area_pct']]
    y = data['Sea_Level_Rise_mm']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("\nSea Level Rise Prediction:")
    print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
    print("R2 Score:", r2_score(y_test, y_pred))

# Run predictions
predict_co2_emissions(df)
predict_sea_level_rise(df)

CO2 Emissions Prediction:
Mean Squared Error: 32.70480410002796
R2 Score: -0.00288940285383954

Sea Level Rise Prediction:
Mean Squared Error: 1.265130061284075
R2 Score: -0.02378523378773134


#### Key Insights for Improved Regression Analysis Model

* Enhanced Predictive Power -
By incorporating additional features like average temperature, renewable energy percentage, extreme weather events, and forest area percentage, the model can capture more nuanced relationships, leading to better predictions.

* CO₂ Emissions Prediction -
The model now uses multiple relevant factors beyond population, making it more reflective of real-world CO₂ emission patterns. Industrial and environmental factors can significantly improve the accuracy.

* Sea Level Rise Prediction -
Including CO₂ emissions as a predictor alongside other environmental factors provides a more robust prediction. This reflects the interconnected relationship between emissions, temperature rise, and sea level changes.

* Model Evaluation -
With the use of Mean Squared Error (MSE) and R² Score, you can evaluate how well the model is performing. A lower MSE and higher R² indicate a more accurate model.