<p style="text-align:center">
    <a href="https://skills.network/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML0101ENSkillsNetwork20718538-2022-01-01">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>

<h1 align="center"><font size="5">Supervised Machine Learning: Regression - Final Assignment</font></h1>


## Instructions:

In this Assignment, you will demonstrate the data regression skills you have learned by completing this course. You are expected to leverage a wide variety of tools, but also this report should focus on present findings, insights, and next steps. You may include some visuals from your code output, but this report is intended as a summary of your findings, not as a code review. 

The grading will center around 5 main points:

1. Does the report include a section describing the data?
2. Does the report include a paragraph detailing the main objective(s) of this analysis? 
3. Does the report include a section with variations of linear regression models and specifies which one is the model that best suits the main objective(s) of this analysis.
4. Does the report include a clear and well-presented section with key findings related to the main objective(s) of the analysis?
5. Does the report highlight possible flaws in the model and a plan of action to revisit this analysis with additional data or different predictive modeling techniques? 




## Import the required libraries


This analysis performs a linear regression on the World Happiness Report dataset. To get started, import the necessary libraries for data manipulation and analysis, including NumPy, Pandas, Matplotlib, and Scikit-Learn.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score


pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

## Importing the Dataset


Load the World Happiness Report dataset. 

This dataset contains information about the happiness levels of various countries, along with factors that influence happiness, such as GDP per capita, social support, and life expectancy. We'll use this data to perform a linear regression analysis.

In [None]:
data = pd.read_csv('datasets/world-happiness-report.csv')
data.head()

# 1. About the Data


The World Happiness Report dataset contains the following attributes:
- **Country name:** Name of the country
- **Year:** Year of the happiness measurement
- **Life Ladder:** A score indicating the happiness level of the country
- **Log GDP per capita:** Logarithm of GDP per capita
- **Social support:** The perceived level of social support in the country
- **Healthy life expectancy at birth:** The average life expectancy in the country
- **Freedom to make life choices:** The degree of freedom to make life choices
- **Generosity:** The generosity of the population
- **Perceptions of corruption:** The perceived level of corruption in the country
- **Positive affect:** A measure of positive emotions
- **Negative affect:** A measure of negative emotions

In [None]:
data.shape

In [None]:
data.dtypes

In [None]:
# missing data?
data.isnull().sum()

In [None]:
# Handle missing data
data.dropna(subset=['Log GDP per capita', 'Social support', 'Healthy life expectancy at birth',
                   'Freedom to make life choices', 'Generosity', 'Perceptions of corruption', 'Positive affect', 'Negative affect'], inplace=True)


data.isnull().sum()

# 2. Objectives


The main objective of this analysis is to build a linear regression model to predict the Happiness Score based on the other attributes in the dataset. 

We will focus on both prediction and interpretation of the model's performance.

# 3. Linear Regression Models


We will explore three linear regression models using the specified columns:

1. Simple Linear Regression (baseline): Predicting Life Ladder using a single feature (e.g., Log GDP per capita).
2. Polynomial Regression: Extending the model with polynomial features to capture nonlinear relationships.
3. Regularized Regression (e.g., Ridge or Lasso): Adding regularization to mitigate overfitting.


#### Simple Linear Regression

In [None]:
X = data[['Log GDP per capita']]
y = data['Life Ladder'] #happiness score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

simple_model = LinearRegression()
simple_model.fit(X_train, y_train)

In [None]:
# Evaluate the Simple Linear Regression model
y_pred_simple = simple_model.predict(X_test)
mse_simple = mean_squared_error(y_test, y_pred_simple)
mae_simple = mean_absolute_error(y_test, y_pred_simple)
r2_simple = r2_score(y_test, y_pred_simple)

#### Polynomial Regression

In [None]:
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y, test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)

In [None]:
# Evaluate the Polynomial Regression model
y_pred_poly = poly_model.predict(X_test_poly)
mse_poly = mean_squared_error(y_test, y_pred_poly)
mae_poly = mean_absolute_error(y_test, y_pred_poly)
r2_poly = r2_score(y_test, y_pred_poly)

#### Regularized Regression (Ridge)

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train_scaled, X_test_scaled, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train_scaled, y_train)

In [None]:
# Evaluate the Lasso Regression model
y_pred_ridge = ridge_model.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
mae_ridge = mean_absolute_error(y_test, y_pred_ridge)
r2_ridge = r2_score(y_test, y_pred_ridge)

# 4. Insights and key findings


In [None]:
print("Simple Linear Regression:")
print(f"MSE: {mse_simple:.4f}")
print(f"MAE: {mae_simple:.4f}")
print(f"R²: {r2_simple:.4f}")

print("\nPolynomial Regression:")
print(f"MSE: {mse_poly:.4f}")
print(f"MAE: {mae_poly:.4f}")
print(f"R²: {r2_poly:.4f}")

print("\nRidge Regression:")
print(f"MSE: {mse_ridge:.4f}")
print(f"MAE: {mae_ridge:.4f}")
print(f"R²: {r2_ridge:.4f}")

#### Model Reommendation
Based on the analysis, we recommend using the Polynomial Regression model as the final model. It not only provides accurate predictions but also captures potentially nonlinear relationships between Log GDP per capita and Life Ladder. However, we should keep in mind that other factors in the dataset also influence happiness.

In [None]:
# Calculate the correlation
correlation = data['Log GDP per capita'].corr(data['Life Ladder'])

# Create a scatter plot with a regression line
sns.regplot(x='Log GDP per capita', y='Life Ladder', data=data, ci=None)
plt.title(f'Log GDP per capita vs. Life Ladder (Correlation: {correlation:.2f})')
plt.xlabel('Log GDP per capita')
plt.ylabel('Life Ladder')
plt.show()

#### Key Findings
* Log GDP per capita has a significant positive correlation with Life Ladder.
* Nonlinear relationships between Log GDP per capita and Life Ladder can be captured using polynomial regression.
* Regularized regression is not necessary in this case as we have a single feature.

# 5. Next Steps


To further improve the analysis, we can consider the following:

- Explore additional features: Include more attributes from the dataset to build a more comprehensive model.
- Feature engineering: Create new features or interactions between features to better explain happiness levels.
- Model evaluation: Use cross-validation to ensure the model's robustness.
- External data: Consider incorporating external data like political stability or climate conditions that might affect happiness.

## <h3 align="center"> © IBM Corporation 2020. All rights reserved. <h3/>
