# Regularization
More bias toward true values to gain less variance across all values — a.k.a. **less overfitting** and **better generalization**.

- **Lasso**: A mathematical way to reduce the impact of less important features by shrinking their coefficients to **0**.
- **Ridge**: Applies a high penalty to features so they **don’t become 0**, but have **low impact** on prediction.
- **ElasticNet**: Combines both Lasso and Ridge regularization.

**Advice**:  
- If you have **many important features**, use **Ridge**.  
- Otherwise, use **Lasso**.


### Explanation of the dataset features

- **gdp**: Gross Domestic Product per capita. Represents the average economic wealth of a country's inhabitants.
- **family**: Social support or the ability to rely on family or friends in times of need.
- **lifeexp**: Life expectancy at birth. Indicates the average health and longevity of the population.tion.
- **freedom**: Freedom to make life choices.
- **corruptions**: Perception of corruption in government and institutions.
- **distopia**: Reference value representing the worst possible scenario for each of the indicators.

In [12]:
import pandas as pd
import sklearn
import matplotlib.pyplot as plt  # Fixed: 'matplotlib as plt' → 'matplotlib.pyplot as plt'

from sklearn.decomposition import PCA, IncrementalPCA
from sklearn.preprocessing import StandardScaler  # Fixed: 'StandarScaler' → 'StandardScaler'
from sklearn.linear_model import LinearRegression,Lasso,Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

path="/home/dan/PLATZI/data/SKlearn_Orchestration/skLearn_Orchestration/data/third_part/"
file="felicidad.csv"
df=pd.read_csv(path+file)
features_df=df.drop(["country","rank", "high","low","score"], axis=1)
target=df["score"]
features_df.shape
features_df.sample(4)

Unnamed: 0,gdp,family,lifexp,freedom,generosity,corruption,dystopia
27,1.21756,1.412228,0.719217,0.579392,0.175097,0.178062,2.17241
141,1.122094,1.221555,0.341756,0.505196,0.099348,0.098583,0.377914
80,0.995539,1.274445,0.492346,0.443323,0.611705,0.015317,1.429477
123,0.808964,0.832044,0.289957,0.435026,0.120852,0.079618,1.724136


In [13]:
#normalization
#split
X_train,X_test,Y_train,Y_test=train_test_split(features_df,target,test_size=0.25,random_state=37)
model =LinearRegression().fit(X_train,Y_train)
lineal_pred=model.predict(X_test)
Lasso =Lasso(alpha=0.1).fit(X_train,Y_train)
lasso_pred=Lasso.predict(X_test)
ridge=Ridge(alpha=0.1).fit(X_train,Y_train)
ridge_pred=ridge.predict(X_test)


In [14]:
linear_loss=mean_squared_error(Y_test,lineal_pred)
lasso_loss=mean_squared_error(Y_test,lasso_pred)
ridge_loss=mean_squared_error(Y_test,ridge_pred)    
print("Linear Regression Loss:", linear_loss)
print("Lasso Regression Loss:", lasso_loss)
print("Ridge Regression Loss:", ridge_loss)

Linear Regression Loss: 9.581675773302601e-08
Lasso Regression Loss: 0.44512158491881276
Ridge Regression Loss: 0.00010388580545959077


In [16]:
print ("Linear Regression Coefficients:", model.coef_)
print ("Lasso Coefficients:", Lasso.coef_)
print ("Ridge Coefficients:", ridge.coef_)

Linear Regression Coefficients: [1.00023339 0.99979398 0.99978691 1.00002994 1.0001342  0.9999504
 0.99992823]
Lasso Coefficients: [1.4639291  0.         0.         0.         0.         0.
 0.54695934]
Ridge Coefficients: [1.01402108 0.99224837 0.97737508 0.99999083 0.9688058  0.93750682
 0.99338599]
