# **Regression Model Metrics Overview**

## 1. **Mean Absolute Error** (MAE)
**Definition:**
The average magnitude of errors in predictions.
Provides a direct interpretation in the same units as the target variable.

$$ MAE = \frac{1}{n} \sum_{i=1}^{n} | \hat{y}_i - y_i | $$

**When to Use:**
Use MAE when all errors are equally important (i.e., no penalization for large errors).

**O'zbekcha:**
 **O'rtacha Mutlaq Xato (MAE):**
Bashoratlar xatolarining o'rtacha miqdori.
 Maqsadli o'zgaruvchining bir xil birliklarida bevosita talqin qilish imkonini beradi.
 
 $$ MAE = \frac{1}{n} \sum_{i=1}^{n} | \hat{y}_i - y_i | $$

**Qachon qo'llaniladi:**
 MAE barcha xatolar birdek muhim bo'lgan hollarda ishlatiladi (masalan, katta xatolar uchun jarima yo'q).

# 2. **Mean Squared Error** (MSE)
**Definition:**
The average of the squared differences between predicted and actual values.
Penalizes large errors more than small ones.

$$ MSE = \frac{1}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i )^2 $$

**When to Use:**
Use MSE when large errors should be penalized more heavily.
It emphasizes outliers.

**O'zbekcha:**
**O'rtacha Kvadrat Xato (MSE):**
Bashorat va haqiqiy qiymatlar o'rtasidagi kvadrat farqlarning o'rtachasi.
Katta xatolar kichik xatolarga nisbatan ko'proq jarimaga ega bo'ladi.

$$ MSE = \frac{1}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i )^2 $$

**Qachon qo'llaniladi:**
MSE katta xatolarni kuchliroq jazolash kerak bo'lganda ishlatiladi.
Bu chet qiymatlarga urg'u beradi.

3. **R-Squared (R^2) or Coefficient of Determination**
Definition:
How well the model explains the variability of the target variable.
Values range from 0 to 1 (closer to 1 is better).

$$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} $$

**Where:**
 - $$ SS_{res} = \sum_{i=1}^{n} ( \hat{y}_i - y_i )^2 $$ (Residual Sum of Squares)
 - $$ SS_{tot} = \sum_{i=1}^{n} ( y_i - \bar{y} )^2 $$ (Total Sum of Squares)

**When to Use:**
Use R^2 to measure how much of the variance in the data is captured by the model.
 A negative R^2 indicates the model performs worse than simply predicting the mean.

**O'zbekcha:**
**R-Kvadrat (R^2) yoki Aniqlash Koeffitsienti:**
Model maqsadli o'zgaruvchining o'zgaruvchanligini qanchalik yaxshi tushuntirayotganini o'lchaydi.
Qiymatlar 0 dan 1 gacha (1 ga yaqin bo'lish yaxshiroq).

$$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} $$

**Bu yerda:**
- $$ SS_{res} = \sum_{i=1}^{n} ( \hat{y}_i - y_i )^2 $$ (Qoldiq Kvadratlar Yig'indisi)
- $$ SS_{tot} = \sum_{i=1}^{n} ( y_i - \bar{y} )^2 $$ (Umumiy Kvadratlar Yig'indisi)

**Qachon qo'llaniladi:**
R^2 ma'lumotlardagi o'zgaruvchanlikning qancha qismi model tomonidan tushuntirilganligini o'lchash uchun ishlatiladi.
Salbiy R^2 model faqat o'rtacha qiymatni bashorat qilishdan yomon ishlashini anglatadi.

In [1]:
import pandas as pd

In [10]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

cafilornia_data = fetch_california_housing()

df = pd.DataFrame(data=cafilornia_data.data, columns=cafilornia_data.feature_names)
df["Price"] = cafilornia_data.target


X = df.drop("Price", axis=1)
y = df["Price"]

X_train, X_test, y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=42)


model = RandomForestRegressor()

model.fit(X_train, y_train)


In [11]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error


y_pred = model.predict(X_test)

r_score = r2_score(y_test, y_pred)
msa =mean_squared_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
    
print("Random Forest Regression Metrics")
print("R2 Score: ", r_score)
print("MSA: ", msa)
print("MSE: ", mse)

Random Forest Regression Metrics
R2 Score:  0.8078519933582717
MSA:  0.2517924730544127
MSE:  0.2517924730544127


In [9]:
df["Price"].var(), df["Price"].std()

(1.3316148163035277, 1.1539561587441387)

# 4. What Is a **"Good"** MSE/RMSE Score?
General Rules of Thumb:
- **Less than variance:**
$$ MSE < \text{Variance of } y $$
$$ RMSE < \text{Standard Deviation of } y $$
- This suggests that your model is performing better than random guessing or always predicting the mean.
- **Near zero:**
- Lower values of MSE or RMSE indicate better model performance.
- An ideal score is close to 0, but that depends on the problem.
- **Relative to the Scale of the Target Variable:**
- Evaluate the magnitude of RMSE in relation to the average value of the target variable.
- For example, if your target variable has values around 1000, an RMSE of 50 might be good.
- However, if the target variable values are between 0 and 1, the same RMSE might indicate poor performance.

### **O'zbekcha:**
**"Yaxshi" MSE/RMSE Bahosi Nima?**
Umumiy Qoidalar:
- **Dispersiyadan kichik:**
$$ MSE < \text{y ning Dispersiyasi} $$
$$ RMSE < \text{y ning Standart Og'ishi} $$
- Bu sizning modelingiz tasodifiy taxmin yoki o'rtacha qiymatni bashorat qilishdan yaxshiroq ishlayotganini ko'rsatadi.
- **Nolga yaqin:**
- MSE yoki RMSE ning past qiymatlari modelning yaxshiroq ishlashini bildiradi.
- Ideal baho 0 ga yaqin bo'lishi kerak, ammo bu masalaga bog'liq.
- **Maqsadli O'zgaruvchi Miqyosiga Nisbatan:**
- RMSE miqdorini maqsadli o'zgaruvchi o'rtacha qiymatiga nisbatan baholang.
- Masalan, agar maqsadli o'zgaruvchi qiymatlari taxminan 1000 bo'lsa, RMSE 50 yaxshi bo'lishi mumkin.
- Ammo, agar maqsadli o'zgaruvchi qiymatlari 0 va 1 orasida bo'lsa, xuddi shu RMSE yomon natijani ko'rsatishi mumkin.