### Linear Regression in Machine Learning :
Linear regression is a supervised learning algorithm used to predict a continuous target variable based on one or more independent variables. It assumes a linear relationship between the input features (independent variables) and the output (dependent variable).

### Steps in Linear Regression


## example 1 : Simple Linear Regression

In [7]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

In [51]:
data = {'Square_Feet': [1500, 1800, 2400, 3000, 3500],
        'Price': [300000, 360000, 500000, 600000, 650000]}
df=pd.DataFrame(data)
df

Unnamed: 0,Square_Feet,Price
0,1500,300000
1,1800,360000
2,2400,500000
3,3000,600000
4,3500,650000


In [55]:

# Features and Target
X = df[['Square_Feet']]  # Independent variable
y = df['Price']          # Dependent variable

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
print("Mean Squared Error:", mean_squared_error(y_test, predictions))
print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)


Mean Squared Error: 103178009.09017195
Model Coefficients: [177.92792793]
Model Intercept: 49887.387387387455


## example 2: Multiple Linear Regression

In [65]:
# Sample data
data = {'Engine_Size': [1.8, 2.0, 2.4, 3.0, 3.6],
        'Horsepower': [140, 160, 200, 240, 300],
        'Weight': [2800, 3000, 3200, 3600, 4000],
        'Price': [22000, 25000, 28000, 32000, 36000]}
df=pd.DataFrame(data)
df

Unnamed: 0,Engine_Size,Horsepower,Weight,Price
0,1.8,140,2800,22000
1,2.0,160,3000,25000
2,2.4,200,3200,28000
3,3.0,240,3600,32000
4,3.6,300,4000,36000


In [75]:
# features and targets
X=df.drop('Price',axis=1)
y=df['Price']


In [77]:
X,y

(   Engine_Size  Horsepower  Weight
 0          1.8         140    2800
 1          2.0         160    3000
 2          2.4         200    3200
 3          3.0         240    3600
 4          3.6         300    4000,
 0    22000
 1    25000
 2    28000
 3    32000
 4    36000
 Name: Price, dtype: int64)

In [94]:

# Features and targets
X = df.drop('Price', axis=1)
y = df['Price']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = LinearRegression()
model.fit(X_train, y_train)

#prediction
new_data=[[2.5, 180, 3100]]
predicted=model.predict(new_data)
print(predicted)


[26000.001875]




In [92]:
# Model evaluation
print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)


Model Coefficients: [7.49998313e-03 5.00000000e+01 4.99998875e+00]
Model Intercept: 1500.017999959604


## Example 3: 

### 1. import library and load data

In [105]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

# Load California Housing dataset
california = fetch_california_housing(as_frame=True)
df = california.frame

# Display dataset information
df.head()


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [108]:
df.columns

Index(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup',
       'Latitude', 'Longitude', 'MedHouseVal'],
      dtype='object')

### 2. feature selection and splitting data

In [111]:
# Select features and target
X = df.drop(['AveOccup','Latitude','Longitude'],axis=1)  # Features
y = df['MedHouseVal']  # Target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### 3. Feature Scaling

In [114]:
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


### 4. train the model

In [117]:
# Train the model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Display coefficients and intercept
print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)

Model Coefficients: [-6.01673052e-16 -1.32611373e-16  4.14125187e-16 -6.43734737e-16
 -1.94568825e-17  1.15619125e+00]
Model Intercept: 2.071946937378876


### 5.model evalution

In [120]:
# Predict on the test set
y_pred = model.predict(X_test_scaled)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R² Score: {r2:.2f}")


Mean Squared Error: 0.00
R² Score: 1.00


### 6. Feature Importance

In [125]:
# Create a DataFrame to display feature importance
feature_importance = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': model.coef_
}).sort_values(by='Coefficient', ascending=False)

feature_importance


Unnamed: 0,Feature,Coefficient
5,MedHouseVal,1.156191
2,AveRooms,4.141252e-16
4,Population,-1.945688e-17
1,HouseAge,-1.326114e-16
0,MedInc,-6.016731e-16
3,AveBedrms,-6.437347e-16
