# Multiple Linear Regression

Multiple linear regression is an extension of simple linear regression. Instead of using a single independent variable to predict a dependent variable, we use **multiple** independent variables. The goal is to find the best-fitting plane (or hyperplane in higher dimensions) that describes the relationship.

The formula for multiple linear regression is:

$$y = m_1x_1 + m_2x_2 + ... + m_nx_n + b$$

Where:
* $y$ is the dependent variable (what we want to predict, e.g., **price**).
* $x_1, x_2, ..., x_n$ are the independent variables (our features, e.g., **area** and **bedrooms**).
* $m_1, m_2, ..., m_n$ are the **coefficients** for each independent variable. Each coefficient represents the change in $y$ for a one-unit change in its corresponding $x$, holding all other variables constant.
* $b$ is the **intercept**, the value of $y$ when all independent variables are zero.

---

## 1. Load and Inspect the Data

First, we import the `pandas` library and load our dataset. This time, our dataset includes not just the area but also the number of bedrooms. We use `df.sample(5)` to view a random selection of 5 rows, which gives us a quick look at the structure and values in our data.

In [2]:
import pandas as pd

df = pd.read_csv('home_prices.csv')
df.sample(5)

Unnamed: 0,area_sqr_ft,price_lakhs,bedrooms
5,1325,80.1,2
3,1259,59.0,2
2,1057,86.6,3
6,1085,116.0,3
11,700,49.0,2


## 2. Train the Multiple Linear Regression Model

Next, we create and train our `LinearRegression` model. The process is very similar to simple linear regression, but with one key difference:
* For the independent variables (X), we now pass a DataFrame containing **all** the feature columns we want to use: `['area_sqr_ft', 'bedrooms']`.
* The dependent variable (y) remains the single column we are trying to predict: `'price_lakhs'`.

The `fit` method will now calculate the best coefficients ($m_1$, $m_2$) and the intercept ($b$) for our multi-dimensional data.

In [3]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(df[['area_sqr_ft', 'bedrooms']], df['price_lakhs'])

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


## 3. Prepare Data for Prediction

To make a prediction, we must provide the new data in the same format as our training data—a DataFrame with columns for `'area_sqr_ft'` and `'bedrooms'`. Here, we create a test DataFrame for two houses:
1.  A 1500 sq ft house with 2 bedrooms.
2.  A 2000 sq ft house with 3 bedrooms.

In [4]:
test = pd.DataFrame({
    'area_sqr_ft': [1500, 2000],
    'bedrooms': [2, 3]
})

## 4. Make Predictions

Finally, we pass our `test` DataFrame to the `model.predict()` method. The model uses the learned coefficients and intercept to calculate the estimated price for each house.


In [5]:
model.predict(test)

array([ 75.78971293, 120.17765942])

The model predicts:
* A price of approximately **75.79 lakhs** for the 1500 sq ft, 2-bedroom house.
* A price of approximately **120.18 lakhs** for the 2000 sq ft, 3-bedroom house.