# Predicting Housing Prices with Linear Regression
**Author:** Magudeshwaran and Senthilkumaran

**Goal:** Predict house prices using the number of rooms.

### Step 1: Import Libraries
We will import the tools we need.
- `pandas`: To load and work with data.
- `matplotlib`: To make plots.
- `sklearn`: To build our machine learning model.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

### Step 2: Load the Data
Here, we load the Boston Housing dataset from a URL and look at the first 5 rows.

In [None]:
url = 'https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv'
housing_df = pd.read_csv(url)
housing_df.head()

### Step 3: Select Data for the Model
We need to choose our feature and target.
- **Feature (X):** `rm` (Number of Rooms)
- **Target (y):** `medv` (House Price)

In [None]:
X = housing_df[['rm']].values
y = housing_df['medv'].values

### Step 4: Split the Data
We split our data into two parts:
1.  **Training data:** To teach the model.
2.  **Testing data:** To check how well the model works on new data.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Step 5: Train the Model
Now we create our `LinearRegression` model and train it with our training data.

In [None]:
regressor = LinearRegression()
regressor.fit(X_train, y_train)

### Step 6: Plot the Training Data
Let's see how our model did on the training data.
- **Blue dots** are the actual data.
- **Red line** is the model's prediction.

In [None]:
plt.scatter(X_train, y_train, color='blue', label='Actual Data')
plt.plot(X_train, regressor.predict(X_train), color='red', linewidth=2, label='Regression Line')
plt.title('Price vs. Rooms (Training Data)')
plt.xlabel('Number of Rooms')
plt.ylabel('Price')
plt.legend()
plt.show()

### Step 7: Plot the Test Data
Now we check the model on the test data.
- **Green dots** are the new, unseen data.
- **Red line** is our model's prediction.

In [None]:
plt.scatter(X_test, y_test, color='green', label='Actual Data')
plt.plot(X_train, regressor.predict(X_train), color='red', linewidth=2, label='Regression Line')
plt.title('Price vs. Rooms (Test Data)')
plt.xlabel('Number of Rooms')
plt.ylabel('Price')
plt.legend()
plt.show()