<a href="https://colab.research.google.com/github/Lucid-Lifo/Data-Analysis-Using-Python/blob/main/Copy_of_Linear_Regression_SLR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Simple Linear Regression


* Simple Linear Regression aims to describe how one variable i.e the dependent variable changes in relation with reference to the independent variable.
* For example consider a scenario where a company wants to predict sales based on advertising expenditure.
* By using simple linear regression the company can determine if an increase in advertising leads to higher sales or not.
* The relationship between the dependent and independent variables is represented by the simple linear equation:
            `**y=mx+b**`

* Here:
  * y is the predicted value (dependent variable).
  * m is the slope of the line
  * x is the independent variable.
  * b is the y-intercept (the value of y when x is 0).

* In this equation m signifies the slope of the line indicating how much y changes for a one-unit increase in x, a positive m suggests a direct relationship while a negative m indicates an inverse relationship.

## Implementation

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
column_names = [
    'CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS',
    'RAD','TAX','PTRATIO','B','LSTAT','MEDV'
]

# Reload with column names
df = pd.read_csv("/content/housing.csv", delim_whitespace=True, names=column_names)

# Separate features and target
X = df.drop("MEDV", axis=1)
df['PRICE'] = df["MEDV"]

print(df.head())

In [None]:
print(df.isnull().sum())

In [None]:
# Features and target
X = df[['RM']]
y = df['PRICE']

# Train/test split (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Testing set size: {X_test.shape[0]}")

In [None]:
# Create a Linear Regression model
model = LinearRegression()

# Train the model on the training data
model.fit(X_train, y_train)

# Print the intercept and coefficient
print(f"Intercept: {model.intercept_}")
print(f"Coefficient: {model.coef_}")

In [None]:
# Predict house prices for the test set
y_pred = model.predict(X_test)

# Display the first few predictions alongside the actual values
predictions = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print(predictions.head())

In [None]:
# Plot the actual data points
plt.scatter(X_test, y_test, color='blue', label='Actual')

# Plot the regression line
plt.plot(X_test, y_pred, color='red', label='Regression Line')

# Add labels and title
plt.xlabel('Number of Rooms (RM)')
plt.ylabel('House Price ($1000s)')
plt.title('Simple Linear Regression: Number of Rooms vs. House Price')
plt.legend()
plt.show()

In [None]:
# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Calculate R-squared score
r2 = r2_score(y_test, y_pred)
print(f"R-squared score: {r2}")