# **Simple Linear Regression: Salary Prediction Based on Experience**

## **1. Introduction**
In this notebook, we will implement a **Simple Linear Regression** model to predict an employee's salary based on their years of experience. Simple Linear Regression is a fundamental machine learning technique used to model relationships between two continuous variables.

We will take note of the following to achieve the simple linear regression:
1. **Data Preprocessing**
2. **Splitting Data into Training and Test Sets**
3. **Training the Linear Regression Model**
4. **Making Predictions**
5. **Visualizing the Results**
6. **Evaluating Model Performance**

### **Step 1: Understanding Simple Linear Regression**
Simple Linear Regression is used to establish a relationship between an independent variable (Years of Experience) and a dependent variable (Salary) using a straight-line equation.

### **Step 2: Importing the Necessary Libraries**
We need libraries for numerical computations, data visualization, and dataset handling.

In [None]:
import numpy as np  # For numerical computations
import matplotlib.pyplot as plt  # For data visualization
import pandas as pd  # For handling datasets

### **Step 3: Load the Dataset**
We use pandas to load the dataset into a DataFrame.

In [None]:
dataset = pd.read_csv('Salary_Data.csv')

# splitting the data into the independent and dependent set
X = dataset.iloc[:, :-1].values  # Independent variable (Years of Experience)
y = dataset.iloc[:, -1].values   # Dependent variable (Salary)

### **Step 4: Splitting the Dataset into Training and Test Sets**
Splitting ensures we train our model on one part of the data and test it on another part.

In [None]:
from sklearn.model_selection import train_test_split  # Import when needed
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0, stratify=y)

### **Step 5: Feature Scaling (Optional)**
Feature scaling helps normalize data for better model performance.
- It is not necessary for simple linear regression, but we include it for teaching purposes.

In [None]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

In [None]:
# Scaling is usually not applied to the dependent variable in regression problems.
# However, for demonstration, we include it but keep it commented.
# sc_y = StandardScaler()
# y_train = sc_y.fit_transform(y_train.reshape(-1,1))

### **Step 6: Training the Simple Linear Regression Model**
We use the LinearRegression class from sklearn to train the model.

In [None]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

### **Step 7: Making Predictions**
The trained model is used to predict salaries for the test set.

In [None]:
y_pred = regressor.predict(X_test)

### Step 8: Visualizing the Training Set Results
A scatter plot helps us understand the actual data points, and the regression line shows the model’s prediction.


In [None]:
plt.scatter(X_train, y_train, color='red')  # Actual salaries
plt.plot(X_train, regressor.predict(X_train), color='blue')  # Regression line
plt.title('Salary vs Experience (Training Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

### **Step 9: Visualizing the Test Set Results**
The same approach is used to compare actual vs predicted salaries in the test set.


In [None]:
plt.scatter(X_test, y_test, color='red')  # Actual salaries
plt.plot(X_train, regressor.predict(X_train), color='blue')  # Regression line
plt.title('Salary vs Experience (Test Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

### **Step 10: Evaluating Model Performance**
We use three metrics to assess how well our model performs.


In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score  # Import when needed
mae = mean_absolute_error(y_test, y_pred)  # Measures absolute difference between actual and predicted values
mse = mean_squared_error(y_test, y_pred)  # Measures the squared difference (penalizes large errors more)
r2 = r2_score(y_test, y_pred)  # Measures how well the model explains variance in the data


In [None]:
# Printing performance metrics
print(f"Mean Absolute Error (MAE): {mae}")
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared Score (R²): {r2}")

### **Summary**
- We built a Simple Linear Regression model to predict salary based on experience.
- We trained the model, made predictions, visualized results, and evaluated its performance.