# **Simple Linear Regession Project**


# **In this project we demonstrate a simple linear regression model, and we use a salary data set where we predict the salary of employee based on the years of experiences.**

In [32]:
#import library
import pandas as pd
import numpy as np


In [2]:
#load the data set
salary=pd.read_csv("https://github.com/ybifoundation/Dataset/raw/main/Salary%20Data.csv")

In [4]:
#show the data
salary

Unnamed: 0,Experience Years,Salary
0,1.1,39343
1,1.2,42774
2,1.3,46205
3,1.5,37731
4,2.0,43525
5,2.2,39891
6,2.5,48266
7,2.9,56642
8,3.0,60150
9,3.2,54445


**Data preprocessing**

In [5]:
#check for missing value
salary.isna().sum()

Unnamed: 0,0
Experience Years,0
Salary,0


**There is no missing value, we are good to go.**

In [18]:
salary.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Experience Years  40 non-null     float64
 1   Salary            40 non-null     int64  
dtypes: float64(1), int64(1)
memory usage: 772.0 bytes


In [6]:
#define target (y) and features (X)
salary.columns

Index(['Experience Years', 'Salary'], dtype='object')

In [22]:
#we will predict the salary(y) based on Experience Years(x)
y=salary["Salary"]
x=salary[["Experience Years"]] # Reshape x to be a 2D array

In [23]:
#split the data set into training and tesing set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, train_size=0.7, random_state=2529)


In [24]:
# check shape of train and test sample
x_train.shape, x_test.shape, y_train.shape, y_test.shape


((28, 1), (12, 1), (28,), (12,))

In [10]:
#now select the model, we use LinearRegression model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

**Train or Fit our model to perform well**

In [25]:
model.fit(x_train,y_train)

In [27]:
#print the value of intercept
print(model.intercept_)

26596.961311068262


In [29]:
#print the value of co efficient
model.coef_

array([9405.61663234])

In [30]:
#predict model
y_pred = model.predict(x_test)

In [31]:
#print the predictions
y_pred

array([ 90555.15441095,  59516.61952424, 106544.70268592,  64219.42784041,
        68922.23615658, 123474.81262412,  84911.78443155,  63278.86617718,
        65159.98950364,  61397.74285071,  37883.70126987,  50111.00289191])

In [40]:
# Step 8 : model accuracy
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, mean_squared_error,r2_score

print('MAE : ', mean_absolute_error(y_test, y_pred))
print('MAPE : ', mean_absolute_percentage_error(y_test, y_pred))
print('MSE : ', mean_squared_error(y_test, y_pred))
print('RMSE : ', np.sqrt(mean_squared_error(y_test, y_pred)))
print('R2 : ', r2_score(y_test, y_pred))


MAE :  4005.9263101681768
MAPE :  0.06384602996141632
MSE :  24141421.671440993
RMSE :  4913.3920738570205
R2 :  0.960233432146844


# **Now we input the years of experience and our model will predict the salary based on it.**

In [35]:
years = float(input("Enter years of experience: "))


# Convert to 2D array because model expects 2D input
years_array = np.array([[years]])

# Predict salary
predicted_salary = model.predict(years_array)

print(f"Predicted Salary for {years} years of experience: {predicted_salary[0]:.2f}")

Enter years of experience: 5
Predicted Salary for 5.0 years of experience: 73625.04




**we make a function that take the years and predict the salary. so that we can call it whenever we need it.**

In [37]:
def predSalary(years):
  years_array = np.array([[years]])
  predicted_salary = model.predict(years_array)
  print(f"Predicted Salary for {years} years of experience: {predicted_salary[0]:.2f}")

In [39]:
#now predict the salary
print(predSalary(2.2))

Predicted Salary for 2.2 years of experience: 47289.32
None




#  **Project Summary: Simple Linear Regression on Salary Data**

In this project, we developed a Simple Linear Regression model to predict an employe's salary based on their years of experience.
We used a publicly available Salary Data from YBI github page, dataset containing two columns:


*   Experience Years (independent variable / feature)
*   Salary (dependent variable / target)



**Model Performance**


1.   The model explains 96.02% of the variance in salary based on years of experience.
2.   On average, predictions are within ₹4,006 of actual salaries, with only 6.38% percentage error.

3. This shows a strong linear relationship between experience and salary in the dataset.





**Conclusion**

This project demonstrates how to implement, train, and evaluate a simple linear regression model in Python using scikit‑learn.
The model is effective, easy to use, and provides clear insights into the relationship between experience and salary.

