<a href="https://colab.research.google.com/github/Atanga-Theresa/Machine-Learning/blob/main/Exercise(Class).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Atanga Theresa Atampoka (WTF/2025/2963)

**Problem Statement**

This assignment utilizes the hiring.csv dataset, which contains  hiring statistics including candidate experience, written test scores, and personal interview scores. The objective is to develop a machine learning model that predicts candidate salaries based on these three factors. This model will assist the HR department in determining appropriate compensation for future hires. Specifically, the model should be capable of predicting salaries for candidates with the following profiles:
*   2 years of experience, a test score of 9, and an interview score of 6
*   12 years of experience, a test score of 10, and an interview score of 10



In [None]:
# importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
import warnings
warnings.filterwarnings("ignore")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
Hiring = pd.read_csv("/content/drive/MyDrive/hiring.csv")
# Rename columns for easier access and to remove special characters/spaces
Hiring.rename(columns={
    'test_score(out of 10)': 'test_score',
    'interview_score(out of 10)': 'interview_score',
    'salary($)': 'salary'
}, inplace=True)
display(Hiring)

Unnamed: 0,experience,test_score,interview_score,salary
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


Data Preprocessing

In [None]:
# Converting experience column into actual numeric values
%pip install word2number
from word2number import w2n

Collecting word2number
  Downloading word2number-1.1.zip (9.7 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: word2number
  Building wheel for word2number (setup.py) ... [?25l[?25hdone
  Created wheel for word2number: filename=word2number-1.1-py3-none-any.whl size=5568 sha256=1f3209da080d9e1fe01a35271de199e30d9b4a883ba202f14256de89358e0cbc
  Stored in directory: /root/.cache/pip/wheels/5b/79/fb/d25928e599c7e11fe4e00d32048cd74933f34a74c633d2aea6
Successfully built word2number
Installing collected packages: word2number
Successfully installed word2number-1.1


In [None]:
# 1. Fill missing 'experience' values with the string 'zero' and convert to string type.
# This allows 'word2number' to process both numerical words and the placeholder 'zero'.
Hiring['experience'] = Hiring['experience'].fillna('zero').astype(str)

# 2. Apply the word-to-number conversion to the 'experience' column.
Hiring['experience'] = Hiring['experience'].apply(w2n.word_to_num)
display(Hiring)

Unnamed: 0,experience,test_score,interview_score,salary
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,,7,72000
7,11,7.0,8,80000


In [None]:
# Calculate the median of the 'experience' column and floor it to get a whole number for years.
median_experience = np.floor(Hiring['experience'].median())
# Replace the '0' values (which resulted from converting 'zero' to a number) with the calculated median.
Hiring['experience'] = Hiring['experience'].replace(0, median_experience)
display(Hiring)

Unnamed: 0,experience,test_score,interview_score,salary
0,4,8.0,9,50000
1,4,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,,7,72000
7,11,7.0,8,80000


In [None]:
# Calculate the median of the 'test_score' column and floor it.
median_test_score = np.floor(Hiring['test_score'].median())
# Fill NaN values in 'test_score' with the calculated median and converting the column to integer type.
Hiring['test_score'] = Hiring['test_score'].fillna(median_test_score).astype(int)
display(Hiring)

Unnamed: 0,experience,test_score,interview_score,salary
0,4,8,9,50000
1,4,8,6,45000
2,5,6,7,60000
3,2,10,10,65000
4,7,9,6,70000
5,3,7,10,62000
6,10,8,7,72000
7,11,7,8,80000


In [None]:
# Splitting data frame into dependent and independent variables


In [None]:
# Splitting data frame into dependent (salary) and independent variables (experience, test_score, interview_score)
X = Hiring.drop("salary", axis="columns")
y = Hiring["salary"]

Building and Testing model

In [None]:
#Building model
reg = linear_model.LinearRegression()
reg.fit(X, y)

In [None]:
reg.coef_

array([3390.87422644, 1803.45158198, 3184.69450013])

In [None]:
reg.intercept_

np.float64(4220.822801359711)

Predicting salaries for  candidates

In [None]:
# Predicting the salary for a candidate with 2 years of experience, a test score of 9, and an interview score of 6.
reg.predict([[2,9,6]])

array([46341.80249281])

In [None]:
# Predicting the salary for a candidate with 12 years of experience, a test score of 10, and an interview score of 10.
reg.predict([[12,10,10]])

array([94792.77433975])