In [None]:
import pandas as pd
from sklearn.linear_model import LinearRegression
import pickle

**1.	import pandas as pd:** Import the pandas library and give it an alias 'pd'. This library is commonly used for data manipulation and analysis.

**2.	from sklearn.linear_model import LinearRegression:** Import the LinearRegression class from the scikit-learn (sklearn) library. Sklearn is a popular machine learning library, and LinearRegression is a class used to create a linear regression model, which is a basic type of supervised learning algorithm for regression tasks.

**3.	import pickle:** Import the pickle module, which is a standard Python library used for serializing and deserializing Python objects. In this code, it will be used to save the trained model to disk.


In [None]:
# dataset = pd.read_csv('https://raw.githubusercontent.com/9394113857/Data-Sets/raghu/hiring.csv')

In [None]:
dataset = pd.read_csv('https://raw.githubusercontent.com/9394113857/Predict-Salary-Analysis/raghu/hiring.csv')

**1.	pd:** It is the alias for the pandas library, which was imported earlier using the statement import pandas as pd.

**2.	read_csv:** This is a function provided by pandas to read data from a CSV (Comma Separated Values) file.

**3.	'https://raw.githubusercontent.com/9394113857/Predict-Salary-Analysis/raghu/hiring.csv':** This is the URL of the CSV file on GitHub. The data in this file is related to a "Salary Prediction" or "Hiring" analysis.

**4.	dataset = pd.read_csv(...):** This line of code calls the read_csv function with the provided URL as an argument and reads the data from the CSV file. The data is then stored in a pandas DataFrame named dataset.



In [None]:
dataset.head() # retrieve the first few rows (by default, the first 5 rows) of the DataFrame.

# dataset: Refers to the pandas DataFrame containing the data loaded from the CSV file.
# .head(): This is a method in pandas that is used to retrieve the first few rows (by default, the first 5 rows) of the DataFrame.

Unnamed: 0,experience,test_score,interview_score,salary
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000


In [None]:
# Fill missing values in 'experience' column with 0
dataset['experience'].fillna(0, inplace=True)


**1.	dataset['experience']:** This selects the 'experience' column from the DataFrame dataset.

**2.	fillna(0, inplace=True):** This method is called on the 'experience' column to fill the missing values with 0. The fillna method is used to replace any NaN (Not a Number) or missing values with the specified value, which in this case is 0.

**3.	inplace=True:** This argument ensures that the changes are made directly to the original DataFrame dataset, without the need to create a new DataFrame.

**Overview:**

After executing this code, any missing values in the 'experience' column will be replaced with 0 in the original dataset. This can be helpful to handle missing data before performing any further analysis or modeling, as many machine learning algorithms may not handle missing values properly.

In [None]:
# Fill missing values in 'test_score' column with the mean of the column
dataset['test_score'].fillna(dataset['test_score'].mean(), inplace=True)


**1.	dataset['test_score']:** This selects the 'test_score' column from the DataFrame dataset.

**2.	fillna(dataset['test_score'].mean(), inplace=True):** This method is called on the 'test_score' column to fill the missing values with the mean value of the column. The fillna method is used to replace any NaN (Not a Number) or missing values with the specified value, which in this case is the mean value of the 'test_score' column.

**3.	inplace=True:** This argument ensures that the changes are made directly to the original DataFrame dataset, without the need to create a new DataFrame.

**Overview:**

After executing this code, any missing values in the 'test_score' column will be replaced with the mean value of the 'test_score' column in the original dataset. This approach is one way to handle missing data in the 'test_score' column by imputing the mean value, which can help in maintaining the overall integrity of the data and avoiding any bias that could result from simply removing rows with missing values.



In [None]:
# Define a function to convert word values to integers
def convert_to_int(word):
    word_dict = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9,
                 'ten': 10, 'eleven': 11, 'twelve': 12, 'zero': 0, 0: 0}
    return word_dict[word]


**1.	def convert_to_int(word):** This line defines a new function named convert_to_int, which takes a single argument word.

**2.	word_dict:** This is a dictionary that maps word values to their corresponding integer representations. For example, the word 'one' maps to the integer 1, 'two' maps to 2, and so on. It also includes an entry for 'zero', which maps to 0.

**3.	return word_dict[word]:** Inside the function, it looks up the input word in the word_dict dictionary and returns the corresponding integer value. If the input word is not found in the dictionary, it will raise a KeyError.

**Overview:**

The purpose of this function is to provide a mapping between word representations of numbers and their corresponding integer values. For example, if you pass the word 'three' as an argument to this function, it will return the integer 3. This function can be useful when you have data with numbers represented as words and need to convert them to numerical values for calculations or further processing.



In [None]:
word_dict = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9,
         	'ten': 10, 'eleven': 11, 'twelve': 12, 'zero': 0, 0: 0}

The provided code defines a dictionary named word_dict that maps word values to their corresponding integer representations.

**Here's a breakdown of the dictionary:**

1.	'one': 1: The word 'one' maps to the integer 1.
2.	'two': 2: The word 'two' maps to the integer 2.
3.	'three': 3: The word 'three' maps to the integer 3.
4.	'four': 4: The word 'four' maps to the integer 4.
5.	'five': 5: The word 'five' maps to the integer 5.
6.	'six': 6: The word 'six' maps to the integer 6.
7.	'seven': 7: The word 'seven' maps to the integer 7.
8.	'eight': 8: The word 'eight' maps to the integer 8.
9.	'nine': 9: The word 'nine' maps to the integer 9.
10.	'ten': 10: The word 'ten' maps to the integer 10.
11.	'eleven': 11: The word 'eleven' maps to the integer 11.
12.	'twelve': 12: The word 'twelve' maps to the integer 12.
13.	'zero': 0: The word 'zero' maps to the integer 0.
14.	0: 0: The integer 0 maps to itself, as it is included to handle the case when the input word is 'zero'.

**Overview:**

This dictionary is useful in the convert_to_int function mentioned earlier. It provides a mapping between word representations of numbers and their corresponding integer values. When a word is passed to the function, it can look up the word in this dictionary and return the corresponding integer value.



In [None]:
# Select the relevant columns and convert 'experience' values to integers
X = dataset.iloc[:, :3]
X['experience'] = X['experience'].apply(lambda x: convert_to_int(x))


**1.	X = dataset.iloc[:, :3]:** It selects the first three columns of the DataFrame dataset and assigns them to a new DataFrame X. The columns selected are usually the features or independent variables used for prediction.

**2.	X['experience'] = X['experience'].apply(lambda x: convert_to_int(x)):** This line takes the 'experience' column from the DataFrame X and applies the convert_to_int function (defined earlier) to each value in the column. The apply method is used here to apply a function to each element of the column. The lambda x: convert_to_int(x) is a lambda function that takes an individual 'experience' value and converts it to its corresponding integer using the convert_to_int function. The result is then stored back in the 'experience' column of the DataFrame X, replacing the original word values with their integer representations.

**Overview:**

After executing this code, the DataFrame X will contain the relevant columns (usually the first three columns from the original dataset) with the 'experience' column converted to integers. This data can now be used for further analysis or machine learning modeling where numerical data is expected as input.



In [None]:
# Select the target column
y = dataset.iloc[:, -1]


The provided code selects the target column from the DataFrame dataset and assigns it to a new variable y. Here's what the code does:

**1.	y = dataset.iloc[:, -1]:** This line uses the iloc indexer of pandas to select all rows and the last column of the DataFrame dataset. The iloc indexer allows selection based on integer-based positions. The -1 index refers to the last column of the DataFrame, which is commonly the target or dependent variable in a supervised learning task.

**Overview:**

After executing this code, the variable y will contain the target column from the original dataset. The target column typically contains the values that the machine learning model will try to predict or learn from the input features represented by DataFrame X. This separation of features (X) and target (y) is common when preparing data for machine learning tasks. The X and y can then be used to train and evaluate machine learning models.



In [None]:
# Create a Linear Regression model
regressor = LinearRegression()

# The provided code creates an instance of the LinearRegression model and assigns it to a variable named regressor.

In [None]:
# Fit the model with the training data
regressor.fit(X, y)


**1.	regressor:** This variable holds the instance of the LinearRegression model created earlier using LinearRegression().

**2.	fit(X, y):** The fit method is a common method in scikit-learn models used for training or fitting the model with the training data. In this case, X represents the training features (input data), and y represents the target variable (output data) that we want the model to learn and predict.

**3.  By calling regressor.fit(X, y),** the Linear Regression model is trained on the provided training data X and y. During the training process, the model will adjust its internal parameters to find the best-fitting straight line that represents the relationship between the features in X and the target variable in y. Once the training is completed, the model will be ready to make predictions on new, unseen data.

**Overview:**

After this code is executed, the regressor variable will hold the trained Linear Regression model, and it can be used to make predictions on new data using the .predict() method provided by scikit-learn.



In [None]:
# Save the trained model to a file
pickle.dump(regressor, open('MultiLinear_Salary.pkl', 'wb'))


In [None]:
# Load the saved model from the file
loaded_model = pickle.load(open('MultiLinear_Salary.pkl', 'rb'))


**1.	pickle:** This is the Python standard library module used for serializing and deserializing Python objects, including machine learning models.

**2.	pickle.dump(regressor, open('MultiLinear_Salary.pkl', 'wb')):** This line of code uses the pickle.dump() function to serialize and save the trained model (regressor) to a file named 'MultiLinear_Salary.pkl'.

**•	pickle.dump:** The pickle.dump() function is used to serialize the regressor object and save it to a file.

**•	regressor:** This is the trained Linear Regression model that we want to save.

**•	open('MultiLinear_Salary.pkl', 'wb'):** The open() function is used to open a file in binary write mode ('wb'). The file 'MultiLinear_Salary.pkl' will be created or overwritten if it already exists. The 'wb' mode is used because the pickle.dump() function expects a binary file to save the serialized data.

**Overview:**

After executing this code, the trained Linear Regression model (regressor) will be saved in a file named 'MultiLinear_Salary.pkl' in the current directory. This file can be later loaded using pickle.load() to reuse the trained model for making predictions on new data without having to retrain the model again.

In [None]:
!pip install colorama

Collecting colorama
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: colorama
Successfully installed colorama-0.4.6


In [None]:
import joblib
import colorama
import warnings

from IPython.display import HTML

# Suppress warnings
warnings.filterwarnings("ignore")

# Initialize colorama to handle colored text
colorama.init()

# Load the pre-trained model
loaded_model = joblib.load('MultiLinear_Salary.pkl')

# Ask user for inputs
experience = float(input("Enter Experience | Eg: 2 "))
test_score = float(input("Enter Test Score | Eg: 9 "))
interview_score = float(input("Enter Interview Score | Eg: 6 "))

# Make prediction using the loaded model
prediction = loaded_model.predict([[experience, test_score, interview_score]])

# Print the results in bold
#print("\033[1mPredicted Salary (In Rupees):\033[0m", prediction[0])


print("===============================================")
bold_text = "<b>The predicted value from model:</b> " + str(prediction[0])
display(HTML(bold_text))
print("===============================================")


Enter Experience | Eg: 2 2
Enter Test Score | Eg: 9 9
Enter Interview Score | Eg: 6 6




**1.	Import Statements:**

•	import joblib: Used to load the pre-trained machine learning model from a file (Pickle format in this case).
•	import colorama: This library is used to handle colored text in the console.
•	import warnings: Used to suppress warning messages.
•	from IPython.display import HTML: Enables rendering HTML content, which we use to display text in bold.

**2.	Initialize Colorama:**

•	colorama.init(): Initializes the Colorama library, allowing us to use colored text in the console.

**3.	Load the Pre-trained Model:**

•	loaded_model = joblib.load('MultiLinear_Salary.pkl'): Loads the pre-trained machine learning model stored in the 'MultiLinear_Salary.pkl' file.

**4.	Ask User for Inputs:**

•	The code prompts the user to enter three inputs: experience, test_score, and interview_score. The user is expected to provide numerical values representing the years of experience, test score, and interview score, respectively.

**5.	Make Prediction Using the Loaded Model:**

•	prediction = loaded_model.predict([[experience, test_score, interview_score]]): Uses the loaded model to predict the salary based on the input provided by the user. The prediction result is stored in the variable prediction.

**6.	Display the Result in Bold:**

•	The code prepares a string bold_text that contains the prediction result. It wraps the result with HTML <b> tags to make it bold.
•	display(HTML(bold_text)): Renders the bold_text with HTML tags, displaying the prediction result in bold in the Google Colab console.

**7.	Additional Output Formatting:**

•	The code also prints a series of equal signs before and after displaying the result to create a visual separator for better readability.

**Overview:**

In summary, this code loads a pre-trained machine learning model, takes user input for experience, test score, and interview score, predicts the salary using the model, and then displays the result in bold using HTML tags. The use of colorama enables colored text in the console, and warnings are suppressed to avoid displaying any warnings.



**Points to load .pkl file from github repository:**

1.You cannot directly use the URL in the open() function to access a file from GitHub. The open() function expects a local file path, not a URL.

2.To load the .pkl file from GitHub into Google Colab, you can use the requests library to download the file and then load it using pickle.

In [None]:
import requests
import pickle

# Replace the URL with the raw URL of your .pkl file in GitHub
url = 'https://raw.githubusercontent.com/9394113857/Dumped_Models/main/MultiLinear_Salary.pkl'

# Download the .pkl file from GitHub
response = requests.get(url)
with open('MultiLinear_Salary.pkl', 'wb') as f:
    f.write(response.content)

# Load the model from the downloaded .pkl file
with open('MultiLinear_Salary.pkl', 'rb') as f:
    model = pickle.load(f)


In [None]:
import joblib
import colorama
import warnings

from IPython.display import HTML

# Suppress warnings
warnings.filterwarnings("ignore")

# Initialize colorama to handle colored text
colorama.init()

# Load the pre-trained model
loaded_model = joblib.load('MultiLinear_Salary.pkl')

# Ask user for inputs
experience = float(input("Enter Experience | Eg: 2 "))
test_score = float(input("Enter Test Score | Eg: 9 "))
interview_score = float(input("Enter Interview Score | Eg: 6 "))

# Make prediction using the loaded model
prediction = model.predict([[experience, test_score, interview_score]])

# Print the results in bold
#print("\033[1mPredicted Salary (In Rupees):\033[0m", prediction[0])


print("===============================================")
bold_text = "<b>The predicted value from model:</b> " + str(prediction[0])
display(HTML(bold_text))
print("===============================================")


Enter Experience | Eg: 2 2
Enter Test Score | Eg: 9 9
Enter Interview Score | Eg: 6 6


