<div style="max-width: 600px; margin: 50px auto; padding: 20px; background-color: #f8f8f8; border-radius: 10px; box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);">
    <h1 style="color: #009688; text-align: center;">Welcome INVENTORS!</h1>
    <p style="color: #333; text-align: justify;">Welcome to our workshop on Supervised Machine Learning! In this session, we'll delve into the captivating realm of predictive analytics using various techniques within supervised learning. Supervised learning involves training a model on a labeled dataset, where each input is associated with an output, to learn a mapping between the input and output variables. Throughout this workshop, we'll explore algorithms such as linear regression, decision trees, support vector machines, and neural networks, among others. We'll learn how to build, evaluate, and deploy supervised learning models to tackle real-world problems, enabling you to harness the power of data-driven insights for impactful decision-making. Get ready to embark on an exhilarating journey into the exciting field of supervised machine learning! &#128640;</p>
</div>


# 📈 **Linear Regression**


<a target="_blank"><img src="https://media.geeksforgeeks.org/wp-content/uploads/20231123123151/ezgif-1-ba0c9540c5.gif" border="0" /></a>

<p>Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.</p>

**In Machine Learning and this notebook we use Pandas a lot.**

<a target="_blank"><img src="https://imgs.search.brave.com/Nc6gXRnXrvfoFmjyGoBVq-UocGRxFQgCG68HN3aeL2g/rs:fit:500:0:0/g:ce/aHR0cHM6Ly9kYXRh/c2NpZW50ZXN0LmNv/bS9lbi93cC1jb250/ZW50L3VwbG9hZHMv/c2l0ZXMvOS8yMDIy/LzAxL2lsbHVfcGFu/ZGFzLTgyLTEwMjR4/NTYyLnBuZw" border="0"  style="width: 350px;"/></a>

### **What is pandas used for?**

Pandas is a powerful and versatile library for data manipulation and analysis in Python. It offers a wide range of data structures and functions for working with structured data, making it an essential tool for data scientists and analysts. With Pandas, you can easily load, manipulate, and analyze data from various sources such as CSV files, Excel spreadsheets, SQL databases, and more. Its intuitive and expressive API allows for tasks like data cleaning, transformation, aggregation, and visualization. Pandas is particularly well-suited for tasks like data exploration, data preprocessing, and feature engineering in preparation for machine learning tasks. Overall, Pandas provides a flexible and efficient toolkit for working with tabular and time-series data in Python.

<a target="_blank"><img src="https://imgs.search.brave.com/Ei5DQ8GPJjp2JmG9MMd2kGN7DmHCtiLOW3Wz3VndHOk/rs:fit:500:0:0/g:ce/aHR0cHM6Ly91cGxv/YWQud2lraW1lZGlh/Lm9yZy93aWtpcGVk/aWEvY29tbW9ucy8z/LzMxL051bVB5X2xv/Z29fMjAyMC5zdmc.svg" border="0" style="width: 350px;"></a>


### **What is numpy used for?**

NumPy is a fundamental library for scientific computing in Python, providing support for multidimensional arrays and matrices along with a collection of mathematical functions to operate on these arrays efficiently. It serves as the foundation for many other libraries in the Python ecosystem, enabling high-performance numerical computations. With NumPy, you can perform array manipulation, mathematical operations, linear algebra, random number generation, and more. Its array-oriented computing capabilities make it essential for tasks such as data analysis, machine learning, signal processing, and image processing. NumPy's concise syntax and optimized performance make it the go-to choice for handling large datasets and complex mathematical operations in Python.


In [1]:
import numpy as np
import pandas as pd

<a target="_blank"><img src="https://s4.uupload.ir/files/download_(1)_slz6.png" border="0" style="width: 350px;" /></a>

### **What is scikit-learn used for?**

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python.

<p>The code snippet imports necessary modules from scikit-learn library for <code>LinearRegression</code> for creating linear regression models and many other models, <code>train_test_split</code> for splitting the dataset into training and testing sets, and <code>mean squared error</code> and <code> r2score </code> for evaluating model accuracy then imports necessary modules from the scikit-learn library for <code> SimpleImputer</code> for handling missing values in datasets.</p>



In [2]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error, r2_score
from sklearn.impute import SimpleImputer

<p>The code snippet loads CSV file, 'train.csv', using pandas' <code>read_csv()</code> function from the specified paths.</p>

<p>
<code>pd.get_dummies(df_1)</code> is using the <code>get_dummies()</code> function from the Pandas library in Python to convert categorical variables into  numerical representations of categorical data. This allows handling categorical data effectively. The resulting DataFrame, will contain the original columns from <code> df_1</code>, but with categorical columns replaced by their dummy variable representations.    
</p>
<p> 
 <code>df.iloc[:, :-1].values </code> This selects all rows and all columns except the last one from the DataFrame df. The .iloc accessor is used to select data by position (integer index). [:, :-1] specifies all rows (:) and all columns up to the last one (:-1). 
<code>.values </code> returns the values as a NumPy array.

<code>df['SalePrice']</code> This selects the column named 'SalePrice' from the DataFrame df. It retrieves the target variable values and assigns them to the variable y.
</p>


In [3]:
df_1 = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/train.csv')
df = pd.get_dummies(df_1)

x = df.iloc[:, :-1].values
y=df['SalePrice']
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

<p>The code calculates the number of null values in each column of the DataFrame <code>df </code>and prints the result.</p>

In [4]:
null_counts = df.isnull().sum()
print(null_counts)

Id                         0
MSSubClass                 0
LotFrontage              259
LotArea                    0
OverallQual                0
                        ... 
SaleCondition_AdjLand      0
SaleCondition_Alloca       0
SaleCondition_Family       0
SaleCondition_Normal       0
SaleCondition_Partial      0
Length: 289, dtype: int64


<p>The provided code snippet demonstrates the usage of scikit-learn's <code>SimpleImputer</code> to handle missing values in a dataset. </p>
<p>It initializes an imputer object with a specified strategy for imputation, such as replacing missing values with the mean. Then, it fits the imputer to the training data <code>X_train</code> and transforms both the training and testing data <code>X_test</code> to replace missing values with the computed mean from the training data. </p>
<p>This process ensures that the imputation strategy learned from the training set is applied consistently to both training and testing data, thereby handling missing values effectively across the dataset.<p>

In [5]:
#imputer = SimpleImputer(strategy='most_frequent')
imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)

<p>The provided code snippet trains a linear regression model using the imputed training data <code>X_train_imputed</code> and corresponding target values <code> y_train</code>, then predicts the target variable values for the imputed testing data <code> X_test_imputed</code> using the trained model</p>

In [6]:
model = LinearRegression()
model.fit(X_train_imputed, y_train)

y_pred = model.predict(X_test_imputed)

<p>The provided code snippet calculates the mean squared error (MSE) and the coefficient of determination (R-squared) score between the predicted target values <code> y_pred </code> and the actual target values <code> y_test </code>. Then, it prints out both the MSE and R-squared score to evaluate the performance of the linear regression model. </p>

In [7]:
mse = mean_squared_error(y_test, y_pred)
Rsquared = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("Rsquared (R^2 Score):", Rsquared)

Mean Squared Error: 2.19273116022413e-20
Rsquared (R^2 Score): 1.0
