  ** Applying Multiple Linear Regression to estimate the Medical Inurance Charges of an individual based on various factors provided in the Dataset.** 

---



---



*Importing Libraries*

---



In [0]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

*Importing the Dataset*

---



In [0]:
dataset=pd.read_csv("insurance.csv")
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, -1:].values

*Printing the Dataset Values*

---



In [3]:
print(dataset.head())

   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520


In [4]:
print(X)

[[19 'female' 27.9 0 'yes' 'southwest']
 [18 'male' 33.77 1 'no' 'southeast']
 [28 'male' 33.0 3 'no' 'southeast']
 ...
 [18 'female' 36.85 0 'no' 'southeast']
 [21 'female' 25.8 0 'no' 'southwest']
 [61 'female' 29.07 0 'yes' 'northwest']]


In [19]:
print(Y)

[[16884.92]
 [ 1725.55]
 [ 4449.46]
 ...
 [ 1629.83]
 [ 2007.94]
 [29141.36]]


* Encoding Catagorical Data*

---



*Encoding Independent Factors - Gender,Smoker,Region *

In [0]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
X[:,1]=(le.fit_transform(X[:,1]))
X[:,4]=(le.fit_transform(X[:,4]))
X[:,5]=(le.fit_transform(X[:,5]))


*Printing the Encoded Values from Dataset*

In [21]:
print(X)

[[19 0 27.9 0 1 3]
 [18 1 33.77 1 0 2]
 [28 1 33.0 3 0 2]
 ...
 [18 0 36.85 0 0 2]
 [21 0 25.8 0 0 3]
 [61 0 29.07 0 1 1]]


*Splitting the Dataset into Training Set and Test Set *

---






In [0]:
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3)

*Printing the Training Set Values*

---



In [23]:
print(X_train)

[[29 1 22.515 3 0 0]
 [31 1 34.39 3 1 1]
 [49 0 22.61 1 0 1]
 ...
 [35 1 27.1 1 0 3]
 [64 1 24.7 1 0 1]
 [22 0 28.05 0 0 2]]


In [24]:
print(X_test)

[[43 1 23.2 0 0 3]
 [22 0 30.4 0 1 1]
 [60 1 24.32 0 0 1]
 ...
 [30 0 28.405 1 0 1]
 [60 0 28.7 1 0 3]
 [34 1 30.8 0 1 3]]


*Printing the Test Values*

---



In [25]:
print(Y_train)

[[ 5209.58]
 [38746.36]
 [ 9566.99]
 [38344.57]
 [19040.88]
 [19107.78]
 [13747.87]
 [ 1635.73]
 [13429.04]
 [ 9617.66]
 [ 8240.59]
 [ 8605.36]
 [ 1837.24]
 [ 2198.19]
 [12146.97]
 [ 3070.81]
 [21797.  ]
 [ 4466.62]
 [16776.3 ]
 [ 9058.73]
 [ 6875.96]
 [37079.37]
 [ 7731.43]
 [39722.75]
 [11090.72]
 [10796.35]
 [47896.79]
 [ 4949.76]
 [17904.53]
 [14394.56]
 [11381.33]
 [ 5428.73]
 [55135.4 ]
 [ 3561.89]
 [27117.99]
 [24535.7 ]
 [ 2899.49]
 [25656.58]
 [23241.47]
 [ 3994.18]
 [10594.5 ]
 [ 4795.66]
 [ 2211.13]
 [ 8944.12]
 [ 3578.  ]
 [ 4915.06]
 [ 3167.46]
 [ 3704.35]
 [13112.6 ]
 [12129.61]
 [ 1725.55]
 [24393.62]
 [11737.85]
 [12622.18]
 [20177.67]
 [21344.85]
 [ 2459.72]
 [11842.62]
 [10197.77]
 [11396.9 ]
 [15230.32]
 [14133.04]
 [10791.96]
 [ 2498.41]
 [ 9855.13]
 [18157.88]
 [ 6414.18]
 [10560.49]
 [ 4751.07]
 [ 1261.44]
 [21472.48]
 [47055.53]
 [ 9704.67]
 [ 7222.79]
 [12105.32]
 [ 4877.98]
 [10436.1 ]
 [11576.13]
 [39725.52]
 [ 2104.11]
 [ 2136.88]
 [ 1625.43]
 [39983.43]
 [ 3

In [26]:
print(Y_test)

[[ 6250.44]
 [33907.55]
 [12523.6 ]
 [ 8823.99]
 [ 9222.4 ]
 [ 2166.73]
 [36950.26]
 [19496.72]
 [36149.48]
 [ 3579.83]
 [34254.05]
 [ 3077.1 ]
 [ 9630.4 ]
 [ 4837.58]
 [13217.09]
 [25309.49]
 [ 1615.77]
 [10797.34]
 [ 9282.48]
 [52590.83]
 [39611.76]
 [ 7419.48]
 [17748.51]
 [12142.58]
 [23288.93]
 [16884.92]
 [ 1702.46]
 [12981.35]
 [19199.94]
 [48173.36]
 [ 9724.53]
 [13405.39]
 [26392.26]
 [17043.34]
 [ 4718.2 ]
 [ 2302.3 ]
 [10407.09]
 [ 4350.51]
 [ 7726.85]
 [39774.28]
 [ 8124.41]
 [19444.27]
 [ 2709.11]
 [ 4449.46]
 [44641.2 ]
 [ 8534.67]
 [13616.36]
 [ 3972.92]
 [14410.93]
 [ 1621.88]
 [ 9880.07]
 [31620.  ]
 [ 3385.4 ]
 [11264.54]
 [18903.49]
 [12268.63]
 [ 8798.59]
 [ 2304.  ]
 [ 3847.67]
 [17179.52]
 [ 1708.  ]
 [10942.13]
 [42211.14]
 [ 2480.98]
 [24106.91]
 [11520.1 ]
 [24915.22]
 [11842.44]
 [ 1727.54]
 [35147.53]
 [ 5472.45]
 [ 1242.26]
 [ 4347.02]
 [12609.89]
 [ 3176.82]
 [ 4260.74]
 [ 4646.76]
 [11073.18]
 [22395.74]
 [37607.53]
 [16069.08]
 [14283.46]
 [17626.24]
 [19

*Training Multiple Linear Regression On Training Set*

---



In [27]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

*Predicting the Values of the Test Set*

---



In [0]:
Y_pred=regressor.predict(X_test)

*Printing the Predicted Values*

---



In [15]:
np.set_printoptions(precision=2)
print(Y_pred)

[[ 1.45e+04]
 [ 1.56e+04]
 [ 1.00e+04]
 [ 4.95e+03]
 [ 8.45e+03]
 [ 1.51e+04]
 [ 9.65e+03]
 [ 1.14e+04]
 [ 3.16e+04]
 [ 3.70e+03]
 [ 9.77e+02]
 [ 2.69e+04]
 [ 1.74e+03]
 [ 4.37e+03]
 [ 2.97e+04]
 [ 1.52e+03]
 [ 7.63e+03]
 [ 2.66e+04]
 [ 9.12e+03]
 [ 1.17e+04]
 [ 1.28e+04]
 [ 1.54e+04]
 [ 3.58e+04]
 [ 1.29e+04]
 [ 3.37e+02]
 [ 4.63e+03]
 [ 9.90e+03]
 [ 8.11e+03]
 [ 6.72e+03]
 [ 9.72e+03]
 [ 7.78e+03]
 [ 3.45e+04]
 [ 3.07e+03]
 [ 8.41e+02]
 [ 3.29e+04]
 [ 3.33e+04]
 [ 1.18e+04]
 [ 6.49e+03]
 [ 1.29e+04]
 [ 5.32e+03]
 [ 4.68e+03]
 [ 9.32e+03]
 [ 3.24e+04]
 [ 4.05e+04]
 [ 3.32e+04]
 [ 1.51e+04]
 [-1.52e+03]
 [ 1.15e+04]
 [ 1.29e+04]
 [ 9.70e+03]
 [ 3.48e+04]
 [ 8.26e+03]
 [ 9.47e+03]
 [ 1.14e+04]
 [ 4.06e+04]
 [ 1.13e+04]
 [ 2.91e+03]
 [ 2.96e+04]
 [ 7.44e+03]
 [ 8.31e+03]
 [ 1.85e+04]
 [ 2.68e+04]
 [ 5.86e+03]
 [ 7.32e+03]
 [ 8.08e+03]
 [ 1.29e+04]
 [ 2.99e+03]
 [ 1.01e+04]
 [ 1.18e+04]
 [ 4.85e+03]
 [ 9.76e+03]
 [ 1.01e+04]
 [ 6.93e+03]
 [ 1.14e+04]
 [ 9.35e+03]
 [ 1.50e+04]
 [ 6.07e+03]

*Concatenating the Predicted and Sample values to compare.*

---



In [28]:
print(np.concatenate((Y_pred,Y_test),1))

[[ 1.45e+04  6.25e+03]
 [ 1.56e+04  3.39e+04]
 [ 1.00e+04  1.25e+04]
 [ 4.95e+03  8.82e+03]
 [ 8.45e+03  9.22e+03]
 [ 1.51e+04  2.17e+03]
 [ 9.65e+03  3.70e+04]
 [ 1.14e+04  1.95e+04]
 [ 3.16e+04  3.61e+04]
 [ 3.70e+03  3.58e+03]
 [ 9.77e+02  3.43e+04]
 [ 2.69e+04  3.08e+03]
 [ 1.74e+03  9.63e+03]
 [ 4.37e+03  4.84e+03]
 [ 2.97e+04  1.32e+04]
 [ 1.52e+03  2.53e+04]
 [ 7.63e+03  1.62e+03]
 [ 2.66e+04  1.08e+04]
 [ 9.12e+03  9.28e+03]
 [ 1.17e+04  5.26e+04]
 [ 1.28e+04  3.96e+04]
 [ 1.54e+04  7.42e+03]
 [ 3.58e+04  1.77e+04]
 [ 1.29e+04  1.21e+04]
 [ 3.37e+02  2.33e+04]
 [ 4.63e+03  1.69e+04]
 [ 9.90e+03  1.70e+03]
 [ 8.11e+03  1.30e+04]
 [ 6.72e+03  1.92e+04]
 [ 9.72e+03  4.82e+04]
 [ 7.78e+03  9.72e+03]
 [ 3.45e+04  1.34e+04]
 [ 3.07e+03  2.64e+04]
 [ 8.41e+02  1.70e+04]
 [ 3.29e+04  4.72e+03]
 [ 3.33e+04  2.30e+03]
 [ 1.18e+04  1.04e+04]
 [ 6.49e+03  4.35e+03]
 [ 1.29e+04  7.73e+03]
 [ 5.32e+03  3.98e+04]
 [ 4.68e+03  8.12e+03]
 [ 9.32e+03  1.94e+04]
 [ 3.24e+04  2.71e+03]
 [ 4.05e+04

*Predicting a single value through inputs *

---



In [29]:
regressor.predict([[20,1,26.2,2,1,0]])

array([[26384.02]])

*Obtaining the Final Multiple Regression Eqaution. *

---



In [30]:
print(regressor.coef_)
print(regressor.intercept_)

[[  248.6   -454.01   307.31   413.27 23573.51  -276.15]]
[-10585.56]


***The Final Equation Can be Represented as:***

---


***Charges = 248.6x age -454.01x Sex + 307.31x bmi + 413.27x children + 23573.51xsmoker - 276.15x region - 10585.56 ***