Explain in simple terms the name "Quantile Regression".
 
 
 Imagine you have a bunch of data points, like scores of students in a class. When you look at the average score, you're finding the "typical" score. But what if you want to know more than that? What if you want to know about specific scores that divide the students into groups?

That's where quantile regression comes in. Instead of just focusing on the average (like regular regression does), quantile regression helps you understand different groups within your data. It gives you information about specific points that split your data into portions. These specific points are called "quantiles."

For example, the median is a type of quantile. It's the score that separates the top half of students from the bottom half. With quantile regression, you can find other quantiles too, like the score that separates the top 10% of students from the rest, or the score that separates the top 25%.

So, in simple terms, quantile regression is a way to understand different parts of your data by looking at specific points that divide it into groups, rather than just focusing on the average.

In [1]:
import pandas as pd

In [2]:
!pip install --upgrade scikit-learn

Requirement already up-to-date: scikit-learn in c:\users\gopikrishna\anaconda3\lib\site-packages (1.0.2)


In [3]:
dataset = pd.read_csv("insurance_pre.csv")

In [4]:
dataset

Unnamed: 0,age,sex,bmi,children,smoker,charges
0,19,female,27.900,0,yes,16884.92400
1,18,male,33.770,1,no,1725.55230
2,28,male,33.000,3,no,4449.46200
3,33,male,22.705,0,no,21984.47061
4,32,male,28.880,0,no,3866.85520
...,...,...,...,...,...,...
1333,50,male,30.970,3,no,10600.54830
1334,18,female,31.920,0,no,2205.98080
1335,18,female,36.850,0,no,1629.83350
1336,21,female,25.800,0,no,2007.94500


In [5]:
dataset = pd.get_dummies(dataset)

In [6]:
dataset

Unnamed: 0,age,bmi,children,charges,sex_female,sex_male,smoker_no,smoker_yes
0,19,27.900,0,16884.92400,1,0,0,1
1,18,33.770,1,1725.55230,0,1,1,0
2,28,33.000,3,4449.46200,0,1,1,0
3,33,22.705,0,21984.47061,0,1,1,0
4,32,28.880,0,3866.85520,0,1,1,0
...,...,...,...,...,...,...,...,...
1333,50,30.970,3,10600.54830,0,1,1,0
1334,18,31.920,0,2205.98080,1,0,1,0
1335,18,36.850,0,1629.83350,1,0,1,0
1336,21,25.800,0,2007.94500,1,0,1,0


In [7]:
dataset.columns

Index(['age', 'bmi', 'children', 'charges', 'sex_female', 'sex_male',
       'smoker_no', 'smoker_yes'],
      dtype='object')

In [8]:
independent = dataset[['age', 'bmi', 'children', 'sex_female', 'sex_male',
       'smoker_no', 'smoker_yes']]

In [9]:
independent

Unnamed: 0,age,bmi,children,sex_female,sex_male,smoker_no,smoker_yes
0,19,27.900,0,1,0,0,1
1,18,33.770,1,0,1,1,0
2,28,33.000,3,0,1,1,0
3,33,22.705,0,0,1,1,0
4,32,28.880,0,0,1,1,0
...,...,...,...,...,...,...,...
1333,50,30.970,3,0,1,1,0
1334,18,31.920,0,1,0,1,0
1335,18,36.850,0,1,0,1,0
1336,21,25.800,0,1,0,1,0


In [10]:
dependent = dataset[['charges']]

In [11]:
dependent

Unnamed: 0,charges
0,16884.92400
1,1725.55230
2,4449.46200
3,21984.47061
4,3866.85520
...,...
1333,10600.54830
1334,2205.98080
1335,1629.83350
1336,2007.94500


In [12]:
from sklearn.model_selection import train_test_split 
X_train, X_test, Y_train, Y_test = train_test_split(independent, dependent, test_size=0.30, random_state = 0)

In [48]:
from sklearn.linear_model import QuantileRegressor

quantile_regression = QuantileRegressor(quantile=0.5, alpha=0,  fit_intercept=True)
quantile_regression.fit(X_train, Y_train)

  y = column_or_1d(y, warn=True)
Status is 4: Numerical difficulties encountered.
Result message of linprog:
The solution does not satisfy the constraints within the required tolerance of 3.16E-04, yet no errors were raised and there is no certificate of infeasibility or unboundedness. This is known to occur if the `presolve` option is False and the problem is infeasible. This can also occur due to the limited accuracy of the `interior-point` method. Check whether the slack and constraint residuals are acceptable; if not, consider enabling presolve, reducing option `tol`, and/or using method `revised simplex`. If you encounter this message under different circumstances, please submit a bug report.


QuantileRegressor(alpha=0)

In [49]:
import sklearn 
print(sklearn.__version__)

1.0.2


In [50]:
y_pred = quantile_regression.predict(X_test)

In [51]:
from sklearn.metrics import r2_score
r_score = r2_score(Y_test, y_pred)

In [52]:
r_score

0.7412992683545558

In [55]:
print(round(r_score, 2))

0.74
