# Problem Statement:
- The insurance company has collected the data of age of customers and the purchased details of the customers. They are asking us to predict the purchase behaviour of the customers given that age is entered as input to the model.
- In the data, age is independent and purchased is target.
- purchased contains values like 0 and 1, where 0 = Not Purchased & 1 = Purchased

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_csv("Health_Insurance.csv")
data.head()

Unnamed: 0,Age,Purchased
0,19,0
1,35,0
2,26,0
3,27,0
4,19,0


In [3]:
data.shape

(400, 2)

# Observations:

- The data contains 400 rows and 2 columns
- Each row represents data of one customer

In [4]:
data.isnull().sum()

Age          0
Purchased    0
dtype: int64

# Seperate X and y

In [6]:
X = data.drop("Purchased", axis = 1)
y = data['Purchased']

# Split X and y into train set and test set

In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

In [8]:
X_train.shape, X_test.shape

((300, 1), (100, 1))

# Apply Logistic Regression on train set

In [9]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr

In [10]:
lr.fit(X_train ,y_train)

# Perform prediction on the X_test data

In [12]:
y_pred = lr.predict(X_test)
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0], dtype=int64)

In [13]:
X_test

Unnamed: 0,Age
132,30
309,38
341,35
196,30
246,35
...,...
146,27
135,23
390,48
264,48


# Check Accuracy y_pred and y_test data

In [14]:
from sklearn.metrics import accuracy_score
accuracy_score(y_pred, y_test)

0.9

In [16]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

array([[66,  2],
       [ 8, 24]], dtype=int64)

In [17]:
accuracy = (66 + 24) / (66+24+8+2)

In [18]:
accuracy

0.9

# Note:
- For all classification algorithms, confusion matrix is the only parameter to judge accuracy.
- For all regression algorithms, r2_score and mean_sqaured_error are the parameters to judge accuracy