# Boston Housing Price Predictions

I will be using `pandas` `plotly.express` and `scikit-learn` for me to predict the **price of insurance based on age**. The dataset comes from `kaggle.com`. 

In [2]:
# Importing the needed libraries

import pandas as pd
import plotly.express as px
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [3]:
# Loading in the csv file

df = pd.read_csv('insurance_data.csv')

In [4]:
# The next few code blocks will be used for EDA

df.head(10)

Unnamed: 0,Age,Premium
0,18,10000
1,22,15000
2,23,18000
3,26,21000
4,28,24000
5,31,26500
6,33,27000


In [5]:
df.isnull().sum()

Age        0
Premium    0
dtype: int64

In [6]:
# Vizualizing the data

fig = px.scatter(df,
                 x='Age',
                 y='Premium',
                 marginal_x='histogram',
                 marginal_y='histogram',
                 trendline='ols',
                 color='Premium',
                 color_continuous_scale=px.colors.sequential.ice)
fig.update_layout(title="Insurance",
                  xaxis_title='Age',
                  yaxis_title='Premium')
fig.show()

### Making the Model

From here on out, I will be making the model using the data to know if the Linear regression algorithm is going to be a good fit (which based on the visualization it should be) for the provided data.

In [32]:
# Features/ Labels

X = df['Age']
y = df['Premium']

In [33]:
# Ensures that X is in DataFrame form.
if isinstance(X, pd.Series):
    X = X.to_frame()
    
# Ensures that y is in Series form.
if isinstance(y, pd.Series):
    y = pd.Series(y)

In [34]:
# Spliting the data into training and testing

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [37]:
# Creating the model by using the Linear Regression Algorithm

model=LinearRegression()
model.fit(X_train, y_train)

In [52]:
# Predicting using the model to make predictions and Analyzing the R

y_pred = model.predict(X_test)

for pred in y_pred:
    print(f"Prediction: {pred:.2f}")

print(f"R^2 Value: {model.score(X, y):.2f}")

for coeff in model.coef_:
    print(f"Coefficent: {coeff:.2f}")

Prediction: 16736.65
Prediction: 15554.90
R^2 Value: 0.97
Coefficent: 1181.75
