# Car Purchasing Model

https://www.kaggle.com/dev0914sharma/car-purchasing-model

## Task

You are working as a data scientist in an automobile company.
You would like to develop a model to predict the total amount that customers are willing to pay for the new car. This information will be used by the company to do the targeted marketing based on the customer profile.

## File

This File contain the information of the various people with the following detail which can help us to find out the better prediction and also help us to train our model to predict the correct value and get a good F1 Score.

- Customer Name
- Customer e-mail
- Country
- Gender
- Age
- Annual Salary 
- Credit Card Debt 
- Net Worth (Asset – Liabilities)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("../input/car-purchasing-model/Car_Purchasing_Data.csv")
df

In [None]:
df.describe()

In [None]:
df.info()

## Check data

In [None]:
sns.set_style('darkgrid')
g = sns.pairplot(df, hue='Gender')

In [None]:
sns.heatmap(df.corr())

In [None]:
sns.set_style('whitegrid')
g = sns.countplot(x='Gender',data=df)

## Building the Regression Model

### Train - Test - Split

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
df.columns

In [None]:
X = df[['Gender', 'Age',
       'Annual Salary', 'Credit Card Debt', 'Net Worth']]
y = df['Car Purchase Amount']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y,
                                                    test_size=0.30, 
                                                    random_state=101
                                                   )

## Creating and Training the Model - Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
lm = LinearRegression()
lm.fit(X_train,y_train)

In [None]:
# print the intercept
print(lm.intercept_)

In [None]:
coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])
coeff_df

In [None]:
predictions = lm.predict(X_test)

In [None]:
plt.scatter(y_test,predictions)

In [None]:
sns.displot((y_test-predictions),bins=50);

### Regression Evaluation Metrics

In [None]:
from sklearn import metrics

In [None]:
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))
print('r2:', metrics.r2_score(y_test, predictions))