#  <span style="color:red">How Precious</span>

# Project Overview
**Date Updated: May 15, 2021**

For this project we will use the diamonds dataset based on a case study called **"How Precious is a Diamond?"**. The data contains 53940 records for training. Short descriptions of the business meaning of each column in the data is as below:

##  Dataset for this project

In this project we will follow below steps:

**carat:** The carat value of the Diamond

**cut:** The cut type of the Diamond, it determines the shine (Ideal' 'Premium' 'Good' 'Very Good' 'Fair')

**color:** The color value of the Diamond ('E' 'I' 'J' 'H' 'F' 'G' 'D')

**clarity:** The carat type of the Diamond ('SI2' 'SI1' 'VS1' 'VS2' 'VVS2' 'VVS1' 'I1')

**depth:** The depth value of the Diamond

**table:** Flat facet on its surface — the large, flat surface facet that you can see when you look at the diamond from above.

**x:** Width of the diamond

**y:** Length of the diamond

**z:** Height of the diamond

**price:** The price of the Diamond in USD.


# Getting the data and analysing

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas_profiling import ProfileReport
%matplotlib inline

In [None]:
dataset = pd.read_csv('../input/diamonds/diamonds.csv')
dataset.head()

# Quick check the data

In [None]:
dataset.shape

In [None]:
dataset.info()

# Exploratory data analysis

In [None]:
profile = ProfileReport(dataset, title="Diamonds Profiling Report")

In [None]:
profile.to_notebook_iframe()

# Preprocessing

In [None]:
dummyCut = pd.get_dummies(dataset['cut'],drop_first=True)
dummyColor = pd.get_dummies(dataset['color'],drop_first=True)
dummyClarity = pd.get_dummies(dataset['clarity'],drop_first=True)
df = pd.concat([dataset,dummyCut,dummyColor,dummyClarity],axis=1)
df.head()

In [None]:
df.drop(['cut','color','clarity'],axis=1,inplace=True)
df.head()

## Splitting the dataset into the Training set and Test set

In [None]:
X = df.drop('price',axis=1)
y = df['price']

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2)

# Defining the model
## Ensemble models

### AdaBoostRegressor

In [None]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn.model_selection import cross_val_score

model = AdaBoostRegressor(base_estimator=None,
                          n_estimators=100,
                          random_state=44)
scores = cross_val_score(model, X_train, y_train, cv=5)
scores.mean()

### RandomForestRegressor

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score

model = RandomForestRegressor(max_depth=10,random_state=44)
model.fit(X_train,y_train)

scores = cross_val_score(model, X_train, y_train, cv=5)
scores.mean()

### GradientBoostingRegressor

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_score

model = GradientBoostingRegressor(n_estimators=100, 
                                  learning_rate=0.1,
                                  max_depth=10,
                                  random_state=44,
                                  loss='ls')
model.fit(X_train, y_train)
scores = cross_val_score(model, X_train, y_train, cv=5)
print(scores)

In [None]:
from sklearn import metrics

y_pred = model.predict(X_test)
print(f"SCORE:{model.score(X_test,y_test)}")

print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

In [None]:
plt.figure(figsize=(10,10))
plt.plot(y_pred,y_test,'o')
plt.xlabel("predicted value")
plt.ylabel("actual value")
plt.show()

# Single prediction

In [None]:
X_test[1:2]

In [None]:
print(model.predict(X_test[1:2]))

In [None]:
print(y_test[1:2])

# Save and reuse model

In [None]:
import pickle
with open('GBR_estimator.pkl', 'wb') as file:
    pickle.dump(model, file)
    
# To deserialize estimator later
with open('GBR_estimator.pkl', 'rb') as file:
    new_model = pickle.load(file)

In [None]:
print(f'Predict Price: {np.round(new_model.predict(X_test[1:2]))}')
print(f'Actual Price : {y_test[1:2].values}')

Contact for more:
[R.CALISKAN](www.resulcaliskan.com)