# <center>**Diamond Price Prediction Model**</center>

<img src='diamond_pic.png'>

## **Table of Contents**

1. [Problem Statement](#problem)
2. [Data Loading and Exploration](#data-loading)
3. [Data Preprocessing](#data-preprocess)
[<ul>3.1 Numerical Pipeline</ul>](#numeric)
[<ul>3.2 Ordinal Pipeline</ul>](#ordinal)
[<ul>3.3 Custom Transformer</ul>](#custom)
4. [Data Visualization in Tableau](#visual)
5. [Model Selection and Training](#selection)
[<ul>5.1 Intital Model Creation</ul>](#initial)
[<ul>5.2 Fine Tuning</ul>](#fine)
6. [Model Evaluation](#evaluation)
7. [Conclusion](#conclude)
8. [Appendix](#append)

---



## **1. Problem Statement** <a class="anchor" id="problem"></a>

The goal of this analysis is to explore the Diamonds Data set from Kaggle (https://www.kaggle.com/datasets/shivam2503/diamonds) and to see if there are any trends between the track features and popularity.  More specifically, the results of this analysis could be useful to music producers, song writers, and bands to incorporate the features that affect popularity in their future songs and albums.

The dataset author's description of the columns is in the Appendix section.  

---

## **2. Data Loading and Exploration** <a class="anchor" id="data-loading"></a>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
diamond_data = pd.read_csv('diamond_prices.csv')

In [4]:
diamond_data.head()

Unnamed: 0.1,Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,1,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,2,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,3,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,4,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,5,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


In [5]:
diamond_data.shape

(53940, 11)

The author of the dataset says that there are 30,000 songs in the dataset, but it looks like there are around 3,000 more.

In [6]:
diamond_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53940 entries, 0 to 53939
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  53940 non-null  int64  
 1   carat       53940 non-null  float64
 2   cut         53940 non-null  object 
 3   color       53940 non-null  object 
 4   clarity     53940 non-null  object 
 5   depth       53940 non-null  float64
 6   table       53940 non-null  float64
 7   price       53940 non-null  int64  
 8   x           53940 non-null  float64
 9   y           53940 non-null  float64
 10  z           53940 non-null  float64
dtypes: float64(6), int64(2), object(3)
memory usage: 4.5+ MB


## **3. Data Preprocessing** <a class="anchor" id="data-preprocess"></a>

### 3.1 Numerical Pipeline <a class="anchor" id="numeric"></a>

### 3.2 Ordinal Pipeline <a class="anchor" id="ordinal"></a>

### 3.3 Custom Transformer <a class="anchor" id="custom"></a>

## **4. Data Visualization in Tableau** <a class="anchor" id="visual"></a>

Placeholder

![Spotify Streams Dashboard.png](attachment:55f23ec6-ea6d-4fd0-a9a2-87aaf45095aa.png)

## **5. Model Selection and Training** <a class="anchor" id="selection"></a>

### 5.1 Initial Model Creation <a class="anchor" id="initial"></a>



### 5.2 Fine Tuning<a class="anchor" id="fine"></a>

---
## **6. Model Evaluation** <a class="anchor" id="evaluation"></a>

---
## **7. Conclusion** <a class="anchor" id="conclude"></a>

---
## **8. Appendix** <a class="anchor" id="append"></a>

The description of the columns from the dataset author's Kaggle post are listed below:

|variable                 |class     |description |
|:---|:---|:-------|
|carat                    |float64   | weight of the diamond |
|cut                      |character | quality of the cut (Fair, Good, Very Good, Premium, Ideal)  |
|color                    |character | diamond color, from J (worst) to D (best)  |
|clarity                  |character | clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) |
|x                        |float64 | length in mm (0--10.74) |
|y                        |float64 | y width in mm (0--58.9) |
|z                        |float64 | z depth in mm (0--31.8) |
|depth                    |float64 | depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79) |
|table                    |float64 | table width of top of diamond relative to widest point (43--95) |
|price                    |int64     | price in US dollars |