## 1. Load Data

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#Data Loading
df = pd.read_csv('../data/raw/diamonds.csv')

## 2. Exploring Data

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,cut,color,clarity,carat_weight,cut_quality,lab,symmetry,polish,eye_clean,...,meas_depth,girdle_min,girdle_max,fluor_color,fluor_intensity,fancy_color_dominant_color,fancy_color_secondary_color,fancy_color_overtone,fancy_color_intensity,total_sales_price
0,0,Round,E,VVS2,0.09,Excellent,IGI,Very Good,Very Good,unknown,...,1.79,M,M,unknown,,unknown,unknown,unknown,unknown,200
1,1,Round,E,VVS2,0.09,Very Good,IGI,Very Good,Very Good,unknown,...,1.78,STK,STK,unknown,,unknown,unknown,unknown,unknown,200
2,2,Round,E,VVS2,0.09,Excellent,IGI,Very Good,Very Good,unknown,...,1.77,TN,M,unknown,,unknown,unknown,unknown,unknown,200
3,3,Round,E,VVS2,0.09,Excellent,IGI,Very Good,Very Good,unknown,...,1.78,M,STK,unknown,,unknown,unknown,unknown,unknown,200
4,4,Round,E,VVS2,0.09,Very Good,IGI,Very Good,Excellent,unknown,...,1.82,STK,STK,unknown,,unknown,unknown,unknown,unknown,200


In [4]:
df.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 219703 entries, 0 to 219702
Data columns (total 26 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   Unnamed: 0                   219703 non-null  int64  
 1   cut                          219703 non-null  object 
 2   color                        219703 non-null  object 
 3   clarity                      219703 non-null  object 
 4   carat_weight                 219703 non-null  float64
 5   cut_quality                  219703 non-null  object 
 6   lab                          219703 non-null  object 
 7   symmetry                     219703 non-null  object 
 8   polish                       219703 non-null  object 
 9   eye_clean                    219703 non-null  object 
 10  culet_size                   219703 non-null  object 
 11  culet_condition              219703 non-null  object 
 12  depth_percent                219703 non-null  float64
 13 

We have to delete column 1 as it is redundant

In [5]:
# calculating the number of outliers

# print(orig_df[orig_df['carat_weight'] > 2.03].shape[0])
# print(orig_df[orig_df['carat_weight'] > 2.03].shape[0]/df.shape[0])

From the box plot, we can observe that the carat weights range from 0.08 to 19.35. The majority of the carat weights fall between approximately 0.08 and 2.03, with a median value around 0.5. There are 9447(4.30%) outliers with carat weights above 2.03.


In [6]:
# The 23 color grades on the GIA Color Scale 
# (or diamond color chart) are subdivided into
#  five subcategories, which are: colorless (D-F); 
# near colorless (G-J); faint (K-M); very light (N-R); 
# and light (S-Z).


In [7]:
# diamonds with fancy color
print(df.loc[df['fancy_color_dominant_color'] != 'unknown'].shape[0])
df['fancy_color_dominant_color'].value_counts()

9164


fancy_color_dominant_color
unknown      210539
Yellow         6487
Pink           1369
Brown           531
Green           302
Orange          271
Purple           76
Gray             66
Blue             38
Chameleon        12
Black             6
Red               4
Other             2
Name: count, dtype: int64

Most of the diamonds fall under diamonds with no fancy color i.e Colorless, Near Colorless and Faint(D-M)

In [8]:
#  changing the 2 'Other' categories to 'unknown':
df.replace({'fancy_color_dominant_color': {'Other': 'unknown'}}, inplace=True)

There are 9162 diamonds of fancy colors with their respective intensities.

In [9]:
print(df.loc[df['fancy_color_intensity'] != 'unknown'].shape[0])
df['fancy_color_intensity'].value_counts()

9162


fancy_color_intensity
unknown          210541
Fancy              3447
Fancy Intense      1943
Fancy Light        1288
Fancy Deep          777
Fancy Vivid         714
Light               318
Faint               238
Fancy Dark          238
Very Light          199
Name: count, dtype: int64

On GIA Colored Diamond Grading Reports, colored diamonds are graded in order of increasing color strength, from Faint, Very Light, Light, Fancy Light and Fancy to Fancy Intense, Fancy Vivid, Fancy Dark and Fancy Deep

In [10]:
df['fancy_color_overtone'].fillna('unknown', inplace=True)
df['fancy_color_overtone'].value_counts(dropna=False)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['fancy_color_overtone'].fillna('unknown', inplace=True)


fancy_color_overtone
unknown      219315
Brownish        123
Yellowish        78
Orangey          54
Pinkish          51
Greenish         47
Purplish         34
Grayish           1
Name: count, dtype: int64

There are 9106 diamonds with a fancy color. And the Chameleon, Black and Red diamonds are the rarest among natural diamonds.

Lets double check that there aren't any diamonds with a secondary fancy color without a dominant color.

In [11]:
df.loc[(df['fancy_color_dominant_color']=='unknown') & (df['fancy_color_secondary_color']!='unknown')]

Unnamed: 0.1,Unnamed: 0,cut,color,clarity,carat_weight,cut_quality,lab,symmetry,polish,eye_clean,...,meas_depth,girdle_min,girdle_max,fluor_color,fluor_intensity,fancy_color_dominant_color,fancy_color_secondary_color,fancy_color_overtone,fancy_color_intensity,total_sales_price


Checking the same for fancy color overtone.

In [12]:
condition = (df['fancy_color_dominant_color']=='unknown') & (df['fancy_color_overtone']!='unknown')
df.loc[condition]

Unnamed: 0.1,Unnamed: 0,cut,color,clarity,carat_weight,cut_quality,lab,symmetry,polish,eye_clean,...,meas_depth,girdle_min,girdle_max,fluor_color,fluor_intensity,fancy_color_dominant_color,fancy_color_secondary_color,fancy_color_overtone,fancy_color_intensity,total_sales_price
9410,9410,Round,K,VVS2,0.3,Excellent,GIA,Excellent,Excellent,unknown,...,2.68,unknown,unknown,unknown,Medium,unknown,unknown,Yellowish,unknown,624
152987,152988,Pear,D,SI1,0.77,unknown,GIA,Very Good,Very Good,unknown,...,3.37,STK,XTK,unknown,,unknown,unknown,Greenish,unknown,3882
154700,154701,Pear,D,VS2,0.73,unknown,GIA,Very Good,Very Good,unknown,...,3.42,TK,XTK,unknown,,unknown,unknown,Greenish,unknown,4119
160454,160455,Round,K,VS2,1.1,Excellent,IGI,Excellent,Excellent,unknown,...,4.11,STK,STK,unknown,,unknown,unknown,Brownish,unknown,4754
219214,219215,Round,D,VVS2,5.05,Excellent,GIA,Excellent,Excellent,unknown,...,6.66,unknown,unknown,Blue,Strong,unknown,unknown,Yellowish,unknown,233311
219250,219251,Round,E,VVS1,5.49,Excellent,GIA,Excellent,Excellent,unknown,...,6.95,unknown,unknown,Blue,Strong,unknown,unknown,Yellowish,unknown,245952


All the above are colorless or faint diamonds, based on the color category. Because D and E are supposed to be colorless, we'll change the overtone to unknown. K colored diamonds are slightly tinted so we don't need to change the values. - https://essiluxgroup.com/knowledge-base/diamond-color.html

In [13]:
df.loc[(df.color.isin(['D', 'E'])) & (condition), 'fancy_color_overtone'] = 'unknown'
df.loc[condition]

Unnamed: 0.1,Unnamed: 0,cut,color,clarity,carat_weight,cut_quality,lab,symmetry,polish,eye_clean,...,meas_depth,girdle_min,girdle_max,fluor_color,fluor_intensity,fancy_color_dominant_color,fancy_color_secondary_color,fancy_color_overtone,fancy_color_intensity,total_sales_price
9410,9410,Round,K,VVS2,0.3,Excellent,GIA,Excellent,Excellent,unknown,...,2.68,unknown,unknown,unknown,Medium,unknown,unknown,Yellowish,unknown,624
152987,152988,Pear,D,SI1,0.77,unknown,GIA,Very Good,Very Good,unknown,...,3.37,STK,XTK,unknown,,unknown,unknown,unknown,unknown,3882
154700,154701,Pear,D,VS2,0.73,unknown,GIA,Very Good,Very Good,unknown,...,3.42,TK,XTK,unknown,,unknown,unknown,unknown,unknown,4119
160454,160455,Round,K,VS2,1.1,Excellent,IGI,Excellent,Excellent,unknown,...,4.11,STK,STK,unknown,,unknown,unknown,Brownish,unknown,4754
219214,219215,Round,D,VVS2,5.05,Excellent,GIA,Excellent,Excellent,unknown,...,6.66,unknown,unknown,Blue,Strong,unknown,unknown,unknown,unknown,233311
219250,219251,Round,E,VVS1,5.49,Excellent,GIA,Excellent,Excellent,unknown,...,6.95,unknown,unknown,Blue,Strong,unknown,unknown,unknown,unknown,245952


In [14]:
def color_condition(color):
    if color in ['D', 'E', 'F']:
        return 'colorless'
    elif color in ['G', 'H', 'I', 'J']:
        return 'near colorless'
    elif color in ['K', 'L', 'M']:
        return 'faint'
    elif color in ['N', 'O', 'P', 'Q', 'R']:
        return 'very light' 
    elif color in ['S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']:
        return 'light'
    else:
        return 'unknown'

df['color_scale'] = df['color'].apply(color_condition)
df.loc[(df['fancy_color_dominant_color']!='unknown')&(df['color']=='unknown'), 'color_scale'] = 'fancy'
df['color_scale'].value_counts() 

color_scale
colorless         95542
near colorless    94519
faint             20480
fancy              9162
Name: count, dtype: int64

### ========================================================================================

In [15]:
df['clarity'].value_counts()

clarity
SI1     38627
VS2     38173
VS1     36956
SI2     31105
VVS2    28985
VVS1    27877
IF       9974
I1       6961
I2        944
I3         91
SI3        10
Name: count, dtype: int64

Understanding clarity:
<!---
The GIA Diamond Clarity Scale has 6 categories, some of which are divided, for a total of 11 specific grades.

- Flawless (FL) No inclusions and no blemishes visible under 10x magnification
- Internally Flawless (IF) No inclusions visible under 10x magnification
- Very, Very Slightly Included (VVS1 and VVS2) Inclusions so slight they are difficult for a skilled grader to see under 10x magnification
- Very Slightly Included (VS1 and VS2) Inclusions are observed with effort under 10x magnification, but can be characterized as minor
- Slightly Included (SI1 and SI2) Inclusions are noticeable under 10x magnification
- Included (I1, I2, and I3) Inclusions are obvious under 10x magnification which may affect transparency and brilliance

WHAT CAUSES INCLUSIONS?
Small crystals can become trapped in a diamond when it’s forming. Sometimes as a crystal grows, it can develop irregularities in its atomic structure. The size, position and visibility of inclusions can have a significant impact on diamond clarity.

SI3 is a diamond clarity rating used by some labs to indicate a clarity that falls between the Slightly Included and Included ranges. It is important to note that the GIA does not have an SI3 rating and it is our expert opinion that you should avoid these diamonds. These diamonds are almost always diamonds that have been given an I1 or worse rating by the GIA and then submitted to another lab with lower standards and “upgraded” to an SI3. An SI3-rated diamond will, in all likelihood, not be eye-clean and will contain obvious blemishes visible to the naked eye. We don’t consider these diamonds to be a good value for your money.
>

In [16]:
df[df['clarity'] == 'SI3']

Unnamed: 0.1,Unnamed: 0,cut,color,clarity,carat_weight,cut_quality,lab,symmetry,polish,eye_clean,...,girdle_min,girdle_max,fluor_color,fluor_intensity,fancy_color_dominant_color,fancy_color_secondary_color,fancy_color_overtone,fancy_color_intensity,total_sales_price,color_scale
130,130,Round,H,SI3,0.23,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,284,near colorless
149,149,Round,H,SI3,0.24,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,296,near colorless
171,171,Round,H,SI3,0.25,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,308,near colorless
212,212,Round,H,SI3,0.26,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,320,near colorless
228,228,Round,F,SI3,0.24,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,324,colorless
304,304,Round,H,SI3,0.27,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,334,near colorless
405,405,Round,H,SI3,0.29,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,358,near colorless
406,406,Round,H,SI3,0.29,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,358,near colorless
9791,9791,Round,G,SI3,0.3,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,630,near colorless
19407,19407,Round,F,SI3,0.32,Very Good,IGI,Very Good,Very Good,unknown,...,unknown,unknown,unknown,,unknown,unknown,unknown,unknown,720,colorless


"You see, if you were to take the same exact (SI3) diamond graded by EGL to GIA, it would most likely receive an I1 or worse grade at GIA. It’s no wonder that GIA or any other major labs around the world won’t recognize SI3 as a clarity grade." - https://beyond4cs.com/clarity/si3-grading/


"IGI inflated the qualities in eight of the ten possible grades"

"IGI has been more lenient in its grading standards than GIA"

We'll do IGI vs GIA comparisons later in the notebook.

In [17]:
df['clarity'] = df['clarity'].replace('SI3', 'I1')

In [18]:
df['clarity'].value_counts()

clarity
SI1     38627
VS2     38173
VS1     36956
SI2     31105
VVS2    28985
VVS1    27877
IF       9974
I1       6971
I2        944
I3         91
Name: count, dtype: int64

In [19]:
# df['clarity_category'] = df['clarity'].astype('category')
# df['clarity_category']= df.clarity_category.cat.set_categories(
#     ['I3', 'I2', 'I1', 'SI2', 'SI1', 'VS2', 'VS1', 'VVS2', 'VVS1', 'IF'], ordered=True
# )
# df.clarity_category

In [20]:
def clarity_condition(clarity):
    if clarity in ['VVS1', 'VVS2']:
        return 'VVS'
    elif clarity in ['VS1', 'VS2']:
        return 'VS'
    elif clarity in ['SI1', 'SI2']:
        return 'SI'
    elif clarity in ['I1', 'I2', 'I3']:
        return 'I'
    else:
        return 'IF'


# df['clarity_scale'] = df['clarity'].apply(clarity_condition)
# df['clarity_scale'].value_counts()

In [21]:
df['cut_quality'].value_counts()

cut_quality
Excellent    124861
unknown       60607
Very Good     34201
Good             28
Fair              5
Ideal             1
Name: count, dtype: int64

In [22]:
# df['cut_quality_category'] = df['cut_quality'].astype('category')
# df['cut_quality_category'] = df.cut_quality_category.cat.set_categories(
#     ['Fair', 'Good', 'Very Good', 'Excellent', 'Ideal'], ordered=True
# )

In [23]:
df['lab'].value_counts()

lab
GIA    200434
IGI     15865
HRD      3404
Name: count, dtype: int64

In [24]:
df['culet_size'].value_counts()

culet_size
N          131899
unknown     85740
VS           1345
S             476
M             163
L              58
SL             14
EL              4
VL              4
Name: count, dtype: int64

In [25]:
# df['culet_size_category'] = df['culet_size'].astype('category')
# df['culet_size_category'] = df.culet_size_category.cat.set_categories(
#     ['EL', 'VL', 'SL', 'L', 'M', 'S', 'VS', 'N'], ordered=True
# )

In [26]:
df['culet_condition'].value_counts()

culet_condition
unknown    204384
Pointed     15293
Chipped        18
Abraded         8
Name: count, dtype: int64

Ideally, a culet should be so small it appears as a pinpoint when viewed from the top, categorized as “None” or “Small” in grading reports. A larger culet can create a visual “hole” or dark spot at the bottom of the diamond, detracting from its overall brilliance and appearance.

In GIA's International Diamond Grading System™, culet size is described as None, Very Small, Small, Medium, Slightly Large, Large, Very Large, or Extremely Large.

### ========================================================================================

In [27]:
print(df['girdle_min'].value_counts())
print(df['girdle_max'].value_counts())

girdle_min
unknown    83432
M          74421
STK        26335
TN         16744
TK         10353
VTK         4471
XTK         1981
VTN         1650
XTN          292
STN           24
Name: count, dtype: int64
girdle_max
unknown    84295
STK        70440
TK         25186
M          17977
VTK        12638
XTK         7647
TN          1363
VTN          111
XTN           34
STN           12
Name: count, dtype: int64


Ideal girdle thickness should range between Very Thin to Thick.

Abbreviations For Girdle thickness:

- EXTN, ET, XT, EXN = Extremely Thin
- VTN, VT, VETN = Very Thin
- T, TN, TH = Thin
- M, ME, MD = Medium
- STK, ST, SLTK, SLTH = Slightly Thick
- T, TK, TH = Thick
- VTK, VTH, VETK, VET = Very Thick
- ET, EXTK, XT, XTK = Extremely Thick
- F, FA, FAC = Faceted
- S, SM = Smooth
- P, PO = Polished

In [28]:
girdle_thickness_scale = ['XTN', 'VTN', 'TN', 'STN', 'M', 'STK', 'TK', 'VTK', 'XTK']

# create_categories('girdle_min', girdle_thickness_scale)
# create_categories('girdle_max', girdle_thickness_scale)

In [29]:
df['depth_percent'].describe()

count    219703.000000
mean         61.683768
std           9.915266
min           0.000000
25%          61.200000
50%          62.400000
75%          63.500000
max          98.700000
Name: depth_percent, dtype: float64

Depth and Table percent:
<!---
Graders calculate it by dividing the average girdle diameter (the width of the diamond) by the table to culet length (or height) of the diamond and multiplying it by one hundred. Ideally, the total depth percentage should range from 57.5 to 63%.

For a round diamond, an ideal depth percentage is between 59 and 62.6 percent and for a princess cut look for a diamond with a depth of 68 to 74 percent.

The ideal table percentage varies by shape. For a round cut diamond, an excellent table range is 54 and 57 and for a princess cut a table range of 69 to 75 percent of the width of the diamond is recommended.

TABLE:
- the table of a diamond is graded from poor to excellent, depending on its quality.
- For a round cut diamond, an excellent table range is 54 and 57 percent. 
A very good cut can have the table of 52 to 53 percent or 58 to 60 percent.
- For a princess cut diamond, an ideal table takes up 69 to 75 percent of the width of the diamond. A very good cut can be between 56 to 67 percent or 75 to 76 percent.

- For an asscher cut or emerald cut diamond, an ideal table takes up 60 to 68 percent of the width of the diamond. 
- For an oval cut diamond, an ideal table is between 53 and 63 percent. 
- For a pear shape diamond, an ideal table size is 53 to 65 percent.
- For a radiant cut diamond, an ideal table size is between 61 to 69 percent. 
- For a heart shape diamond, check for an ideal table that’s between 56 and 62 percent of the diamond’s total width.
- For a marquise diamond, an ideal table takes up 53 to 63 percent of the width of the diamond. A very good cut can be between 52 and 64 to 65 percent.

DEPTH:
- For a round diamond, an ideal depth percentage is between 59 and 62.6 percent
- For a princess cut diamond, choose a diamond with an ideal depth of 68 to 74 percent.
- For a cushion cut diamond, look for an ideal depth that’s between 61 and 68 percent.
- For an Asscher or Emerald cut diamond, an ideal depth is between 61 to 68 percent.
- For an oval cut diamond, an ideal depth is less than 68 percent.
- For a pear shape diamond, an ideal diamond’s depth is less than 68 percent.
- For a radiant cut diamond, an ideal depth is less than 67 percent.
- For a heart shape diamond, make sure to choose a diamond with an ideal depth of 56 to 62 percent.
- For a marquise diamond, an ideal depth range is between 58 and 62 percent of the total width of the diamond, while a very good cut will have a depth range of 56 to 57.9 or 62.1 to 66 percent.
>

### ========================================================================================

In [30]:
df['eye_clean'].value_counts()

eye_clean
unknown       156916
Yes            61931
Borderline       515
E1               300
No                41
Name: count, dtype: int64

In [31]:
# df['eye_clean_category'] = df['eye_clean'].astype('category')
# df['eye_clean_category'] = df.eye_clean_category.cat.set_categories(
#     ['No', 'Borderline', 'E1', 'Yes'], ordered=True)

The term “eye-clean” is not an official grading; rather, it’s a subjective assessment that can vary from person to person based on their eyesight and the lighting conditions.

If it’s a VVS, FL or IF diamond, you’re paying too much for clarity (unless you’re going for a diamond over 3 carats, then a VVS might be your best value). These are the higher grades, and you can almost always find an eye-clean diamond for less. On the other hand, I1-I3 diamonds are simply too included to be eye-clean in any carat weight above 0.4ct.

While a better clarity graded diamond might seem like the best choice, it isn’t worth the cost. A higher graded diamond will look identical to a lower graded diamond as long as they’re both eye-clean.

- Round Cut and Princess Cut: For 2 carat diamonds and smaller, VS2 and SI1 diamonds are almost always eye-clean (sometimes even SI2s for a round cut). For diamonds over 2 carats, VS1s and VS2s are eye-clean. When your carat weight gets over 3 carats, you may have to look at VVS2 diamonds to get an eye-clean stone. The bigger the diamond (carat weight can play a role), the easier it is to see imperfections.

- Cushion Cut, Oval Cut, Radiant Cut, Marquise and Pear-Shaped: These diamond shapes hide inclusions better than others. Opt for an SI1 or SI2 for the best value.

- Heart-Shaped: VS2 and SI1 heart shape diamonds will be eye-clean and offer you the most for your budget. They hide inclusions better than Round Cuts and Princess Cuts, but not as well as shapes like the cushion cut.

- Emerald cut, asscher cut and baguette: It’s easier to see imperfections in step cut diamonds. Aim for a VS2 in these shapes for the best value.

### ==================================================================================

In [32]:
print(df['polish'].value_counts(), '\n')
print(df['symmetry'].value_counts(), '\n')
print(df['fluor_intensity'].value_counts(), '\n')
print(df['fluor_color'].value_counts())

polish
Excellent    175806
Very Good     42323
Good           1565
Fair              7
Poor              2
Name: count, dtype: int64 

symmetry
Excellent    131619
Very Good     83143
Good           4609
Fair            325
Poor              7
Name: count, dtype: int64 

fluor_intensity
Faint          38302
Medium         20705
Strong         13243
Very Slight     2729
Very Strong     1093
unknown          128
Slight            12
Name: count, dtype: int64 

fluor_color
unknown    203977
Blue        15219
Yellow        400
Green          55
White          42
Orange         10
Name: count, dtype: int64


In [33]:
symmetry_labels = ['Poor', 'Fair', 'Good', 'Very Good', 'Excellent']

# create_categories('polish', symmetry_labels)
# create_categories('symmetry', symmetry_labels)

When diamonds have Slight or Faint Fluorescence from GIA, for example, they don’t appear cloudy. In fact, the slight fluorescence can make the diamond appear more white. But when fluorescence makes the diamond hazy, the stone is less transparent. Light won’t reflect as well and the diamond won’t be as clear or beautiful. 

Here are explanations of when fluorescence can lower a diamond’s quality:

- Strong or very strong blue fluorescence: These diamonds usually appear hazy or cloudy.
- Medium blue fluorescence with a high color grade (G or better): These diamonds also usually appear milky or hazy.
- D color, E color and F color diamonds with any fluorescence:
Diamonds in the colorless range (D-F) don’t benefit from fluorescence. They’re actually less desirable and therefore less valuable, which lowers the diamond price per carat.

### ====================================================================================

In [34]:

df['total_sales_price'].describe()

count    2.197030e+05
mean     6.908062e+03
std      2.595949e+04
min      2.000000e+02
25%      9.580000e+02
50%      1.970000e+03
75%      5.207000e+03
max      1.449881e+06
Name: total_sales_price, dtype: float64