Skip to content

Bassem30/Diamonds-Analysis

Repository files navigation

Diamonds-Analysis

Dataset The data consists of information regarding 54,000 round-cut diamonds, including price, carat, and other diamond qualities. The dataset can be found in the repository for R's ggplot2 library here, with feature documentation available here.

Summary of Findings In the exploration, I found that there was a strong relationship between the price of a diamond and its carat weight, with modifying effects from the cut, color, and clarity grades given to the diamond. The relationship is approximately linear between price and carat when price is transformed to be on a logarithmic scale and carat transformed to be on a cube-root scale. I found a somewhat surprising result initially when the marginal trend for the cut, color, and clarity variables indicated that higher diamond quality was associated with lower price. However, higher diamond quality was also associated with smaller diamonds. When I isolated diamonds of a single carat weight, there was a clear positive relationship between higher diamond quality and higher diamond price.

Outside of the main variables of interest, I verified the relationship between diamond carat weight and its x, y, and z dimensions. For the dataset given, there was an interesting interaction in the categorical diamond quality features. The lower clarity grades looked like they had slightly better distribution of cut and color grades than diamonds with the higher clarity grades.

Key Insights for Presentation For the presentation, I focus on just the influence of the four Cs of diamonds and leave out most of the intermediate derivations. I start by introducing the price variable, followed by the pattern in carat distribution, then plot the transformed scatterplot.

Afterwards, I introduce each of the categorical variables one by one. To start, I use the violin plots of price and carat across clarity. I'm only looking at the clarity grade plot here since it's the clearest example of how the categorical quality grades affect diamond pricing. The other two categorical variables, cut and color, are covered afterwards, using point plots. I've made sure to use different color palettes for each quality variable to make sure it is clear that they're different between plots.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published