# Week 06 Worksheet; part B

This week's worksheet will be an introduction to the [Grammar of Graphics](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/ref=as_li_ss_tl) as implemented in the plotting software [plotnine](https://plotnine.org) in Python.  The Grammar of Graphics was first made popular by the R package [ggplot2](https://ggplot2.tidyverse.org).  So by learning the grammar, even if in Python, you'll really be learning a style of plotting that is incredibly popular in data science.

## 1. imports

Import the package `plotnine` as `pn`, `pandas` as `pd`, and `numpy` as `np`.  Also use the code

`from plotnine.data import diamonds as df`

to import one of the [many](https://plotnine.org/reference/#datasets) datasets from the plotnine package itself.

## 2. point

Make a scatter plot using [`geom_point`](https://plotnine.org/reference/geom_point.html#plotnine.geom_point).  Put the y-axis on [log10](https://plotnine.org/reference/scale_y_log10.html#plotnine.scale_y_log10) scale.  Add a [title](https://plotnine.org/reference/labs.html#plotnine.labs), and put units in the y-axis label.  Since there is so much data in the dataset, make the points slightly transparent with the keyword argument `alpha` to the function `geom_point()`.

## 3. histogram

Make a [histogram](https://plotnine.org/reference/geom_histogram.html#plotnine.geom_histogram) of the variable `price` with the y-axis being scaled to `density`.  Specify no `fill`, a color of your choice, and make transparent the histogram bins.  Overlay a [density plot](https://plotnine.org/reference/geom_density.html#plotnine.geom_density).  [Facet wrap]() by `color`.  Explore the [plot themes](https://plotnine.org/reference/#themes) and use the theme you like best.

## 4. density

Make a [density plot](https://plotnine.org/reference/geom_density.html#plotnine.geom_density) and add a [rug plot](https://plotnine.org/reference/geom_rug.html#plotnine.geom_rug) to the x-axis, making transparent the ticks so as to better see the density of them.  Pick another theme for this plot.

## 5. boxplot

Make a [box plot](https://plotnine.org/reference/geom_boxplot.html#plotnine.geom_boxplot) of `price` by `cut`, and also color the box plots by `cut`.

## 6. violin

Make a [violin plot](https://plotnine.org/reference/geom_violin.html#plotnine.geom_violin) of `price` by `color`, and also color the box plots by `color`.

## 7. violin + boxplot

Make a violin plot of `price` on `color`, and also color the box plots by `color`. Overlay box plots.  Adjust the `width` and `fill` of the box plots, so to not take away from the information contained in the violin plot.  Use the keyword argument `outlier_alpha` to hide the "outliers" of the box plot.

## 8. jitter

Import the [temperature](https://raw.githubusercontent.com/roualdes/data/refs/heads/master/temperature.csv) dataset we used in Week 06 Worksheet_a.  Create a column on just the months and store it in the original data.  I found it easiest to make a `month_name` column, too, which was type `category` with [ordered levels/categories](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.CategoricalIndex.set_categories.html#pandas.CategoricalIndex.set_categories).

Use the Pandas DataFrame method [melt](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html) to transform the data from wide to long: the identifier variable is `month` (or `month_name`), the value variables are `hilo` and `death_valley`.  Choose variable and value names that make sense to you.

With your new dataframe, use [geom_jitter]() to re-create, as best you can, the plot below.

![](https://roualdes.us/math608/temps.png)

## 9. jitter + linerange of quantiles

Make a jitter plot using just data from Death Valley only.  Overlay [line ranges](https://plotnine.org/reference/geom_linerange.html#plotnine.geom_linerange) in two different thicknesses.  The thinner lines should correspond to the 5% and 95% quantiles, hence a 90% confidence interval.  The thicker lines should correspond to the 25% and 75% quantiles, hence a 50% confidence interval.  In the end, your plot should look something like the plot below.

![](https://roualdes.us/math608/death_valley_jitter.png)

## 10. line

Import the dataset `economics` from plotnine and make use [geom_line](https://plotnine.org/reference/geom_line.html#plotnine.geom_line) to make a time-series plot of the number of unemployed people (in thousands) in the U.S.  Adjust the y-axis label to include units, and add a title.  Change the plots theme.