# Solutions: Unit 7
-------------------

Complete the problems below in your copy of the Jupyter Notebook.

## Problem 7.1.

The `seaborn` library offers some conveniences, but you can also acheive these results using other methods. Recreate the plot from section 7.2.1. using just the `matplotlib` and `pandas` packages. You may need to consult the [`matplotlib` documentation](https://matplotlib.org/stable/api/axes_api.html) to add some of the features to this plot. Compare this to the `seaborn` solution. 

In [None]:
# problem 7.1. solution

import pandas as pd
import matplotlib.pyplot as plt

plt.style.use('ggplot')

# read in the physical properties dataset that we have used previously
films_df = pd.read_excel('../../data/film_testing.xlsx', sheet_name='physical_properties')

# filter the data to select just the Elongation at Break property
elongation_df = films_df[films_df['Property']=='Elongation at Break']

# pivot the data to produce columns by direction, calculate the mean and standard devation
elongation_pt = elongation_df.pivot_table(index='FilmID', columns='Direction', 
                                          values='Measurement', aggfunc=['mean', 'std'])

# resets the index to numerical values, instead of the FilmID labels
# FilmID becomes a data column
elongation_pt.reset_index(inplace=True)
elongation_pt

In [None]:
# start the plot
fig, ax = plt.subplots()

# need to set the bar width, so that we can shift the two data series
bar_width = 0.4

# plot the MD series align='edge', with a negative bar width shifts the bar left
ax.bar(elongation_pt.index, elongation_pt['mean', 'MD'], 
       align='edge', width=-bar_width, yerr=elongation_pt['std', 'MD'], 
       tick_label=elongation_pt['FilmID', ''], label='MD')

ax.bar(elongation_pt.index, elongation_pt['mean', 'TD'], 
       align='edge', width=bar_width, yerr=elongation_pt['std', 'TD'], 
       tick_label=elongation_pt['FilmID', ''], label='TD')

ax.legend(title='Film Orientation')
ax.set_xlabel("Film Grade")
ax.set_ylabel("Elongation at Break (%)")

## Problem 7.2.

The `seaborn` libraries are most useful with long-form datasets, but this isn't always what you have to work with. Research the [`pandas.DataFrame.melt()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html) function to convert the file `film_classification_extended.csv` from wide-form to long-form, retaining the film type as an identifier variable.

Then, create a faceted plot showing the distributions of the tensile strength measurements (`strength-MD`, `strength-TD`). Display film types by row and the two measured properties by column.

In [None]:
# problem 7.2. solution

import pandas as pd
import seaborn as sns

# load the film classification dataset from Unit 6 problems
film_class_df = pd.read_csv('../../data/film_classification_extended.csv')
film_class_df.head()

In [None]:
# "melt" the DataFrame to long-form, preserving film type as an identifier
# by selecting only the value_vars columns of interest, we can avoid filtering the DataFrame layer
film_long_df = film_class_df.melt(id_vars='filmtype', value_vars=['strength-MD', 'strength-TD'])

# ALTERNATIVE: you could melt all columns, and then filter the DataFrame to the values of interest
# film_long_df = film_long_df[(film_long_df['variable']=='strength-MD') | (film_long_df['variable']=='strength-TD')]

film_long_df.head()

In [None]:
# use the displot to plot the faceted histograms
g = sns.displot(data=film_long_df,
                x='value', 
                row='filmtype', col='variable',
                facet_kws=dict(margin_titles=True))

# relable the axes
g.set_axis_labels('Tensile Strength (MPa)', 'Count')

## Problem 7.3.

To visualize the relationship between variables in a dataset, it is a common practice to create a faceted grid, with each variable represented as both a row and a column. Where the variable the row matches the variand column, the histogram for that variable is plotted. Where one variable in a row is paired with a different variable in the column, a scatterplot of this pair-wise relationship is plotted.

$$\begin{array}{|r|ccc|}
\hline
var1 & hist & scatter & scatter \\
var2 & scatter & hist & scatter \\
var3 & scatter & scatter & hist \\
\hline
 &  var1 & var2 & var3 \\
 \hline
\end{array}$$

Use the `dart`, `strength-MD` and `strength-TD` variables from the `film_classification_extended.csv` dataset. You can leave the data in wide-form for this exercise. 

1. Create this faceted plot using the `plt.subplots()` function and the basic `matplotlib` functionality
2. Review the documentation for the [`seaborn.pairplot()`](https://seaborn.pydata.org/generated/seaborn.pairplot.html) function and create a second version of the plot using `seaborn`. Also use the `filmtype` attribute to set the color of the points.

In [None]:
# problem 7.3. solution

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('ggplot')

# load the film classification dataset from Unit 6 problems
film_class_df = pd.read_csv('../../data/film_classification_extended.csv')
film_class_df.head()

In [None]:
# create the faceted grid
fig, ax = plt.subplots(nrows=3, ncols=3, sharex='col', dpi=120)

# plot the histograms, removing the y tick labesl for the histograms
ax[0, 0].hist(film_class_df['dart'], density=True)
ax[0, 0].set_yticklabels([])

ax[1, 1].hist(film_class_df['strength-MD'], density=True)
ax[1, 1].set_yticklabels([])

ax[2, 2].hist(film_class_df['strength-TD'], density=True)
ax[2, 2].set_yticklabels([])

# scatterplots with s=2 to make the marker smaller
# plot the scatterplots in the first column
ax[1, 0].scatter(film_class_df['dart'], film_class_df['strength-MD'], s=2)
ax[2, 0].scatter(film_class_df['dart'], film_class_df['strength-TD'], s=2)

ax[2, 0].set_xlabel('Dart Impact', fontsize='x-small')
ax[0, 0].set_ylabel('Dart Impact', fontsize='x-small')

# plot the scatterplots in the second column
ax[0, 1].scatter(film_class_df['strength-MD'], film_class_df['dart'], s=2)
ax[2, 1].scatter(film_class_df['strength-MD'], film_class_df['strength-TD'], s=2)

ax[2, 1].set_xlabel('Tensile Strength\n(MD)', fontsize='x-small')
ax[1, 0].set_ylabel('Tensile Strength\n(MD)', fontsize='x-small')

# plot the scatterplots in the third column
ax[0, 2].scatter(film_class_df['strength-TD'], film_class_df['dart'], s=2)
ax[1, 2].scatter(film_class_df['strength-TD'], film_class_df['strength-MD'], s=2)

ax[2, 2].set_xlabel('Tensile Strength\n(TD)', fontsize='x-small')
ax[2, 0].set_ylabel('Tensile Strength\n(TD)', fontsize='x-small')

# set the y-axis ticks for the first row
ax[0, 1].set_yticks(np.arange(0, 2, 0.5))
ax[0, 1].tick_params(labelsize='x-small')

ax[0, 2].set_yticks(np.arange(0, 2, 0.5))
ax[0, 2].tick_params(labelsize='x-small')

# set the y-axis ticks for the second row
ax[1, 0].set_yticks(np.arange(0, 400, 50))
ax[1, 0].tick_params(labelsize='x-small')

ax[1, 2].set_yticks(np.arange(0, 400, 50))
ax[1, 2].tick_params(labelsize='x-small')

# set the y-axis ticks for the second row
ax[2, 0].set_yticks(np.arange(0, 400, 50))
ax[2, 0].tick_params(labelsize='x-small')

ax[2, 1].set_yticks(np.arange(0, 400, 50))
ax[2, 1].tick_params(labelsize='x-small')

# set the font size for the bottom right plot tick labels
ax[2, 2].tick_params(labelsize='x-small')

In [None]:
# we can recreate this using pairplot() in a single line
g = sns.pairplot(film_class_df, hue='filmtype', vars=['dart', 'strength-MD', 'strength-TD'])

--------------
## Next Steps:

1. Advance to [Unit 8](../08-image-analysis/unit08-lesson.ipynb) when you're ready for the next step