# Information Visualization II
## School of Information, University of Michigan

## Week 1: 
- Multivariate/Multidimensional + Temporal

## Assignment Overview
### This assignment's objectives include:

- Review, reflect on, and apply different strategies for multidimensional/multivariate/temporal datasets

- Recreate visualizations and propose new and alternative visualizations using [Altair](https://altair-viz.github.io/) 

### The total score of this assignment will be 100 points consisting of:
- You will be producing four visualizations. Three of them will require you to follow the example closely, but the last will be fairly open-ended. For the last one, we'll also ask you to justify why you designed your visualization the way you did.

### Resources:
- Article by [FiveThirtyEight](https://fivethirtyeight.com) available  [online](https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/) (Hickey, 2014)
- The associated dataset on [Github](https://github.com/fivethirtyeight/data/tree/master/bob-ross)
- A dataset of all the [paintings from the show](https://github.com/jwilber/Bob_Ross_Paintings)
    
    
### Important notes:
1) Grading for this assignment is entirely done by manual inspection. For some of the visualizations, we'll expect you to get pretty close to our example (1-3). Problem 4 is more free-form.  

2) There are a few instances where our numbers do not align exactly with those from 538. We've pre-processed our data a little bit differently.

3) When turning in your PDF, please use the File -> Print -> Save as PDF option ***from your browser***. Do ***not*** use the File->Download as->PDF option. Complete instructions for this are under Resources in the Coursera page for this class.

If you're having trouble with printing, take a look at [this video](https://youtu.be/PiO-K7AoWjk).

In [1]:
# load up the resources we need
import urllib.request
import os.path
from os import path
import pandas as pd
import altair as alt
import numpy as np
from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA
import ipywidgets as widgets
from IPython.display import display
from PIL import Image

## Bob Ross

Today's assignment will have you working with artwork created by [Bob Ross](https://en.wikipedia.org/wiki/Bob_Ross). Bob was a very famous painter who had a televised painting show from 1983 to 1994. Over 13 seasons and approximately 400 paintings, Bob would walk the audience through a painting project. Often these were landscape images. Bob was famous for telling his audience to paint "happy trees" and sayings like, "We don't make mistakes, just happy little accidents." His soothing voice and bushy hair are well known to many generations of viewers.

If you've never seen an episode, I might suggest starting with [this one](https://www.youtube.com/watch?v=Fw6odlNp7_8). 

![bob ross](assets/bobrosspaints.png)

Bob Ross left a long legacy of art which makes for an interesting dataset to analyze. It's both temporally rich and has a lot of variables we can code. We'll be starting with the dataset created by 538 for their article on a [Statistical Analysis of Bob Ross](https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/). The authors of the article coded each painting to indicate what features the image contained (e.g., one tree, more than one tree, what kinds of clouds, etc.). 

In addition, we've downloaded a second dataset that contains the actual images. We know what kind of paint colors Bob used in each episode, and we have used that to create a dataset for you containing the color distributions. For example, we approximate how much '<font color='#614f4b'>burnt umber</font>' he used by measuring the distance (in color space) from each pixel in the image to the color. This is imperfect, of course (paints don't mix this way), but it'll be close enough for our analysis.

In [2]:
# the paints Bob used
rosspaints = ['alizarin crimson','bright red','burnt umber','cadmium yellow','dark sienna', 
              'indian yellow','indian red','liquid black','liquid clear','black gesso',
              'midnight black','phthalo blue','phthalo green','prussian blue','sap green',
              'titanium white','van dyke brown','yellow ochre']

# hex values for the paints above
rosspainthex = ['#94261f','#c06341','#614f4b','#f8ed57','#5c2f08','#e6ba25','#cd5c5c',
                '#000000','#ffffff','#000000','#36373c','#2a64ad','#215c2c','#325fa3',
                '#364e00','#f9f7eb','#2d1a0c','#b28426']

# boolean features about what an image includes
imgfeatures = ['Apple frame', 'Aurora borealis', 'Barn', 'Beach', 'Boat', 
               'Bridge', 'Building', 'Bushes', 'Cabin', 'Cactus', 
               'Circle frame', 'Cirrus clouds', 'Cliff', 'Clouds', 
               'Coniferous tree', 'Cumulis clouds', 'Decidious tree', 
               'Diane andre', 'Dock', 'Double oval frame', 'Farm', 
               'Fence', 'Fire', 'Florida frame', 'Flowers', 'Fog', 
               'Framed', 'Grass', 'Guest', 'Half circle frame', 
               'Half oval frame', 'Hills', 'Lake', 'Lakes', 'Lighthouse', 
               'Mill', 'Moon', 'At least one mountain', 'At least two mountains', 
               'Nighttime', 'Ocean', 'Oval frame', 'Palm trees', 'Path', 
               'Person', 'Portrait', 'Rectangle 3d frame', 'Rectangular frame', 
               'River or stream', 'Rocks', 'Seashell frame', 'Snow', 
               'Snow-covered mountain', 'Split frame', 'Steve ross', 
               'Man-made structure', 'Sun', 'Tomb frame', 'At least one tree', 
               'At least two trees', 'Triple frame', 'Waterfall', 'Waves', 
               'Windmill', 'Window frame', 'Winter setting', 'Wood framed']

# load the data frame
bobross = pd.read_csv("assets/bobross.csv")

# enable correct rendering (unnecessary in later versions of Altair)
alt.renderers.enable('default')

# uses intermediate json files to speed things up
alt.data_transformers.enable('json')

DataTransformerRegistry.enable('json')

We have a few variables defined for you that you might find useful for the rest of this exercise. First is the ```bobross``` dataframe which, has a row for every painting created by Bob (we've removed those created by guest artists).

In [3]:
# run to see what's inside
bobross.sample(5)

Unnamed: 0,EPISODE,TITLE,RELEASE_DATE,Apple frame,Aurora borealis,Barn,Beach,Boat,Bridge,Building,...,phthalo blue,phthalo green,prussian blue,sap green,titanium white,van dyke brown,yellow ochre,img_url,week_number,year
178,S15E09,"""CHRISTMAS EVE SNOW""",6/22/88,0,0,0,0,0,0,0,...,0.457815,0.0,0.477909,0.0,0.395812,0.300251,0.0,https://raw.githubusercontent.com/jwilber/Bob_...,25,1988
78,S07E10,"""MOUNTAIN GLORY""",12/6/85,0,0,0,0,0,0,0,...,0.370474,0.0,0.392799,0.273398,0.389869,0.294471,0.347862,https://raw.githubusercontent.com/jwilber/Bob_...,49,1985
70,S06E13,"""BLAZE OF COLOR""",7/23/85,0,0,0,0,0,0,0,...,0.378326,0.399839,0.0,0.31704,0.301774,0.342608,0.382935,https://raw.githubusercontent.com/jwilber/Bob_...,30,1985
212,S18E08,"""WINTER LACE""",8/23/89,0,0,0,0,0,0,0,...,0.297001,0.0,0.310172,0.0,0.731454,0.103736,0.0,https://raw.githubusercontent.com/jwilber/Bob_...,34,1989
254,S21E11,"""DESERT GLOW""",11/14/90,0,0,0,0,0,0,0,...,0.274712,0.0,0.0,0.199524,0.496273,0.0,0.330561,https://raw.githubusercontent.com/jwilber/Bob_...,46,1990


In the dataframe you will see an episode identifier (EPISODE, which contains the season and episode number), the image title (TITLE), the release date (RELEASE_DATE as well as another column for the year). There are also a number of boolean columns for the features coded by 538. A '1' means the feature is present, a '0' means it is not. A list of those columns is available in the ```imgfeatures``` variable.

In [4]:
# run to see what's inside
print(imgfeatures)

['Apple frame', 'Aurora borealis', 'Barn', 'Beach', 'Boat', 'Bridge', 'Building', 'Bushes', 'Cabin', 'Cactus', 'Circle frame', 'Cirrus clouds', 'Cliff', 'Clouds', 'Coniferous tree', 'Cumulis clouds', 'Decidious tree', 'Diane andre', 'Dock', 'Double oval frame', 'Farm', 'Fence', 'Fire', 'Florida frame', 'Flowers', 'Fog', 'Framed', 'Grass', 'Guest', 'Half circle frame', 'Half oval frame', 'Hills', 'Lake', 'Lakes', 'Lighthouse', 'Mill', 'Moon', 'At least one mountain', 'At least two mountains', 'Nighttime', 'Ocean', 'Oval frame', 'Palm trees', 'Path', 'Person', 'Portrait', 'Rectangle 3d frame', 'Rectangular frame', 'River or stream', 'Rocks', 'Seashell frame', 'Snow', 'Snow-covered mountain', 'Split frame', 'Steve ross', 'Man-made structure', 'Sun', 'Tomb frame', 'At least one tree', 'At least two trees', 'Triple frame', 'Waterfall', 'Waves', 'Windmill', 'Window frame', 'Winter setting', 'Wood framed']


The columns that contain the amount of each color in the paintings are listed in ```rosspaints```. There is also an analogous list variable called ```rosspainthex``` that has the hex values for the paints. These hex values are approximate.

In [5]:
# run to see what's inside
print("paint names",rosspaints)
print("")
print("hex values", rosspainthex)

paint names ['alizarin crimson', 'bright red', 'burnt umber', 'cadmium yellow', 'dark sienna', 'indian yellow', 'indian red', 'liquid black', 'liquid clear', 'black gesso', 'midnight black', 'phthalo blue', 'phthalo green', 'prussian blue', 'sap green', 'titanium white', 'van dyke brown', 'yellow ochre']

hex values ['#94261f', '#c06341', '#614f4b', '#f8ed57', '#5c2f08', '#e6ba25', '#cd5c5c', '#000000', '#ffffff', '#000000', '#36373c', '#2a64ad', '#215c2c', '#325fa3', '#364e00', '#f9f7eb', '#2d1a0c', '#b28426']


### Problem 1  (20 points)

As a warmup, we're going to have you recreate the [first chart from the Bob Ross article](assets/bob_ross_538.png) (source: [Statistical Analysis of Bob Ross](https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/)). This one simply shows a bar chart for the percent of images that have certain features. The Altair version is:

!["Bob Ross feature distribution"](assets/bob_ross_altair.png)

We'll be using the 538 theme for styling, so you don't have to do much beyond creating the chart (but do note that we want to see the percents, titles, and modifications to the axes). 

You will replace the code for ```makeBobRossBar()``` and have it return an Altair chart.  We suggest you first create a table that contains the names of the features and the percents.  Something like this:

!["Sample Table](assets/feature_table.png)

Recall that this is the 'long form' representation of the data, which will make it easier to create a visualization with. Also, **note the order of the bars. It's not arbitrary, please re-create it.**

In [6]:
def makeBobRossBar():
    # implement this function to return an altair chart
    
    alt.themes.enable("fivethirtyeight")
    
    bobross_df = bobross.copy()

    bobross_df = bobross_df[bobross_df.columns.intersection(imgfeatures)]
    bobross_df = pd.melt(bobross_df, var_name='index', value_name='value')
    bobross_df = bobross_df.groupby(['index']).sum().reset_index()
    bobross_df['value'] = bobross_df['value'] / len(bobross)
    bobross_df = bobross_df[bobross_df['value'] > .015]  
    bobross_df = bobross_df.sort_values(by=['value'],ascending = False)
    
    data = bobross_df
    
    bars = alt.Chart(data).mark_bar().encode(
    x= alt.X('value:Q',axis= None),
    y= alt.Y('index:O',sort = '-x', axis=alt.Axis(title = None, domainOpacity = 0)
            )
    )
    

    text = bars.mark_text(
        align='left',
        baseline='middle',
        dx=3  # Nudges text to right so it doesn't appear on top of the bar
    ).encode(
        text= alt.Text('value:Q',format = '.0%')
    )
    
    chart = (bars + text
            ).properties(title={
                'text':['The Paintings of Bob Ross'],
                'subtitle':['Percentage containing each element']
                }             
            ).configure_view(
    strokeWidth=0
)
    
    return chart
    
    # YOUR CODE HERE
    #raise NotImplementedError()

In [7]:
# run this code to validate
alt.themes.enable('fivethirtyeight')
makeBobRossBar()

## Problem 2 (25 points)

The 538 article ([Statistical Analysis of Bob Ross](https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/)) has a long analysis of conditional probabilities. Essentially, we want to know the probability of one feature given another (e.g., what is the probability of Snow given Trees?). The article calculates this over the entire history of the show, but we would like to visualize these probabilities over time. Have they been constant? or evolving?  We will only be doing this for a few variables (otherwise, we'll have a matrix of over 3000 small charts). Specifically, we care about images that contain: 'At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake.' Each small multiple plot will be a line chart corresponding to the conditional probability over time. The matrix "cell" indicates which pairs of variables are being considered (e.g., probability of at least two trees given the probability of at least one tree is the 2nd row, first column in our example).

Your task will be to generate the small multiples plot below:

!["Small multiples"](assets/matrix_small.png)

The full image is [available here](assets/matrix_full.png). While your small multiples visualization should contain all this data, you can ***feel free to style it as you think is appropriate***. We will be grading (minimally) on aesthetics. Implement the code for the function: ```makeBobRossCondProb()``` to return this chart.

Some notes on doing this exercise:

* If you don't remember how to calculate conditional probabilities, take a look at the article. Remember, we want the conditional probabilities given the images in a specific year. This is simply an implementation of Conditional Probability/Bayes' Theorem. We implemented a function called ```condprobability(...)``` as you can see below. You can do the same or pick your own strategy for this.

* We suggest creating a long-form representation of the table for this data. For example, here's a sample of ours (you can use this to double check your calculations):

!["Long form conditional probabilities](assets/cond_prob_table.png)

* There are a number of strategies to build the small-multiple plots. Some are easier than others. You will find in this case that some combinations of repeated charts and faceting will not work. However, you should be able to use the standard concatenation approaches in combination with repeated charts or faceting.

In [8]:
def condprobability(totest=['At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake']):
    # we suggest you implement this function to make your life easier. It should take a datafame as input,
    # the two columns we want the conditional probability for, and the year for which we want to compare
    # you can make variants of this function as you see fit
    
    # YOUR CODE HERE
    import copy
    imgs = copy.deepcopy(totest)
#     totest=['At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake']
    totest.append('year')

    bobross_df2 = bobross.copy()
    bobross_df2 = bobross_df2[bobross_df2.columns.intersection(totest)]

#     imgs = ['At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake']
    final_df = []
    for i in list(bobross_df2['year'].unique()):
        bobross_df3 = bobross_df2.copy()
        bobross_df3 = bobross_df2[bobross_df2['year']==i] 
        for x in imgs:
            for y in imgs:
                probAB = len(bobross_df3[(bobross_df3[x] == 1) & (bobross_df3[y]==1)]) 
                condit = probAB / len(bobross_df3[bobross_df3[y]==1])
                toappend = [x,y,i,condit]
                final_df.append(toappend)

    c_prob = pd.DataFrame(final_df,columns=['key1','key2','year','prob'])
    return c_prob
    
    
    
    #raise NotImplementedError()

In [9]:
# test = condprobability()
# test[test['year']==1991]

In [10]:
def makeBobRossCondProb(totest=['At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake']):
    data = condprobability()
    
    charts = []
    for i in totest:
        source = data.copy()
        source = data[data['key1']==i]
        charts.append(alt.Chart(source).mark_line().encode(
            x = alt.X('year:O',axis = alt.Axis(title=None,tickCount = 2)),
            y = alt.Y('prob:Q',axis = alt.Axis(title = ['Probability of',i]))
        ).properties(
            width=50,
            height=50
        ).facet(
            column = alt.Column('key2:N',
            sort = totest,
            title = 'Given...'
        )
        ))
        
        
#          "text": ["First line of title", "Second line of title"], 
#       "subtitle": ["Cool first line of subtitle", "Even cool
    
    chart = alt.vconcat(*charts)
    
    return chart
    


In [11]:
makeBobRossCondProb()

In [12]:
# totest=['At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake']
# totest.append('year')

# bobross_df2 = bobross.copy()
# bobross_df2 = bobross_df2[bobross_df2.columns.intersection(totest)]

# imgs = ['At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake']
# final_df = []
# for i in list(bobross_df2['year'].unique()):
#     bobross_df3 = bobross_df2.copy()
#     bobross_df3 = bobross_df2[bobross_df2['year']==i] 
#     for x in imgs:
#         for y in imgs:
#             probAB = len(bobross_df3[(bobross_df3[x] == 1) & (bobross_df3[y]==1)]) 
#             condit = probAB / len(bobross_df3[bobross_df3[y]==1])
#             toappend = [x,y,i,condit]
#             final_df.append(toappend)
            
# c_prob = pd.DataFrame(final_df,columns=['key1','key2','year','prob'])
# c_prob
# # bobross_df2.head()

In [13]:
# # bobross_df2 = bobross.copy()
# bobross_df2.head()

In [14]:
# imgs = ['At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake']
# final_df = []
# for i in list(bobross_df2['year'].unique()):
#     bobross_df2 = bobross_df2[bobross_df2['year']==i] 
#     for x in imgs:
#         for y in imgs:
#             probAB = len(bobross_df2[(bobross_df2[x] == 1) & (bobross_df2[y]==1)]) 
#             condit = probAB / len(bobross_df2[bobross_df2[y]==1])
#             toappend = [x,y,i,condit]
#             final_df.append(toappend)
            
# c_prob = pd.DataFrame(final_df,columns=['key1','key2','year','prob'])
# c_prob

In [15]:
# def makeBobRossCondProb(totest=['At least one tree','At least two trees','Clouds','Grass','At least one mountain','Lake']):
#     # implement this function to return an altair chart
#     # note that we have created a default 'totest' variable that has the columns for which 
#     # we want the pairwise analysis
    
#     # return alt.Chart(...)
    
#     # YOUR CODE HERE
#     raise NotImplementedError()

In [16]:
# # run this cell to test your code
# makeBobRossCondProb()

### Additional comments

If you deviated from our example, please use this cell to give us additional information about your design choices and why you think they are an improvement.


## Problem 3 (25 points)

Recall that in some cases of multidimensional data a good strategy is to use dimensionality reduction to visualize the information. Here, we would like to understand how images are similar to each other in 'feature' space. Specifically, how similar are they based on the image features? Are images that have beaches close to those with waves? 

We are going to create a 2D MDS plot using the scikit learn package. We're going to do most of this for you in the next cell. Essentially we will use the euclidean distance between two images based on their image feature array to create the image. Your plot may look slightly different than ours based on the random seed (e.g., rotated or reflected), but in the end, it should be close. If you're interested in how this is calculated, we suggest taking a look at [this documentation](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html)

Note that the next cell may take a minute or so to run, depending on the server.  

In [17]:
# create the seed
seed = np.random.RandomState(seed=3)

# generate the MDS configuration, we want 2 components, etc. You can tweak this if you want to see how
# the settings change the layout
mds = manifold.MDS(n_components=2, max_iter=3000, eps=1e-9, random_state=seed, n_jobs=1)

# fit the data. At the end, 'pos' will hold the x,y coordinates
pos = mds.fit(bobross[imgfeatures]).embedding_

# we'll now load those values into the bobross data frame, giving us a new x column and y column
bobross['x'] = [x[0] for x in pos]
bobross['y'] = [x[1] for x in pos]

In [18]:
bobross.head()

Unnamed: 0,EPISODE,TITLE,RELEASE_DATE,Apple frame,Aurora borealis,Barn,Beach,Boat,Bridge,Building,...,prussian blue,sap green,titanium white,van dyke brown,yellow ochre,img_url,week_number,year,x,y
0,S01E01,"""A WALK IN THE WOODS""",1/11/83,0,0,0,0,0,0,0,...,0.335426,0.291454,0.341764,0.301478,0.0,https://raw.githubusercontent.com/jwilber/Bob_...,2,1983,0.468509,-0.655269
1,S01E02,"""MT. MCKINLEY""",1/11/83,0,0,0,0,0,0,0,...,0.333043,0.205642,0.571301,0.223015,0.0,https://raw.githubusercontent.com/jwilber/Bob_...,2,1983,-0.787913,2.087431
2,S01E03,"""EBONY SUNSET""",1/18/83,0,0,0,0,0,0,0,...,0.226244,0.744489,0.047932,0.718768,0.0,https://raw.githubusercontent.com/jwilber/Bob_...,3,1983,0.088857,2.670687
3,S01E04,"""WINTER MIST""",1/25/83,0,0,0,0,0,0,0,...,0.489283,0.0,0.48124,0.269215,0.0,https://raw.githubusercontent.com/jwilber/Bob_...,4,1983,-1.435869,0.895231
4,S01E05,"""QUIET STREAM""",2/1/83,0,0,0,0,0,0,0,...,0.364585,0.267708,0.341428,0.286462,0.0,https://raw.githubusercontent.com/jwilber/Bob_...,5,1983,0.668885,-0.865571


Your task is to implement the visualization for the MDS layout. We will be using a new mark, ```mark_image```, for this. You can read all about this mark on the Altair site [here](https://altair-viz.github.io/user_guide/marks.html#user-guide-image-mark). Note that we all already saved the images for you. They are accessible in the img_url column in the bobross table. You will use the ```url``` encode argument to mark_image to make this work.

In this case, we would also like to emphasize all the images that *have* a specific feature. So when you define your ```genMDSPlot()``` function below, it should take a key string as an argument (e.g., 'Beach') and visually highlight those images. A simple way to do this is to use a second mark underneath the image (e.g., a rectangle) that is a different color based on the absence or presence of the image.  Here's an example output for ```genMDSPlot("Palm trees")```:

!["mds"](assets/mds_small.png)

Click [here](assets/mds_large.png) for a large version of this image. Notice the orange boxes indicating where the Palm tree images are. Note that we have styled the MDS plot to not have axes. Recall that these are meaningless in MDS 'space' (this is not a scatterplot, it's a projection).

In [19]:
def genMDSPlot(key):
    # return an altair chart (e.g., return alt.Chart(...))
    # key is a string indicating which images should be visually highlighted (i.e., images containing the feature
    # should be made salient)
    
    data = bobross.copy()
    data = data[['img_url','x','y',key]]
    
    images = alt.Chart(data).mark_image(width=15,
    height=15
    ).encode(
        x = alt.X('x:Q',axis = None)
        ,
        y = alt.Y('y:Q',axis = None)
        ,
        url = 'img_url'
    )
    
    colors = alt.Chart(data).mark_rect(width = 17,
    height = 17
    ).encode(
        x = alt.X('x:Q',axis = None)
        ,
        y = alt.Y('y:Q',axis = None),
        color = alt.Color(key, type = "nominal")
    )
    
    return colors + images
    
    # YOUR CODE HERE
    #raise NotImplementedError()
    

In [20]:
data = bobross.copy()
key = 'Beach'
data = data[['x','y',key]]
data.head()

Unnamed: 0,x,y,Beach
0,0.468509,-0.655269,0
1,-0.787913,2.087431,0
2,0.088857,2.670687,0
3,-1.435869,0.895231,0
4,0.668885,-0.865571,0


In [21]:
genMDSPlot('Beach')

We are going to create an interactive widget that allows you to select the feature you want to be highlighted. If you implemented your ```genMDSPlot``` code correctly, the plot should change when you select new items from the list. We would ordinarily do this directly in Altair, but because we don't have control over the way you created your visualization, it's easiest for us to use the widgets built into Jupyter.

It should look something like this:

!["mds interactive"](assets/interactive_mds.png)

It may take a few seconds the first time you run this to download all the images.

In [22]:
# note that it might take a few seconds for the images to download
# depending on your internet connection

output = widgets.Output()

def clicked(b):
    output.clear_output()
    with output:
        highlight = filterdrop.value
        if (highlight == ""):
            print("please enter a query")
        else:
            genMDSPlot(highlight).display()


featurecount = bobross[imgfeatures].sum()

filterdrop = widgets.Dropdown(
    options=list(featurecount[featurecount > 2].keys()),
    description='Highlight:',
    disabled=False,
)

filterdrop.observe(clicked)

display(filterdrop,output)

with output:
    genMDSPlot('Barn').display()


Dropdown(description='Highlight:', options=('Barn', 'Beach', 'Bridge', 'Bushes', 'Cabin', 'Cactus', 'Cirrus cl…

Output()

## Problem 4 (30 points: 25 for solution, 5 for explanation)

Your last problem is fairly open-ended in terms of visualization. We would like to analyze the colors used in different images for a given season as a small multiples plot. You can pick how you represent your small multiples, but we will ask you to defend your choices below.  You must implement the function ```colorSmallMultiples(season)``` that takes a season number as input (e.g., 2) and returns an Altair chart. The "multiples" should be at the painting level--so, one multiple per painting (and each TV season shown at once).

You can go something as simple as this:

!["simple small multiples"](assets/bob_ross_color_glyph.png)

This visualization has a row for every painting and a colored circle (in the color of the paint). The circle is sized based on the amount of the corresponding paint that is used in the image. 

You can also go to something as crazy as this:

!["face small multiples"](assets/bob_ross_face.png)

Here, we've overlaid circles as curls in Bob's massive hair. We're not claiming this is an effective solution, but you're welcome to do this (or anything else) as long as you describe the pros and cons of your choices. And, yes, we generated both examples using Altair.

Again, the relevant columns are available are listed in ```rosspaints``` (there are 18 of them). The values range from 0 to 1 based on the fraction of pixel color allocated to that specific paint.  The ```rosspainthex``` has the corresponding hex values for the paint color. 

Make sure your visualization is actually a small multiple approach. There should be "mini" visualizations for each painting.

In [23]:
def colorSmallMultiples(season):
    
    # return an Altair chart
    # season is the integer representing the season of the show are interested in. Limit your images
    # to that season in the small multiples display.
    
    # YOUR CODE HERE
    
    import copy

    needed_col = copy.deepcopy(rosspaints)
    needed_col.extend(['EPISODE','TITLE'])
    needed_col

    bobross_df = bobross.copy()
    bobross_df = bobross_df[bobross_df.columns.intersection(needed_col)]

    bobross_df.head()

    data = pd.melt(bobross_df, id_vars=['EPISODE','TITLE'], value_vars=rosspaints)

    d = {'variable':rosspaints,'hex':rosspainthex}
    df = pd.DataFrame(d)

    data = data.merge(df, on='variable', how='left')

    data["season"] = data['EPISODE'].str[1:3].astype(int)

    data = data[data['season']==season]
    data = data.rename(columns = {'TITLE':'Episode Title'})

    chart = alt.Chart(data).mark_bar().encode(
        x=alt.X('value:Q', stack="normalize",axis=alt.Axis(format='%',title = None)),
        color=alt.Color('hex:N',scale=None)
    ).facet(facet='Episode Title:N',columns=2
    ).properties(title = 'The Color Breakdown in Season ' + str(season)
    )

    return chart
    
    #raise NotImplementedError()

In [24]:
# run this to test your code for season 1
colorSmallMultiples(1)

In [25]:
# run this to test your code for season 2
colorSmallMultiples(2)

In [26]:
import copy
bobross.head()

season = 4

needed_col = copy.deepcopy(rosspaints)
needed_col.extend(['EPISODE','TITLE'])
needed_col

bobross_df = bobross.copy()
bobross_df = bobross_df[bobross_df.columns.intersection(needed_col)]

bobross_df.head()

data = pd.melt(bobross_df, id_vars=['EPISODE','TITLE'], value_vars=rosspaints)

d = {'variable':rosspaints,'hex':rosspainthex}
df = pd.DataFrame(d)

data = data.merge(df, on='variable', how='left')

data["season"] = data['EPISODE'].str[1:3].astype(int)

data = data[data['season']==season]
data = data.rename(columns = {'TITLE':'Episode Title'})

chart = alt.Chart(data).mark_bar().encode(
    x=alt.X('value:Q', stack="normalize",axis=alt.Axis(format='%',title = None)),
    color=alt.Color('hex:N',scale=None)
).facet(facet='Episode Title:N',columns=2
).properties(title = 'The Color Breakdown in Season ' + str(season)
)

data


Unnamed: 0,EPISODE,Episode Title,variable,value,hex,season
37,S04E01,"""PURPLE SPLENDOR""",alizarin crimson,0.248065,#94261f,4
38,S04E02,"""TRANQUIL VALLEY""",alizarin crimson,0.537490,#94261f,4
39,S04E03,"""MAJESTIC MOUNTAINS""",alizarin crimson,0.500875,#94261f,4
40,S04E05,"""EVENING SEASCAPE""",alizarin crimson,0.415874,#94261f,4
41,S04E06,"""WARM SUMMER DAY""",alizarin crimson,0.376069,#94261f,4
...,...,...,...,...,...,...
6520,S04E08,"""WETLANDS""",yellow ochre,0.466780,#b28426,4
6521,S04E09,"""COOL WATERS""",yellow ochre,0.324393,#b28426,4
6522,S04E10,"""QUIET WOODS""",yellow ochre,0.291199,#b28426,4
6523,S04E12,"""AUTUMN DAYS""",yellow ochre,0.437968,#b28426,4


### Explain your choices

Explain your design here. Describe the pros and cons in terms of visualization principles.


I went with a simpler design for question 4, in order to think through how I might present something like this in real life.  Naturally, my decision has some pros and cons.

#### Pros:
- It is expressive: Every data point we wanted encoded is there; the colors used are shown in the stacked bar chart, the percent of the colors is represented by the width of the individual bar charts, each season is broken down by each artwork.
- The stacking of small multiples makes it easier to compare artworks (at least within the same column... across columns is less effective).  I experimented with different column numbers and settled on two.  Anything greater made it difficult to compare pieces of art and less than made for an overwhelmingly long list to compare.
- Every small multiple is intuitive and does not require the reader to learn what they are looking at

#### Cons:
- While it is relatively easy to see that Ross used more of the shade of white in season 2 episode 1 than season 2, for instance, the single axes at the bottom of each column makes it difficult to see just how much more of the white shade Ross used between those episodes.  It is not effective as it could be for that reason.
- The colors do not have a legend or labels and some shades are difficult to distinguish from one another.  If this were a chart that I knew would have to be spoken about (a group talking about Ross' use of Prussian Blue vs. Phthalo Blue, for example) I may have approached my design differently.