# **SI649 W25 Altair Theme Homework #3**

# Overview


We are focusing on **custom themes & small multiples** in this lab! For this assignment, we will be looking at *Star Wars* character dataset from [America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters)](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/) by Walt Hickey.

### Lab Instructions

*   Save, rename, and submit the ipynb file (use your username in the name).
*   Complete all the checkpoints, to create the required visualization at each cell
*   Run every cell (do Runtime -> Restart and run all to make sure you have a clean working version), and upload your .ipynb file to Canvas.
*   For each visualization, there is a space to write down a "Grammar of Graphics" plan, but this is optional for this assignment.
*   If you end up stuck, show us your work by including links (URLs) that you have searched for. You'll get partial credit for showing your work in progress.


In [1]:
# suppress warnings about future deprecations
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# imports we will use
import altair as alt
import pandas as pd

# for large data sets
alt.data_transformers.disable_max_rows()

# read in data
df = pd.read_csv('https://scmcqueen.github.io/si649_hw/skyelerStarWars.csv',header=[0,1],skip_blank_lines=True)



## Part 1: Character Rating Facet Charts

Using an Altair Facet chart, recreate the 'Star Wars' Character Favorability Ratings chart from the [Star Wars article](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/). It should look like this:

<img src="https://scmcqueen.github.io/si649_hw/favorability_SW.png" height="300">

### Step 1: Get data in the correct format
We will get this started for you

In [2]:
# Read in the data with Character name & review
download = pd.read_csv('https://scmcqueen.github.io/si649_hw/reviews_by_char.csv',index_col=0)
download.head()

Unnamed: 0,character,rating
0,Han Solo,Very favorably
2,Han Solo,Somewhat favorably
3,Han Solo,Very favorably
4,Han Solo,Very favorably
5,Han Solo,Very favorably


In [3]:
download.rating.value_counts()

rating
Very favorably                                 5166
Somewhat favorably                             2569
Neither favorably nor unfavorably (neutral)    1647
Unfamiliar (N/A)                                852
Somewhat unfavorably                            654
Very unfavorably                                641
Name: count, dtype: int64

In [4]:
# TODO: Map the 'rating' column to a new column that matches the chart's Favorable, Unfavorable, etc.
rating_map = {
    "Very favorably": "Favorable",
    "Somewhat favorably": "Favorable",
    "Neither favorably nor unfavorably (neutral)": "Neutral",
    "Somewhat unfavorably": "Unfavorable",
    "Very unfavorably": "Unfavorable",
    "Unfamiliar (N/A)": "Unfamiliar"
}

download['rating_mapped'] = download['rating'].map(rating_map)

In [5]:
# TODO: in Pandas, get the percent of each rating per character
rating_count = download.groupby(['character','rating_mapped']).size().reset_index(name='count')

# Calculate the percentage of each rating per character and transform to wide format
rating_count['percent'] = rating_count.groupby('character')['count'].transform(lambda x: round(x / x.sum()*100))

# Sort the data by character and rating_mapped by the order of "Favorable", "Neutral", "Unfavorable", "Unfamiliar"
rating_count = rating_count.sort_values(by=['character', 'rating_mapped'], 
                                        key=lambda x: x.map({'Favorable':1,'Neutral':2,'Unfavorable':3,'Unfamiliar':4}))

rating_count

Unnamed: 0,character,rating_mapped,count,percent
0,Anakin Skywalker,Favorable,514,62.0
4,Boba Fett,Favorable,291,36.0
8,C-3P0,Favorable,703,85.0
12,Darth Vader,Favorable,481,58.0
16,Emperor Palpatine,Favorable,253,31.0
20,Han Solo,Favorable,761,92.0
24,Jar Jar Binks,Favorable,242,29.0
28,Lando Calrissian,Favorable,365,45.0
32,Luke Skywalker,Favorable,771,93.0
36,Obi Wan Kenobi,Favorable,750,91.0


### Step 2: Create your charts


Hints:
* Layer the charts before faceting!

In [6]:
# Extract the sort order according to the Favorable column
fav_sort = rating_count[rating_count["rating_mapped"] == "Favorable"].sort_values(by="percent", ascending=False)["character"].to_list()

bar_chart = alt.Chart(rating_count).mark_bar().encode(
    x=alt.X('percent:Q', title=None, axis = None),
    y=alt.Y('character:N', title=None, sort = fav_sort, axis = alt.Axis(grid = False)),
    color=alt.Color('rating_mapped:N', title= None, legend = None),
).properties(
    width=80,
    height=250
)

text = alt.Chart(rating_count).mark_text(align = 'left', dx = 2).encode(
    x = alt.X('percent:Q'),
    y = alt.Y('character:N', sort = fav_sort),
    text = alt.Text('percent:Q'),
    color = alt.Color('rating_mapped:N')
)


rating_chart = alt.layer(bar_chart, text).facet(
    column=alt.Column('rating_mapped:N', title=None, sort=['Favorable', 'Neutral', 'Unfavorable', 'Unfamiliar'])
).resolve_scale().properties(
    title = {
        "text": "'Star Wars' Character Favorability Rating",
        "subtitle": "By 837 respondents"
    }
).configure_view(stroke = None)

rating_chart


## Part 2: Star Wars Theme

### Step 1: Create Star Wars Theme

We want you to try implementing a custom theme in Altair based on this style guide:

<img src="https://scmcqueen.github.io/si649_hw/StarWars_StyleGuide.png" height="1200">

We will give you some starter code, so you aren't creating a theme fully from scratch. The Altair documentation on [Chart Customization](https://altair-viz.github.io/user_guide/customization.html) and this Towards Data Science Article [Consistently Beautiful Visualizations with Altair Themes](https://medium.com/towards-data-science/consistently-beautiful-visualizations-with-altair-themes-c7f9f889602) should serve as helpful guides.

You can set the spacing, color palettes, font schemes, etc.

Run the cell below to get the font you need.

In [7]:
%%html

<style>
@import url('https://fonts.googleapis.com/css?family=Lato');
</style>

In [8]:
# TO DO: Modify this code to fit the above style guide
@alt.theme.register("star_wars", enable=True) # Comment this line out for Altair 5.2
# the theme is defined as a function
def star_wars_solution():
    # you can set variables here and reuse later in the function
    font = "Lato"

    backgroundColor = "#FFFFFF"

    font_color = "#000000"

    font_size = {
        "XL": 20,
        "L": 18,
        "M": 16,
        "S": 14,
        "XS": 12
    }

    return {
        "config": {
            "title": {
                "anchor": "start",
                "font": "Lato",
                "fontSize": font_size["XL"],
                "subtitleFontSize": font_size["L"],
                "subtitleFont": "Lato"
            },
            "axisX": {
                "titleFontSize": font_size["M"],
                "labelFontSize": font_size["S"],
                "grid": False,
                "labelFont": font,
           },
           "axisY": {
                "domain": False,
                "titleFontSize": font_size["M"],
                "titleAngle": 0,
                "titleBaseline": "bottom",
                "titleAnchor": "start",
                "labelFontSize": font_size["S"],
                "grid": True,
                "gridColor": "#aab0c0",
                "gridWidth": 1,
                "labelFont": font,
           },
           "axis": {
               "labelFont": font,
               "titleFont": font,
               "titleFontSize": font_size["M"],
               "labelFontSize": font_size["S"],
           },
           "background": backgroundColor,
           "view": {
               "stroke": None
           },
           "range": {
                "category": ["#B62321", "#A1A332", "#15509F", "#3B444B", "#D7C078", "#DC5026"],
                "ramp": ["#B62321", "#BD4232", "#C36244", "#CA8155", "#D0A167", "#D7C078"],
                "diverging": ["#15509F", "#638ABF", "#B1C5DF", "#FFFFFF", "#F3C5B7", "#E88A6E", "#DC5026"]
           },
           "area": {
           },
           "line": {
           },
           "point": {
           },
           "text": {
               "fontSize": font_size["S"],
           },
           "bar": {
               "binSpacing": 1,
               "size": 25
            },
            "facet": {
                "titleFontSize": font_size["M"],
            },
            "notes": {
                "fontSize": font_size["XS"]
            },
            "sources":{
                "fontSize": font_size["XS"],
                "anchor": "right"
            },
       },
    }

# For Altair version 5.2, uncomment the lines below
# alt.themes.register('star_wars', star_wars_solution)
# enable the newly registered theme
#alt.theme.enable('star_wars')

### Step 2: Recreate Favorability Chart with new theme

To check that your theme looks correct, recreate your chart of Star Wars characters favorability. If your theme is enabled correctly, the visualization should look like this:

<img src="https://scmcqueen.github.io/si649_hw/favorability_SW_theme.png" height="300">

In [9]:
# TODO: Copy code from above for facet chart


bar_chart = alt.Chart(rating_count).mark_bar().encode(
    x=alt.X('percent:Q', title=None, axis = None),
    y=alt.Y('character:N', title=None, sort = fav_sort, axis = alt.Axis(grid = False, labelFontSize=12)),
    color=alt.Color('rating_mapped:N', title= None, legend = None),
).properties(
    height = 300,
    width = 100
)

text = alt.Chart(rating_count).mark_text(align = 'left', dx = 2, fontSize = 12).encode(
    x = alt.X('percent:Q'),
    y = alt.Y('character:N', sort = fav_sort),
    text = alt.Text('percent:Q'),
    color = alt.Color('rating_mapped:N')
)


rating_chart = alt.layer(bar_chart, text).facet(
    column=alt.Column('rating_mapped:N', title=None, sort=['Favorable', 'Neutral', 'Unfavorable', 'Unfamiliar'])
).resolve_scale().properties(
    title = {
        "text": "'Star Wars' Character Favorability Rating",
        "subtitle": "By 837 respondents",
        "subtitleFontSize": 14
    }
).configure_view(stroke = None)

rating_chart

### Step 3: Recreate the following charts to test your theme

#### Chart 1: Favorable Ratings by Number of Ratings

<img src="https://scmcqueen.github.io/si649_hw/favorable.png" height="300">

In [10]:
# calculate the total number of ratings for each character and add to the rating_count dataframe
rating_count_total = rating_count.groupby('character')['count'].sum().reset_index(name='total')
rating_count_w_total = pd.merge(rating_count, rating_count_total, on='character')
rating_count_w_total['percent_1'] = rating_count_w_total['count'] / rating_count_w_total['total']
fav_rating_count = rating_count_w_total[rating_count_w_total['rating_mapped'] == 'Favorable']
fav_rating_count

Unnamed: 0,character,rating_mapped,count,percent,total,percent_1
0,Anakin Skywalker,Favorable,514,62.0,823,0.624544
1,Boba Fett,Favorable,291,36.0,812,0.358374
2,C-3P0,Favorable,703,85.0,827,0.85006
3,Darth Vader,Favorable,481,58.0,826,0.582324
4,Emperor Palpatine,Favorable,253,31.0,814,0.310811
5,Han Solo,Favorable,761,92.0,829,0.917973
6,Jar Jar Binks,Favorable,242,29.0,821,0.294762
7,Lando Calrissian,Favorable,365,45.0,820,0.445122
8,Luke Skywalker,Favorable,771,93.0,831,0.927798
9,Obi Wan Kenobi,Favorable,750,91.0,825,0.909091


In [11]:
fav_count_chart = alt.Chart(fav_rating_count).mark_point(size=10).encode(
    x = alt.X('total:Q', title = 'Number of Ratings', scale=alt.Scale(domain=[800, 850]), axis = alt.Axis(labelFontSize=12, titleFontSize=12)),
    y = alt.Y('percent_1:Q', title = None, axis = alt.Axis(labelFontSize=12)),
    color = alt.Color('percent_1:Q', title = 'Percent',
                      scale=alt.Scale(range="ramp"))
).properties(
    title = {
        "text": "Favorable Ratings by Number of Ratings",
        "subtitle": "Percent of Favorable Ratings",
        "fontSize": 16,
        "subtitleFontSize": 12
    }
)

fav_count_chart

#### Chart 2: Who shot first?

<img src="https://scmcqueen.github.io/si649_hw/shot_first.png" height="150">

In [12]:
# read in shooting data
shot_first = pd.read_csv('https://scmcqueen.github.io/si649_hw/shot_first.csv',index_col=0)
shot_first.head()

Unnamed: 0,shooter
0,I don't understand this question
2,I don't understand this question
3,I don't understand this question
4,Greedo
5,Han


In [13]:

# Prepare the data
shot_first_percent = shot_first.groupby('shooter').size().reset_index(name='count')
shot_first_percent['percent'] = shot_first_percent['count'] / shot_first_percent['count'].sum()

shot_first_percent



Unnamed: 0,shooter,count,percent
0,Greedo,197,0.237923
1,Han,325,0.392512
2,I don't understand this question,306,0.369565


In [14]:
text_chart = alt.Chart(shot_first_percent).mark_text(fontSize=15, color='#FFFFFF', align='right', dx=-10).encode(
    x=alt.X('percent:Q'),
    y=alt.Y('shooter:N'),
    text=alt.Text('percent:Q', format='.2%'),
    
)

# Create the bar chart
bar_chart = alt.Chart(shot_first_percent).mark_bar(size=23).encode(
    x=alt.X('percent:Q', title=None, axis=None),
    y=alt.Y('shooter:N', title=None, axis=alt.Axis(grid=False, ticks=False, labelPadding=10, labelLimit=350, labelFontSize=15, labelFontWeight=600)),
    color=alt.Color('percent:Q', scale=alt.Scale(range='diverging'), legend=alt.Legend(title='Percent of Responders', titleFontSize=14, labelFontSize=12))
)

chart_2 = alt.layer(bar_chart, text_chart).resolve_scale(color='independent').properties(
    width=270,
    height=80,
    title={"text": "Who shot first?", "fontSize": 24}
)

chart_2

## Part 3: Repeat Chart of Movies Seen by Key Demographics

Recreate this chart using .repeat()

<img src="https://scmcqueen.github.io/si649_hw/movies_demo.png" height="300">

In [15]:
# read in the formatted data
x = pd.read_csv('https://scmcqueen.github.io/si649_hw/movies_demo.csv',index_col=0)
x.head()

Unnamed: 0,Age,Gender,Do you consider yourself to be a fan of the Star Wars film franchise?,movies_seen
2,18-29,Male,No,3
3,18-29,Male,Yes,6
4,18-29,Male,Yes,6
5,18-29,Male,Yes,6
8,18-29,Male,Yes,6


In [16]:
repeat_graph = alt.Chart(x).mark_boxplot(outliers=False).encode(
    alt.X(alt.repeat('column'), type='ordinal', axis=alt.Axis(titleFontSize=12, labelFontSize=13, labelAngle=0, titlePadding=10, titleFontWeight=500, labelFontWeight=500)),
    alt.Y('movies_seen:Q', title=None, axis=alt.Axis(titleFontSize=14, labelPadding=5)),
    color = alt.ColorValue("#B62321")
).properties(
    width=170,
    height=220,
).repeat(
    column=["Age", "Gender", "Do you consider yourself to be a fan of the Star Wars film franchise?"],
).properties(
    title = {
        "text": "Number of Movies Seen by Demographics",
    }
)
repeat_graph    

## Part 4: Make your own theme!

Make your own theme. This is your opportunity to be creative and define a consistent, reusable style. The theme must be significantly different than the Star Wars one above. The more unique, the better!

To get full points, you must:
* Create at least 3 color schemes for your theme
* Choose a new font for your theme
* Change the default font sizes
* Set a new default mark color
* Choose to enable or disable grid lines
* Change the default padding of the titles

You can choose one of the themes from [this list of Data Visualization Style Guidelines](https://docs.google.com/spreadsheets/d/1F1gm5QLXh3USC8ZFx_M9TXYxmD-X5JLDD0oJATRTuIE/edit?gid=1679646668#gid=1679646668) as a starting point. To generate interesting color palettes, you can use sites like [Coolors](https://coolors.co/), [Canva](https://www.canva.com/colors/color-palette-generator/), or [ColorMagic](https://colormagic.app/).

In [17]:
%%html
<style>
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+Display:ital,wght@0,100..900;1,100..900&family=Sofia+Sans:wght@1..1000&display=swap');
</style>

In [18]:
# TO DO: Modify this code to fit the above style guide
@alt.theme.register("my_theme", enable=True) # Comment this line out for Altair 5.2


# the theme is defined as a function
def my_theme_solution():
    # you can set variables here and reuse later in the function
    font = "Noto Sans Display"

    backgroundColor = "#EBF2f6"

    font_color = "#000000"

    font_size = {
        "XL": 18,
        "L": 16,
        "M": 14,
        "S": 12,
        "XS": 10
    }
    
    diverging_palette = [
        "#006ea0",  
        "#4d9abd", 
        "#99c5d9",  
        "#ffffff", 
        "#e3c2b3",  
        "#c68466",  
        "#a03200"  
    ]

    return {
        "config": {
            "title": {
                "anchor": "start",
                "font": font,
                "fontWeight": 700,
                "fontSize": font_size["XL"],
                "subtitleFontSize": font_size["L"],
                "subtitleFont": font,
                "offset": 20
            },
            "axisX": {
                "titleFontSize": font_size["M"],
                "labelFontSize": font_size["S"],
                "grid": False,
                "labelFont": font,
           },
           "axisY": {
                "domain": False,
                "titleFontSize": font_size["M"],
                "titleAngle": 0,
                "titleBaseline": "bottom",
                "titleAnchor": "start",
                "labelFontSize": font_size["S"],
                "grid": True,
                "gridColor": "#aab0c0",
                "gridWidth": 1,
                "labelFont": font,
           },
           "axis": {
               "labelFont": font,
               "titleFont": font,
               "titleFontSize": font_size["M"],
               "labelFontSize": font_size["S"],
           },
           "background": backgroundColor,
           "view": {
               "stroke": None,
               "padding": {"left": 5, "top": 10, "right": 5, "bottom": 10}
           },
           "range": {
                "category": ["#006EA0", "#32C0D2", "#E0B265", "#00969F", "#973C4C", "#AC8B96"],
                "ramp": ["#fef0d9", "#fdcc8a", "#fc8d59", "#e34a33", "#b30000"],
                "diverging": diverging_palette
           },
           "area": {
           },
           "line": {
           },
           "point": {
           },
           "text": {
               "fontSize": font_size["S"],
           },
           "bar": {
               "binSpacing": 1,
               "size": 25
            },
            "facet": {
                "titleFontSize": font_size["M"],
            },
            "notes": {
                "fontSize": font_size["XS"]
            },
            "sources":{
                "fontSize": font_size["XS"]
            },
       },
       "padding": {"left": 20, "top": 20, "right": 5, "bottom": 10}
    }

### Recreate all of the charts from above in your theme!

In [24]:
# TODO: Copy code from above for facet chart


bar_chart = alt.Chart(rating_count).mark_bar().encode(
    x=alt.X('percent:Q', title=None, axis = None),
    y=alt.Y('character:N', title=None, sort = fav_sort, axis = alt.Axis(grid = False)),
    color=alt.Color('rating_mapped:N', title= None, legend = None),
).properties(
    height = 300,
    width = 100
)

text = alt.Chart(rating_count).mark_text(align = 'left', dx = 2).encode(
    x = alt.X('percent:Q'),
    y = alt.Y('character:N', sort = fav_sort),
    text = alt.Text('percent:Q'),
    color = alt.Color('rating_mapped:N')
)


rating_chart = alt.layer(bar_chart, text).facet(
    column=alt.Column('rating_mapped:N', title=None, sort=['Favorable', 'Neutral', 'Unfavorable', 'Unfamiliar'])
).resolve_scale().properties(
    title = {
        "text": "'Star Wars' Character Favorability Rating",
        "subtitle": "By 837 respondents",
    }
)

rating_chart.save('rating_chart.html')
rating_chart

In [20]:
# insert scatter plot

fav_count_chart = alt.Chart(fav_rating_count).mark_point(size=10).encode(
    x = alt.X('total:Q', title = 'Number of Ratings', scale=alt.Scale(domain=[800, 850])),
    y = alt.Y('percent_1:Q', title = None, axis = alt.Axis(labelFontSize=12)),
    color = alt.Color('percent_1:Q', title = 'Percent',
                      scale=alt.Scale(range="ramp"))
).properties(
    title = {
        "text": "Favorable Ratings by Number of Ratings",
        "subtitle": "Percent of Favorable Ratings",
        "fontSize": 16,
        "subtitleFontSize": 12
    }
)

fav_count_chart

In [21]:
# insert 'who shot first' chart
text_chart = alt.Chart(shot_first_percent).mark_text(fontSize=15, color='#FFFFFF', align='right', dx=-10).encode(
    x=alt.X('percent:Q'),
    y=alt.Y('shooter:N'),
    text=alt.Text('percent:Q', format='.2%'),
    
)

# Create the bar chart
bar_chart = alt.Chart(shot_first_percent).mark_bar(size=23).encode(
    x=alt.X('percent:Q', title=None, axis=None),
    y=alt.Y('shooter:N', title=None, axis=alt.Axis(grid=False, ticks=False, labelPadding=10, labelLimit=350, labelFontSize=14)),
    color=alt.Color('percent:Q', scale=alt.Scale(range='diverging'), legend=alt.Legend(title='Percent of Responders', titleFontSize=14, labelFontSize=12))
)

chart_2 = alt.layer(bar_chart, text_chart).resolve_scale(color='independent').properties(
    width=270,
    height=80,
    title={"text": "Who shot first?", "fontSize": 24}
)

chart_2

In [22]:
# insert repeat chart
repeat_graph = alt.Chart(x).mark_boxplot(outliers=False).encode(
    alt.X(alt.repeat('column'), type='ordinal', axis=alt.Axis(titleFontSize=12, labelFontSize=13, labelAngle=0, titlePadding=10, titleFontWeight=600, labelFontWeight=500)),
    alt.Y('movies_seen:Q', title=None, axis=alt.Axis(titleFontSize=14, labelPadding=5)),
).properties(
    width=170,
    height=220,
).repeat(
    column=["Age", "Gender", "Do you consider yourself to be a fan of the Star Wars film franchise?"],
).properties(
    title = {
        "text": "Number of Movies Seen by Demographics",
    }
)
repeat_graph    

## Part 5: Upload one of your charts to a github page

Choose one of the four charts above (in your own style) and export it as an html, using `chart.save('alt_theme.html')`.

Open the html file using any text editor, and add a link to the data you used for this chart, by adding a line of HTML to the body (i.e., between the `<body>` and `</body>` tags); something like `<a href="link">Data</a>`, where "link" is the URL for the data. This link to the data should appear on your html page.

Using the same process as we used for the matplotlib assignment, upload your html page to github, such that it is accessible on github pages, and insert the link to your page below:


TODO: [```My link```](https://nakafrozn.github.io/posts/datavis-altair/datavis-altair/)

Finally, open the link, and verify that you are able to download the data using the link that you added. 

Next, click on the three dots at the upper right of the plot, and click on "View Source". Note that this plot thus supports some basic interaction, even though this is a static html page. 

After clicking "View Source", copy the plot's source description and paste it below:


TODO: 

```html

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <link href="https://fonts.googleapis.com/css2?family=Noto+Sans+Display:ital,wght@0,100..900;1,100..900&family=Sofia+Sans:wght@1..1000&display=swap" rel="stylesheet">
  <style>
    #vis.vega-embed {
      width: 100%;
      display: flex;
    }

    #vis.vega-embed details,
    #vis.vega-embed details summary {
      position: relative;
    }
  </style>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/vega@5"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/vega-lite@5.20.1"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/vega-embed@6"></script>
</head>
<body>
  <a href="https://scmcqueen.github.io/si649_hw/reviews_by_char.csv">Data</a>
  <div id="vis"></div>
  <script>
    <!--- The plot part>
  </script>
</body>
</html>

```