# **SI649 W25 Altair Theme Homework #3**

# Overview


We are focusing on **custom themes & small multiples** in this lab! For this assignment, we will be looking at *Star Wars* character dataset from [America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters)](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/) by Walt Hickey.

### Lab Instructions

*   Save, rename, and submit the ipynb file (use your username in the name).
*   Complete all the checkpoints, to create the required visualization at each cell
*   Run every cell (do Runtime -> Restart and run all to make sure you have a clean working version), and upload your .ipynb file to Canvas.
*   For each visualization, there is a space to write down a "Grammar of Graphics" plan, but this is optional for this assignment.
*   If you end up stuck, show us your work by including links (URLs) that you have searched for. You'll get partial credit for showing your work in progress.


In [263]:
# suppress warnings about future deprecations
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# imports we will use
import altair as alt
import pandas as pd

# for large data sets
alt.data_transformers.disable_max_rows()

# read in data
df = pd.read_csv('https://scmcqueen.github.io/si649_hw/skyelerStarWars.csv',header=[0,1],skip_blank_lines=True)



## Part 1: Character Rating Facet Charts

Using an Altair Facet chart, recreate the 'Star Wars' Character Favorability Ratings chart from the [Star Wars article](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/). It should look like this:

<img src="https://scmcqueen.github.io/si649_hw/favorability_SW.png" height="300">

### Step 1: Get data in the correct format
We will get this started for you

In [264]:
# Read in the data with Character name & review
download = pd.read_csv('https://scmcqueen.github.io/si649_hw/reviews_by_char.csv',index_col=0)
download.head()
#show all download unique ratings
download['rating'].unique()

array(['Very favorably', 'Somewhat favorably',
       'Neither favorably nor unfavorably (neutral)',
       'Somewhat unfavorably', 'Unfamiliar (N/A)', 'Very unfavorably'],
      dtype=object)

In [265]:
# TODO: Map the 'rating' column to a new column that matches the chart's Favorable, Unfavorable, etc.
download = pd.read_csv('https://scmcqueen.github.io/si649_hw/reviews_by_char.csv',index_col=0)

download['rating'] = download['rating'].map({'Very favorably':'Favorable',
                                             'Somewhat favorably':'Favorable',
                                             'Neither favorably nor unfavorably (neutral)': 'Neutral',
                                             'Somewhat unfavorably':'Unfavorable',
                                             'Unfamiliar (N/A)': 'Unfamiliar',
                                             'Very unfavorably': 'Unfavorable'})


download.head()




Unnamed: 0,character,rating
0,Han Solo,Favorable
2,Han Solo,Favorable
3,Han Solo,Favorable
4,Han Solo,Favorable
5,Han Solo,Favorable


In [266]:
# TODO: in Pandas, get the percent of each rating per character

grouped = download.groupby(['character','rating']).size().reset_index(name='counts')
grouped['percent'] = grouped.groupby('character')['counts'].transform(lambda x: x/x.sum())
#make percent *100 and round to whole number
grouped['percent'] = grouped['percent']*100
grouped['percent'] = grouped['percent'].round(0)
grouped.head()


Unnamed: 0,character,rating,counts,percent
0,Anakin Skywalker,Favorable,514,62.0
1,Anakin Skywalker,Neutral,135,16.0
2,Anakin Skywalker,Unfamiliar,52,6.0
3,Anakin Skywalker,Unfavorable,122,15.0
4,Boba Fett,Favorable,291,36.0


### Step 2: Create your charts


Hints:
* Layer the charts before faceting!

In [270]:
# TODO: Using a facet chart, recreate the article

#favorable counts
favorable_counts = download[download['rating'] == 'Favorable'].groupby('character').size()

# Sort the characters in order of most to least favorable
sorted_characters = favorable_counts.sort_values(ascending=False).index.tolist()

# Group by character and rating, then calculate percentages
grouped = download.groupby(['character','rating']).size().reset_index(name='counts')
grouped['percent'] = grouped.groupby('character')['counts'].transform(lambda x: x/x.sum())
#make percent *100 and round to whole number
grouped['percent'] = grouped['percent']*100
grouped['percent'] = grouped['percent'].round(0)

# Base chart setup 
base = alt.Chart(grouped).encode(
    x=alt.X('percent:Q', axis=alt.Axis(title=None, grid=False, labels=False, ticks=False)),
    y=alt.Y('character:N', sort=sorted_characters, axis=alt.Axis(title=None, grid=False)),
    color=alt.Color('rating:N', sort=['Favorable', 'Neutral', 'Unfamiliar', 'Unfavorable'],legend=None)
).properties(   
    width=100
)

# Bar chart
bars = base.mark_bar()

# Text labels for percentages
text = base.mark_text(
    align='left',
    baseline='middle',
    dx=3
).encode(
    text=alt.Text('percent:Q')
)

# Combine bar and text into a layered chart, then facet
final_chart = alt.layer(bars, text).facet(
    column=alt.Column('rating:N', sort=['Favorable', 'Neutral', 'Unfavorable', 'Unfamiliar'],title=None),
    spacing=0
).properties(
    title={
        "text": "'Star Wars' Characters Favorability Ratings",
        "subtitle": "By 834 respondents"
    })
    

final_chart

## Part 2: Star Wars Theme

### Step 1: Create Star Wars Theme

We want you to try implementing a custom theme in Altair based on this style guide:

<img src="https://scmcqueen.github.io/si649_hw/StarWars_StyleGuide.png" height="600">

We will give you some starter code, so you aren't creating a theme fully from scratch. The Altair documentation on [Chart Customization](https://altair-viz.github.io/user_guide/customization.html) and this Towards Data Science Article [Consistently Beautiful Visualizations with Altair Themes](https://medium.com/towards-data-science/consistently-beautiful-visualizations-with-altair-themes-c7f9f889602) should serve as helpful guides.

You can set the spacing, color palettes, font schemes, etc.

Run the cell below to get the font you need.

In [272]:
%%html

<style>
@import url('https://fonts.googleapis.com/css?family=Lato');
</style>

In [362]:
# TO DO: Modify this code to fit the above style guide
@alt.theme.register("star_wars", enable=True) # Comment this line out for Altair 5.2
# the theme is defined as a function
def star_wars_solution():
    # you can set variables here and reuse later in the function
    font = "Lato"
    backgroundColor = "White"

    return {
        "config": {
            "title": {
                "anchor": "start",
            },
            "axisX": {
               "domain": True,
           },
           "axisY": {
               "domain": False,
           },
           "background": backgroundColor,
           "view": {
               "stroke": "transparent",
           },
           "range": {
                "category": ["#b62321", "#a1a332", "#15509F", "#3b444b", "#d7c078", "#DC5026"],
                "sequential": ["#b62321", "#bd4232", "#c36244", "#ca8155", "#d0a167", "#d7c078"],
                "diverging": ["#15509f", "#638abf", "#B1c5df", "#ffffff", "#f3c5b7", "#e88a6e", "#dc5026"]
            },
           "area": {
           },
           "line": {
           },
           "point": {
           },
           "text": {
           },
           "bar": {
            },
       },
    }

# For Altair version 5.2, uncomment the lines below
# alt.themes.register('star_wars', star_wars_solution)
# # enable the newly registered theme
# alt.themes.enable('star_wars')

### Step 2: Recreate Favorability Chart with new theme

To check that your theme looks correct, recreate your chart of Star Wars characters favorability. If your theme is enabled correctly, the visualization should look like this:

<img src="https://scmcqueen.github.io/si649_hw/favorability_SW_theme.png" height="300">

In [320]:
# TODO: Copy code from above for facet chart


#favorable counts
favorable_counts = download[download['rating'] == 'Favorable'].groupby('character').size()

# Sort the characters in order of most to least favorable
sorted_characters = favorable_counts.sort_values(ascending=False).index.tolist()

# Group by character and rating, then calculate percentages
grouped = download.groupby(['character','rating']).size().reset_index(name='counts')
grouped['percent'] = grouped.groupby('character')['counts'].transform(lambda x: x/x.sum())
#make percent *100 and round to whole number
grouped['percent'] = grouped['percent']*100
grouped['percent'] = grouped['percent'].round(0)

# Base chart setup 
base = alt.Chart(grouped).encode(
    x=alt.X('percent:Q', axis=alt.Axis(title=None, grid=False, labels=False, ticks=False)),
    y=alt.Y('character:N', sort=sorted_characters, axis=alt.Axis(title=None, grid=False)),
    color=alt.Color('rating:N', sort=['Favorable', 'Neutral', 'Unfamiliar', 'Unfavorable'],legend=None)
).properties(   
    width=100
)

# Bar chart
bars = base.mark_bar()

# Text labels for percentages
text = base.mark_text(
    align='left',
    baseline='middle',
    dx=3
).encode(
    text=alt.Text('percent:Q')
)

# Combine bar and text into a layered chart, then facet
final_chart = alt.layer(bars, text).facet(
    column=alt.Column('rating:N', sort=['Favorable', 'Neutral', 'Unfavorable', 'Unfamiliar'],title=None),
    spacing=0
).properties(
    title={
        "text": "'Star Wars' Characters Favorability Ratings",
        "subtitle": "By 834 respondents"
    })
    

final_chart
    

### Step 3: Recreate the following charts to test your theme

#### Chart 1: Favorable Ratings by Number of Ratings

<img src="https://scmcqueen.github.io/si649_hw/favorable.png" height="300">

In [407]:
# Filter to only favorable ratings
favorable_data = download[download['rating'] == 'Favorable']

# Calculate count and percentage of favorable ratings per character
favorable_counts = favorable_data.groupby('character').size().reset_index(name='count')
total_counts = download.groupby('character').size().reset_index(name='total')
merged = pd.merge(favorable_counts, total_counts, on='character')
merged['percent'] = merged['count'] / merged['total']

# Create the chart using the theme's diverging color scheme
favorable_chart = alt.Chart(merged).mark_circle().encode(
    x=alt.X('total:Q', axis=alt.Axis(title='Number of Ratings', grid= False), scale=alt.Scale(domain=[800, 850])),
    y=alt.Y('percent:Q', 
        axis=alt.Axis(title='Percentage of Favorable Ratings', titleAngle=0, titleY=-10, titleX = 40, labelAngle=0), 
        scale=alt.Scale(domain=[0, 1])
    ),
    color=alt.Color('percent:Q',
        scale=alt.Scale(range=sequential_colors)
    ),
).properties(
    title= "Favorable Ratings by Number or Ratings", 
    width=250,
    height=250
)

favorable_chart

#### Chart 2: Who shot first?

<img src="https://scmcqueen.github.io/si649_hw/shot_first.png" height="150">

In [412]:
# read in shooting data
shot_first = pd.read_csv('https://scmcqueen.github.io/si649_hw/shot_first.csv',index_col=0)
shot_first.head()
#show unique values
shot_first['shooter'].unique()

array(["I don't understand this question", 'Greedo', 'Han'], dtype=object)

In [536]:
# Calculate total responses per shooter
total_responses = shot_first.groupby('shooter').size().reset_index(name='total')
total_responses['percent'] = total_responses['total'] / total_responses['total'].sum()
total_responses['percent'] = total_responses['percent'].round(2)

# Base chart setup
base = alt.Chart(total_responses).encode(
    x=alt.X('percent:Q', axis=alt.Axis(title=None, grid=False, labels=False, ticks=False, domain=False)),
    y=alt.Y('shooter:N', axis=alt.Axis(title=None, grid=False, domain=False)),
    color=alt.Color('percent:Q', 
                    scale=alt.Scale(range=diverging_colors),
                    legend=alt.Legend(title='Percent of Responders'))
).properties(
    title=alt.TitleParams(
        text='Who Shot First?',
        fontSize=20
    ),
    width=400,
    height=100
)

# Bar chart
bars = base.mark_bar()

# Text labels for percentages over the bar
text = alt.Chart(total_responses).mark_text(
    align='center',
    baseline='middle',
    color='white',  # Ensure text is white
    dx=-20
).encode(
    x=alt.X('percent:Q'),
    y=alt.Y('shooter:N'),
    text=alt.Text('percent:Q', format='.00%')
).properties(
    width=400,
    height=100
)


# Combine bar and text into a layered chart, placing text above
final_chart = alt.layer(bars, text).resolve_scale(color='shared').configure_axis(
    labelFontSize=12,  
    titleFontSize=14
)

final_chart

## Part 3: Repeat Chart of Movies Seen by Key Demographics

Recreate this chart using .repeat()

<img src="https://scmcqueen.github.io/si649_hw/movies_demo.png" height="300">

In [None]:
# three bar charts, title "Number of Movies Seen by Key Demographics"
# y axis for all three charts is number of movies for each category on average, the scale is from 0 at the bottom to 6 at the top
# x axis for the first chart is titles "Age" with 4 categories: 18-29, 30-44, 45-60, > 60
# x axis for the second chart is titled "Gender" with 2 categories: Female, Male
# x axis for the third chart is titled "Do you consider yourself to be a fan of the Star Wars film franchise?" with 2 categories: No, Yes

#Each bar chart's bars start at 6 and go down to the average number of movies seen for that category
#Each bar chart's bars are colored dark red solid and are thin
#each graph has a grid in the background



In [615]:
# read in the formatted data
x = pd.read_csv('https://scmcqueen.github.io/si649_hw/movies_demo.csv',index_col=0)
x.head()

Unnamed: 0,Age,Gender,Do you consider yourself to be a fan of the Star Wars film franchise?,movies_seen
2,18-29,Male,No,3
3,18-29,Male,Yes,6
4,18-29,Male,Yes,6
5,18-29,Male,Yes,6
8,18-29,Male,Yes,6


In [627]:
# Calculate average movies seen for each category
average_movies = x.groupby(['Age', 'Gender', 'Do you consider yourself to be a fan of the Star Wars film franchise?']).mean().reset_index()


base = alt.Chart(average_movies).mark_bar(color='darkred', size=20).encode(
    y=alt.Y('movies_seen:Q', title='Average Movies Seen', scale=alt.Scale(domain=[0, 6], reverse=True), axis=alt.Axis(grid=True, tickCount=7))
).properties(
    width=200,
    height=150
)

# Repeat the chart across different columns
repeat_chart = base.encode(
    x=alt.X(alt.repeat('column'), axis=alt.Axis(title=None, grid=True))
).repeat(
    column=['Age', 'Gender', 'Do you consider yourself to be a fan of the Star Wars film franchise?']
).resolve_scale(
    x='independent',
    y='shared'
).properties(
    title='Average Number of Movies Seen by Key Demographics',
)

repeat_chart

## Part 4: Make your own theme!

Make your own theme. This is your opportunity to be creative and define a consistent, reusable style. The theme must be significantly different than the Star Wars one above. The more unique, the better!

To get full points, you must:
* Create at least 3 color schemes for your theme
* Choose a new font for your theme
* Change the default font sizes
* Set a new default mark color
* Choose to enable or disable grid lines
* Change the default padding of the titles

You can choose one of the themes from [this list of Data Visualization Style Guidelines](https://docs.google.com/spreadsheets/d/1F1gm5QLXh3USC8ZFx_M9TXYxmD-X5JLDD0oJATRTuIE/edit?gid=1679646668#gid=1679646668) as a starting point. To generate interesting color palettes, you can use sites like [Coolors](https://coolors.co/), [Canva](https://www.canva.com/colors/color-palette-generator/), or [ColorMagic](https://colormagic.app/).

In [216]:
# insert theme

### Recreate all of the charts from above in your theme!

In [217]:
# insert facet chart

In [218]:
# insert scatter plot

In [219]:
# insert 'who shot first' chart

In [220]:
# insert repeat chart

## Part 5: Upload one of your charts to a github page

Choose one of the four charts above (in your own style) and export it as an html, using `chart.save('alt_theme.html')`.

Open the html file using any text editor, and add a link to the data you used for this chart, by adding a line of HTML to the body (i.e., between the `<body>` and `</body>` tags); something like `<a href="link">Data</a>`, where "link" is the URL for the data. This link to the data should appear on your html page.

Using the same process as we used for the matplotlib assignment, upload your html page to github, such that it is accessible on github pages, and insert the link to your page below:


TODO: ```Insert your link here```

Finally, open the link, and verify that you are able to download the data using the link that you added. 

Next, click on the three dots at the upper right of the plot, and click on "View Source". Note that this plot thus supports some basic interaction, even though this is a static html page. 

After clicking "View Source", copy the plot's source description and paste it below:


TODO: ```Paste the plot source here```