# Homework 6B - Visual Encoding
In this assignment, you will explore how to encode different data features to visual properties using the movies dataset located in `Datasets/movies.json`.

We will start with simpler visualizations and then move to more complex (and potentially cluttered) visualizations. Finally, you will create a focused visualization that uses only a few selected encodings.

## Instructions

The assignment is divided into three parts:

1. **Project Setup**:  

   - Set up your Python and Jupyter (or VSCode) environment.  

   - Clone or download the repository provided in class (refer to the class notes).

2. **Simple Visualizations:** Create basic visualizations using one or two encodings.

3. **Cluttered Visualization:** Create a visualization that encodes many features simultaneously.

4. **Focused Visualization:** Create one or more visualizations using a few chosen encodings, and discuss your design choices.

5. **Documentation**:  

   - Comment your code and add markdown explanations for each part of your analysis.

6. **Submission**:  

   - Save your notebook and export as either PDF or HTML. If the visualization is not shown in the html, submit a separated version with altair html. Refer to: https://altair-viz.github.io/getting_started/starting.html#publishing-your-visualization (you can use the `chart.save('chart_file.html')` method).
   - Submit to Canvas.



You are free to use any libraries you prefer for visualization, but we recommend using `matplotlib`, `seaborn`, and/or `altair` for this assignment.

   

Happy coding and visualizing!

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt

# If you're running in a notebook environment that limits rows for interactive charts:
alt.data_transformers.disable_max_rows()

# Read the dataset
df = pd.read_json("/content/movies.json")
df.head()

# Let's create a new column for the ratio of US Gross to Production Budget
df['US Gross to Production Budget'] = df['US Gross'] / df['Production Budget']


## Part 0: Simple Visualizations



In this section, you will build simple plots that use minimal visual encodings. This will help you get comfortable with basic mappings before adding complexity.



### Task 0.1: Simple Scatterplot

Create a basic scatterplot that maps:

- **x-axis:** `Production Budget`

- **y-axis:** `US Gross`



*Hint:* You can use:

- `plt.scatter()` from `matplotlib`

- `sns.scatterplot()` from `seaborn`

- `alt.Chart()` from `altair` with `mark_point()`



In [None]:
# Fill Code Here


### Task 0.2: Simple Bar Plot

Now, create a bar plot that shows the average `IMDB Rating` for each `Major Genre`.



*Hint:* You can use:

- `plt.bar()` from `matplotlib`

- `sns.barplot()` from `seaborn`

- `alt.Chart()` from `altair` with `mark_bar()`



In [None]:
# Fill Code Here


### Task 0.3: Another Simple Scatterplot

Create a basic scatterplot that maps:

- **x-axis:** `IMDB Rating`

- **y-axis:** `US Gross to Production Budget`



*Important*: `US Gross to Production Budget` is a ratio. Thus, you should use the logarithmic scale for the y-axis to better visualize the data.



*Hint:* You can use:

- `plt.yscale('log')` from `matplotlib` (also works with `seaborn`)

- `y=alt.Y('...').scale(type="log")` from `altair`



In [None]:
# Fill Code Here


### Task 0.4 : Another Simple Bar Plot

Create a bar plot that shows the average `US Gross to Production Budget` for each `Major Genre`.



Remember to use the logarithmic scale for the y-axis to better visualize the data.



What are the insights you can draw from this plot?

In [None]:
# Fill Code Here



Your comments and insights here:

Any interesting observations from any of the plots you created so far?

## Part 1: Intermediate Visual Encodings



Now that you have built simple plots, let's gradually add more visual encodings.



### Task 1.1: Adding Color

Extend one of the scatterplots from Tasks 0.1 or 0.3 by encoding:

- **Color** to `MPAA Rating`



*Hint:* You can use the `hue` parameter in `seaborn` or the `color` parameter in `matplotlib`.

*Note:* You can also use `alt.Chart()` with `encode(color='MPAA Rating')` for color encoding.



Complete the code cell below.

In [None]:
# Fill Code Here


### Task 1.2: Adding Size

Further extend the scatterplot by encoding another feature:

- **Size** to e.g. `Running Time min`, `IMDB Votes`



This adds another dimension to the plot. Fill in the code below.

In [None]:
# Fill Code Here


## Part 2: Cluttered Visualization

The next task is to create a visualization that encodes as many features as possible. Try mapping:



- **Color**

- **Size**

- **Shape/Marker**

- **Position**

- **Transparency (alpha)**

- **Any other visual property you can think of?**



*Note:* The goal is to demonstrate how quickly a visualization can become cluttered when too many encodings are used.

You are free to choose any combination of the above encodings. Do your worst!

Fill in the code below with your own choices for mappings.

In [None]:
# Fill Code Here


## Part 3: Focused Visualization



For the final part, create more visualizations that use only a few well-chosen encodings in each. For example, you might decide to just use 3 encodings (2 positions + color, or two positions + size, or color + size + shape).

- Map **color**

- Map **size**

- Use **position (X,Y)**



Create a few different visualizations maintaining x and y positions, but changing the color, size, or shape encodings, to showcase different aspects of the data.



You can also use redundant encodings, e.g., using both color and shape to represent the same feature.

This will help you focus on the most important aspects of the data without overwhelming the viewer with too much information.



After creating the visualization, write a brief discussion (just a few sentences) in the markdown cell that follows. Explain:

- Why you selected these specific encodings.

- How these choices improve interpretability.

- Any trade-offs involved in reducing the number of encoded features.

In [None]:
# Your code here


### Discussion



Please write your reflections here:



- Why did you choose the specific encodings for the focused visualization?

- How do the selected visual properties (color, size, position) help in understanding the data better?

- What improvements in clarity do you observe compared to the cluttered visualization?

- What trade-offs do you notice when reducing the number of encoded features?



Use this cell to provide a brief explanation of your design decisions.