
Select a data set from Kraggle: https://www.kaggle.com/iamsouravbanerjee/top-100-greatest-hollywood-actors-of-all-time
Note: This data set has been amended by me to include an extra column for each actor's country of origin.

This particular data set was chosen because:

* There is a sufficiently large number of rows (100) to warrant a scatter plot
* There are enough columns of interesting data, but not too many to be overwhelming
* A visual representation of the number of award nominations vs award wins is intuitively understood and of interest to a user
* The data set is relateable - most people know the names of the actors and are interested in their country of origin

VS Code: 
* Import the libraries for Altair and Pandas (Altair has to be installed first).
* The shape and a sample of the data set are shown as a sanity check to ensure it has been read accurately.


In [None]:
import altair as alt
import pandas as pd

Hollywood_Data = pd.read_csv("Hollywood_Actors.csv")
Hollywood_Data.shape
Hollywood_Data.sample()

Create a scatter chart showing each actor's number of BAFTA nominations and BAFTA wins.
Each data point is represented by a configurable point shape.
Each data point should be colour coded to the actor's country of origin.
The relevant actor's name should appear when the cursor hovers over the data point.
Make the chart interactive.

In [None]:
alt.Chart(Hollywood_Data).mark_point().encode(
    x='BAFTA Nominations',
    y='BAFTAs',
    color='Country',
    tooltip='Name'
).properties(
    title='BAFTA Nominations and Wins'
).interactive()

Interactive Data Dashboard:

Allow the user to select an area on the graph, the data points in this area are coloured according to country
while those outside of the selected area are greyed out. Use selection_interval() and add_slection().

In [None]:
Brush = alt.selection_interval()
BAFTA_Points = alt.Chart(Hollywood_Data).mark_point().encode(
    x='BAFTA Nominations',
    y='BAFTAs',
    tooltip='Name',
    color=alt.condition(Brush,'Country',alt.value('lightgray'))
).add_selection(
    Brush
)
BAFTA_Points

Create a second graph (a bar chart) to show how many BAFTA wins there are per country of origin.

In [None]:
BAFTA_Bar = alt.Chart(Hollywood_Data).mark_bar().encode(
    y='Country',
    x='BAFTAs',
    color='Country'
).properties(
    title='BAFTA Wins per Country of Origin'
)
BAFTA_Bar

Now associate the bar chart with the scatter plot by using tranform_filter()

In [None]:
BAFTA_Bar = alt.Chart(Hollywood_Data).mark_bar().encode(
    y='Country',
    x='BAFTAs',
    color='Country'
).properties(
    title='BAFTA Wins per Country of Origin'
).transform_filter(
    Brush
)
BAFTA_Points & BAFTA_Bar



Conclusions:

* Most BAFTA winning actors are from America, followed by England.
* Only actors from America and England have won 4 or more BAFTAs.
* BAFTA nominations translate to wins very roughly 50% of the time with some notable exclusions:
    * Albert Finney and Laurence Olivier both had a win rate that was lower than most
    * Its mostly Americans who get nominated but then fail to win. Ever. Poor Robert de Niro!
    * Daniel Day-Lewis and Peter Finch (both English) are BAFTA favourites, converting nominations to wins at a higher rate.

On a personal note: I found this exercise quite tricky when trying to use a data set that differed significantly from the example.

Some queries:

* How do I get the tooltip to show more than one name at data point when there are a number of points at the same place?
* How do you select for specific rows of data - eg at first I wanted to use a data set of life expectancy by country for a 20 year span. So how could I select a specific country (ie row) and show the data as x-axis year and y-axis life expectancy?
