# Exercise 6: Interactions 


In [2]:
import numpy as np 
import pandas as pd 
import altair as alt
from vega_datasets import data



# Consider the following example from [stackoverflow](https://stackoverflow.com/questions/63751130/altair-choropleth-map-color-highlight-based-on-line-chart-selection)

This is an example of cross-filtering (/ conditonal coloring) between a choropleth and a bar chart based on multi-point selections.


In [6]:
state_pop = data.population_engineers_hurricanes()[['state', 'id', 'population']]
state_map = alt.topo_feature(data.us_10m.url, 'states')

display(state_pop.head()) 

click = alt.selection_point(fields=['state'])

choropleth = (alt.Chart(state_map).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(state_pop, 'id', ['population', 'state']))
.encode(
    color='population:Q',
    opacity=alt.condition(click, alt.value(1), alt.value(0.2)),
    tooltip=['state:N', 'population:Q'])
.add_params(click)
.project(type='albersUsa'))

bars = (
    alt.Chart(
        state_pop.nlargest(15, 'population'),
        title='Top 15 states by population').mark_bar().encode(
    x='population',
    opacity=alt.condition(click, alt.value(1), alt.value(0.2)),
    color='population',
    y=alt.Y('state', sort='x'))
.add_params(click))

choropleth & bars

Unnamed: 0,state,id,population
0,Alabama,1,4863300
1,Alaska,2,741894
2,Arizona,4,6931071
3,Arkansas,5,2988248
4,California,6,39250017


# Chicago ward map

Recall the geoJSON data for wards in the City of Chicago (also available [here](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Wards-2015-2023-/sp34-6z76)).

* Load the data into a DataFrame
* Make sure every row has a `type` column with the value `'Feature'`
* Make sure every row has a `geometry` (a dictionary with a `type` and `coordinates`)
* Crreate a basic map of the wards of Chicago using Altair.

In [None]:
chicago_wards_df = pd.read_json('../data/chicago-ward-boundaries.geojson')

display(chicago_wards_df.head(1)) 
print('\nKeys in features:')
display(chicago_wards_df.loc[0].features.keys())
print("\nKeys in features['properties']:")
display(chicago_wards_df.loc[0].features['properties'].keys())

# [...] 


####  Note: 
The default projection results in a tilted orientation. 

[`albersUsa`](https://vega.github.io/vega/docs/projections/) would look better. 

# Add data: the number of homes of each type sold and the most popular type 

In [None]:
home_sales = pd.read_csv("../data/chicago-home-sales.csv", sep='\t', encoding="UTF-16")
display(type(home_sales.loc[0,'Chicago Ward'])) 
home_sales.rename(columns={'Chicago Ward': 'ward'}, inplace=True)
home_sales.ward = home_sales.ward.apply(lambda x: str(x))
display(type(home_sales.loc[0,'ward'])) 
print('home_sales:')
display(home_sales.head())

# [...] 



# Recreate the choropleth colored by the most popular type sold in each ward

In [None]:

# [...]





# Consider also this example of `transform_fold()`

[`transform_fold()`](https://altair-viz.github.io/user_guide/transform/fold.html) is essentially a `melt` (unpivot) operation within the chart. 

[`melt`](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) (unpivot) transforms a wide-form dataframe to a long-form one.  

This example compares `transform_fold()` to the equivalent `melt` being performed before making the chart.  

In [None]:
rand = np.random.RandomState(0)
wide_form = pd.DataFrame({
    'Apple': rand.randint(2,10,10),
    'Banana': rand.randint(2,10,10),
    'Carrot': rand.randint(2,10,10),
})
print('The wide-form DataFrame:') 
display(wide_form.head()) 

long_form = pd.melt(wide_form, id_vars=None, 
              value_vars=['Apple', 'Banana', 'Carrot'], 
              var_name='Fruit', 
              value_name='Price')
print('\nThe long-form DataFrame:') 
display(long_form.head(15)) 

long_form_chart = alt.Chart(long_form).mark_bar().encode(
    x='Fruit:N',
    y='sum(Price):Q', 
    color='Fruit:N' 
)

wide_form_chart = alt.Chart(wide_form).transform_fold(
    ['Apple', 'Banana', 'Carrot'],
    as_=['Fruit', 'Price'] # Specify the new column names: key-value pairs 
).mark_bar().encode(
    x='Fruit:N',
    y='sum(Price):Q', # 'value:Q', - default name for values 
    color='Fruit:N' # 'key:N' - default name for keys
)

print('The two charts look the same becaues the operations are equivalent:') 
wide_form_chart | long_form_chart

# Cross-filter the wards with a bar plot showing all three home types 



In Altair, you can use a selection over a choropleth map and cross-filter it with a bar plot. This interaction allows you to select regions on the map (using a multi-point or an interval selection, for instance) and have the selection dynamically filter or highlight data in a bar chart (or any other linked visualization) that visualizes information about the selected areas.

Implement this with `alt.selection_point()` for multi-point selection. The bar chart should show the sum of properties from each of the three types that were sold in the selected wards. 


Specifically: 
* Create the multi-point selection. 
* Create the Choropleth Map: wards colored by most popular property type sold. Highlight selections with conditional opacity or colors. 
* Create the Bar Plot: three bars, each aggregates the number of a single property type sold over the selected wards.
* Cross-filter: link by passing the selection to the bar chart using `transform_filter()`.
* Optional: note the difference between using vs. not using `sum()` on the `y` coordinate of the bar plot. 

<br>

#### Note:
* Bars _cannot_ be selected from the bar plot - only wards are selected from the map.
* Hint: the `transform_fold()` example is here for a reason.   

### Note: Merge the data before cross-filtering
When cross-filtering with `selection_interval()`, merge all of the data into a single DataFrame before coding the charts or use `transform_lookup()` within them. _Avoid using one DataFrame in one chart and a separate DataFrame in a second chart that refers to the same point or interval selection as the first_. Cross-filtering charts that use separate DataFrames can produce unexpected results.  


In [None]:
# Define the multi point selection
selection = alt.selection_point(fields=['ward'])

[...]




# Create a similar choropleth but this time with an interval selection


### (i) get the coordinates of (approximately) the middle of each ward 

In [None]:
def mid_ward(row): 
    '''
    Get a geometry column and return the middle of the geo area
    '''
    return np.array(row['coordinates'][0][0]).mean(axis=0)

chicago_wards_df['middle'] = chicago_wards_df['geometry'].apply(lambda x: mid_ward(x))


Make sure you understand the code in the previous cell before you continue. 

### (ii) Create a basic chart with a circle markers at the middle of each ward

This requires to encode the `longitude` and `latitude` (as opposed to `x` and `y`) in order to connect the numerical values to the map projection coordinates. 

See, e.g., [here](https://altair-viz.github.io/gallery/interval_selection_map_quakes.html). 



In [None]:

# [...]



### (iii) Use an interval selection over the dots to create the cross-filtered chart

In [None]:
# Interval selection
brush = alt.selection_interval()

# [...] 

