# Week 11, Prep notebook

Last week we talked about how Jekyll sites work overall, and this week we'll focus on building visualizations to host on these pages with vega-lite and Altair in Python.

## 1. Include vega-lite plots directly from vega-editor

We can start with saving vega-lite code directly as json if we do our development in the vega-editor.  For a walk through of this see the file `3_direct_from_vega_editor.md` in the prep files (note to self: this is in the dev onlinecv locally at the moment).

**You must include the full URL to the dataset for this to work!**

## 2. Copy vega-lite code from other sources and save using Altair

For example, you may have developed some code on [Starboard](https://starboard.gg/) that you now want to copy into this notebook with Altair, and then eventually save to your Jekyll webpage.

We can follow the [Converting vega-lite to Altair instructions](https://altair-viz.github.io/user_guide/internals.html#converting-vega-lite-to-altair) to do this.

For example, from [one of our prep Starboard notebooks](https://starboard.gg/jnaiman/prep_notebook_week11_spring2022-nG3SEUx) we have the following vega-lite specification:

```
var myHist1 = 
{
  data: {"url": "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2022/master/week10/data/mobility.csv"
  },
  mark: "bar",
  height: "300",
  width: "500",
  encoding: {
    "x": {"field": "State", "type": "nominal"},
    // NOTE: this won't work because "sum" assumes numerical data
    //"y": {"aggregate": "sum", "field": "State", "type": "nominal"} 
    //"y": {"aggregate": "count", "field": "State", "type": "nominal"} // might give an error/warning
    "y": {"aggregate": "count", "field": "State", "type": "quantitative"}
  }
};

var v = vegaEmbed('#firstHist', myHist1);
```

We can convert this to an Altair plot here with:

In [1]:
import altair as alt

In [5]:
chart1 = alt.Chart.from_dict({
  "data": {"url": "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2022/master/week10/data/mobility.csv"
  },
  "mark": "bar",
  "height": 300,
  "width": 500,
  "encoding": {
    "x": {"field": "State", "type": "nominal"},
    # NOTE: this won't work because "sum" assumes numerical data
    #"y": {"aggregate": "sum", "field": "State", "type": "nominal"} 
    #"y": {"aggregate": "count", "field": "State", "type": "nominal"} // might give an error/warning
    "y": {"aggregate": "count", "field": "State", "type": "quantitative"}
  }
})

chart1

Note that I had to put in and take out some " marks -- this is because different vega-lite "hosting" services will be more/less lenient with these kinds of formatting issues.

Now, in principle, I could go to the Vega-Editor using the three `...` button at the upper right of this plot, OR I can just save with:

In [8]:
myJekyllDir = '/Users/jnaiman/online_cv_fall2022/'
chart1.properties(width='container').save(myJekyllDir+"assets/json/chart1.json")

## 3. Complications with copy-paste for more complex operations

So, not everything translates easily between how we've used vega-lite in Starboard before to here, in Jekyll.  In particular, selections in between plots can get a little tricky.  To do this, let's see about re-making our Mobility dashboard. 

Let's first do the plots one at a time.  The rectangle plot spec looked like:

```
var rectPlot1Spec = {
  // Data
  data: {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2021/master/week08/data/mobility.csv"},
  // Marks
  mark:"rect",
  height:"400",
  // Encoding (note:error for encoding vs encodings)
  encoding:{
    //"x":{"field":"Student_teacher_ratio", "type":"quantitative"},
    "x":{"bin":{"maxbins":10}, "field":"Student_teacher_ratio", "type":"quantitative"},
    "y":{"field":"State","type":"ordinal"},
    "color":{"aggregate":"count", "type":"quantitative"} 
    // will show the number of records with a specific student/teacher ratio in a particular state
  }
  
};

var v = vegaEmbed('#rectPlot1',rectPlot1Spec);
```

Which is translated to:

In [42]:
chart1 = alt.Chart.from_dict({
  #// Data
  "data": {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2021/master/week08/data/mobility.csv"},
  #// Marks
  "mark":"rect",
  "height":400,
  #// Encoding (note:error for encoding vs encodings)
  "encoding":{
    #//"x":{"field":"Student_teacher_ratio", "type":"quantitative"},
    "x":{"bin":{"maxbins":10}, "field":"Student_teacher_ratio", "type":"quantitative"},
    "y":{"field":"State","type":"ordinal"},
    "color":{"aggregate":"count", "type":"quantitative"} 
    #// will show the number of records with a specific student/teacher ratio in a particular state
  }
  
})
chart1

Note of course the changes in where " have been added and removed.

The second chart was the histogram of the mobility score, given by the specification:

```
var mobilityHistSpec = {
  // Data
  data: {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2021/master/week08/data/mobility.csv"},
  // Mark
  mark: "bar",
  // Encoding
  encoding:{
    "x":{"field":"Mobility", "type":"quantitative", "bin":true, "axis":{"title":"Mobility Score"}},
    //"x":{"field":"Mobility", "type":"quantitative", "axis":{"title":"Mobility Score"}},
    "y":{"aggregate":"count","type":"quantitative", "axis":{"title":"Mobility Score Distribution"}}
  }
};

var v = vegaEmbed('#mobilityHist1',mobilityHistSpec);
```

This now turns into:

In [43]:
chart2 = alt.Chart.from_dict({
  #// Data
  "data": {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2021/master/week08/data/mobility.csv"},
  #// Mark
  "mark": "bar",
  #// Encoding
  "encoding":{
    "x":{"field":"Mobility", "type":"quantitative", "bin":True, "axis":{"title":"Mobility Score"}},
    #//"x":{"field":"Mobility", "type":"quantitative", "axis":{"title":"Mobility Score"}},
    "y":{"aggregate":"count","type":"quantitative", "axis":{"title":"Mobility Score Distribution"}}
  }
})
chart2

Note here that we had to change the JS `true` to the Pythonic `True`.

The first thing we probably want to do is put these charts side-by-side.  We can do this with Altair's horizontal concatination function:

In [44]:
chart = alt.HConcatChart(hconcat=[chart1,chart2])
chart

In [48]:
# note that this gives an error -- ignore this for the time being
#chart.properties(width='container').save(myJekyllDir+"assets/json/static_mobility.json")
chart.properties().save(myJekyllDir+"assets/json/static_mobility.json")

Ok, so now we want to add a brush selection like before for the left-most plot that changes the values of the histograms on the right-most plot.  How to do this?  Well, first let's look at the brush selector in Altair:

In [15]:
brush = alt.selection_interval()  # selection of type "interval"

We can add this to our first plot and then see we have some interactivity:

In [16]:
chart1.add_selection(
        brush
    )

What are some of the parameters we can use in our brush selection?

In [14]:
alt.selection_interval?

One thing we note here is that there is potentially a `field` we can use for the selection which might make us think we want to use "Mobility" as the input but we actually now need to specify the encodings -- i.e. on what axis in our original plot are we selecting?

So, we are selecting boxes in `chart1` -- so this will be selecting on both x & y:

In [26]:
brush = alt.selection_interval(encodings=['x','y'])

Let's now add this brush to our `chart1`:

In [50]:
chart1 = alt.Chart.from_dict({
  "data": {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2021/master/week08/data/mobility.csv"},
  "mark":"rect",
  "height":400,
  "encoding":{
    "x":{"bin":{"maxbins":10}, "field":"Student_teacher_ratio", "type":"quantitative"},
    "y":{"field":"State","type":"ordinal"},
    "color":{"aggregate":"count", "type":"quantitative"} 
  }  
}).add_selection(
        brush
    )

And now we can add this also as a [transform filter](https://altair-viz.github.io/user_guide/transform/filter.html) to our `chart2` plot:

In [51]:
chart2 = alt.Chart.from_dict({
  "data": {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2021/master/week08/data/mobility.csv"},
  "mark": "bar",
  "encoding":{
    "x":{"field":"Mobility", "type":"quantitative", "bin":True, "axis":{"title":"Mobility Score"}},
    "y":{"aggregate":"count","type":"quantitative", "axis":{"title":"Mobility Score Distribution"}}
  }
}).transform_filter(
    brush
)

Now let's horizontally concatinate them, and let's do it in the fancy Altair way:

In [52]:
chart = chart1 | chart2

In [53]:
chart

Neat!  Now we can once again save this in the "usual way" to add to our Jekyll file:

In [54]:
#error again, ignorning
#chart.properties(width='container').save(myJekyllDir+"assets/json/dashboard_mobility.json")
chart.properties().save(myJekyllDir+"assets/json/dashboard_mobility.json")

## 4. Use Altair to make the chart starting from the data

* read in dataframe
* do all of the analysis and stuff with altair things

Let's once again re-write `chart1`, but now transforming from the dictionary we have been passing to more "original Altair" formatting and passing data through Python:

```
chart1 = alt.Chart.from_dict({
  "data": {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2021/master/week08/data/mobility.csv"},
  "mark":"rect",
  "height":400,
  "encoding":{
    "x":{"bin":{"maxbins":10}, "field":"Student_teacher_ratio", "type":"quantitative"},
    "y":{"field":"State","type":"ordinal"},
    "color":{"aggregate":"count", "type":"quantitative"} 
  }  
})
```

Let's first start by grabbing the data as a dataframe in Python:

In [57]:
import pandas as pd

In [58]:
mobility = pd.read_csv('https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_spring2021/master/week08/data/mobility.csv')

In [59]:
mobility.head()

Unnamed: 0,ID,Name,Mobility,State,Population,Urban,Black,Seg_racial,Seg_income,Seg_poverty,...,Migration_out,Foreign_born,Social_capital,Religious,Violent_crime,Single_mothers,Divorced,Married,Longitude,Latitude
0,100,Johnson City,0.062199,TN,576081,1,0.021,0.09,0.035,0.03,...,0.005,0.012,-0.298,0.514,0.001,0.19,0.11,0.601,-82.436386,36.470371
1,200,Morristown,0.053652,TN,227816,1,0.02,0.093,0.026,0.028,...,0.014,0.023,-0.767,0.544,0.002,0.185,0.116,0.613,-83.407249,36.096539
2,301,Middlesborough,0.072635,TN,66708,0,0.015,0.064,0.024,0.015,...,0.012,0.007,-1.27,0.668,0.001,0.211,0.113,0.59,-83.535332,36.55154
3,302,Knoxville,0.056281,TN,727600,1,0.056,0.21,0.092,0.084,...,0.014,0.02,-0.222,0.602,0.001,0.206,0.114,0.575,-84.24279,35.952259
4,401,Winston-Salem,0.044801,NC,493180,1,0.174,0.262,0.072,0.061,...,0.019,0.053,-0.018,0.488,0.003,0.22,0.092,0.586,-80.505333,36.081276


## 5. Python Analysis + Altair Plotting

We can also do all of the data transformations in Python (which might be easier since we've been using Python this whole time!) and then do the actual plotting in Altair: