# Vega Lite Examples in Haskell - Composite Mark Plots

The overview notebook - `VegaLiteGallery` - describes how 
[`hvega`](http://hackage.haskell.org/package/hvega)
is used to create Vega-Lite visualizations.

-----

## Table of Contents

This notebook represents the [Composite Mark Plots](https://vega.github.io/vega-lite/examples/#composite-mark)
section of the [Vega-Lite example gallery](https://vega.github.io/vega-lite/examples/).

### [Error Bars and Error Bands](#Error-Bars-and-Error-Bands)

 - [Error Bars showing Confidence Interval](#Error-Bars-showing-COnfidence-Interval)
 - [Error Bars showing Standard Deviation](#Error-Bars-showing-Standard-Deviation)
 - [Line Chart with Confidence Interval Band](#Line-Chart-with-Confidence-Interval-Band)
 - [Scatterplot with Mean and Standard Deviation Overlay](#Scatterplot-with-Mean-and-Standard-Deviation-Overlay)

### [Box Plots](#Box-Plots)

 - [Box Plot with Min/Max Whiskers](#Box-Plot-with-Min%2FMax-Whiskers)
 - [Tukey Box Plot (1.5 IQR)](#Tukey-Box-Plot-%281.5-IQR%29))

---

## Versions

The notebook was last run with the following versions of [`hvega`](https://hackage.haskell.org/package/hvega) and
related modules:

In [1]:
:!ghc-pkg latest ghc
:!ghc-pkg latest ihaskell
:!ghc-pkg latest hvega
:!ghc-pkg latest ihaskell-hvega

ghc-8.4.4

ihaskell-0.9.1.0

hvega-0.4.0.0

ihaskell-hvega-0.2.0.3

As to when it was last run, how about:

In [2]:
import Data.Time (getCurrentTime)
getCurrentTime

2019-09-04 12:58:46.872851988 UTC

## Set up

See the overview notebook for an explanation of this section (it provides code I use to compate the `hvega` output
to the specification given in the Vega-Lite gallery).

In [3]:
{-# LANGUAGE OverloadedStrings #-}

-- VegaLite uses these names
import Prelude hiding (filter, lookup, repeat)

import Graphics.Vega.VegaLite

-- IHaskell automatically imports this if the `ihaskell-vega` module is installed
-- import IHaskell.Display.Hvega

-- If you are viewing this in an IHaskell notebook rather than Jupyter Lab,
-- use the following to see the visualizations
--
-- vlShow = id

In [4]:
{-# LANGUAGE QuasiQuotes #-}

import qualified Data.ByteString.Lazy.Char8 as BL8
import qualified Data.HashMap.Strict as HM
import qualified Data.Set as S

import Data.Aeson (Value(Object), encode)
import Data.Aeson.QQ.Simple (aesonQQ)
import Control.Monad (forM_, unless, when)
import Data.Maybe (fromJust)
import System.Directory (removeFile)
import System.Process (readProcess, readProcessWithExitCode)

validate ::
  VLSpec       -- ^ The expected specification
  -> VegaLite  -- ^ The actual visualization
  -> IO ()
validate exp vl = 
  let got = fromVL vl
      put = putStrLn
  in if got == exp
      then put "Okay"
      else do
        let red = "\x1b[31m"
            def = "\x1b[0m"
            
            report m = put (red ++ m ++ def)
            
        report "The visualization and expected specification do not match."
        
        -- assume both objects
        let Object oexp = exp
            Object ogot = got
            kexp = S.fromList (HM.keys oexp)
            kgot = S.fromList (HM.keys ogot)
            kmiss = S.toList (S.difference kexp kgot)
            kextra = S.toList (S.difference kgot kexp)
            keys = S.toList (S.intersection kexp kgot)
            
        unless (null kmiss && null kextra) $ do
          put ""
          report "Keys are different:"
          unless (null kmiss)  $ put ("  Missing: " ++ show kmiss)
          unless (null kextra) $ put ("  Extra  : " ++ show kextra)

        -- this often creates an impressive amount of text for what is
        -- only a small change, which is why it is followed by a call
        -- to debug
        --
        forM_ keys $ \key ->
          let vexp = fromJust (HM.lookup key oexp)
              vgot = fromJust (HM.lookup key ogot)
          in when (vexp /= vgot) $ do
            put ""
            report ("Values are different for " ++ show key)
            put ("  Expected: " ++ show vexp)
            put ("  Found   : " ++ show vgot)
          
        putStrLn ""
        report "The field-level differences are:"
        debug_ exp vl


-- Rather than come up with a way to diff JSON here, rely on `jq` and the trusty
-- `diff` command. This is not written to be robust!
--
debug_ spec vl = do
  let tostr = BL8.unpack . encode
  
  expected <- readProcess "jq" [] (tostr spec)
  got <- readProcess "jq" [] (tostr (fromVL vl))

  let f1 = "expected.json"
      f2 = "got.json"
      
  writeFile f1 expected
  writeFile f2 got
  
  let diffOpts = ["--minimal", f1, f2]
  (_, diff, _) <- readProcessWithExitCode "diff" diffOpts ""
  putStrLn diff
  
  forM_ [f1, f2] removeFile

-----

## Error Bars and Error Bands

 - [Error Bars showing Confidence Interval](#Error-Bars-showing-COnfidence-Interval)
 - [Error Bars showing Standard Deviation](#Error-Bars-showing-Standard-Deviation)
 - [Line Chart with Confidence Interval Band](#Line-Chart-with-Confidence-Interval-Band)
 - [Scatterplot with Mean and Standard Deviation Overlay](#Scatterplot-with-Mean-and-Standard-Deviation-Overlay)

---

### Error Bars showing Confidence Interval

From https://vega.github.io/vega-lite/examples/layer_point_errorbar_ci.html

In [5]:
layerPointErrorbarCISpec = [aesonQQ|
{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "data": {"url": "data/barley.json"},
  "encoding": {"y": {"field": "variety", "type": "ordinal"}},
  "layer": [
    {
      "mark": {"type": "point", "filled": true},
      "encoding": {
        "x": {
          "aggregate": "mean",
          "field": "yield",
          "type": "quantitative",
          "scale": {"zero": false},
          "title": "Barley Yield"
        },
        "color": {"value": "black"}
      }
    },
    {
      "mark": {"type": "errorbar", "extent": "ci"},
      "encoding": {
        "x": {"field": "yield", "type": "quantitative", "title": "Barley Yield"}
      }
    }
  ]
}
|]

In [6]:
layerPointErrorbarCI =
    let dvals = dataFromUrl "data/barley.json" []
    
        enc = encoding (position Y [PName "variety", PmType Ordinal] [])
    
        enc1 = encoding
                 . position X [ PName "yield", PmType Quantitative, PAggregate Mean
                              , PTitle "Barley Yield", PScale [SZero False] ]
                 . color [MString "black"]
                 
        lyr1 = [mark Point [MFilled True], enc1 []]
        
        enc2 = encoding (position X [PName "yield", PmType Quantitative, PTitle "Barley Yield"] [])
        
        lyr2 = [mark ErrorBar [MExtent ConfidenceInterval], enc2]
        
        layers = map asSpec [lyr1, lyr2]
        
    in toVegaLite [dvals, enc, layer layers]

vlShow layerPointErrorbarCI

In [7]:
validate layerPointErrorbarCISpec layerPointErrorbarCI

Okay

Return to the [Table of Contents](#Table-of-Contents).

### Error Bars showing Standard Deviation

From https://vega.github.io/vega-lite/examples/layer_point_errorbar_stdev.html

In [8]:
layerPointErrorbarStdevSpec = [aesonQQ|
{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "data": {"url": "data/barley.json"},
  "encoding": {"y": {"field": "variety", "type": "ordinal"}},
  "layer": [
    {
      "mark": {"type": "point", "filled": true},
      "encoding": {
        "x": {
          "aggregate": "mean",
          "field": "yield",
          "type": "quantitative",
          "scale": {"zero": false},
          "title": "Barley Yield"
        },
        "color": {"value": "black"}
      }
    },
    {
      "mark": {"type": "errorbar", "extent": "stdev"},
      "encoding": {
        "x": {"field": "yield", "type": "quantitative", "title": "Barley Yield"}
      }
    }
  ]
}
|]

In [9]:
layerPointErrorbarStdev =
    let dvals = dataFromUrl "data/barley.json" []
    
        enc = encoding (position Y [PName "variety", PmType Ordinal] [])
        enc1 = encoding
                 . position X [ PName "yield", PmType Quantitative
                              , PAggregate Mean, PScale [SZero False], PTitle "Barley Yield" ]
                 . color [MString "black"]
        enc2 = encoding (position X [PName "yield", PmType Quantitative, PTitle "Barley Yield"] [])
        
        lyr1 = [mark Point [MFilled True], enc1 []]
        lyr2 = [mark ErrorBar [MExtent StdDev], enc2]
        
        layers = map asSpec [lyr1, lyr2]
        
    in toVegaLite [dvals, enc, layer layers]

vlShow layerPointErrorbarStdev

In [10]:
validate layerPointErrorbarStdevSpec layerPointErrorbarStdev

Okay

Return to the [Table of Contents](#Table-of-Contents).

### Line Chart with Confidence Interval Band

From https://vega.github.io/vega-lite/examples/layer_line_errorband_ci.html

In [11]:
layerLineErrorbandCISpec = [aesonQQ|
{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "data": {"url": "data/cars.json"},
  "encoding": {
    "x": {
      "field": "Year",
      "type": "temporal",
      "timeUnit": "year"
    }
  },
  "layer": [
    {
      "mark": {"type": "errorband", "extent": "ci"},
      "encoding": {
        "y": {
          "field": "Miles_per_Gallon",
          "type": "quantitative",
          "title": "Mean of Miles per Gallon (95% CIs)"
        }
      }
    },
    {
      "mark": "line",
      "encoding": {
        "y": {
          "aggregate": "mean",
          "field": "Miles_per_Gallon",
          "type": "quantitative"
        }
      }
    }
  ]
}
|]

In [12]:
layerLineErrorbandCI =
    let dvals = dataFromUrl "data/cars.json" []
    
        toEnc channel opts = encoding (position channel opts [])
        
        ytitle = "Mean of Miles per Gallon (95% CIs)"
        enc = toEnc X [PName "Year", PmType Temporal, PTimeUnit Year]
        enc1 = toEnc Y [PName "Miles_per_Gallon", PmType Quantitative, PTitle ytitle]
        enc2 = toEnc Y [PName "Miles_per_Gallon", PmType Quantitative, PAggregate Mean]
        
        lyr1 = [mark ErrorBand [MExtent ConfidenceInterval], enc1]
        lyr2 = [mark Line [], enc2]
        
        layers = map asSpec [lyr1, lyr2]
        
    in toVegaLite [dvals, enc, layer layers]

vlShow layerLineErrorbandCI

In [13]:
validate layerLineErrorbandCISpec layerLineErrorbandCI

Okay

Return to the [Table of Contents](#Table-of-Contents).

### Scatterplot with Mean and Standard Deviation Overlay

From https://vega.github.io/vega-lite/examples/layer_scatter_errorband_1D_stdev_global_mean.html

In [14]:
layerScatterErrorband1DStdevGlobalMeanSpec = [aesonQQ|
{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "description": "A scatterplot showing horsepower and miles per gallons for various cars.",
  "data": {"url": "data/cars.json"},
  "layer": [
    {
      "mark": "point",
      "encoding": {
        "x": {"field": "Horsepower", "type": "quantitative"},
        "y": {"field": "Miles_per_Gallon", "type": "quantitative"}
      }
    },
    {
      "mark": {"type": "errorband", "extent": "stdev", "opacity": 0.2},
      "encoding": {
        "y": {
          "field": "Miles_per_Gallon",
          "type": "quantitative",
          "title": "Miles per Gallon"
        }
      }
    },
    {
      "mark": "rule",
      "encoding": {
        "y": {
          "field": "Miles_per_Gallon",
          "type": "quantitative",
          "aggregate": "mean"
        }
      }
    }
  ]
}
|]

In [15]:
layerScatterErrorband1DStdevGlobalMean =
    let label = description "A scatterplot showing horsepower and miles per gallons for various cars."
        dvals = dataFromUrl "data/cars.json" []
    
        posX = position X [PName "Horsepower", PmType Quantitative]
        posY opts = position Y ([PName "Miles_per_Gallon", PmType Quantitative] ++ opts)
    
        lyr1 = [mark Point [], encoding (posX (posY [] []))]
        lyr2 = [mark ErrorBand [MExtent StdDev, MOpacity 0.2], encoding (posY [PTitle "Miles per Gallon"] [])]
        lyr3 = [mark Rule [], encoding (posY [PAggregate Mean] [])]
    
        layers = map asSpec [lyr1, lyr2, lyr3]

    in toVegaLite [label, dvals, layer layers]

vlShow layerScatterErrorband1DStdevGlobalMean

In [16]:
validate layerScatterErrorband1DStdevGlobalMeanSpec layerScatterErrorband1DStdevGlobalMean

Okay

Return to the [Table of Contents](#Table-of-Contents).

-----

## Box Plots

 - [Box Plot with Min/Max Whiskers](#Box-Plot-with-Min%2FMax-Whiskers)
 - [Tukey Box Plot (1.5 IQR)](#Tukey-Box-Plot-%281.5-IQR%29))

---

### Box Plot with Min/Max Whiskers

From https://vega.github.io/vega-lite/examples/boxplot_minmax_2D_vertical.html

In [17]:
boxplotMinmax2DVerticalSpec = [aesonQQ|
{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "description": "A vertical 2D box plot showing median, min, and max in the US population distribution of age groups in 2000.",
  "data": {"url": "data/population.json"},
  "mark": {
    "type": "boxplot",
    "extent": "min-max"
  },
  "encoding": {
    "x": {"field": "age", "type": "ordinal"},
    "y": {
      "field": "people",
      "type": "quantitative",
      "axis": {"title": "population"}
    }
  }
}
|]

In [18]:
boxplotMinmax2DVertical =
    let label = description "A vertical 2D box plot showing median, min, and max in the US population distribution of age groups in 2000."
        dvals = dataFromUrl "data/population.json" []
        
        markOpts = mark Boxplot [MExtent ExRange]
        enc = encoding
                . position X [PName "age", PmType Ordinal]
                . position Y [PName "people", PmType Quantitative, PAxis [AxTitle "population"]]
        
    in toVegaLite [label, dvals, markOpts, enc []]

vlShow boxplotMinmax2DVertical

In [19]:
validate boxplotMinmax2DVerticalSpec boxplotMinmax2DVertical

Okay

Return to the [Table of Contents](#Table-of-Contents).

### Tukey Box Plot (1.5 IQR)

From https://vega.github.io/vega-lite/examples/boxplot_2D_vertical.html

In [20]:
boxplot2DVerticalSpec = [aesonQQ|
{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "description": "A vertical 2D box plot showing median, min, and max in the US population distribution of age groups in 2000.",
  "data": {"url": "data/population.json"},
  "mark": {
    "type": "boxplot",
    "extent": 1.5
  },
  "encoding": {
    "x": {"field": "age", "type": "ordinal"},
    "y": {
      "field": "people",
      "type": "quantitative",
      "axis": {"title": "population"}
    },
    "size": {"value": 5}
  }
}
|]

In [21]:
boxplot2DVertical =
    let label = description "A vertical 2D box plot showing median, min, and max in the US population distribution of age groups in 2000."
        dvals = dataFromUrl "data/population.json" []
        
        markOpts = mark Boxplot [MExtent (IqrScale 1.5)]
        enc = encoding
                . position X [PName "age", PmType Ordinal]
                . position Y [PName "people", PmType Quantitative, PAxis [AxTitle "population"]]
                . size [MNumber 5]
        
    in toVegaLite [label, dvals, markOpts, enc []]

vlShow boxplot2DVertical

In [22]:
validate boxplot2DVerticalSpec boxplot2DVertical

Okay

Return to the [Table of Contents](#Table-of-Contents).