# An Alternate Universe

In an alternate universe there exists a language named Clojure that makes programming complex things enjoyable and fun.  It has a very strong functional background and lots of great libraries and some intelligent users.  Here it enjoys the popularity it deserves and has all the great things written for you.


In this universe, we can do basic statistical analysis and we can easily write extensions to the system we are working with in the language we like, they run instantly and create joy.  Let's take a short walk through that universe.

## A Short Digression: LISP

Clojure is an implementation of a dialect of languages named LISP.  This version of LISP runs on the jvm.  Being a LISP means that it is homoiconic, or that programs in clojure are expressed in the datastructures of clojure.  Many great minds have worked to design this modern LISP and it has an extremely well though out foundation.   

LISP as a language was designed by John McCarthy in 1955.  You can find some interesting information about it [here](http://www.paulgraham.com/rootsoflisp.html).  It was the first language with garbage collection and also was the first high level language assuming you do not believe that fortran is a high level language.

Let's take a second and familiarize with things here.

In [None]:
;; The language is described using its datastructures...

;; This is a list:

(list 1 2 3)

In [None]:
;; This is also a list

'(1 2 3)

In [None]:
;; Lists can be executed.  If the reader finds a list that isn't 'escaped'
;; it will ask the compiler to output code to execute the expression.

(+ 1 2 3)

In [None]:
;; Vectors in clojure are functional datastructures
;; that have very good random access semantics.

[1 2 3 4]

In [None]:
;; They can work like functions

([4 1 3 2] 0)

In [None]:
;; Keywords are global constants, like names and they start with a ':'

:one

In [None]:
;; Maps (functional equivalent of a python dictionary) are curly braced:

{:a 1 :b 2}

In [None]:
;; You can find things in maps in lots of ways, but maps with keyword keys are
;; more special than other maps.

;; Retrieving the value stored under key "one"
(println (get {"one" 1 "two" 2} "one"))

;; Retrieving the value stored under the keyword key :one, we can use the
;; function syntax.
(:one {:one "hi" :two "bye"})

In [None]:
;; If you add something to a vector or a map, you get a new map that 
;; shares structure with the previous item.  This means the previous 
;; item is unchanged

(def first-list [1 2 3 4])

(println (conj first-list 5))
(println (assoc first-list 0 6))
first-list

#### And Off We Go!

In the interests of time, we have to move on.  I hope you liked our small intro and that it allows the next section to be perhaps less mysterious

## We Get The Things

In clojure, the dependencies are expressed by the program and not as much by the environment.  So this means that you don't install things to your environment to run them.  Instead, you include the dependency in your project. 

In [None]:
;;Grab the big things we will be using.
(require '[clojupyter.misc.helper :as helper])
;;these are order dependent because oz indirectly includes jna 3.2.2 and 
;; tech.jna requires jna 5.0.0
(helper/add-dependencies '[techascent/tech.ml "0.27"])
(helper/add-dependencies '[metasoarous/oz "1.6.0-alpha2"])
(helper/add-dependencies '[cnuernber/garmin-fit-clj "0.1"])
(require '[cemerick.pomegranate :as pg])

;;We need the fit jar to make this all work.
(pg/add-classpath "resources/fit.jar")

;;You didn't want to see that output anyway, trust me...
:ok

In [None]:
;; I wrote a library for you guys!!
(require '[cnuernber.garmin-fit :as fit])


(require '[tech.ml.dataset :as ds])
(require '[tech.ml.dataset.column :as ds-col])
(require '[tech.ml.dataset.etl :as etl])
(require '[tech.compute.tensor.functional :as tens-fun])
(require '[clojure.core.matrix :as m])
(require '[clojure.set :as c-set])
(require '[clojure.pprint :as pp])
(require '[oz.notebook.clojupyter :as oz])
(import '[java.time Duration])


## Sequences

We load the data.  This produces a datastructure called a sequence.  This is similar to a singly-linked list but it isn't constructed of nodes.  It is a functional datastructure, however, and we can manipulate it in various ways.
The file is a sequence of maps and the `:event-type` member of each map tells you what you are looking at.  One thing to note is that sequences are lazy so this doesn't load the entire file.  It loads only what you are looking at.

In [None]:
(def test-fname "data/activities/81623728.fit.gz")

(pp/pprint (take 6 (fit/decode test-fname)))

## Functions

We now define our first function.  It takes a filename and produces a dataset consisting of records of a given type.  The dataset system knows how to take a sequence of maps (what we produced above) and translate those maps into a column-major dataset object that stores the column data efficiently.  We do have to make sure the values in each map are scalars which they should be logically for this dataset.

In [None]:
(defn fit-file->dataset
  [fname]
  (ds/->dataset
   (->> (fit/decode fname)
        (filter #(= :record-message (:event-type %)))
        (map (fn [record-data]
               (->> (dissoc record-data :event-type)
                    (map (fn [[k v]]
                           [k (if (sequential? v)
                                (first v)
                                v)]))
                    (into {})))))
   {:table-name fname}))


(defonce test-ds (fit-file->dataset fit/test-fname))


(ds/select test-ds :all (range 5))

## Moar Functions!!

1. We have to do a bit of dataset processing.  For this dataset we choose to just drop any rows with missing data.
2. Next we have a function to find the overall duration of the dataset which is expressed as `(- (max :timestamp) (min :timestamp))` 
3.  Some work to convert from semicircular coordinates to lat-long and build a pipeline of sequential processing commands.

In [None]:
(defn drop-missing
  [dataset]
  (let [missing-indexes (->> (ds/columns-with-missing-seq dataset)
                             (mapcat (fn [{:keys [column-name]}]
                                       (-> (ds/column dataset column-name)
                                           ds-col/missing)))
                             set)]
    (ds/select dataset :all  (->> (range (second (m/shape dataset)))
                                  (remove missing-indexes)))))


(defn ds-duration
  "Duration in seconds of the entire ride."
  [dataset]
  (let [{act-min :min
         act-max :max}
        (-> (ds/column dataset :timestamp)
            (ds-col/stats [:min :max]))]
    (Duration/ofSeconds
     (- (long act-max) (long act-min)))))

(def semi->deg
  (/ 180.0
     (Math/pow 2 31)))


(def lat-lon [:position-lat :position-long] )


(def load-pipeline
  [['m= lat-lon '(* (col) 8.381903171539307E-8)]
   '[m= :altitude-norm (/ (- (col :altitude) (min (col :altitude)))
                          (- (max (col :altitude)) (min (col :altitude))))]
   '[m= :speed-mph (* (col :speed) 2.23694)]])


(defn run-pipeline
  [dataset & {:keys [target] :as options}]
  (-> (drop-missing dataset)
      (etl/apply-pipeline load-pipeline options)
      :dataset
      ;;Make handling the data in the browser sane.
      (#(ds/ds-take-nth 10 %))
      (etl/apply-pipeline '[[m= :speed-avg (rolling 20 :mean (col :speed-mph))]
                            [m= :power-avg (rolling 20 :mean (col :power))]
                            [m= :cadence-avg (rolling 20 :mean (col :cadence))]
                            [m= :minutes-from-start (/ (- (col :timestamp)
                                                          (min (col :timestamp)))
                                                       60)]]
                          {})))



## Run The Functions!!

We now process the dataset and produce a new dataset.  Dataset processing is functional so processing a column produces a new column.  Processing a dataset producess...Well, a few things.  One of them is a new dataset.

Another is a new pipeline with the intermediate values calculated embedded datastructure.  So you can use the produced pipeline in production without having to reengineer it.

In [None]:
(def processed-pipeline (run-pipeline test-ds))

(def processed-ds (:dataset processed-pipeline))

(ds/select processed-ds :all (range 5))

## That Was Great!  Now What?

We take our dataset object and convert it back into a sequence of maps.  Sequences of maps are something that clojure's core algorithm facilities handle robustly so if we can we would like to be speaking in that language.

In [None]:
(def all-the-data (-> (ds/select processed-ds
                                 (concat lat-lon
                                         [:timestamp :altitude :power :speed-mph
                                          :minutes-from-start
                                          :cadence :power-avg :speed-avg :cadence-avg])
                                 :all)
                      (ds/->flyweight)))

(pp/pprint (first all-the-data))

At this point we calculate some intermediate values which we will use when visualizing the data.

In [None]:
(def timestamp-data (ds-col/stats (ds/column processed-ds :timestamp)
                                  [:min :max]))

(def minutes-range (ds-col/stats (ds/column processed-ds :minutes-from-start)
                                  [:min :max]))

(def altitude-data (ds-col/stats (ds/column processed-ds
                                            :altitude)
                                 [:min :max]))

(def latitude-range (mapv (ds-col/stats (ds/column processed-ds :position-lat)
                                        [:min :max])
                          [:min :max]))

(def longitude-range (mapv (ds-col/stats (ds/column processed-ds :position-long)
                                         [:min :max])
                           [:min :max]))

(pp/pprint [timestamp-data minutes-range altitude-data])

## Off The Charts!

We take our processed dataset and our sequence of maps and output something called hiccup.  Hiccup really is HTML but encoded in clojure datastructures and keywords.  So you have a vector of intermixed more vectors and keywords.  Some strings.  And a few maps.

That is it, however.  That really is the absolute core of clojure.  A datastructure.  And powerful tools to transform from one datastructure to another datastructure.  On top of this we can actually build anything.

But I Digress...


### Vega and Vega-lite


Now we get into Vega-lite.  This is documented library to make building graphs and charts using the vega visual language easier.  Vega is built on top of d3 and takes inspiration from libraries such as R's ggplot2 library. For more information on vega-lite, go [here](https://vega.github.io/vega-lite/).

We show all the features here because we are running out of time. The most important takeaway is that regardless of language you speak to vega (and vega-lite) in terms of datastructures.  

In [None]:
(defn duration->str
  [^Duration dur]
  (.toString dur))

(def chart-width 600)
(def chart-height 150)


(def view-ds
  [:div
   [:h2 (format "Behold - %s - %s"
                (ds/dataset-name processed-ds)
                (duration->str (ds-duration processed-ds)))]
   [:h3 "Dashboard"]
   [:vega-lite {:data {:values all-the-data}
                :vconcat [{:projection {:type :albersUsa}
                           :width chart-width
                           :height chart-height
                           :mark :circle
                           :encoding {:latitude {:field (first lat-lon)
                                                 :type :quantitative}
                                      :longitude {:field (second lat-lon)
                                                  :type :quantitative}
                                      :color {:condition {:selection :times
                                                          :field :altitude
                                                          :type :quantitative
                                                          :scale {:range [:darkblue :lightblue]}}
                                               :value :lightgreen}}}
                          {
                           :hconcat
                           [{:width (/ chart-width 2)
                             :height chart-height
                             :mark :point
                             :selection {:times {:type :interval}}
                             :encoding
                             {:x {:field :minutes-from-start
                                  :type :quantitative
                                  :scale {:domain
                                          [(:min minutes-range)
                                           (:max minutes-range)]}}
                              :y {:field :altitude
                                  :type :quantitative
                                  :scale {:domain [(:min altitude-data)
                                                   (:max altitude-data)]}}
                              :color {:field :altitude
                                      :type :quantitative
                                      :scale {:range [:darkblue :lightblue]}}}}
                            {:layer [{:width (/ chart-width 2)
                                      :height chart-height
                                      :mark :point
                                      :selection {:times {:type :interval}}
                                      :encoding {:x {:field :minutes-from-start
                                                     :type :quantitative
                                                     :scale {:domain
                                                             [(:min minutes-range)
                                                              (:max minutes-range)]}}
                                                 :y {:field :speed-mph
                                                     :type :quantitative}}}
                                     {:mark {:type :line
                                             :color :yellow}
                                      :encoding {:x {:field :minutes-from-start
                                                     :type :quantitative
                                                     :scale {:domain
                                                             [(:min minutes-range)
                                                              (:max minutes-range)]}}
                                                 :y {:field :speed-avg
                                                     :type :quantitative}}}]}]}
                          {:hconcat
                           [{:layer [{:width (/ chart-width 2)
                                      :height chart-height
                                      :mark :point
                                    :selection {:times {:type :interval}}
                                    :encoding {:x {:field :minutes-from-start
                                                   :type :quantitative
                                                   :scale {:domain
                                                           [(:min minutes-range)
                                                            (:max minutes-range)]}}
                                               :y {:field :power
                                                   :type :quantitative}}}
                                     {:width (/ chart-width 2)
                                      :height chart-height
                                      :mark {:type :line
                                           :color :yellow}
                                    :encoding {:x {:field :minutes-from-start
                                                   :type :quantitative
                                                   :scale {:domain
                                                           [(:min minutes-range)
                                                            (:max minutes-range)]}}
                                               :y {:field :power-avg
                                                   :type :quantitative}}}]}
                            {:layer [{:width (/ chart-width 2)
                                      :height chart-height
                                      :mark :point
                                      :selection {:times {:type :interval}}
                                      :encoding {:x {:field :minutes-from-start
                                                     :type :quantitative
                                                     :scale {:domain
                                                             [(:min minutes-range)
                                                              (:max minutes-range)]}}
                                                 :y {:field :cadence
                                                     :type :quantitative}}}
                                     {:width (/ chart-width 2)
                                      :height chart-height
                                      :mark {:type :line
                                             :color :yellow}
                                      :encoding {:x {:field :minutes-from-start
                                                     :type :quantitative
                                                     :scale {:domain
                                                             [(:min minutes-range)
                                                              (:max minutes-range)]}}
                                                 :y {:field :cadence-avg
                                                     :type :quantitative}}}]}]}

                          ]}]])

In [None]:
(first view-ds)

In [None]:
(second view-ds)

## And It Is Done!

Viewing the charts is simple.  And check it out--interactivity works!!  Selecting data on any of the scatter/line charts below will change the display of the geographic chart.

In [None]:
(oz/view! view-ds)