In [1]:
(require '[clojupyter.javascript.alpha :as cjp-js])
(require '[clojupyter.display :as display])
(require '[clojupyter.misc.helper :as helper])
(require '[clojure.data.json :as json])
(helper/add-dependencies '[org.clojure/data.csv "1.0.0"])
(require '[clojure.data.csv :as csv])
(helper/add-dependencies '[metasoarous/oz "1.5.6"])
(require '[oz.notebook.clojupyter :as oz])
(require '[clojure.java.io :as io])
(require '[clojure.pprint :as pp])
(helper/add-dependencies '[clojure.java-time "0.3.2"])
(require '[java-time :as t])
(require '[clojure.edn :as edn])
(helper/add-dependencies '[panthera "0.1-alpha.13"])
(require '[libpython-clj.python :as py])
(require '[panthera.panthera :as pt])

nil

In [2]:
;; use panthera html display
(defn show
  [obj]
  (display/html
    (py/call-attr obj "to_html")))

(defn show-table
  [m]
  (-> m
      pt/data-frame
      show))

(show-table [{:a 1 :b 2} {:a 3 :b 4}])

Unnamed: 0,a,b
0,1,2
1,3,4


Okay! We're going back to our bike path dataset here. I live in Montreal, and I was curious about whether we're more of a commuter city or a biking-for-fun city -- do people bike more on weekends, or on weekdays?

# 4.1 Adding a 'weekday' column to our dataframe

First, we need to load up the data. We've done this before.

In [3]:
;; bikes = pd.read_csv('../data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')

(def fixed-data
    (with-open [reader (io/reader "../data/bikes.csv" :encoding "ISO-8859-1")]
      (doall
        (csv/read-csv reader :separator \;))))

(defn blank->nil [s]
   (when-not (#{""} s) s))

(defn csv-data->maps [csv-data]
  (map zipmap
       (->> (first csv-data) ;; First row is the header
            (map keyword) ;; Drop if you want string keys instead
            repeat)
       (->> (rest csv-data)
            (map #(map blank->nil %))))) ;; Drop if you want blank strings to stay

(defn col-parser [col-key]
    (if (= :Date col-key) 
         (comp t/format (partial t/local-date "dd/MM/yyyy"))
         edn/read-string))

(def bikes
    (->> fixed-data
         csv-data->maps
         (map #(into {} (map (fn [[k v]] [k ((col-parser k) v)]) %))))) ;; Apply each parser to columns

(->> bikes
     (take 5)
     show-table)

Unnamed: 0,du Parc,Rachel1,Pierre-Dupuy,Berri 1,Maisonneuve 1,Brébeuf (données non disponibles),Date,Côte-Sainte-Catherine,St-Urbain (données non disponibles),Maisonneuve 2
0,26,16,10,35,38,,2012-01-01,0,,51
1,53,43,6,83,68,,2012-01-02,1,,153
2,89,58,3,135,104,,2012-01-03,2,,248
3,111,61,8,144,116,,2012-01-04,1,,318
4,97,95,13,197,124,,2012-01-05,2,,330


In [99]:
;; Python 
;; bikes['Berri 1'].plot()

(oz/view!
  {:data {:values bikes}
  :mark "line"
  :encoding {:x {:field :Date
                 :type "temporal"}
             :y {:field (keyword "Berri 1")
                 :type "quantitative"}}
  :width 800})

Next up, we're just going to look at the Berri bike path. Berri is a street in Montreal, with a pretty important bike path. I use it mostly on my way to the library now, but I used to take it to work sometimes when I worked in Old Montreal. 

So we're going to create a dataframe with just the Berri bikepath in it

In [34]:
;; Python
;; berri_bikes = bikes[['Berri 1']].copy()
;; berri_bikes[:5]

(def berri-bikes 
    (->> bikes
         (map #(select-keys % [:Date (keyword "Berri 1")]))))

(->> berri-bikes
     (take 5)
     show-table)

Unnamed: 0,Date,Berri 1
0,2012-01-01,35
1,2012-01-02,83
2,2012-01-03,135
3,2012-01-04,144
4,2012-01-05,197


If we wanted to get the day of the month for each row, we could do it like this:

In [35]:
;; Python
;; berri_bikes.index.day

(->> berri-bikes
     (map #(t/as (t/local-date (:Date %)) :day-of-month)))

(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5)

We actually want the weekday, though:

In [36]:
;; Python
;; berri_bikes.index.weekday

(->> berri-bikes
     (map #(t/as (t/local-date (:Date %)) :day-of-week)))

(7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1)

These are the days of the week, where 1 is Monday.

Now that we know how to *get* the weekday, we can add it as a column in our dataframe like this:

In [51]:
;; Python
;; berri_bikes.loc[:,'weekday'] = berri_bikes.index.weekday
;; berri_bikes[:5]

(def berri-bikes 
    (->> berri-bikes
         (map #(assoc % :week-day (t/as (t/local-date (:Date %)) :day-of-week)))))

(->> berri-bikes 
    (take 5)
    show-table)

Unnamed: 0,Date,Berri 1,week-day
0,2012-01-01,35,7
1,2012-01-02,83,1
2,2012-01-03,135,2
3,2012-01-04,144,3
4,2012-01-05,197,4


# 4.2 Adding up the cyclists by weekday

This turns out to be really easy!

We have a `groupby` function that is similar to SQL groupby, if you're familiar with that.

In this case, `berri_bikes.groupby('weekday').aggregate(sum)` means "Group the rows by weekday and then add up all the values with the same weekday".

In [124]:
;; Python;
;; weekday_counts = berri_bikes.groupby('weekday').aggregate(sum)
;; weekday_counts

(def weekday-counts
    (->> berri-bikes
         (group-by :week-day)
         (into (sorted-map))
         (map (fn [[k v]] {:week-day k :sum (reduce + (map (keyword "Berri 1") v))}))))

(show-table weekday-counts)

Unnamed: 0,week-day,sum
0,1,134298
1,2,135305
2,3,152972
3,4,160131
4,5,141771
5,6,101578
6,7,99310


It's hard to remember what 1, 2, 3, 4, 5, 6, 7 mean, so we can fix it up and graph it:

In [125]:
;; Python
;; weekday_counts.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
;; weekday_counts

(def weekdays 
    (zipmap (iterate inc 1) ["Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday" "Sunday"]))

(def weekday-counts
    (->> weekday-counts
         (map #(update % :week-day (partial get weekdays)))))

(show-table weekday-counts)

Unnamed: 0,week-day,sum
0,Monday,134298
1,Tuesday,135305
2,Wednesday,152972
3,Thursday,160131
4,Friday,141771
5,Saturday,101578
6,Sunday,99310


In [129]:
;; Python 
;; weekday_counts.plot(kind='bar')

(oz/view!
  {:data {:values weekday-counts}
  :mark "bar"
  :encoding {:x {:field :week-day
                 :type "nominal"
                 :sort false}
             :y {:field :sum
                 :type "quantitative"}}
  :width 800})

So it looks like Montrealers are commuter cyclists -- they bike much more during the week. Neat!

# 4.3 Putting it together

Let's put all that together, to prove how easy it is. 6 lines of magical pandas!

If you want to play around, try changing `sum` to `max`, `numpy.median`, or any other function you like.

In [132]:
;; Python
;; bikes = pd.read_csv('../data/bikes.csv', 
;;                     sep=';', encoding='latin1', 
;;                     parse_dates=['Date'], dayfirst=True, 
;;                     index_col='Date')
;; # Add the weekday column
;; berri_bikes = bikes[['Berri 1']].copy()
;; berri_bikes.loc[:,'weekday'] = berri_bikes.index.weekday

;; # Add up the number of cyclists by weekday, and plot!
;; weekday_counts = berri_bikes.groupby('weekday').aggregate(sum)
;; weekday_counts.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
;; weekday_counts.plot(kind='bar')

(def fixed-data
    (with-open [reader (io/reader "../data/bikes.csv" :encoding "ISO-8859-1")]
      (doall
        (csv/read-csv reader :separator \;))))

(defn blank->nil [s]
   (when-not (#{""} s) s))

(defn csv-data->maps [csv-data]
  (map zipmap
       (->> (first csv-data) ;; First row is the header
            (map keyword) ;; Drop if you want string keys instead
            repeat)
       (->> (rest csv-data)
            (map #(map blank->nil %))))) ;; Drop if you want blank strings to stay

(defn col-parser [col-key]
    (if (= :Date col-key) 
         (comp t/format (partial t/local-date "dd/MM/yyyy"))
         edn/read-string))

(def bikes
    (->> fixed-data
         csv-data->maps
         (map #(into {} (map (fn [[k v]] [k ((col-parser k) v)]) %))))) ;; Apply each parser to columns

(def weekdays 
    (zipmap (iterate inc 1) ["Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday" "Sunday"]))

(def weekday-counts
    (->> bikes
         (map #(select-keys % [:Date (keyword "Berri 1")])) ;; berri-bikes
         (map #(assoc % :week-day (t/as (t/local-date (:Date %)) :day-of-week))) ;; Add day-of-week
         (group-by :week-day) ;; Group by weekday
         (into (sorted-map)) ;; Sort by weekday
         (map (fn [[k v]] {:week-day k :sum (reduce + (map (keyword "Berri 1") v))})) ;; 
         (map #(update % :week-day (partial get weekdays))))) ;; Update the week-day to use 

(oz/view!
  {:data {:values weekday-counts}
  :mark "bar"
  :encoding {:x {:field :week-day
                 :type "nominal"
                 :sort false}
             :y {:field :sum
                 :type "quantitative"}}
  :width 800})