# Investigate eurostat's data about waste


In [3]:
; Add code libraries.

%classpath add mvn org.clojure data.csv 1.0.0
(require '[clojure.data.csv :as csv])

(require '[clojure.java.io :as io])
(require '[clojure.pprint :as pp])
(require '[clojure.edn :as edn])
(require '[clojure.string :as str])

(import 'java.time.LocalDate)
(import 'com.twosigma.beakerx.chart.xychart.TimePlot
        'com.twosigma.beakerx.chart.xychart.plotitem.Line)

class com.twosigma.beakerx.chart.xychart.plotitem.Line

In [4]:
; Define convenience functions.

; Convert the CSV structure to a list-of-maps structure.
(defn to-maps [csv-data]
    (map zipmap (->> (first csv-data)
                    ;(map keyword)
                    repeat)
                (rest csv-data)))

#'beaker_clojure_shell_c01718b8-c9b5-4992-b730-63b6491ac5f1/to-maps

Interesting eurostat URLs:
* [splash page](https://ec.europa.eu/eurostat/)
* [a nice tree-view navigation of eurostat datasets](https://ec.europa.eu/eurostat/data/database) 

Let's investigate a particular dataset: 
> Municipal waste by waste management operations (env_wasmun) (Last update: 24-02-2020)

* [view this municipal waste dataset](http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=env_wasmun)
* [view this municipal waste dataset's metadata](https://ec.europa.eu/eurostat/cache/metadata/en/env_wasmun_esms.htm)
* [download this municipal waste dataset](https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/env_wasmun.tsv.gz) 


In [5]:
; I downloaded the file: https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/env_wasmun.tsv.gz
; into this directory and gunzip-ed it.
; The resulting file is: ./env_wasmun.tsv

(def csv-file "env_wasmun.tsv")

; Read the CSV file
(def csv-data
    (with-open [reader (io/reader csv-file)]
        (doall
            (csv/read-csv reader :separator \tab))))

(def data0 (to-maps csv-data))

; Sample
(repeatedly 10 #(rand-nth data0))

##### Aside

I'm wrangling this data in quite a manual way.   
A future iteration of this might look at using the dataset's SDMX file and converting that into linked data.      
For example, using [this software tool](https://github.com/csarven/linked-sdmx) to "transform Generic and Compact SDMX 2.0 data and metadata to RDF/XML using the RDF Data Cube and related vocabularies for statistical Linked Data".


In [6]:
; The dimensions of the dataset are unit, wst-oper, geo  & time.

(def unit-labels {
"KG_HAB" "Kilograms per capita"
"THS_T" "Thousand tonnes"})

(def wst-oper-labels {
"GEN" "Waste generated"
"TRT" "Waste treatment"
"DSP_I_RCV_E" "Disposal - incineration (D10) and recovery - energy recovery (R1)"
"DSP_L_OTH" "Disposal - landfill and other (D1-D7, D12)"
"DSP_I" "Disposal - incineration (D10)"
"RCV_E" "Recovery - energy recovery (R1)"
"RCY_M" "Recycling - material"
"RCY_C_D" "Recycling - composting and digestion"})

(def geo-labels {
"EU27_2020" "European Union - 27 countries (from 2020)"
"EU28" "European Union - 28 countries (2013-2020)"
"EU27_2007" "European Union - 27 countries (2007-2013)"
"BE" "Belgium"
"BG" "Bulgaria"
"CZ" "Czechia"
"DK" "Denmark"
"DE" "Germany (until 1990 former territory of the FRG)"
"EE" "Estonia"
"IE" "Ireland"
"EL" "Greece"
"ES" "Spain"
"FR" "France"
"HR" "Croatia"
"IT" "Italy"
"CY" "Cyprus"
"LV" "Latvia"
"LT" "Lithuania"
"LU" "Luxembourg"
"HU" "Hungary"
"MT" "Malta"
"NL" "Netherlands"
"AT" "Austria"
"PL" "Poland"
"PT" "Portugal"
"RO" "Romania"
"SI" "Slovenia"
"SK" "Slovakia"
"FI" "Finland"
"SE" "Sweden"
"UK" "United Kingdom"
"IS" "Iceland"
"NO" "Norway"
"CH" "Switzerland"
"ME" "Montenegro"
"MK" "North Macedonia"
"AL" "Albania"
"RS" "Serbia"
"TR" "Turkey"
"BA" "Bosnia and Herzegovina"
"XK" "Kosovo (under United Nations Security Council Resolution 1244/99)"})

#'beaker_clojure_shell_c01718b8-c9b5-4992-b730-63b6491ac5f1/geo-labels

In [7]:
; Parse each row's wst-oper, unit & geo dimension values.

(def data1 (->> data0
            (map (fn [row]
                    (let [dim-vals-str-key "wst_oper,unit,geo\\time" ; the column containing concatenated values from 3 dimensions
                          dim-vals-str (get row dim-vals-str-key) ; get the string containing the 3 concatenated values
                          [wst-oper unit geo] (str/split dim-vals-str #",")] ; get the individual 3 values
                          (-> row
                              (dissoc dim-vals-str-key) ; remove the concatenated values
                              (assoc :wst-oper wst-oper ; add the individual 3 values
                                     :geo geo
                                     :unit unit)))))))
    
; Sample
(repeatedly 10 #(rand-nth data1))

In [15]:
; Further filtering, reshaping and flagging.

(def years (->> data1 
                first
                keys
                (filter #(not (keyword? %)))))

(def data2 (->> data1
            (filter #(= "THS_T" (:unit %))) ; Just interested in the 1000's of tonnes measure
            (filter #(not (str/starts-with? (:geo %) "EU2"))) ; Ignore the EU aggregations
            (map (fn [row] ; Split each per-country row into per-country-per-year rows
                    (for [year years]
                         (-> row
                            (select-keys [:wst-oper :unit :geo])
                            (assoc :year (str/trim year))
                            (assoc :tonnage (get row year))))))
            flatten))
                        
(def flags { ; measure values may have been flagged - add flag labels for completeness...
    "b" "break in time series"
    "c" "confidential"
    "d" "definition differs, see metadata"
    "e" "estimated"
    "f" "forecast"
    "n" "not significant"
    "p" "provisional"
    "r" "revised"
    "s" "Eurostat estimate"
    "u" "low reliability"
    "z" "not applicable"})

(def data3 
    (map 
        (fn [row]
            (let [v (str/trim (:tonnage row))
                  [t f] (cond
                            (= ":" v) [nil nil] ;
                            (not (str/includes? v " ")) [v nil]
                            :else (let [ix (.indexOf v " ")] [(subs v 0 ix) (subs v (inc ix))]))]
                (assoc row 
                    :tonnage t
                    :flag (get flags f))))
        data2))
            
; Sample
(repeatedly 10 #(rand-nth data3))

In [16]:
; Calculate the tonnage of waste recycled for each country:
;   recycled-tonnage = RCY_M + RCY_C_D

(def data4 ; calculate the recycled tonnages
    (->> data3
        (filter #(contains? #{"RCY_M" "RCY_C_D"} (:wst-oper %))) ; just the recycled values
        (group-by (juxt :geo :year)) ; group by country-year
        (map (fn [[[geo year] group]] 
            {:geo geo
             :year year
             :recycled-tonnage (->> group
                                    (map :tonnage)
                                    (map #(edn/read-string %)) ; string -> number
                                    (map #(if (nil? %) 0 %)) ; nil -> 0
                                    (reduce +))})))) ; for each country-year, sum the collected RCY_M and RCY_C_D tonnages

; Sample
(repeatedly 10 #(rand-nth data4))

In [18]:
; Calculate the percentage of waste recycled for each country:
;   recycled-percentage = (recycled-tonnage / generated-tonnage) * 100 = ((RCY_M + RCY_C_D) / GEN) * 100
             
(defn find-generated-tonnage [geo year]
    (let [v (->> data3
                (filter #(= (:wst-oper %) "GEN")) 
                (filter #(= (:geo %) geo))
                (filter #(= (:year %) year))
                first
                :tonnage
                edn/read-string)] ; string -> number
        (if (nil? v)
            0 ; nil -> 0
            v)))
             
(def data5 ; calculate the recycled percentages
    (map #(let [generated-tonnage (find-generated-tonnage (:geo %) (:year %))
                recycled-percentage (if (= 0 generated-tonnage) ; don't divide by zero!
                                        0
                                        (* (double (/ (:recycled-tonnage %) generated-tonnage)) 100))] ; i.e. (recycled-tonnage / geneated-tonnage) * 100
                (assoc % 
                       :generated-tonnage generated-tonnage
                       :recycled-percentage recycled-percentage))
         data4))
         
(def data6 ; decorate with the actual names of the countries
    (map 
        #(assoc % :geo-label (get geo-labels (:geo %)))
        data5))
         
; Sample
(repeatedly 10 #(rand-nth data6))

In [19]:
; Plot an info-graphic of the percentage of waste recycled per country per year

(def lines
    (->> data6
        (group-by :geo-label)
        (map (fn [[geo-label coll1]]
                (let [coll2 (sort-by :year coll1)]
                    {:label geo-label
                     :x-series (->> coll2 
                         (map :year) 
                         (map (fn [yyyy] (LocalDate/parse (str yyyy "-12-31")))))
                     :y-series (->> coll2
                         (map :recycled-percentage))})))))

(def time-plot
    (doto (TimePlot.)
        (.setTitle "Percentage of waste recycled per country per year")
        (.setXLabel "Year")
        (.setYLabel "Percentage recycled")))
(doseq [line lines]
    (.add time-plot (doto (Line.)
                        (.setDisplayName (:label line))
                        (.setX (:x-series line))
                        (.setY (:y-series line)))))
time-plot

The following European Environment Agency (EEA) graph uses the same underlying dataset.

<a href="https://www.eea.europa.eu/data-and-maps/daviz/municipal-waste-recycled-and-composted-3#tab-chart_3"><img alt="chart_3" src="https://www.eea.europa.eu/data-and-maps/daviz/municipal-waste-recycled-and-composted-3/chart_3.png" /><div style="clear:both"></div>Go to original visualization</a>

The EEA graph's 2004 & 2016 figures corroborate the figures that we've calculated (given that the EEA's calculations include various adjustments for changes in collection methodologies etc.).