# Differential Privacy demo in Clojure

This notebook shows how to use differential privacy algorithms available in [differential-privacy-clj](https://github.com/OpenMined/differential-privacy-clj) (a wrapper for [libdifferentialprivacy](https://github.com/google/differential-privacy)). See doc/README.md and doc/clojure/README.md if you can't make it run.


The content of this notebook is based on [Animals and Carrots](https://github.com/google/differential-privacy/blob/e819e03a20f9d7b0a30f2547c00ba74065b3f549/examples/cc/report_the_carrots.cc) from Google's differential-privacy library.

This takes care of dependencies, classpath and libraries loading:

In [1]:
%classpath add mvn com.google.protobuf protobuf-java 3.11.4
%classpath add mvn org.apache.commons commons-math3 3.6.1
%classpath add mvn com.google.guava guava 28.2-jre
%classpath add mvn org.clojure data.csv 1.0.0


;; Download two jars from a server (temporary solution before release)
%classpath add mvn commons-io commons-io 2.6
(import (org.apache.commons.io FileUtils)
        (java.io File)
        (java.net URL))

(def dpUrl "http://replomancer.net/OpenMined/libdifferentialprivacy-1.0.jar")
(def dpFile (str (System/getProperty "java.io.tmpdir") "/differentialprivacy-1.0.jar"))
(FileUtils/copyURLToFile (URL. dpUrl) (File. dpFile))
%classpath add dynamic dpFile

(def dp2Url "http://replomancer.net/OpenMined/differential-privacy-clj-0.2.0-SNAPSHOT.jar")
(def dp2File (str (System/getProperty "java.io.tmpdir") "/differential-privacy-clj-0.2.0-SNAPSHOT.jar"))
(FileUtils/copyURLToFile (URL. dp2Url) (File. dp2File))
%classpath add dynamic dp2File



(require '[clojure.data.csv :as csv]
         '[clojure.java.io :as io]
         '[differential-privacy-clj.core :as dp])

null

First we load the carrots data from a CSV file:

In [2]:
(def carrots-consumption-data
  (with-open [reader (io/reader "../example_data/carrots_demo/animals_and_carrots.csv")]
    (let [data (csv/read-csv reader)]
      (mapv (fn [[animal consumption]] (Double/parseDouble consumption)) data))))

carrots-consumption-data

[1.0, 88.0, 35.0, 99.0, 69.0, 14.0, 77.0, 53.0, 94.0, 67.0, 92.0, 87.0, 70.0, 31.0, 14.0, 14.0, 61.0, 57.0, 68.0, 13.0, 21.0, 38.0, 92.0, 39.0, 46.0, 36.0, 23.0, 76.0, 8.0, 69.0, 35.0, 83.0, 40.0, 74.0, 17.0, 77.0, 52.0, 31.0, 14.0, 40.0, 46.0, 99.0, 44.0, 15.0, 89.0, 36.0, 98.0, 20.0, 56.0, 90.0, 5.0, 75.0, 56.0, 23.0, 49.0, 83.0, 55.0, 22.0, 7.0, 16.0, 91.0, 80.0, 21.0, 56.0, 10.0, 28.0, 29.0, 19.0, 73.0, 45.0, 5.0, 20.0, 28.0, 45.0, 39.0, 64.0, 22.0, 7.0, 30.0, 10.0, 48.0, 60.0, 73.0, 82.0, 96.0, 82.0, 38.0, 84.0, 39.0, 12.0, 75.0, 75.0, 45.0, 87.0, 91.0, 33.0, 40.0, 0.0, 67.0, 63.0, 16.0, 93.0, 19.0, 72.0, 46.0, 73.0, 98.0, 86.0, 3.0, 64.0, 94.0, 75.0, 2.0, 87.0, 74.0, 79.0, 56.0, 51.0, 77.0, 81.0, 42.0, 90.0, 96.0, 4.0, 58.0, 73.0, 27.0, 56.0, 80.0, 10.0, 35.0, 86.0, 100.0, 16.0, 7.0, 30.0, 84.0, 50.0, 86.0, 21.0, 15.0, 66.0, 75.0, 71.0, 56.0, 52.0, 99.0, 45.0, 84.0, 99.0, 51.0, 37.0, 96.0, 90.0, 92.0, 80.0, 96.0, 31.0, 39.0, 2.0, 68.0, 53.0, 47.0, 82.0, 51.0, 57.0, 10.0, 28.0, 91

# Farmer Fred

It is a new day. Farmer Fred is ready to ask the animals about their carrot consumption.

Here's the initial value of privacy budget (epsilon) with some other settings:

In [3]:
;; We set a very high value here to improve accuracy on the small example dataset.
(def privacy-budget (atom 4.0))

(def query-epsilon 1.0)  ;; default amount of privacy budget we use per query

;; Unlike the C++ library the Java version currently does not support
;; privately inferred bounds so these have to be set manually:
(def lower-bound 0.0)
(def upper-bound 100.0)
(def max-partitions 1)
(def max-contributions 1)

#'beaker_clojure_shell_fe4e9c81-6c23-4735-982c-a1f06d4f167f/max-partitions

Farmer Fred asks the animals how many total carrots they have eaten. The animals know the true sum but report the differentially private sum to Farmer Fred. But first, they ensure that Farmer Fred still has privacy budget left.

In [4]:
(println "\nRemaining privacy budget:" @privacy-budget)

(if (> query-epsilon @privacy-budget)
    (println "Not enough privacy budget left!")
    (let [true-sum (reduce + carrots-consumption-data)
          dp-sum (dp/bounded-sum carrots-consumption-data
                                 :lower lower-bound
                                 :upper upper-bound
                                 :max-partitions-contributed max-partitions
                                 :epsilon query-epsilon)]
        (swap! privacy-budget - query-epsilon)
        (println "True sum:" true-sum)
        (printf "DP sum: %.2f" dp-sum)
        (flush)))


Remaining privacy budget: 4.0
True sum: 9649.0
DP sum: 9566.95

null

Farmer Fred catches on that the animals are giving him DP results.

He asks for the mean number of carrots eaten, (TODO: "but this time, he wants some additional accuracy information to build his intuition." requires features yet unavailable in Java).

In [5]:
(println "\nRemaining privacy budget:" @privacy-budget)

(if (> query-epsilon @privacy-budget)
    (println "Not enough privacy budget left!")
    (let [true-mean (/ (reduce + carrots-consumption-data) (count carrots-consumption-data))
          dp-mean (dp/bounded-mean carrots-consumption-data
                                   :lower lower-bound
                                   :upper upper-bound
                                   :max-contributions-per-partition max-contributions
                                   :max-partitions-contributed max-partitions
                                   :epsilon query-epsilon)]
        (swap! privacy-budget - query-epsilon)
        (printf "True mean: %.2f\n" true-mean)
        (printf "DP mean: %.2f\n" dp-mean)
        (flush)))



Remaining privacy budget: 3.0
True mean: 53.02
DP mean: 51.87


null

Fred wonders how many gluttons are in his zoo. How many animals ate over 90 carrots?

In [6]:
(println "\nRemaining privacy budget:" @privacy-budget)

(if (> query-epsilon @privacy-budget)
    (println "Not enough privacy budget left!")
    (let [true-count (count (filter #(> % 90) carrots-consumption-data))
          dp-count (dp/count (filter #(> % 90) carrots-consumption-data)
                             :max-partitions-contributed max-partitions
                             :epsilon query-epsilon)]
        (swap! privacy-budget - query-epsilon)
        (println "True count:" true-count)
        (println "DP count: " dp-count)))


Remaining privacy budget: 2.0
True count: 21
DP count:  22


null

If you rerun one of the previous cells a couple of times you will eventually see the privacy budget running out.