Updating the similarity module functions in the Apps layer (#390)

* update similarity.py * updating test filefor new functions * update similarity tutorial * First round of comments * small changes * gaussian backend error * small change to n_mean * change event numbers in tutorial * adding non-trivial tests * Tom review and comments * updating tests from JM comments * fixing some tests for travis CI * Fix merge * Update feature_vector_orbits example * Update feature_vector_events example * Fix line widths * Update changelog * Revert "Fix line widths" This reverts commit 56b07d5. * remove hafnina expression Co-authored-by: trbromley <brotho02@gmail.com> Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>
XanaduAI · Jun 18, 2020 · 9713d05 · 9713d05
1 parent 6e2c338
commit 9713d05
Show file tree

Hide file tree

Showing 4 changed files with 855 additions and 315 deletions.
diff --git a/.github/CHANGELOG.md b/.github/CHANGELOG.md
@@ -2,6 +2,10 @@
 
 <h3>New features since last release</h3>
 
+* Feature vectors of graphs can now be calculated exactly in the `similarity` module of the
+  applications layer.
+  [(#390)](https://github.com/XanaduAI/strawberryfields/pull/390)
+
 * Adds the `apps.qchem.dynamics` module for simulating vibrational quantum dynamics in molecules.
   The `dynamics.evolution()` function provides a custom operation that encodes the input chemical
   information for use in a Strawberry Fields `Program`. The `sample_fock()` function allows for

diff --git a/examples_apps/run_tutorial_similarity.py b/examples_apps/run_tutorial_similarity.py
@@ -144,7 +144,7 @@
 ##############################################################################
 # Now that we have mastered orbits and events, how can we make a feature vector? It was shown in
 # :cite:`schuld2019quantum` that one way of making a feature vector of a graph is through the
-# frequencies of events. Specifically, for a :math:`k` photon event :math:`E_{k, n_{\max}}`
+# frequencies of orbits or events. For example, for a :math:`k` photon event :math:`E_{k, n_{\max}}`
 # with maximum count per mode :math:`n_{\max}` and corresponding probability :math:`p_{k,
 # n_{\max}}:=p_{E_{k, n_{\max}}}(G)` with respect to a graph :math:`G`, a feature vector can be
 # written as
@@ -156,71 +156,94 @@
 # where :math:`\mathbf{k} := (k_{1}, k_{2}, \ldots , k_{K})` is a list of different total photon
 # numbers.
 #
-# For example, if :math:`\mathbf{k} := (2, 4, 6)` and :math:`n_{\max} = 2`, we have
+# For example, if :math:`\mathbf{k} := (2, 4, 6, 8)` and :math:`n_{\max} = 2`, we have
 #
 # .. math::
-#     f_{(2, 4, 6), 2} = (p_{2, 2}, p_{4, 2}, p_{6, 2}).
+#     f_{(2, 4, 6, 8), 2} = (p_{2, 2}, p_{4, 2}, p_{6, 2}, p_{8, 2}).
 #
 # In this case, we are interested in the probabilities of events :math:`E_{2, 2}`, :math:`E_{4,
-# 2}`, and :math:`E_{6, 2}`. Suppose we are sampling from a four-mode device and have the samples
+# 2}`, :math:`E_{6, 2}`, and :math:`E_{8, 2}`. Suppose we are sampling from a four-mode device
+# and have the samples
 # ``[0, 3, 0, 1]`` and ``[1, 2, 0, 1]``. These samples are part of the orbits ``[3, 1]`` and
 # ``[2, 1, 1]``, respectively. However, ``[3, 1]`` is not part of the :math:`E_{4, 2}` event while
 # ``[2, 1, 1]`` is.
 #
 # Calculating a feature vector
 # ----------------------------
 #
-# We provide two methods for calculating a feature vector of GBS event probabilities in
-# Strawberry Fields:
+# We provide three methods for calculating a feature vector in the :mod:`~.apps.similarity` module
+# of Strawberry Fields:
 #
 # 1. Through sampling.
-# 2. Using a Monte Carlo estimate of the probability.
+# 2. Using exact probability calculations.
+# 3. Using a Monte Carlo estimate of the probability.
 #
 # In the first method, all one needs to do is generate some GBS samples from the graph of
-# interest and fix the composition of the feature vector. For example, for a feature vector
-# :math:`f_{\mathbf{k} = (2, 4, 6), n_{\max}=2}` we use:
+# interest and fix the composition of the feature vector. For example, to obtain feature vector
+# :math:`f_{\mathbf{k} = (2, 4), n_{\max}=2}` for the first MUTAG graph, we use:
 
-print(similarity.feature_vector_sampling(m0, event_photon_numbers=[2, 4, 6], max_count_per_mode=2))
+print(similarity.feature_vector_events_sampling(m0, [2, 4], 2))
 
 ##############################################################################
-# For the second method, suppose we want to calculate the event probabilities exactly rather than
-# through sampling. To do this, we consider the event probability :math:`p_{k, n_{\max}}` as the
-# sum over all sample probabilities in the event. In GBS, each sample probability is determined by
-# the hafnian of a relevant sub-adjacency matrix. While this is tough to calculate, what makes
-# calculating :math:`p_{k, n_{\max}}` really challenging is the number of samples the corresponding
-# event contains! For example, the 6-photon event over 17 modes :math:`E_{k=6, n_{\max}=2}`
+# We can also use any orbits of our choice instead of events:
+print(similarity.feature_vector_orbits_sampling(m0, [[1, 1], [2], [1, 1, 1, 1], [2, 1, 1]]))
+
+##############################################################################
+# For the second method, we calculate the orbit probabilities exactly rather than
+# through sampling. Considering a feature vector of orbit probabilities,
+# the probability of a single orbit :math:`p(O)` is given by:
+#
+# .. math::
+#     p(O) = \sum_{S \in O} p(S)
+#
+# where :math:`S` represents a GBS output click pattern. Calculating each :math:`p(S)` requires
+# computing a `hafnian <https://the-walrus.readthedocs.io/en/latest/hafnian.html>`__, which gets
+# exponentially difficult with increasing photon number. The exact probability of an event
+# :math:`p_{k,n_{\max}}` can be calculated in a similar manner.
+#
+# Built-in functions :func:`~.feature_vector_orbits` and :func:`~.feature_vector_events`
+# can be used to get exact feature vectors. These functions
+# use a keyword argument ``samples`` to signal producing either exact or Monte Carlo estimated probabilities,
+# as shown later. ``samples`` is set to ``None`` to get an exact feature vector by default. To use Monte Carlo
+# estimation, ``samples`` can be set to the number of samples desired to be used in the estimation.
+# For example, to get the exact event probabilities in the feature vector example
+# :math:`f_{\mathbf{k} = (2, 4), n_{\max}=2}` seen previously, we use:
+
+print(similarity.feature_vector_events(nx.Graph(m0_a), [2, 4], 2))
+
+##############################################################################
+# Although they are precise, exact calculations for large matrices can be tough to evaluate. Additionally,
+# what makes calculating :math:`p_{k, n_{\max}}` really challenging is the number of samples the
+# corresponding event contains. For example, the 6-photon event over 17 modes :math:`E_{k=6, n_{\max}=2}`
 # contains the following number of samples :
 
 print(similarity.event_cardinality(6, 2, 17))
 
 ##############################################################################
-# To avoid calculating a large number of sample probabilities, an alternative is to perform a
-# Monte Carlo approximation. Here, samples within an event are selected uniformly at random and
-# their resultant probabilities are calculated. If :math:`N` samples :math:`\{S_{1}, S_{2},
-# \ldots , S_{N}\}` are generated, then the event probability can be approximated as
+# To avoid calculating a large number of sample probabilities, an alternative is to perform
+# Monte Carlo estimation. Here, samples within an orbit or event are selected uniformly
+# at random and their resultant probabilities are calculated. For example, for an event
+# :math:`E_{k, n_{\max}}`, if :math:`N` samples :math:`\{S_{1}, S_{2}, \ldots , S_{N}\}`
+# are generated, then the event probability can be approximated as
 #
 # .. math::
 #     p(E_{k, n_{\max}}) \approx \frac{1}{N}\sum_{i=1}^N p(S_i) |E_{k, n_{\max}}|,
 #
 # with :math:`|E_{k, n_{\max}}|` denoting the cardinality of the event.
 #
-# This method can be accessed using the :func:`~.prob_event_mc` function. The 4-photon event is
-# approximated as:
+# This method can be accessed using the :func:`~.feature_vector_events` function
+# with ``samples`` set to the number of samples desired to be used in the estimation.
+# For example, to get MC-estimated probabilities for our example feature vector
+# :math:`f_{\mathbf{k} = (2, 4), n_{\max}=2}`, we use:
 
-print(similarity.prob_event_mc(nx.Graph(m0_a), 4, max_count_per_mode=2, n_mean=6))
+print(similarity.feature_vector_events(nx.Graph(m0_a), [2, 4], 2, samples=1000))
 
 ##############################################################################
-# The feature vector can then be calculated through Monte Carlo sampling using
-# :func:`~.feature_vector_mc`.
 #
 # .. note::
-#     The results of :func:`~.prob_event_mc` and :func:`~.feature_vector_mc` are probabilistic and
-#     may vary between runs. Increasing the optional ``samples`` parameter will increase accuracy
-#     but slow down calculation.
-#
-# The second method of Monte Carlo approximation is intended for use in scenarios where it is
-# computationally intensive to pre-calculate a statistically significant dataset of samples from
-# GBS.
+#     The results of using Monte Carlo estimation with :func:`~.feature_vector_orbits` and
+#     :func:`~.feature_vector_events` are probabilistic and may vary between runs. Increasing
+#     the ``samples`` parameter will increase the precision but slow down the calculation.
 #
 # Machine learning with GBS graph kernels
 # ---------------------------------------
@@ -250,10 +273,10 @@
 events = [8, 10]
 max_count = 2
 
-f1 = similarity.feature_vector_sampling(m0, events, max_count)
-f2 = similarity.feature_vector_sampling(m1, events, max_count)
-f3 = similarity.feature_vector_sampling(m2, events, max_count)
-f4 = similarity.feature_vector_sampling(m3, events, max_count)
+f1 = similarity.feature_vector_events_sampling(m0, events, max_count)
+f2 = similarity.feature_vector_events_sampling(m1, events, max_count)
+f3 = similarity.feature_vector_events_sampling(m2, events, max_count)
+f4 = similarity.feature_vector_events_sampling(m3, events, max_count)
 
 import numpy as np
 
@@ -262,9 +285,10 @@
 print(R)
 
 ##############################################################################
-# There is freedom in the choice of ``events`` composing the feature vectors and we encourage the
-# reader to explore different combinations. Note, however, that odd photon-numbered events have
-# zero probability because ideal GBS only generates and outputs pairs of photons.
+# The choice of what ``events`` to use for the feature vectors can be significant and we encourage the
+# reader to explore different combinations. We can also use any orbits of our choice above instead of events.
+# Note, however, that GBS samples with odd total number of photons have zero probability when using ideal
+# GBS, which only generates and outputs pairs of photons.
 #
 # Given our points in the feature space and their target labels, we can use
 # scikit-learn's Support Vector Machine `LinearSVC <https://scikit-learn.org/stable/modules/generated/sklearn.svm