talks/2017_Seminar.html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">

    <title>Final Thesis Seminar (2017)</title>

    <meta name="description" content="Final Thesis Seminar (2017)">
    <meta name="author" content="Philippe Desjardins-Proulx">

    <meta name="apple-mobile-web-app-capable" content="yes" />
    <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />

    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">

    <link rel="stylesheet" href="css/reveal.min.css">
    <link rel="stylesheet" href="css/theme/polaris.css" id="theme">
    <link rel="stylesheet" href="lib/css/zenburn.css">

    <script>
      if (window.location.search.match(/print-pdf/gi)) {
        var link = document.createElement('link');
        link.rel = 'stylesheet';
        link.type = 'text/css';
        link.href = 'css/print/pdf.css';
        document.getElementsByTagName('head')[0].appendChild(link);
      }
    </script>

    <!--[if lt IE 9]>
    <script src="lib/js/html5shiv.js"></script>
    <![endif]-->
  </head>

  <body>

    <div class="reveal">
      <!-- Any section element inside of this container is displayed as a slide -->
      <div class="slides">
        <section>
          <h1>Automatic Revision of Ecological Theories</h1>
          <p>by <a href="http://phdp.github.io/">Philippe Desjardins-Proulx</a></small></p>
        </section>

        <section>
          <section>
            <h1>Motivation</h1>
          </section>
          <section>
            <img src='img/amnat12.png' alt='amnat12'/>
          </section>
          <section>
            <img src='img/eco-evo-paper.png' alt='ecoevo'/>
          </section>
          <section>
            <img src='img/complex-1.svg' width='70%' alt='complex fig 1'/>
          </section>
          <section>
            <img src='img/complex-0.svg' width='100%' alt='complex fig 0'/>
          </section>
          <section>
            <img src='img/wagner.png' alt='wagner'/>
          </section>
          <section>
            <img src='img/hahn08.png' alt='hahn08'/>
          </section>
          <section>
            <h2>[Supervised] learning in a nutshell</h2>
            <img src='img/ml-900.png' alt='machine learning'/>
          </section>
          <section>
            <h2>Random forests</h2>
            <img src='img/random_forests.png' alt='machine learning'/>
          </section>
          <section>
            <img src='img/biorxiv-scientifictheories.png' alt='biorxiv'/>
          </section>
          <section>
            <h2>The thesis' thesis</h2>

            <p>Biodiversity is shaped by too many factors (speciation,
            stochastic drift, selection from abiotic and biotic factors) to be
            understood with a single unifying theory.</p>

            <br/>

            <p>Thus, we must combine traditional theories with techniques
            capable of learning from data, but we must be able to keep the
            structure and clarity of these traditional theories.</p>

          </section>
          <section>
            <img src='img/reasoning_systems.png' alt='reasoning systems'/>
          </section>
          <section>
            <h2>Lotka-Volterra framework</h2>
            \[\frac{dx}{dt} = \alpha x - \beta xy \mbox{ and } \frac{dy}{dt} = \delta xy - \gamma y.\]
          </section>
          <section>
            <h2>Probabilistic niche model</h2>
            \[ PPreyOn(x, y) = \alpha \times \exp\left[-\left(\frac{N(y) - D(x)}{R(x)/2}\right)^2\right].\]
          </section>
          <section>
            <h2>Exponential growth</h2>
            <p>From</p>
            \[N(x, t + 1) = G(x) \times N(x, t).\]
            <br/>
            <p>To</p>
            \[SmallN(x, t) \mbox{ and } Resources(x, t) \Rightarrow N(x, t + 1) = G(x) \times N(x, t).\]
          </section>
          <section>
            <h2>Theory revision wishlist</h2>
            <p>Handling deterministic and probabilistic theories.</p>
            <p>Fast algorithms for theory revision.</p>
            <p>Handling arbitrarily large knowledge bases.</p>
          </section>
        </section>

        <section>
          <section>
            <h1>Ecological Interactions</h1>
          </section>
          <section>
            <h2>Why interactions?</h2>

            <p>First reason: a key reason why selection is difficult to handle
            with traditional theories is because of the complexity of species
            interactions.</p>
            <br/>
            <p>Second reason: my lab is studying interactions. I had access to
            both good data-sets and people who understood them well.</p>
          </section>
          <section>
            <img src='img/ecological-netflix-paper.png' alt='paper'/>
          </section>
          <section>
            <h2>Data</h2>

            <ul width='100%'>
              <li>Assembled by Isabelle Daigle, mostly from Digel et al. (2014).</li>
              <li>48 soil food webs.</li>
              <li>881 species.</li>
              <li>34 193 unique interactions.</li>
              <li>215 418 absence of interactions.</li>
              <li>3 real valued-traits: \(body mass\), \(Ph0\), \(Ph1\).</li>
              <li>25 binary traits (e.g. AboveGround, Detritivore, Bacteria, Jumps, LongLegs, UsePoison).</li>
            </ul>

          </section>
          <section>
            <h2>K nearest neighbour (KNN) with K = 3</h2>
            <img src='img/pearson.svg' width='100%' alt='pearson'/>
          </section>
          <section>
            \[tanimoto(\mathbf{x}, \mathbf{y}) = \frac{\left\vert\mathbf{x} \cap \mathbf{y}\right\vert}{\left\vert\mathbf{x} \cup \mathbf{y}\right\vert}.\]

<br/><br/>

\begin{align}
  tanimoto(\mathbf{x}, \mathbf{y}) &= \frac{\left\vert\{A, B, C\} \cap \{A, C, D, E, F\}\right\vert}{\left\vert\{A, B, C\} \cup \{A, C, D, E, F\}\right\vert}\\
                                   &= \frac{\left\vert\{A, C\}\right\vert}{\left\vert\{A, B, C, D, E, F\}\right\vert} = \frac{2}{6}.
\end{align}


          </section>
          <section>
            \[TSS = \frac{(tp \times tn) - (fp \times fn)}{(tp + fn)(fp + tn)}.\]

<br/><br/>

            <p><i>tp, tn, fp, fn</i> are the true positives, true negatives, false positives,
            and false negaties, respectively.</p>

            <p>The <i>TSS</i> ranges from -1 to 1.</p>

          </section>
          <section>

            <p>Supervised learning Random forests predict correctly 99.55% of
            the non-interactions and 96.81% of the interactions, for a
            <i>TSS</i> of 0.96.</p>

            <br/>

            <p>Removing the binary traits has little effect on the model. With
            only \(body mass\), \(Ph0\), \(Ph1\), the <i>TSS</i> of the random
            forest is 0.94.</p>

          </section>
        </section>
        <section>
          <section>
            <h1>KNN with Threshold</h1>
          </section>
          <section>
            <img src='img/knn-paper.png' alt='knn'/>
          </section>
          <section>
            <h2>I Bartomeus' pollinator-plant data-set</h2>

            <ul width='100%'>
              <li>Plant-pollinator interactions were collected across 16 sites in
              the southwest of the iberian peninsula (Huelva and Seville).</li>

              <li>65 pollinators.</li>

              <li>277 plants.</li>

              <li>739 pollinator-plant interactions.</li>

              <li>We randomly pick 739 pollinator-plant pairs that are not
              interacting to build a data-set of 1478 entries, half of
              interactions, half of non-interactions.</li>
            </ul>
          </section>
          <section>
            <h2>Accuracy</h2>
            <img src='img/ht_acc.svg' width='80%' alt='acc'/>
          </section>
          <section>
            <h2>TSS</h2>
            <img src='img/ht_tss.png' width='80%' alt='acc'/>
          </section>
        </section>

        <section>
          <section>
            <h1>Markov Logic</h1>
          </section>
          <section>
            \[\mbox{Forall } x, y, z: Friends(x, y) \mbox{ and } Friends(y, z) \Rightarrow Friends(x, z), 0.7.\]
            \[\mbox{Forall } x: Smoking(x) \Rightarrow Cancer(x), 1.5.\]
            \[\mbox{Forall } x, y: Friends(x, y) \mbox{ and } Smoking(x) \Rightarrow Smoking(y), 1.1.\]

            <br/><br/>

            <p>Can answer queries of the form:</p>

            <br/>

            \[P(Cancer(Bob) | Smoking(Anna) \mbox{ and } Friends(Anna, Bob)).\]
            \[P(Friends(x, y) \Leftrightarrow Friends(y, x) | ...).\]

          </section>
          <section>
            <h2>Salix data-set</h2>
            <img src='img/salix.png' alt='salix'/>
            \[Parasites \rightarrow Galler \rightarrow Salix\]
          </section>
          <section>
            <h2>Learning weights (but not really rules)</h2>
\[HighNumSalix(x), 6.44.\]
\[\neg ModerateCooccurrence(x,y), 5.60.\]
\[\neg PreyOnAt(x,y,a3), 4.38.\]
\[\neg HighCooccurrence(x,y), 4.33.\]
\[\neg PresenceAt(x,y), 3.08.\]
\[\neg PreyOn(x,y), 1.68.\]
\[\neg IsSalix(x), 1.38.\]
\[\neg IsGaller(x), 0.59.\]
\[\neg CloselyRelated(x,y), 0.59.\]
\[\neg HigherPhyloValue(x,y), 0.57.\]
          </section>

          <section>
\[ HighNumGaller(x), 0.47\]
\[ HighNumSpecies(x), 0.24.\]
\[ HighFoodWebLinkDensity(x), 0.19.\]
\[ \neg IsParasitoid(x), 0.18.\]
\[ \neg HighFoodWebConnectance(x), 0.14.\]
\[ \neg HighTotalPrecipiration(x), 0.01.\]
\[ \neg HighMeanTemperature(x), 0.01.\]
\[ \neg PreyOn(x,x), 0.01.\]
\[ HighNumParasites(x), 0.00.\]
          </section>

          <section>
\[ PreyOnAt(x,y,z) \Rightarrow PreyOn(x,y), 5.20.\]
\[ HighCooccurrence(x,y) \Rightarrow PreyOn(x,y), 4.22.\]
\[ IsGaller(x) \mbox{ and } PreyOn(x,y) \Rightarrow IsSalix(y), 4.15.\]
\[ IsParasitoid(y) \mbox{ and } PreyOn(y,x) \Rightarrow IsGaller(x), 3.49.\]
\[ PreyOn(x,y) \Rightarrow HighCooccurrence(x,y), 1.57.\]
\[ PreyOn(x,y) \Rightarrow PreyOn(x,x), 1.52.\]
\[ HigherPhyloValue(x,y) \Rightarrow PreyOn(x,y), 1.13.\]
\[ HigherPhyloValue(x,y) \Rightarrow HighCooccurrence(x,y), -0.59.\]
\[ PreyOn(x,x) \Rightarrow PreyOn(x,y), 0.45.\]
\[ \neg IsSalix(x) \mbox{ or } \neg PreyOn(x,y), 0.02.\]
\[ CloselyRelated(x,y) \mbox{ and } PreyOn(x,z) \Rightarrow PreyOn(y,z), 0.00.\]
          </section>

        </section>

        <section>
          <section>
            <h1>Fuzzy Logic</h1>
          </section>
          <section>
            <h2>I Bartomeus' pollinator-plant data-set (part II)</h2>

            <p>Seven real-valued traits:</p>

            <ul>
              <li>Plant's nectar tube dimension.</li>
              <li>Plant's nectar tube depth.</li>
              <li>Plant's flower width.</li>
              <li>Pollinator's body size.</li>
              <li>Pollinator's tongue length.</li>
              <li>Pollinator generalism.</li>
              <li>\(K\) nearest neighbour ratio.</li>
            </ul>
          </section>
          <section>
            <h2>Structure</h2>
            \[\mathbf{If} \mbox{ } antecedants \mbox{ } \mathbf{then} \mbox { } consequents.\]
            \[\mathbf{If} \mbox{ } antecedants \mbox{ } \mathbf{then} \mbox { } consequents.\]
            \[\mathbf{If} \mbox{ } antecedants \mbox{ } \mathbf{then} \mbox { } consequents.\]
            \[...\]

            <br/>
            <br/>

            \[\mathbf{If} \mbox{ } A_0 \mbox{ and } A_1 \mbox{ and } A_2 \mbox{ and } ... \mbox{ } \mathbf{then} \mbox { } C_0.\]
          </section>
          <section>
            <h2>Initial model</h2>
            <ul style='width: 100%'>
              <li style='color:#246325'><b>If</b> KNN is high <b>then</b> interaction.</li>
              <li style='color:#632427'><b>If</b> KNN is low <b>then</b> non-interaction.</li>
            </ul>
          </section>
          <section>
            <h2>Current approach for learning (Python script)</h2>
            <h3>Evolution</h3>
            <ol start='0'>
              <li>Start with the initial knowledge base with the two KNN rules.</li>
              <li>Randomly generate a rule, add it to the knowledge base.</li>
              <li>If the new knowledge base beats the old by 0.004 TSS, keep the rule.</li>
              <li>Repeat \(\phi\) times</li>
            </ol>
            <br/><br/>
            <h3>Pseudo-ensemble approach</h3>
            <ol start='0'>
              <li>Make \(n\) evolutions.</li>
              <li>Count the frequency of the rules in the evolutions</li>
              <li>Try to add the rule one by one from most frequent to least</li>
              <li>Keep if TSS improves by at least 0.004.</li>
            </ol>
          </section>
          <section>
            <ul style='width: 100%; font-size: 80%'>
              <li style='color:#246325'><b>If</b> Plant nectar tube depth is average <b>then</b> interaction.</li>
              <li style='color:#246325'><b>If</b> Pollinator body size is small <b>then</b> interaction.</li>
              <li style='color:#246325'><b>If</b> KNN is high <b>then</b> interaction.</li>
              <li style='color:#246325'><b>If</b> Pollinator is specialist <b>and</b> Plant nectar tube dim is small <b>then</b> interaction.</li>
              <li style='color:#246325'><b>If</b> Plant flower width is average <b>then</b> interaction.</li>
              <li style='color:#246325'><b>If</b> Plant flower width is small <b>and</b> Pollinator tongue length is small <b>then</b> interaction.</li>
              <li style='color:#632427'><b>If</b> KNN is low <b>then</b> non-interaction.</li>
            </ul>
          </section>
          <section>
            <h2>Results on pollinator-plant data-set</h2>
            <table style="width:100%; text-align:center;">
              <tr>
                <th>Model</th>
                <th>TSS (10% testing data)</th>
              </tr>
              <tr>
                <td>Support Vector Machine</td>
                <td>0.564</td>
              </tr>
              <tr>
                <td>Fuzzy (initial model with two KNN rules)</td>
                <td>0.689</td>
              </tr>
              <tr>
                <td>Decision tree</td>
                <td>0.723</td>
              </tr>
              <tr>
                <td>Random forests</td>
                <td>0.795</td>
              </tr>
              <tr>
                <td>Fuzzy (after theory revision)</td>
                <td>0.841</td>
              </tr>
            </table>
          </section>
        </section>

        <section>
          <section>
            <h1>Conclusions</h1>
          </section>
          <section>
            <ul width='100%'>
              <li>KNN is an effective, widely applicable, and simple approach.</li>
              <li>KNN with threshold works well with TSS scores.</li>
              <li>Supervised learning algorithms (esp. random forests) can predict interactions with few real-valued traits.</li>
              <li>Fuzzy knowledge bases are both clear and accurate.</li>
              <li>Even a silly revision algorithm yields very good results for fuzzy logic.</li>
              <li>Knowledge representation in ecology is an interesting topic in itself.</li>
              <li>Markov logic has issues (computational, binary) but modern variants could be promising.</li>
            </ul>
          </section>
        </section>

        <section>
          <section>
            <h1>What's Next</h1>
          </section>
          <section>
            <h2>Salix part II?</h2>
            <img src='./img/psl_paper.png' alt='psl paper'/>
          </section>
          <section>
            <h2>...also fuzzy logic meets salix</h2>
          </section>
          <section>
            <h2>Genetic (or evolutionary) algorithms</h2>
            <img src='./img/generic-algorithms.png' alt='evolutionary algorithms'/>
          </section>
        </section>

        <section>
          <section>
            <h1>Questions?</h1>
          </section>
        </section>

      </div>
    </div>

    <script src="lib/js/head.min.js"></script>
    <script src="js/reveal.min.js"></script>
    <script>
      // Full list of configuration options available here:
      // https://github.com/hakimel/reveal.js#configuration
      Reveal.initialize({
        width: "70%",
        height: "100%",
        margin: 0,
        minScale: 1,
        maxScale: 1,
        controls: true,
        progress: true,
        history: true,
        center: true,

        theme: Reveal.getQueryHash().theme, // available themes are in /css/theme
        transition: Reveal.getQueryHash().transition || 'linear', // default/cube/page/concave/zoom/linear/fade/none

        // Parallax scrolling
        // parallaxBackgroundImage: 'https://s3.amazonaws.com/hakim-static/reveal-js/reveal-parallax-1.jpg',
        // parallaxBackgroundSize: '2100px 900px',

        // Optional libraries used to extend on reveal.js
        dependencies: [
          { src: 'lib/js/classList.js', condition: function() { return !document.body.classList; } },
          { src: 'plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
          { src: 'plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
          { src: 'plugin/highlight/highlight.js', async: true, callback: function() { hljs.initHighlightingOnLoad(); } },
          { src: 'plugin/zoom-js/zoom.js', async: true, condition: function() { return !!document.body.classList; } },
          { src: 'plugin/notes/notes.js', async: true, condition: function() { return !!document.body.classList; } }
        ]
      });
    </script>
    <script type="text/javascript" src="../../js/bower/MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
  </body>
</html>