Skip to content
This repository has been archived by the owner on Dec 12, 2021. It is now read-only.

Commit

Permalink
Added create model
Browse files Browse the repository at this point in the history
  • Loading branch information
yamsgithub committed Jun 19, 2017
1 parent 595375d commit 8a6ab30
Show file tree
Hide file tree
Showing 6 changed files with 38 additions and 11 deletions.
36 changes: 36 additions & 0 deletions docs/create_model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Create Model
------------

DDT incrementally builds a model as the user `annotates <http://domain-discovery-tool.readthedocs.io/en/latest/use.html#annotation>`_ the retrieved pages. The accuracy of the domain model is displayed on the top right corner. It provides an indication of the model coverage of the domain and how it is influenced by annotations.

The domain model can be exported by clicking on the **Model** button on the top. This will show a drop down as shown in figure below:

.. image:: model_dropdown.png
:width: 800px
:align: center
:height: 400px
:alt: alternate text

Click on **Create Model** to export the model. This should bring up a file explorer pop-up (makes sure you enable pop-up on your browser) as shown below. Save the compressed model file. This contains the ACHE classifier model, the training data for the model and the initial seed list required for crawling.

.. image:: model_download.png
:width: 800px
:align: center
:height: 400px
:alt: alternate text

Annotation
~~~~~~~~~~

Currently, pages can be annotated as Relevant, Irrelevant or Neutral using the |tag_all| buttons respectively to tag all pages in the current view. |tag_one| buttons can be used to tag individual pages. Annotations are used to build the domain model.

.. |tag_all| image:: tag_all.png

.. |tag_one| image:: tag_one.png

Note:

* At least 10 pages each of relevant and irrelevant pages should be annotated to build the model. The more the annotations, hence the better coverage of the domain, the better the domain model.
* Ensure that the relevant and irrelevant page annotations are balanced for a better model.


Binary file added docs/model_download.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/model_dropdown.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/tag_all.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/tag_one.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 2 additions & 11 deletions docs/use.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,8 @@ Now you should be able to head to http://<hostname>:8084/ to interact with the t

.. include:: add_domain.rst
.. include:: load_data.rst
.. include:: filter.rst

Annotation
----------

Currently, pages can be annotated as Relevant, Irrelevant or Neutral. Annotations are used to build a domain model.

Domain Model
------------

DDT incrementally builds a model as the user annotates the retrieved pages. The accuracy of the domain model is displayed on the top right corner. It provides an indication of the model coverage of the domain and how it is influenced by annotations.
.. include:: filter.rst
.. include:: create_model.rst

Run Crawler
-----------
Expand Down

0 comments on commit 8a6ab30

Please sign in to comment.