Skip to content

Commit

Permalink
Merge branch 'feature/hypothesis-fit-and-finish' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
felliott committed Jul 19, 2018
2 parents 59f4b17 + 6ca6f6a commit c8be470
Show file tree
Hide file tree
Showing 5 changed files with 49 additions and 6 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Guide
install
quickstart
overview
integrations
code

Project info
Expand Down
40 changes: 40 additions & 0 deletions docs/integrations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _integrations:

Integrations
============


Hypothes.is annotator
---------------------

MFR supports loading the `Hypothes.is <https://hypothes.is/>`_ annotation sidebar on pdfs and files converted to pdf. Hypothes.is allows users to publicly comment and converse on internet-accesible files. The annotator is not automatically loaded; it must be signaled to turn on by the parent iframe. MFR also overrides some of the properties used by the sidebar to identify the annotation.


Enabling
^^^^^^^^

The annotator is not loaded automatically for every MFR pdf render. The parent frame will need to send the ``startHypothesis`` event to the MFR iframe to start loading the annotator. If the iframe is created via ``mfr.js``, then this signal can be sent by calling ``.startHypothesis()`` on the Render object. If ``mfr.js`` is not used, then the signal can be sent by calling ``.postMessage()`` on the iframe:

.. code-block:: javascript
$('iframe')[0].contentWindow.postMessage('startHypothesis', mfrUrl);
When the iframe receives this event, it will override the pdf.js metadata the annotator extracts then inject the hypothes.is loader script into the iframe.

Hypothes.is support can be completely disabled by setting the ``ENABLE_HYPOTHESIS`` flag to `False` in the pdf extension settings (`mfr.extensions.pdf.settings`). If running via the OSF's docker-compose, add ``PDF_EXTENSION_CONFIG_ENABLE_HYPOTHESIS=0`` to ``.docker-compose.mfr.env`` in the osf.io repo and recreate the container. If this flag is turned off, sending the ``startHypothesis`` event to the iframe will do nothing.


Annotator metadata
^^^^^^^^^^^^^^^^^^

The annotator client links annotations to both the url of the document and an identifier embedded in the pdf. It also attaches the page title as metadata to the annotation. [#f1]_ In MFR, all three of these may be unsuitable for one reason or another, so MFR will override the properties that the client retrieves to provide more appropriate values. These properties are:

**URL**: The MFR url can be complex, especially since it takes another url as a query parameter. Hypothes.is can handle reordering of the top-level parameters, but any change to the internal url will be taken as a new url, causing annotations to be lost. In addition, the url is used by hypothesis to provide share links and "view-in-context" links. Visiting an MFR render url will load the iframe, but without an external frame to send the ``startHypothesis`` signal, the annotations will never be loaded. Visiting an MFR export url will start a download of the document, with no chance of showing annotations. Instead, MFR sets the annotation url to the parent frame, which is expected to be simpler and provide more context.

**Document ID**: The document ID is an identifier embedded in the pdf. pdf.js will extract this value, or if it is not present, return the md5 hash of the first 1024 bytes of data in the pdf. User-provided pdfs will *usually* contain IDs, but may not. If the pdf is updated there is no guarantee that the ID will be preserved across revisions. If the ID changes, the document could lose its annotations. pdfs exported by LibreOffice do not contain any identifiers and may change unpredictably. For these reasons, MFR exports a stable identifier that should persist across revisions. The stable ID is defined by the auth provider. The OSF auth provider uses a hash of file metadata that is particular to that file and unlikely to change. MFR does not modify the file, instead overwriting the identifier detected by pdf.js, which is then read by the annotator client.

**Title**: The annotator will derive the annotation page title from the pdf title. Similar to Document IDs, user-provided pdfs may or may not have a title. LibreOffice-exported pdfs do not have an embedded title. If an embedded title isn't found, the annotator will fall back to the iframe document's title, which if not set will default to the path part of the iframe url. This results in annotation titles of "render" or "export", with no distinguishing attributes from other MFR annotations. MFR works around this by updating the pdf.js-detected title and page title with the source file's name.

.. rubric:: Footnotes

.. [#f1] If the page title changes between annotations, the client will send the new page title with new annotations, but the hypothesis aggregator will discard that and `use the first title received <https://github.com/hypothesis/h/blob/8410ff35150ea600c02458e4558a67db7c926816/h/activity/bucketing.py#L27>`_ for that identifier.
10 changes: 6 additions & 4 deletions mfr/extensions/pdf/render.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ class PdfRenderer(extension.BaseRenderer):
def render(self):

download_url = munge_url_for_localdev(self.metadata.download_url)
escaped_name = escape_url_for_template(
'{}{}'.format(self.metadata.name, self.metadata.ext)
)
logger.debug('extension::{} supported-list::{}'.format(self.metadata.ext,
settings.EXPORT_SUPPORTED))
if self.metadata.ext.lower() not in settings.EXPORT_SUPPORTED:
Expand All @@ -29,9 +32,7 @@ def render(self):
base=self.assets_url,
url=escape_url_for_template(download_url.geturl()),
stable_id=self.metadata.stable_id,
file_name=escape_url_for_template(
'{}{}'.format(self.metadata.name, self.metadata.ext)
),
file_name=escaped_name,
enable_hypothesis=settings.ENABLE_HYPOTHESIS,
)

Expand All @@ -48,7 +49,8 @@ def render(self):
base=self.assets_url,
url=escape_url_for_template(exported_url.url),
stable_id=self.metadata.stable_id,
enable_hypothesis=settings.ENABLE_HYPOTHESIS
file_name=escaped_name,
enable_hypothesis=settings.ENABLE_HYPOTHESIS,
)

@property
Expand Down
2 changes: 1 addition & 1 deletion mfr/extensions/pdf/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
assert EXPORT_TYPE # mandatory config
EXPORT_MAXIMUM_SIZE = config.get('EXPORT_MAXIMUM_SIZE', '1200x1200')

ENABLE_HYPOTHESIS = config.get_bool('ENABLE_HYPOTHESIS', False)
ENABLE_HYPOTHESIS = config.get_bool('ENABLE_HYPOTHESIS', True)

# supports multiple files in the form of a space separated string
EXPORT_SUPPORTED = config.get('EXPORT_SUPPORTED', '.tiff .tif').split(' ')
Expand Down
2 changes: 1 addition & 1 deletion tests/extensions/pdf/test_renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def test_render_pdf_with_single_quote_in_name(self, assets_url):
def test_render_tif(self, tif_renderer, assets_url):
exported_url = furl.furl(tif_renderer.export_url)
exported_url.args['format'] = '{}.{}'.format(settings.EXPORT_MAXIMUM_SIZE,
settings.EXPORT_TYPE)
settings.EXPORT_TYPE)

body = tif_renderer.render()
assert '<base href="{}/{}/web/" target="_blank">'.format(assets_url, 'pdf') in body
Expand Down

0 comments on commit c8be470

Please sign in to comment.