GitHub - marklogic-community/DocumentDiscovery: A DataDiscovery application for binary documents

Overview

This is a relatively simple application that was built as a customized Application Builder app. It utilizes the ISYS filters to ingest arbitrary binary documents and it then makes those documents searchable, exposes facets based on the metadata and provides access to download the original documents. It is meant as a simeple tool for data discovery and as a jumping off point for more complex document discovery applications.

Installation

First, simply import the DocumentDiscovery package (DocumentDiscovery.xml loacted in the config directory) via the MarkLogic Configuration Manager. This will create a Document Discovery database and a Document Discovery http server. Next, you need to install content processing on this database (see section 3.5 of the Content Processing Framework Guide http://docs.marklogic.com/guide/cpf). Then, install the pipeline that is in the attached ingestPipeline.zip file. This can most easily be done simply by unzipping the packaged files into a directory named "ingest" (you will have to create this directory) directly under the modules directory of your MarkLogic installation and then following the normal pipeline loading process. Then, ensure that only the following pipelines are enable for the default domain in your DocumentDiscovery database (DO NOT ENABLE DOCUMENT CONVERSION):

Document Filtering (XHTML)
Meta Pipeline
Status change handling

Next, checkout the source code located in the src directory to a directory in your local file system and then modify the DocumentDiscovery app server to point to this directory. Ensure that MarkLogic has the necessary file permissions on the directory.

That's it! You're done! Now you can load data using the admin console, Information Studio or any other means and instantly be able to view and search the binary documents!

Detailed Notes

The Author facet is a field based facet combining the Last_Author field along with Typist from the extracted metadata.

If you see mime types starting to show up under document type, you can add a new pretty print version of the mime type simply by adding the necessary entry to the app:get-name function in custom/appfunctions.xqy at line 293. Credit to Paxton Hare and Chhean Saur for work on this function.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
src		src
.DS_Store		.DS_Store
ingestPipeline.zip		ingestPipeline.zip
readme.mdown		readme.mdown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

src

src

.DS_Store

.DS_Store

ingestPipeline.zip

ingestPipeline.zip

readme.mdown

readme.mdown

Repository files navigation

About

Releases

Packages

Contributors 3

Languages

marklogic-community/DocumentDiscovery

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages