New issue

Release new version of system to master branch (by May 4)

Closed

May 7, 2014

100% complete

TODOs for the next code push

Must fix before pushing

Documentation

ID convention: force developers create "id bigint" column for variable tables, but not to use the column
inference rules convention: check and update
extractors (Done in http://deepdive.stanford.edu/doc/extractors.html; needs review)
New configurations supported
- skip_learning
- weight_table
- …

TODOs for the next code push

Must fix before pushing

Documentation

ID convention: force developers create "id bigint" column for variable tables, but not to use the column
inference rules convention: check and update
extractors (Done in http://deepdive.stanford.edu/doc/extractors.html; needs review)
New configurations supported
- skip_learning
- weight_table
- relearn_from

Known issues in code

default extractors (udf_extractor) still assigns "id" to output JSON.
Greenplum parallel load / unload is not implemented yet in tsv_extractor

Test, documentation, and code review about following components:

New extractor path 1: plpy_extractor
New extractor path 2: tsv_extractor
Extractor path 3: sql_extractor
Extractor path 4: cmd_extractor
Grounding

Test to make sure all examples work

attention to OCR example that has 2 variable tables
spouse_example contains 3 implementations with different extractor frameworks.

Optional

Go through whole website
More test for plpy_extractor
- Test extreme cases for input queries
- test extreme cases for UDFs
Write a debugger for plpy_extractor
Adding new unit tests
- for all pipeline configurations
- for all extractors
- checking before and after script for all extractors
- checking extreme input_batch_size for tsv_extractor
- disable output_batch_size for tsv_extractor

This milestone is closed.

No open issues remain. View closed issues or see open milestones in this repository.