Release new version of system to master branch (by May 4)
Closed May 7, 2014
100% complete
TODOs for the next code push
Must fix before pushing
Documentation
- ID convention: force developers create "id bigint" column for variable tables, but not to use the column
- inference rules convention: check and update
- extractors (Done in http://deepdive.stanford.edu/doc/extractors.html; needs review)
- New configurations supported
- skip_learning
- weight_table
- …
TODOs for the next code push
Must fix before pushing
Documentation
- ID convention: force developers create "id bigint" column for variable tables, but not to use the column
- inference rules convention: check and update
- extractors (Done in http://deepdive.stanford.edu/doc/extractors.html; needs review)
- New configurations supported
- skip_learning
- weight_table
- relearn_from
Known issues in code
- default extractors (udf_extractor) still assigns "id" to output JSON.
- Greenplum parallel load / unload is not implemented yet in tsv_extractor
Test, documentation, and code review about following components:
- New extractor path 1: plpy_extractor
- New extractor path 2: tsv_extractor
- Extractor path 3: sql_extractor
- Extractor path 4: cmd_extractor
- Grounding
Test to make sure all examples work
- attention to OCR example that has 2 variable tables
- spouse_example contains 3 implementations with different extractor frameworks.
Optional
-
Go through whole website
-
More test for plpy_extractor
- Test extreme cases for input queries
- test extreme cases for UDFs
-
Write a debugger for plpy_extractor
-
Adding new unit tests
- for all pipeline configurations
- for all extractors
- checking before and after script for all extractors
- checking extreme input_batch_size for tsv_extractor
- disable output_batch_size for tsv_extractor
This milestone is closed.
No open issues remain. View closed issues or see open milestones in this repository.