DeepDive Spouse Example with Chinese Support
This app aims to take the official DeepDive tutorial Extracting mentions of spouses from the news and change input into 100 Chinese news articles.
Chinese NIP Configuration
echo "postgresql://$USER@$HOSTNAME:5432/deepdive_spouse_$USER" >db.url
stanford-chinese-corenlp-2016-01-19-models.jar from stanford NLP website and put them under
udf/bazzar/parser/, please run
sbt/sbt stage to rebuild the project. To test out, you can run
./run.sh -p 8080, and POST anything to port 8080 to see the result.
You can use a tool with GUI like http-tool in Mozilla Firefox.
Run the Project
This project tries to use original project configuration as much as possible. But some changes in project command are necessary:
The project compilation command
are only required to executed at startup. In Section 1.1, to load raw input data: Change
deepdive do articles
deepdive create table articles deepdive load articles input/news-100.tsv.bz2
In Section 2.1, we replace people name list from DBpedia to manual label from 100 new articles. So no SQL queries are need, you can run
deepdive create table spouses_dbpedia deepdive load spouses_dbpedia input/spouse_dbpedia.csv.bz2
and continue with next query:
deepdive query '| 20 ?- spouses_dbpedia(name1, name2).'
Java Memory Error
If you encountered
java.lang.OutOfMemoryError or similar error in NLP processing,
/udf/bazaar/parser/run.sh and change
export JAVA_OPTS="-Xmx3g -Dfile.encoding=UTF-8"
into a lower number.
If you want to execute a command twice due to previous abortion or modification,
sometimes the data are not updated completely even with
To resolve that issue, you can keep track of
and check whether done time of some command is not changed.
If so, you may have to run that command with
deepdive redo or
even they are not included in tutorial.
When I want to execute
deepdive do has_spouse again
due to changes in
udf/supervise_spouse.py, nothing happened.
deepdive do spouse_label__0 can refresh the project status.
User-defined Function Debugging
Because of DeepDive's own rules, user-defined function under
will take one row from database as input and return several rows
which will be passed to other function immediately.
If you want to view debug information, do not use
logging module instead.