No database dependency during inference/serving #316

HiromuHota · 2019-09-26T23:13:34Z

Is your feature request related to a problem? Please describe.

I think a Fonduer-based app lifecycle has three phases: development, training, and serving.
During development, you may have many iterations of re-definition of mention/candidate subclasses, labeling functions, which involve re-parsing, re-extraction, re-labeling, etc.
I understand that it is important to persist intermediate results to a database to save time.
Depending on the size of training dataset, you may need to persist intermediate results during training too.

On the other hand, there is no need to persist intermediate results to a database during inference/serving.
The dependency on database (postgres) makes a Fonduer-based app less portable and less scalable.

Describe the solution you'd like

I'd like no database dependency during inference/serving.

Describe alternatives you've considered

sqlite is easier to install than postgres, but still there is no need to persist intermediate results to a database.

Additional context

Related to #137.

HiromuHota · 2019-12-03T18:56:57Z

One step further towards this goal, I'd like to make all the child class of UDF like ParserUDF unaware of the database. Meanwhile, the child class of UDFRunner like Parser will be intact to maintain the public APIs.

Currently, the apply method of each child class of UDF returns objects that are to be saved to the database:
ParserUDF.apply(self, doc: Document, ...) -> Iterator[Sentence]
MentionExtractorUDF.apply(self, doc: Document, ...) -> Iterator[Mention]
CandidateExtractorUDF.apply(self, doc: Document, ...) -> Iterator[Candidate]
or nothing because objects are saved within the method:
LabelerUDF.apply(self, doc: Document, ...) -> None
FeaturizerUDF.apply(self, doc: Document, ...) -> None

To make these methods unaware of the database,

They should not use session object
They should return objects that will be used by the following process.

For example,
ParserUDF.apply(self, doc: Document, ...) -> Document
MentionExtractorUDF.apply(self, doc: Document, ...) -> Document
CandidateExtractorUDF.apply(self, doc: Document, ...) -> Document
and (with less confidence)
LabelerUDF.apply(self, doc: Document, ...) -> np.ndarray
FeaturizerUDF.apply(self, doc: Document, ...) -> csr_matrix

@senwu , @lukehsiao thoughts?

lukehsiao · 2019-12-03T19:09:58Z

At a high level, this sounds like an excellent idea to me. I wonder if/how this might affect performance as well. Perhaps we can get less lock contention if the different UDFs are not using the session object directly.

HiromuHota · 2019-12-03T19:39:55Z

Good point.
When we take this approach, the database will be accessed only by a single process, which runs UDFRunner.apply. So yes, much less or even no lock contention at the database.
The downside is that there will be an overhead in object transfer from each UDF to the UDFRunner, which I hope is less expensive than the database lock.

Also we have to be careful not to run out of memory when a large corpus is processed.

As you have noticed, the architecture looks like a Map-Reduce (mapper: each UDF and reducer: UDFRunner).
Actually, the snorkel-extraction does exactly this. You can see it at snorkel-extraction/snorkel/udf.py.

HiromuHota mentioned this issue Oct 1, 2019

[Refactor] No unnecessary db access when creating a list of candidates #322

Merged

HiromuHota mentioned this issue Dec 4, 2019

No database dependency during inference/serving #368

Merged

lukehsiao closed this as completed in #368 Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No database dependency during inference/serving #316

No database dependency during inference/serving #316

HiromuHota commented Sep 26, 2019

HiromuHota commented Dec 3, 2019

lukehsiao commented Dec 3, 2019

HiromuHota commented Dec 3, 2019

No database dependency during inference/serving #316

No database dependency during inference/serving #316

Comments

HiromuHota commented Sep 26, 2019

HiromuHota commented Dec 3, 2019

lukehsiao commented Dec 3, 2019

HiromuHota commented Dec 3, 2019