Skip to content

0.3.2

Compare
Choose a tag to compare
@tanaysoni tanaysoni released this 28 Nov 11:08
· 538 commits to master since this release

🖌️ Fundamental Re-design of Question Answering

We believe QA is one of the most exciting tasks for transfer learning. However, the complexity of the task lets pipelines easily become messy, complicated and slow. This is unacceptable for production settings and creates a high barrier for developers to modify or improve them.

We put substantial effort in re-designing QA in FARM with two goals in mind: making it the simplest & fastest pipeline out there.
Results:

  • 💡 Simplicity: The pipeline is cleaner, more modular and easier to extend.
  • 🚀 Speed: Preprocessing of SQuAD 2.0 got down to 42s on a AWS p3.8xlarge (vs. ~ 20min in transformers and early versions of FARM). This will not only speed up training cycles and reduce GPU costs, but has also a big impact at inference time, where most time is actually spend on preprocessing.

See this blog post for more details and to learn about the key steps in a QA pipeline.

💼 Support of proxy servers

Good news for our corporate users: Many of you approached us that the automated downloads of datasets / models caused problem in environments with proxy servers. You can now pass the proxy details to Processor and LanguageModel in the format used by the requests library

Example:

proxies = {"https": "http://user:pass@10.10.10.10:8000"}

language_model = LanguageModel.load(pretrained_model_name_or_path = "bert-base-cased", 
                                    language = "english",
                                    proxies=proxies
                                    )
...
processor = BertStyleLMProcessor(data_dir="data/lm_finetune_nips", 
                                 tokenizer=tokenizer,
                                 max_seq_len=128, 
                                 max_docs=25,
                                 next_sent_pred=True,
                                 proxies = proxies,
                                )

Modelling

  • [enhancement] QA redesign #151
  • [enhancement] Add backwards compatibility for loading prediction head #159
  • [enhancement] Raise an Exception when an invalid path is supplied for loading a saved model #137
  • [bug] fix context in QA formatted preds #163
  • [bug] Fix loading custom vocab in transformers style for LM finetuning #155

Data Handling

  • [enhancement] Allow to load dataset from dicts in DataSilo #127
  • [enhancement] Option to supply proxy server #136
  • [bug] Fix tokenizer for multiple whitespaces #156

Inference

  • [enhancement] Change context in QA formatted preds to not split words #138

Other

  • [enhancement] Add test for output format of QA Inferencer #149
  • [bug] Fix classification report for multilabel #150
  • [bug] Fix inference in doc_classification_cola example #147

Thanks to all contributors for making FARMer's life better!
@johann-petrak, @brandenchan, @tanaysoni, @Timoeller, @tholor, @cregouby