Create example notebook with English Web Treebank data set #193

frreiss · 2021-05-26T18:15:08Z

Create an example notebook that shows analyzing the EWT data set

Major steps:

Download the data set from https://github.com/UniversalDependencies/UD_English-EWT
Read the dataset into DataFrames
Write entire data set to a Feather file and read back in
Display a parse tree
Retokenize with a BERT subword tokenizer
Show reconstructing a sentence's span using group by and aggregation
Run document text through the Stanza EWT dependency parser (https://stanfordnlp.github.io/stanza/available_models.html) and compare the outputs against the gold standard. Or alternately use SpaCy's parser, with the caveat that it's trained on OntoNotes which has a slightly different schema.

frreiss · 2021-06-15T18:44:17Z

Closed by #198

frreiss closed this as completed Jun 15, 2021

Provide feedback