task19

Extracting non-textuals from ParlaMint-{HR,BA,RS}

2022-11-17T09:06:33

Idea: open a component file. For every utterance, reconstruct segments into full utterance. Extract non-textual elements. Reconstruct and renumber the component.

All the targets: moved to .

The RegEx patterns have been written and sequenced in proper order.

To discuss:

I have the option of not splitting on sentences this time. Should I go for unsplit utterances? -> ask Tomaž! Yes, do not split.

To add:

(NASTAVAK NAKON STANKE U 9,45 SATI)

2022-11-22T10:21:03

As of now the component-level fixer works marvelously. Next step: running it on all the datasets.

2022-11-22T13:54:58

I found a few more bugs, but it was finally sucessfully ran on all the three branches. Now I'll research if the add common content was done correctly.

2022-11-23T10:34:07

Change segment notation to seg (e.g. <seg xml:id="ParlaMint-HR_T6.S12.u37297.seg0">)
Split files on agenda (for now 500 - 1k utterances)
Rerun everything.
Run annotation

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
000_triggers.txt		000_triggers.txt
001_debug.py		001_debug.py
001_first_splitter_prototype.ipynb		001_first_splitter_prototype.ipynb
002_prototyping_component_wrapper.ipynb		002_prototyping_component_wrapper.ipynb
002_test.xml		002_test.xml
002_test_in.xml		002_test_in.xml
002_test_out.xml		002_test_out.xml
003_debug.py		003_debug.py
003_running_fixer_on_datasets.ipynb		003_running_fixer_on_datasets.ipynb
003_test.xml		003_test.xml
004_test.classla.xml		004_test.classla.xml
README.md		README.md
classilize_hr.py		classilize_hr.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

task19

2022-11-17T09:06:33

2022-11-22T10:21:03

2022-11-22T13:54:58

2022-11-23T10:34:07

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

task19

2022-11-17T09:06:33

2022-11-22T10:21:03

2022-11-22T13:54:58

2022-11-23T10:34:07

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages