Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can OpenConvert convert plain text file to TEI? #1

Open
vanabel opened this issue Oct 22, 2015 · 5 comments
Open

Can OpenConvert convert plain text file to TEI? #1

vanabel opened this issue Oct 22, 2015 · 5 comments

Comments

@vanabel
Copy link

vanabel commented Oct 22, 2015

I just clone it to a dir, and run the command
java -jar OpenConvert.jar -from text -to TEI test/test.txt test
where test.txt with a single sentence:
Just a test
But it output errors:
Could not find conversion from text to TEI
Did I do something wrong?

@vanabel
Copy link
Author

vanabel commented Oct 23, 2015

Or can you get some example of text files, which will convert to TEI properly.

@jan-niestadt
Copy link
Member

You can do the conversion to TEI online here: http://openconvert.clarin.inl.nl/openconvert/tagger/ui#file

(you need a CLARIN account, which you should be able to get here: https://user.clarin.eu/user/register)

I didn't develop this code, so I'm not sure about the commandline tool, sorry.

@vanabel
Copy link
Author

vanabel commented Oct 23, 2015

@jan-niestadt Thanks, so If I want to build my-self corpus, How can I combine multi TEI into one? I mean, in practice, I would like to add one sentence containing a key word in plain text format each time (which can be converted to TEI by the tools as you mentioned above), then upload the TEI to my Black Lab-server such that it can be queried by the user. It will be useful for scientific writing, since then I can query by key word.

@JessedeDoes
Copy link
Member

Hello all, sorry to catch up only today

  • The right command line for conversion from txt to TEI is (txt not text)
    java -jar OpenConvert.jar -from txt -to TEI test/test.txt test/test.tei
  • For use with blacklab, (only available in the online version), it is best to enable the tokenizer in OpenConvertWeb
  • To combine TEI files, there is no special tool. The element (teiCorpus http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiCorpus.html) may contain an arbitrary number of TEI elements containing documents. It also requires a corpus header, but for blacklab indexing, is should be sufficient to start with <teiCorpus>, then cat all the individual files, and then end the teiCorpus element.

@vanabel
Copy link
Author

vanabel commented Oct 25, 2015

Currently, I grub the data (submit text, and output tei) from the OpenConvert. Since the site may change, I want to have a local version of it, that means, I need a similar function of convert plain text to TEI format. I have noted that you have provided openconvert.client.jar, did it design for this? (In fact, I can't execute it on my server, did it need this openconvert git project?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants