Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preston match throw java.lang.IllegalArgumentException on archive entry zip:hash://sha256/a6d67e8404218dcb06aea0271a61207e51ec3207a09bc419c672c3a777c9700b!/iczn-lists-master/opinions/google-sheets/Official List Publications - Sheet1.tsv #88

Closed
jhpoelen opened this issue Oct 21, 2020 · 3 comments

Comments

@jhpoelen
Copy link
Member

Preston match stopped early when running preston match on a local copy of https://deeplinker.bio , with the following exception:

$ preston ls | pv -l | preston match -l tsv 'http://arctos.database.museum/guid/[a-zA-Z]+:[a-zA-Z]+:[^ \t\n,?;]+' | cut -f1,3 | grep -E '\shttp://arctos.database.museum/guid/[a-zA-Z]+:[a-zA-Z]+:[^ \t\n,?;]+' 
Oct 20, 2020 6:31:39 PM bio.guoda.preston.cmd.CmdLine run                                                                                ]
SEVERE: unexpected exception
java.lang.IllegalArgumentException: Illegal character in opaque part at index 133: zip:hash://sha256/a6d67e8404218dcb06aea0271a61207e51ec3207a09bc419c672c3a777c9700b!/iczn-lists-master/opinions/google-sheets/Official List Publications - Sheet1.tsv
	at java.net.URI.create(URI.java:852)
	at org.apache.commons.rdf.simple.IRIImpl.<init>(IRIImpl.java:33)
	at org.apache.commons.rdf.simple.SimpleRDF.createIRI(SimpleRDF.java:82)
	at bio.guoda.preston.model.RefNodeFactory.toIRI(RefNodeFactory.java:26)
	at bio.guoda.preston.process.TextMatcher.getEntryIri(TextMatcher.java:255)
	at bio.guoda.preston.process.TextMatcher.parseAsArchive(TextMatcher.java:148)
	at bio.guoda.preston.process.TextMatcher.attemptToParseAsArchive(TextMatcher.java:100)
	at bio.guoda.preston.process.TextMatcher.attemptToParse(TextMatcher.java:92)
	at bio.guoda.preston.process.TextMatcher.on(TextMatcher.java:80)
	at bio.guoda.preston.cmd.CmdMatch$1.emit(CmdMatch.java:51)
	at bio.guoda.preston.process.EmittingStreamRDF.copyOnEmit(EmittingStreamRDF.java:51)
	at bio.guoda.preston.process.EmittingStreamRDF.parseAndEmit(EmittingStreamRDF.java:40)
	at bio.guoda.preston.cmd.CmdMatch.run(CmdMatch.java:56)
	at bio.guoda.preston.cmd.CmdMatch.run(CmdMatch.java:36)
	at bio.guoda.preston.cmd.CmdLine.run(CmdLine.java:18)
	at bio.guoda.preston.cmd.CmdLine.run(CmdLine.java:26)
	at bio.guoda.preston.Preston.main(Preston.java:19)
Caused by: java.net.URISyntaxException: Illegal character in opaque part at index 133: zip:hash://sha256/a6d67e8404218dcb06aea0271a61207e51ec3207a09bc419c672c3a777c9700b!/iczn-lists-master/opinions/google-sheets/Official List Publications - Sheet1.tsv
	at java.net.URI$Parser.fail(URI.java:2848)
	at java.net.URI$Parser.checkChars(URI.java:3021)
	at java.net.URI$Parser.parse(URI.java:3058)
	at java.net.URI.<init>(URI.java:588)
	at java.net.URI.create(URI.java:850)
	... 16 more

Suspect that the whitespaces in URI needs to be escaped using customary escape commands like %20 (for whitespace). Can be done with built in new URI(...) constructor.

@mielliott
Copy link
Collaborator

Ah, makes sense. I'll try to fix it real quick. Please holler if you're already working on it.

@mielliott
Copy link
Collaborator

It should be fixed now.

@jhpoelen
Copy link
Member Author

Confirmed fixed! @mielliott thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants