Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type:text shouldn't show up in the KoralQuery serializer #196

Open
margaretha opened this issue Feb 13, 2023 · 11 comments
Open

type:text shouldn't show up in the KoralQuery serializer #196

margaretha opened this issue Feb 13, 2023 · 11 comments
Assignees

Comments

@margaretha
Copy link
Contributor

margaretha commented Feb 13, 2023

While investigating KorAP/Krill#86, I found that corpusTitle eq gingko is serialized as

{
    "@type": "koral:doc",
    "key": "corpusTitle",
    "match": "match:eq",
    "value": "gingko",
    "type": "type:text"
}

whilst type:text is not a type supported according to the KoralQuery doc and it is practically also not supported in Krill.

The type is not added by any query rewrite as it is not added when sending a direct API request:

https://korap.ids-mannheim.de/instance/test/api/v1.0/search?q=ich&cq=availability+%3D+%2FCC-BY.*%2F+%26+docTitle+%3D+%22gingko%22&ql=poliqarp&cutoff=1&state=&pipe=

Could it be that Kalamar add the type?

@Akron
Copy link
Member

Akron commented Feb 13, 2023

It's interesting that this shows up in the KQ-Viewer. The type:text is an index type and is introduced to help the VC Builder to show allowed operators. With this issue: Do you mean this shouldn't show up in the serialization or is there a bigger issue?

@margaretha
Copy link
Contributor Author

Yes, it shouldn't show up in the serialization and it shouldn't be used in general. There should be no problem with that in the backend since Kalamar only sends the corpus query, not KoralQuery.

@margaretha
Copy link
Contributor Author

Could you please check what request Kalamar actually sends to Kustvakt? I don't get any results sending the example direct API request using OAuth2 token and VPN, while Kalamar shows some results as reported in KorAP/Krill#86.

@Akron
Copy link
Member

Akron commented Feb 13, 2023

Well - it is used by the corpus builder and it is used for indexing - so what do you mean by "it shouldn't be used in general"? Yes it is not helpful in a corpus request, but that is not happening.

@Akron
Copy link
Member

Akron commented Feb 13, 2023

I am not sure to which query you are refering to.

@margaretha
Copy link
Contributor Author

Well - it is used by the corpus builder and it is used for indexing - so what do you mean by "it shouldn't be used in general"? Yes it is not helpful in a corpus request, but that is not happening.

I suppose it shouldn't be used since it is not part of the KoralQuery doc and not supported in backend. Why is it used by corpus builder and indexing?

I am not sure to which query you are refering to.

sorry for not being clear. I mean the query in KorAP/Krill#86 or
the one I wrote above:
https://korap.ids-mannheim.de/instance/test/api/v1.0/search?q=ich&cq=availability+%3D+%2FCC-BY.*%2F+%26+docTitle+%3D+%22gingko%22&ql=poliqarp&cutoff=1&state=&pipe=
but using Kalamar instead of a direct API request.

@Akron
Copy link
Member

Akron commented Feb 14, 2023

The KoralQuery doc currently only covers the request and error reporting stuff - neither the indexing nor the response data format. Krill supports it for indexing (see index/FieldDocument) and for responses (see response/MetaFieldsObj). type:text means, the field is indexed tokenized, so single words can be searched in (like for title) as well as a whole string match works. This obviously means that the operators in the visual corpus builder should differ.

That query doesn't show results to me. The request is:
https://korap.ids-mannheim.de/instance/test/api/v1.0/search?context=40-t%2C40-t&count=25&cq=availability+%3D+%2FCC-BY.*%2F+%26+docTitle+%3D+%22gingko%22&cutoff=true&offset=0&q=ich&ql=poliqarp

@margaretha
Copy link
Contributor Author

Thanks for your explanation.

The query should show results with OAuth2 token and VPN since the Gingko corpus is restricted.

@Akron
Copy link
Member

Akron commented Feb 14, 2023

But the VC is limited to CC-BY.*

@margaretha
Copy link
Contributor Author

margaretha commented Feb 14, 2023

Sorry you are right. The request shouldn't be restricted to CC-BY.*
Besides I made a mistake due to the URL encoding for diacritics etc

For the following query

https://korap.ids-mannheim.de/instance/test?q=Z%C3%BCndkerze&cq=corpusTitle+%3D+%22gingko%22&ql=poliqarp&cutoff=1&state=&pipe=

Kalamar would send the query below to Kustvakt, right?

curl -v -H "Authorization: Bearer token" 'https://korap.ids-mannheim.de/instance/test/api/v1.0/search?q=Z%C3%BCndkerze&cq=corpusTitle+%3D+%22gingko%22&ql=poliqarp&cutoff=1&state=&pipe='

This doesn't seem to be a problem from Kalamar and isn't related to type:text so I suppose we should discuss in KorAP/Krill#86 instead

@Akron
Copy link
Member

Akron commented Feb 14, 2023

Yes, this is unrelated. Regarding this topic: I think the corpus assistant shouldn't alter the query serialized by the KoralQuery helper - but I think that's the only problem there is and it's a minor one, not affecting any functionality of the platform.

@Akron Akron changed the title type:text is not supported type:text shouldn't show up in the KoralQuery serializer Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants