Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with some special characters in annotator api #31

Open
dilshans2k opened this issue May 17, 2024 · 3 comments
Open

Issues with some special characters in annotator api #31

dilshans2k opened this issue May 17, 2024 · 3 comments

Comments

@dilshans2k
Copy link

Request:

encoded_text = quote_plus(text)
apikey = ""
ontologies_to_search = [
    "MONDO"
    ]
format = "json"
params: Dict[str, Any] = {
    "apikey": apikey,
    "format": format,
    "ontologies": ontologies_to_search,
    "mappings": True,
    "longest_only": True,
    "exclude_synonyms": False,
    "expand_class_hierarchy": False,
    "class_hierarchy_max_level": 0,
    "text": encoded_text
}
url = "http://services.data.bioontology.org/annotatorplus"
url = url + f"?apikey={apikey}&format={format}&ontologies={ontologies_to_search[0]}&mappings={True}&longest_only={True}&exclude_synonyms={False}&class_hierarchy_max_level={0}&text={text}"
r = requests.get(url=url)
r.raise_for_status()

Issue with %

If the input text contains % (note the whitespace), API gives 500 internal server error.

Sample input:
text=Parkinson Disease % Pneumonia

Server response:

<body>
    <h1>HTTP Status 500 – Internal Server Error</h1>
    <hr class="line" />
    <p><b>Type</b> Exception Report</p>
    <p><b>Message</b> Unexpected end of input at 1:1</p>
    <p><b>Description</b> The server encountered an unexpected condition that prevented it from fulfilling the request.
    </p>
    <p><b>Exception</b></p>
    <pre>com.eclipsesource.json.ParseException: Unexpected end of input at 1:1
	com.eclipsesource.json.JsonParser.error(JsonParser.java:490)
	com.eclipsesource.json.JsonParser.expected(JsonParser.java:484)
	com.eclipsesource.json.JsonParser.readValue(JsonParser.java:193)
	com.eclipsesource.json.JsonParser.parse(JsonParser.java:152)
	com.eclipsesource.json.JsonParser.parse(JsonParser.java:91)
	com.eclipsesource.json.Json.parse(Json.java:295)
	org.sifrproject.annotations.input.BioPortalJSONAnnotationParser.parseAnnotations(BioPortalJSONAnnotationParser.java:65)
	org.sifrproject.servlet.AnnotatorServlet.doPost(AnnotatorServlet.java:177)
	org.sifrproject.servlet.AnnotatorServlet.doGet(AnnotatorServlet.java:118)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:655)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:764)
	org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
	org.sifrproject.util.CharacterSetFilter.doFilter(CharacterSetFilter.java:24)
</pre>
    <p><b>Note</b> The full stack trace of the root cause is available in the server logs.</p>
    <hr class="line" />
    <h3>Apache Tomcat/9.0.62</h3>
</body>

Issue with ;

  1. If the text prefix contains ;, API gives 200OK but with error

    Sample input:
    text: ;Disease

    Sample output:

    [
        {
            "error": "{"errors":["A text to be annotated must be supplied using the argument text=<text to be annotated>"],"status":400}
    "}]
    
  2. If the text contains ;, only entities before ; are annotated.
    Sample input1:
    text: PARKINSON DISEASE PARKINSON's DISEASE

Sample output1:

[
    {
        "annotatedClass": {
            "definition": [
                "A progressive degenerative disorder of the central nervous system characterized by loss of dopamine producing neurons in the substantia nigra and the presence of Lewy bodies in the substantia nigra and locus coeruleus. Signs and symptoms include tremor which is most pronounced during rest, muscle rigidity, slowing of the voluntary movements, a tendency to fall back, and a mask-like facial expression."
            ],
            "prefLabel": "Parkinson disease",
            "synonym": [
                "paralysis agitans",
                "Parkinson disease",
                "Parkinson's disease"
            ],
..........................
        "hierarchy": [],
        "annotations": [
            {
                "from": 1,
                "to": 17,
                "matchType": "PREF",
                "text": "PARKINSON DISEASE"
            },
            {
                "from": 19,
                "to": 37,
                "matchType": "SYN",
                "text": "PARKINSON'S DISEASE"
            }
        ],
        "mappings": []
    }
]

Sample input2:
text = PARKINSON DISEASE; PARKINSON's DISEASE

Sample output2:

    [
        {
            "annotatedClass": {
                "definition": [
                    "A progressive degenerative disorder of the central nervous system characterized by loss of dopamine producing neurons in the substantia nigra and the presence of Lewy bodies in the substantia nigra and locus coeruleus. Signs and symptoms include tremor which is most pronounced during rest, muscle rigidity, slowing of the voluntary movements, a tendency to fall back, and a mask-like facial expression."
                ],
                "prefLabel": "Parkinson disease",
                "synonym": [
                    "paralysis agitans",
                    "Parkinson disease",
                    "Parkinson's disease"
                ],
    ............................
            "annotations": [
                {
                    "from": 1,
                    "to": 17,
                    "matchType": "PREF",
                    "text": "PARKINSON DISEASE"
                }
            ],
            "mappings": []
        }
    ]

As it is visible, Only the first instance of PARKINSON DISEASE was annotated.

@syphax-bouazzouni
Copy link

Hello @dilshans2k,

Thank you for the detailed report; We are not right now doing any development on the annotatorplus project, but we will make sure to fix it, in future iterations.

A temporary fix, is to remove special characters from the submitted text, using a regex like this [^\w\s].

As reference here are related issues ontoportal-lirmm/annotators#49, ontoportal-lirmm/bioportal_web_ui#558, and the temporary fix that we did at AgroPortal to remove special characters from the submitted text ontoportal-lirmm/bioportal_web_ui#561

FYI @jonquet, @Bilelkihal

@dilshans2k
Copy link
Author

Thanks for the prompt reply.
Yes, the solution provided is one way to get it working.

I was wondering, is the annotatorplus repo open source? Also if the database or the kg is publicly available?

@syphax-bouazzouni
Copy link

Hello,

Yes, the annotatorplus repo is open-source, feel free to propose any contribution here https://github.com/ontoportal-lirmm/annotators.

The KG is not publicly available, as Biooportal doesn't offer a SPARQL endpoint, but you can use the API https://data.bioontology.org/ to access all the public ontologies and terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants