Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datatables_view not search if value contains latvian characters #8027

Open
gatiszeiris opened this issue Jan 25, 2024 · 4 comments
Open

datatables_view not search if value contains latvian characters #8027

gatiszeiris opened this issue Jan 25, 2024 · 4 comments
Assignees

Comments

@gatiszeiris
Copy link

CKAN version 2.10.1

Describe the bug

A clear and concise description of what the bug is.
Search return results with search value: "datu sis"
image

Search return results with search value: "datu sistēm"
image

Steps to reproduce

Steps to reproduce the behavior:

  1. dataset resource CSV
    name, number
    "Datu Sistēmas ","4003232323"
  2. In datatables_view try serach in field "name" value "Datu Sis" you will get result
    1. In datatables_view try serach in field "name" value "Datu Sistēm" you will not get result

Expected behavior

When search text including latvian caracters like ē, ā, ž, ī, ņ etc. search works and return results.

@wardi wardi self-assigned this Jan 25, 2024
@wardi
Copy link
Contributor

wardi commented Jan 25, 2024

Thank you @gatiszeiris would you check if the same thing happens when using the api, e.g.

ckanapi action datastore_search resource_id=… q="Datu Sistēm"?

@gatiszeiris
Copy link
Author

There is result of ckanapi --config="/etc/ckan/default/ckan.ini" action datastore_search resource_id=5f2e918d-5497-47b1-80f8-8201800ef542 q="Datu Sistēm"

{
"_links": {},
"fields": [
{
"id": "_id",
"type": "int"
},
{
"id": "regcode",
"type": "numeric"
},
{
"id": "sepa",
"type": "text"
},
{
"id": "name",
"type": "text"
},
{
"id": "name_before_quotes",
"type": "text"
},
{
"id": "name_in_quotes",
"type": "text"
},
{
"id": "name_after_quotes",
"type": "text"
},
{
"id": "without_quotes",
"type": "numeric"
},
{
"id": "regtype",
"type": "text"
},
{
"id": "regtype_text",
"type": "text"
},
{
"id": "type",
"type": "text"
},
{
"id": "type_text",
"type": "text"
},
{
"id": "registered",
"type": "timestamp"
},
{
"id": "terminated",
"type": "timestamp"
},
{
"id": "closed",
"type": "text"
},
{
"id": "address",
"type": "text"
},
{
"id": "index",
"type": "numeric"
},
{
"id": "addressid",
"type": "numeric"
},
{
"id": "region",
"type": "numeric"
},
{
"id": "city",
"type": "numeric"
},
{
"id": "atvk",
"type": "numeric"
},
{
"id": "reregistration_term",
"type": "text"
},
{
"id": "rank",
"type": "float"
}
],
"include_total": true,
"limit": 100,
"q": "Datu Sistēm",
"records": [],
"records_format": "objects",
"resource_id": "5f2e918d-5497-47b1-80f8-8201800ef542",
"total": 0,
"total_estimation_threshold": null,
"total_was_estimated": false
}

@amercader
Copy link
Member

I don't know much about this part of the code, but there is this undocumented config option which might be relevant, and I see that datastore_search supports a language parameter that might be also related.

def _fts_lang(lang: Optional[str] = None) -> str:

@wardi
Copy link
Contributor

wardi commented Jan 25, 2024

There's a language parameter and a ckan.datastore.default_fts_lang config option for datastore_search but the full text search index is created by a trigger

CREATE OR REPLACE FUNCTION populate_full_text_trigger() RETURNS trigger
AS $body$
BEGIN
IF NEW._full_text IS NOT NULL THEN
RETURN NEW;
END IF;
NEW._full_text := (
SELECT to_tsvector(string_agg(value, ' '))
FROM json_each_text(row_to_json(NEW.*))
WHERE key NOT LIKE '\_%');
RETURN NEW;
END;
$body$ LANGUAGE plpgsql;
ALTER FUNCTION populate_full_text_trigger() OWNER TO {writeuser};
that doesn't include a language

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants