Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset search returns different results than search on organisation page #3291

Closed
metaodi opened this issue Oct 31, 2016 · 7 comments
Closed
Assignees

Comments

@metaodi
Copy link
Member

metaodi commented Oct 31, 2016

CKAN Version if known (or site URL)

2.4 and newer

Please describe the expected behaviour

The search on the organisation page should return the same result as if I'm on the dataset search page with an activated organisation facet (i.e. limited to one organisation).

Please describe the actual behaviour

Currently the search behaves differently, because on the organisation page, an extra parameter is added to the query, which changes the behavior of the Solr search. I noticed, that this extra parameter leads Solr to not use the DisMax Query Parser for a simple search term. I'm not a Solr expert, but I think it's because with the added parameter to query becomes too complex for the simple parser to kick in.
In my CKAN instance this leads to very different search results if I'm on the dataset search page or on an organisation search page.

What steps can be taken to reproduce the issue?

Search for the same search term on the dataset search with a filter to an organisation and run the same search on the organisation search page. The results are different in my case.

I propose to implement the filter on the organisation search term using the facet, i.e. setting the qf a filter query, i.e. setting the fq parameter of Solr. If you agree, I could make a PR to make the search behave the same.

It's possible that my Solr setup is not correct, but nontheless, I think it makes sense that these two searches should behave the same (same good/bad that is).

@TkTech
Copy link
Member

TkTech commented Oct 31, 2016

Heyo. In your Solr log you should have the exact final statement that is being run for both of these queries. Can you post one within the group/org and one without? Many different things modify the end search query.

Also, this should definitely not be a facet ("faceting" on a single known value doesn't make much sense). Instead, it should be an fq (filter query) value.

@metaodi
Copy link
Member Author

metaodi commented Oct 31, 2016

@TkTech you're right, I meant filter query, as said, I'm not an solr expert and not too familiar with the terms.

Here are the two log statements

  1. From the dataset page, limited to the organization "bundesarchiv" with the search term "baustellen":
[ckan.lib.search.query] Package query: {'sort': u'score desc, metadata_modified desc', 'fq': [u'organization:"bundesarchiv" +dataset_type:dataset -dataset_type:harvest capacity:"public" +state:active +site_id:"default"'], 'facet.mincount': 1, 'rows': 21, 'facet.field': ['groups', 'keywords_de', 'organization', 'political_level', 'res_rights', 'res_format'], 'facet.limit': '50', 'mm': '2<-1 5<80%', 'facet': 'true', 'q': u'baustellen', 'start': 0, 'wt': 'json', 'qf': 'title_de^8 text_de^4 title_en^2 text_en title_fr^2 text_fr title_it^2 text_it', 'tie': '0.1', 'fl': 'id validated_data_dict', 'defType': 'dismax'}
  1. From the org page of the organiuation "bundesarchiv" with the search term "baustellen":
[ckan.lib.search.query] Package query: {'sort': u'score desc, metadata_modified desc', 'fq': [u'+dataset_type:dataset -dataset_type:harvest capacity:"public" +state:active +site_id:"default"'], 'facet.mincount': 1, 'rows': 21, 'facet.field': ['groups', 'keywords_de', 'res_rights', 'res_format'], 'facet.limit': '50', 'facet': 'true', 'q': u'baustellen owner_org:"7dbaad15-597f-499c-9a72-95de38b95cad"', 'start': 0, 'wt': 'json', 'qf': 'title_de^8 text_de^4 title_en^2 text_en title_fr^2 text_fr title_it^2 text_it', 'fl': 'id validated_data_dict'}

As you can see, the first one uses 'fq': 'organization:"bundesarchiv" and 'defType': 'dismax', whereas the second one changes the query to 'q': 'baustellen owner_org:"7dbaad15-597f-499c-9a72-95de38b95cad".

This currently leads to the situation, where the first one returns a result, but the second one does not. I don't know why this is the case, but propose to implement the search on the org page in the same way as it is on the dataset page.

@amercader
Copy link
Member

The first query includes dismax parameters like 'mm': '2<-1 5<80%', which I guess are affecting the results. I don't think there's any difference in 'fq': 'organization:"bundesarchiv" vs 'q': '... owner_org:"xxx". (in terms of results returned, performance is another question).

@TkTech you seem the one with more Solr experience so I'll assign you to keep an eye on it

@metaodi
Copy link
Member Author

metaodi commented Nov 1, 2016

I just replaced 'q': '... owner_org:"xxx" with 'fq': 'organization:"xxx" in my controller, which solved the problem for me. Maybe there is a problem with the owner_org field in my solr schema. I'm not sure.

@fabiankirstein
Copy link
Contributor

In my opinion fq should be used in the group controller instead of q for filtering. I ran into unexpected behaviour, because I'm modifying the search parameters in before_search and I was expecting to have the search term in q and the filtering in fq.

@kmbn
Copy link
Contributor

kmbn commented Jan 22, 2019

We decided to close old issues that are not actively worked on so that we can focus our effort and attention on issues affecting the current versions of CKAN.

If this issue is still affecting the version of CKAN you're working with now, please feel free to comment or reopen the issue.

If you do reopen this issue, please update it with new details. One reason it might not have been resolved in the past is that it wasn't clear how a contributor could address the issue.

@adborden
Copy link

I think this is still an issue. We often see different results between the main page and the organization page which is very confusing to the users. In some cases, no results are returned on the organization page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants