-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking fixes #730
Tracking fixes #730
Conversation
The ckan.tracking_enabled ini file setting was being ignored, so page view tracking wasn't working. Adding ckan.tracking_enabled to app_globals makes the setting work again. Fixes #728
Page view tracking was failing when a language was selected: 1. Put `ckan.tracking_enabled = true` in your ini file 2. Run CKAN, visit a page with a language e.g. `/en/dataset/annakarenina` 3. Run `paster tracking update`. If you look in your db, in the `tracking_summary` table there'll be a row with `package_id` `~~not~found~~`. 4. Run `paster tracking export tracking.csv`, the exported CSV file will say 0 views. If you visit the page without the language in the URL e.g. `/dataset/annakarenina` then run the export command again, the view does get counted. This commit fixes the SQL used by the `paster tracking update/export` command to handle URLs with or without languages at the start. Fixes #729
Add documentation for the page view tracking feature. There are still some details to document, including how to show number of page views on the pages themselves (eg. dataset views next to datasets) and how to show "popular" labels next to popular datasets. It's also possible to get a list of the N most popular datasets, for example for the front page, perhaps this should be documented too.
Should have been added in commit 3eeeef2 but I forgot.
Still to do:
|
Sorting datasets by popularity only works if page view tracking is enabled, so don't show the 'Popular' option in the 'Order by:' dropdown if tracking isn't enabled.
Change the default sort order of package_search() to 'relevance asc, metadata_modified desc'. We want to sort by relevance by default, but when there's no search query relevance is meaningless, in that case fall back on showing the most recently modified datasets first. Also changes the sort ordering of the "Relevance" option in the "Order by:" dropdown to 'relevance asc, metadata_modified desc' instead of just 'relevance asc'. The previous default ordering was 'score desc, name asc'. I'm not even sure if that works, it seems to disagree with the sort strings that the dropdown gives you, when you chose relevance from the dropdown you got 'relevance asc' not 'score desc' (and the datasets appeared in a different order then the default), and when you chose name you get 'title_string' not name. Previously we've fallen back on showing datasets alphabetically but that's boring as it simply means that all the datasets beginning with 'a' are always shown. Last modified seems more interesting and changes over time. Popularity is not an option because that only works if the page view tracking feature is enabled. Move the logic that selects the default sort order for package_search() out of lib and into package_search(). The package_search() action function now returns the sort order it used in the 'sort' key of the returned dict, and the package controller sends this to the templates to decide which sort ordering to show selected in the "Order by:" dropdown. Previously the package controller and action function each had their own logic and the dropdown was out of sync with the actual sort order. Fixes #714.
Also mention that tracking summary data is also available for resources via the API, not just packages.
@tobes The one thing I'm still stuck on is how to make resource previews (not just downloads) count towards resource view counts. The logic is all done in SQL. Also, if you have any thoughts on how to write tests for this. |
@seanh what counts as a resource preview? Isn't this just embeded in the resource page and so should be tracked or is there a more specific preview situation that you have in mind. We can probably just make some fixes to the resource previews to allow this if we need stand alone preview tracking. The logic is in SQL as this allowed it to be done most easily at the time maybe we need to unpick this if it is causing problems. Tests, we could probably dump some known data into the tracking table and then check that the summary info is correct when it is generated. Just making sure our totals are correct etc. I don't think we'd actually have to create too much data. This would be good to do before any reworking of the SQL. |
@tobes I mean viewing a resource page does not count towards the resource's view count. It does get logged in tracking_raw of course, but when raw is turned to summary it doesn't count towards the resource. Only clicks on the resource's download URL are counted. I'll have a go at some tests. The main thing is you would normally populate tracking_raw by visiting pages but that would involve the templates, maybe I'll do the tests with the old templates, or maybe I'll post directly to |
via as you say maybe at some point actually make x requests to ckan and then see how tracking appears but that feels like a step too far for now. let me know if you need anything |
The tracking_raw table was being created by a migration script, but was not actually defined anywhere in ckan's model. This means that in tests, ckan.model.rebuild_db() would not clear the tracking_raw table, and the old data would then leak it into tracking_summary as well. Adding tracking_raw to the model makes rebuild_db() behave as expected again.
If data_dict contained no 'sort', the default value was not being added if the data_dict did contain 'abort_search': True. The code nonetheless tries to access data_dict['sort'] later on and crashes. (Isn't this what schemas and validation are supposed to be for?) This was causing a test to fail. Adding the default 'sort' value to data_dict even if 'abort_search': True is there fixes it.
'relevance' is not actually in the solr schema so it doesn't do anything, score is the one we want.
Change create_arbitrary() to always create datasets in the same order, instead of a different order each time. Update pagination tests to expect datasets in the new default sort order.
This test depends on the order of the two datasets. The default sort order has changed now to most-recently-modified-first when there's no search query, and since the test creates the two datasets in a random order the search returns the datasets in a different order each time the test is run, giving a 50/50 chance that the test will fail. Change the test to not depend on the fixed order of the datasets.
Just clarify the docs a bit, and also mention that "recent" means last 14 days and that paster export is for datasets only.
Ok, I've fixed the worst bugs so that the feature is usable now, and I've added docs. I'll make another pull request with the tests for this feature, as the tests can wait until 2.1 and I don't want them holding up the bug fixes and docs which should go into 2.0. I've also added some github issues to resolve some of the outstanding issues in 2.1: @tobes @kindly Does one of you want to review and merge this pull request now? (@kindly the default sort order change we talked about is in this pull request) |
PS. I've added a topic label for collecting the issues related to this feature: https://github.com/okfn/ckan/issues?direction=asc&labels=tracking&milestone=&page=1&sort=created&state=open |
* Show a list of the most popular datasets | ||
* Export page-view data to a CSV file | ||
|
||
.. seealso:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no, it's a sphinx command thingy
@seanh I've merged this - can you make a new branch for the 2.0 merge for @amercader as I think that might be easiest given the number of commits |
Sure, making a new branch now |
Sorting datasets by popularity only works if page view tracking is enabled, so don't show the 'Popular' option in the 'Order by:' dropdown if tracking isn't enabled.
The tracking_raw table was being created by a migration script, but was not actually defined anywhere in ckan's model. This means that in tests, ckan.model.rebuild_db() would not clear the tracking_raw table, and the old data would then leak it into tracking_summary as well. Adding tracking_raw to the model makes rebuild_db() behave as expected again.
If data_dict contained no 'sort', the default value was not being added if the data_dict did contain 'abort_search': True. The code nonetheless tries to access data_dict['sort'] later on and crashes. (Isn't this what schemas and validation are supposed to be for?) This was causing a test to fail. Adding the default 'sort' value to data_dict even if 'abort_search': True is there fixes it.
New page view tracking docs built here: http://docs.ckan.org/en/tracking-fixes/tracking.html