Tracking fixes #730

seanh · 2013-04-02T18:31:56Z

New page view tracking docs built here: http://docs.ckan.org/en/tracking-fixes/tracking.html

The ckan.tracking_enabled ini file setting was being ignored, so page view tracking wasn't working. Adding ckan.tracking_enabled to app_globals makes the setting work again. Fixes #728

Page view tracking was failing when a language was selected: 1. Put `ckan.tracking_enabled = true` in your ini file 2. Run CKAN, visit a page with a language e.g. `/en/dataset/annakarenina` 3. Run `paster tracking update`. If you look in your db, in the `tracking_summary` table there'll be a row with `package_id` `~~not~found~~`. 4. Run `paster tracking export tracking.csv`, the exported CSV file will say 0 views. If you visit the page without the language in the URL e.g. `/dataset/annakarenina` then run the export command again, the view does get counted. This commit fixes the SQL used by the `paster tracking update/export` command to handle URLs with or without languages at the start. Fixes #729

Add documentation for the page view tracking feature. There are still some details to document, including how to show number of page views on the pages themselves (eg. dataset views next to datasets) and how to show "popular" labels next to popular datasets. It's also possible to get a list of the N most popular datasets, for example for the front page, perhaps this should be documented too.

Should have been added in commit 3eeeef2 but I forgot.

seanh · 2013-04-03T11:39:34Z

Sorting datasets by popularity only works if page view tracking is enabled, so don't show the 'Popular' option in the 'Order by:' dropdown if tracking isn't enabled.

Change the default sort order of package_search() to 'relevance asc, metadata_modified desc'. We want to sort by relevance by default, but when there's no search query relevance is meaningless, in that case fall back on showing the most recently modified datasets first. Also changes the sort ordering of the "Relevance" option in the "Order by:" dropdown to 'relevance asc, metadata_modified desc' instead of just 'relevance asc'. The previous default ordering was 'score desc, name asc'. I'm not even sure if that works, it seems to disagree with the sort strings that the dropdown gives you, when you chose relevance from the dropdown you got 'relevance asc' not 'score desc' (and the datasets appeared in a different order then the default), and when you chose name you get 'title_string' not name. Previously we've fallen back on showing datasets alphabetically but that's boring as it simply means that all the datasets beginning with 'a' are always shown. Last modified seems more interesting and changes over time. Popularity is not an option because that only works if the page view tracking feature is enabled. Move the logic that selects the default sort order for package_search() out of lib and into package_search(). The package_search() action function now returns the sort order it used in the 'sort' key of the returned dict, and the package controller sends this to the templates to decide which sort ordering to show selected in the "Order by:" dropdown. Previously the package controller and action function each had their own logic and the dropdown was out of sync with the actual sort order. Fixes #714.

Also mention that tracking summary data is also available for resources via the API, not just packages.

…ates

seanh · 2013-04-03T15:09:48Z

@tobes The one thing I'm still stuck on is how to make resource previews (not just downloads) count towards resource view counts. The logic is all done in SQL. Also, if you have any thoughts on how to write tests for this.

tobes · 2013-04-04T10:09:00Z

@seanh what counts as a resource preview? Isn't this just embeded in the resource page and so should be tracked or is there a more specific preview situation that you have in mind. We can probably just make some fixes to the resource previews to allow this if we need stand alone preview tracking.

The logic is in SQL as this allowed it to be done most easily at the time maybe we need to unpick this if it is causing problems.

Tests, we could probably dump some known data into the tracking table and then check that the summary info is correct when it is generated. Just making sure our totals are correct etc. I don't think we'd actually have to create too much data. This would be good to do before any reworking of the SQL.

seanh · 2013-04-04T10:21:03Z

@seanh what counts as a resource preview? Isn't this just embeded in the resource page and so should be
tracked or is there a more specific preview situation that you have in mind. We can probably just make some f
fixes to the resource previews to allow this if we need stand alone preview tracking.

@tobes I mean viewing a resource page does not count towards the resource's view count. It does get logged in tracking_raw of course, but when raw is turned to summary it doesn't count towards the resource. Only clicks on the resource's download URL are counted.

I'll have a go at some tests. The main thing is you would normally populate tracking_raw by visiting pages but that would involve the templates, maybe I'll do the tests with the old templates, or maybe I'll post directly to /_tracking or even populate the database directly. After that I'll probably do the tests functional style, just check that the tracking summary data that comes out is as expected.

tobes · 2013-04-04T10:26:40Z

via /_tracking would probably be best also /fr/_tracking ;) so we test the whole path.

as you say maybe at some point actually make x requests to ckan and then see how tracking appears but that feels like a step too far for now.

let me know if you need anything

The tracking_raw table was being created by a migration script, but was not actually defined anywhere in ckan's model. This means that in tests, ckan.model.rebuild_db() would not clear the tracking_raw table, and the old data would then leak it into tracking_summary as well. Adding tracking_raw to the model makes rebuild_db() behave as expected again.

If data_dict contained no 'sort', the default value was not being added if the data_dict did contain 'abort_search': True. The code nonetheless tries to access data_dict['sort'] later on and crashes. (Isn't this what schemas and validation are supposed to be for?) This was causing a test to fail. Adding the default 'sort' value to data_dict even if 'abort_search': True is there fixes it.

CKAN used to convert 'sort': 'rank' into 'sort': 'score desc, namae asc' but since commit ca04063 ([#714] Fix default sort ordering) it no longer does this, which breaks these tests. Update the tests to pass 'score desc, name asc' directly instead.

'relevance' is not actually in the solr schema so it doesn't do anything, score is the one we want.

Change create_arbitrary() to always create datasets in the same order, instead of a different order each time. Update pagination tests to expect datasets in the new default sort order.

This test depends on the order of the two datasets. The default sort order has changed now to most-recently-modified-first when there's no search query, and since the test creates the two datasets in a random order the search returns the datasets in a different order each time the test is run, giving a 50/50 chance that the test will fail. Change the test to not depend on the fixed order of the datasets.

Just clarify the docs a bit, and also mention that "recent" means last 14 days and that paster export is for datasets only.

seanh · 2013-04-09T12:38:18Z

Ok, I've fixed the worst bugs so that the feature is usable now, and I've added docs.

I'll make another pull request with the tests for this feature, as the tests can wait until 2.1 and I don't want them holding up the bug fixes and docs which should go into 2.0.

I've also added some github issues to resolve some of the outstanding issues in 2.1:

#741
#742

@tobes @kindly Does one of you want to review and merge this pull request now? (@kindly the default sort order change we talked about is in this pull request)

seanh · 2013-04-09T12:40:56Z

PS. I've added a topic label for collecting the issues related to this feature: https://github.com/okfn/ckan/issues?direction=asc&labels=tracking&milestone=&page=1&sort=created&state=open

tobes · 2013-04-16T14:26:19Z

doc/tracking.rst

+* Show a list of the most popular datasets
+* Export page-view data to a CSV file
+
+.. seealso::


Hmm, no, it's a sphinx command thingy

tobes · 2013-04-16T14:54:12Z

@seanh I've merged this - can you make a new branch for the 2.0 merge for @amercader as I think that might be easiest given the number of commits

seanh · 2013-04-16T16:15:02Z

Sure, making a new branch now

Sorting datasets by popularity only works if page view tracking is enabled, so don't show the 'Popular' option in the 'Order by:' dropdown if tracking isn't enabled.

The tracking_raw table was being created by a migration script, but was not actually defined anywhere in ckan's model. This means that in tests, ckan.model.rebuild_db() would not clear the tracking_raw table, and the old data would then leak it into tracking_summary as well. Adding tracking_raw to the model makes rebuild_db() behave as expected again.

If data_dict contained no 'sort', the default value was not being added if the data_dict did contain 'abort_search': True. The code nonetheless tries to access data_dict['sort'] later on and crashes. (Isn't this what schemas and validation are supposed to be for?) This was causing a test to fail. Adding the default 'sort' value to data_dict even if 'abort_search': True is there fixes it.

Sean Hammond added 4 commits April 2, 2013 20:11

[#728] Add ckan.tracking_enabled to app_globals

d242bb9

The ckan.tracking_enabled ini file setting was being ignored, so page view tracking wasn't working. Adding ckan.tracking_enabled to app_globals makes the setting work again. Fixes #728

[#541] Add popularity-sorting screenshot

8db7664

Should have been added in commit 3eeeef2 but I forgot.

Sean Hammond added 6 commits April 3, 2013 14:10

[#730] Only show 'Popular' sort option if tracking enabled

52005cb

Sorting datasets by popularity only works if page view tracking is enabled, so don't show the 'Popular' option in the 'Order by:' dropdown if tracking isn't enabled.

[#541] Add docs for highlighting popular datasets and resources

8d4c366

Also mention that tracking summary data is also available for resources via the API, not just packages.

[#541] Document how to show dataset and resource view counts in templ…

4822caa

…ates

[#541] Document showing most popular datasets on front page

a38ce6b

[#541] Reference Google Analytics extension in Tracking docs

b0ce83c

Sean Hammond added 7 commits April 6, 2013 21:41

[#714] Use score not relevance in default sort order

bc496ff

'relevance' is not actually in the solr schema so it doesn't do anything, score is the one we want.

[#714] Fix pagination tests

85b30ca

Change create_arbitrary() to always create datasets in the same order, instead of a different order each time. Update pagination tests to expect datasets in the new default sort order.

[#541] Tweak page view tracking docs

5bd3efa

Just clarify the docs a bit, and also mention that "recent" means last 14 days and that paster export is for datasets only.

[#541] Small tweak to page view tracking docs

7c7f332

seanh mentioned this pull request Apr 9, 2013

Add tests for builtin page view tracking feature #744

Merged

ghost assigned tobes Apr 16, 2013

tobes reviewed Apr 16, 2013
View reviewed changes

tobes merged commit 7c7f332 into master Apr 16, 2013

seanh mentioned this pull request Apr 16, 2013

Tracking fixes branched off 2.0 #774

Merged

nigelbabu deleted the tracking-fixes branch April 11, 2014 05:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking fixes #730

Tracking fixes #730

seanh commented Apr 2, 2013

seanh commented Apr 3, 2013

seanh commented Apr 3, 2013

tobes commented Apr 4, 2013

seanh commented Apr 4, 2013

tobes commented Apr 4, 2013

seanh commented Apr 9, 2013

seanh commented Apr 9, 2013

tobes Apr 16, 2013

seanh Apr 16, 2013

tobes commented Apr 16, 2013

seanh commented Apr 16, 2013

Tracking fixes #730

Tracking fixes #730

Conversation

seanh commented Apr 2, 2013

seanh commented Apr 3, 2013

seanh commented Apr 3, 2013

tobes commented Apr 4, 2013

seanh commented Apr 4, 2013

tobes commented Apr 4, 2013

seanh commented Apr 9, 2013

seanh commented Apr 9, 2013

tobes Apr 16, 2013

Choose a reason for hiding this comment

seanh Apr 16, 2013

Choose a reason for hiding this comment

tobes commented Apr 16, 2013

seanh commented Apr 16, 2013