Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking fixes #730

Merged
merged 18 commits into from
Apr 16, 2013
Merged

Tracking fixes #730

merged 18 commits into from
Apr 16, 2013

Conversation

seanh
Copy link
Contributor

@seanh seanh commented Apr 2, 2013

New page view tracking docs built here: http://docs.ckan.org/en/tracking-fixes/tracking.html

Sean Hammond added 4 commits April 2, 2013 20:11
The ckan.tracking_enabled ini file setting was being ignored, so page
view tracking wasn't working. Adding ckan.tracking_enabled to
app_globals makes the setting work again.

Fixes #728
Page view tracking was failing when a language was selected:

1. Put `ckan.tracking_enabled = true` in your ini file
2. Run CKAN, visit a page with a language e.g. `/en/dataset/annakarenina`
3. Run `paster tracking update`. If you look in your db, in the
   `tracking_summary` table there'll be a row with `package_id`
   `~~not~found~~`.
4. Run `paster tracking export tracking.csv`, the exported CSV file will
   say 0 views.

If you visit the page without the language in the URL e.g.
`/dataset/annakarenina` then run the export command again, the view does
get counted.

This commit fixes the SQL used by the `paster tracking update/export`
command to handle URLs with or without languages at the start.

Fixes #729
Add documentation for the page view tracking feature. There are still
some details to document, including how to show number of page views on
the pages themselves (eg. dataset views next to datasets) and how to
show "popular" labels next to popular datasets. It's also possible to
get a list of the N most popular datasets, for example for the front
page, perhaps this should be documented too.
Should have been added in commit 3eeeef2 but I forgot.
@seanh
Copy link
Contributor Author

seanh commented Apr 3, 2013

Still to do:

  • Disable "Popular" sorting when tracking is disabled
  • Change default sort ordering on dataset search page (currently looks like it's "Popular" even if tracking is disabled)
  • Add reference to google analytics extension in tracking docs
  • Add tests for the tracking feature (assuming there aren't any yet)
  • Document showing view counts for datasets and resources on pages
  • Document showing "popular" labels next to datasets (and/or resources?) on pages
  • Document showing most popular N datasets (or resrouces?) on front page?
  • Resource previews are not counted (only downloads)
  • Document that recent views means last two weeks
  • Do resource views count towards dataset views?
  • paster export seems to be for dataset views only, document this
  • page view tracking (for non-resource or dataset pages) is a bit useless, the data is not exported from the database anywhere

Sean Hammond added 6 commits April 3, 2013 14:10
Sorting datasets by popularity only works if page view tracking is
enabled, so don't show the 'Popular' option in the 'Order by:' dropdown
if tracking isn't enabled.
Change the default sort order of package_search() to 'relevance asc,
metadata_modified desc'. We want to sort by relevance by default, but
when there's no search query relevance is meaningless, in that case fall
back on showing the most recently modified datasets first. Also changes
the sort ordering of the "Relevance" option in the "Order by:" dropdown
to 'relevance asc, metadata_modified desc' instead of just 'relevance
asc'.

The previous default ordering was 'score desc, name asc'. I'm not even
sure if that works, it seems to disagree with the sort strings that the
dropdown gives you, when you chose relevance from the dropdown you got
'relevance asc' not 'score desc' (and the datasets appeared in a
different order then the default), and when you chose name you get
'title_string' not name.

Previously we've fallen back on showing datasets alphabetically but
that's boring as it simply means that all the datasets beginning with 'a'
are always shown. Last modified seems more interesting and changes over
time. Popularity is not an option because that only works if the page
view tracking feature is enabled.

Move the logic that selects the default sort order for package_search()
out of lib and into package_search().

The package_search() action function now returns the sort order it used
in the 'sort' key of the returned dict, and the package controller sends
this to the templates to decide which sort ordering to show selected in
the "Order by:" dropdown. Previously the package controller and action
function each had their own logic and the dropdown was out of sync with
the actual sort order.

Fixes #714.
Also mention that tracking summary data is also available for resources
via the API, not just packages.
@seanh
Copy link
Contributor Author

seanh commented Apr 3, 2013

@tobes The one thing I'm still stuck on is how to make resource previews (not just downloads) count towards resource view counts. The logic is all done in SQL. Also, if you have any thoughts on how to write tests for this.

@tobes
Copy link
Contributor

tobes commented Apr 4, 2013

@seanh what counts as a resource preview? Isn't this just embeded in the resource page and so should be tracked or is there a more specific preview situation that you have in mind. We can probably just make some fixes to the resource previews to allow this if we need stand alone preview tracking.

The logic is in SQL as this allowed it to be done most easily at the time maybe we need to unpick this if it is causing problems.

Tests, we could probably dump some known data into the tracking table and then check that the summary info is correct when it is generated. Just making sure our totals are correct etc. I don't think we'd actually have to create too much data. This would be good to do before any reworking of the SQL.

@seanh
Copy link
Contributor Author

seanh commented Apr 4, 2013

@seanh what counts as a resource preview? Isn't this just embeded in the resource page and so should be
tracked or is there a more specific preview situation that you have in mind. We can probably just make some f
fixes to the resource previews to allow this if we need stand alone preview tracking.

@tobes I mean viewing a resource page does not count towards the resource's view count. It does get logged in tracking_raw of course, but when raw is turned to summary it doesn't count towards the resource. Only clicks on the resource's download URL are counted.

I'll have a go at some tests. The main thing is you would normally populate tracking_raw by visiting pages but that would involve the templates, maybe I'll do the tests with the old templates, or maybe I'll post directly to /_tracking or even populate the database directly. After that I'll probably do the tests functional style, just check that the tracking summary data that comes out is as expected.

@tobes
Copy link
Contributor

tobes commented Apr 4, 2013

via /_tracking would probably be best also /fr/_tracking ;) so we test the whole path.

as you say maybe at some point actually make x requests to ckan and then see how tracking appears but that feels like a step too far for now.

let me know if you need anything

Sean Hammond added 7 commits April 6, 2013 21:41
The tracking_raw table was being created by a migration script, but was
not actually defined anywhere in ckan's model. This means that in tests,
ckan.model.rebuild_db() would not clear the tracking_raw table, and the
old data would then leak it into tracking_summary as well. Adding
tracking_raw to the model makes rebuild_db() behave as expected again.
If data_dict contained no 'sort', the default value was not being added
if the data_dict did contain 'abort_search': True. The code nonetheless
tries to access data_dict['sort'] later on and crashes. (Isn't this what
schemas and validation are supposed to be for?) This was causing a test
to fail. Adding the default 'sort' value to data_dict even if
'abort_search': True is there fixes it.
CKAN used to convert 'sort': 'rank' into 'sort': 'score desc, namae asc'
but since commit ca04063 ([#714] Fix
default sort ordering) it no longer does this, which breaks these tests.
Update the tests to pass 'score desc, name asc' directly instead.
'relevance' is not actually in the solr schema so it doesn't do
anything, score is the one we want.
Change create_arbitrary() to always create datasets in the same order,
instead of a different order each time. Update pagination tests to
expect datasets in the new default sort order.
This test depends on the order of the two datasets. The default sort
order has changed now to most-recently-modified-first when there's no
search query, and since the test creates the two datasets in a random
order the search returns the datasets in a different order each time the
test is run, giving a 50/50 chance that the test will fail.

Change the test to not depend on the fixed order of the datasets.
Just clarify the docs a bit, and also mention that "recent" means last
14 days and that paster export is for datasets only.
@seanh
Copy link
Contributor Author

seanh commented Apr 9, 2013

Ok, I've fixed the worst bugs so that the feature is usable now, and I've added docs.

I'll make another pull request with the tests for this feature, as the tests can wait until 2.1 and I don't want them holding up the bug fixes and docs which should go into 2.0.

I've also added some github issues to resolve some of the outstanding issues in 2.1:

#741
#742

@tobes @kindly Does one of you want to review and merge this pull request now? (@kindly the default sort order change we talked about is in this pull request)

@seanh
Copy link
Contributor Author

seanh commented Apr 9, 2013

PS. I've added a topic label for collecting the issues related to this feature: https://github.com/okfn/ckan/issues?direction=asc&labels=tracking&milestone=&page=1&sort=created&state=open

* Show a list of the most popular datasets
* Export page-view data to a CSV file

.. seealso::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no, it's a sphinx command thingy

@tobes tobes merged commit 7c7f332 into master Apr 16, 2013
@tobes
Copy link
Contributor

tobes commented Apr 16, 2013

@seanh I've merged this - can you make a new branch for the 2.0 merge for @amercader as I think that might be easiest given the number of commits

@seanh
Copy link
Contributor Author

seanh commented Apr 16, 2013

Sure, making a new branch now

seanh pushed a commit that referenced this pull request Apr 16, 2013
Sorting datasets by popularity only works if page view tracking is
enabled, so don't show the 'Popular' option in the 'Order by:' dropdown
if tracking isn't enabled.
seanh pushed a commit that referenced this pull request Apr 16, 2013
The tracking_raw table was being created by a migration script, but was
not actually defined anywhere in ckan's model. This means that in tests,
ckan.model.rebuild_db() would not clear the tracking_raw table, and the
old data would then leak it into tracking_summary as well. Adding
tracking_raw to the model makes rebuild_db() behave as expected again.
seanh pushed a commit that referenced this pull request Apr 16, 2013
If data_dict contained no 'sort', the default value was not being added
if the data_dict did contain 'abort_search': True. The code nonetheless
tries to access data_dict['sort'] later on and crashes. (Isn't this what
schemas and validation are supposed to be for?) This was causing a test
to fail. Adding the default 'sort' value to data_dict even if
'abort_search': True is there fixes it.
@nigelbabu nigelbabu deleted the tracking-fixes branch April 11, 2014 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants