Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSpace 7 community and collection statistics are inaccurate #9275

Closed
alanorth opened this issue Jan 22, 2024 · 1 comment
Closed

DSpace 7 community and collection statistics are inaccurate #9275

alanorth opened this issue Jan 22, 2024 · 1 comment
Labels
bug component: statistics Related to Statistics (Solr or Google Analytics) needs triage New issue needs triage and/or scheduling

Comments

@alanorth
Copy link
Contributor

alanorth commented Jan 22, 2024

Describe the bug
Statistics displayed on the "Statistics" page of a community or collection are inaccurate (in my opinion). They seem to only reflect views of the community or collection page itself, rather than counting child communities, collections, items, and bitstreams.

To Reproduce
Steps to reproduce the behavior:

  1. View the "Statistics" page for a community or collection, take note of the UUID
  2. Use Solr to perform a search for statistics records directly belonging to that UUID, for example: id:d09e6100-df60-4280-9ff3-f3fea5ea4e6b. These are views of the community/collection page itself.
  3. Use Solr to perform a search for statistics records belonging to children of that UUID, for example: owningComm:d09e6100-df60-4280-9ff3-f3fea5ea4e6b (or owningColl for a collection).

Expected behavior
The number of "views" of a community or collection should take into account all children of the community or collection, in addition to the community page itself. We already have the owningComm and owningColl fields in Solr managed by DSpace. We can use those to show more accurate statistics than we are currently doing, without much work.

For example, our DSpace repository has years of Solr statistics and currently shows this for the simple id query of a large community:

$ curl 'http://localhost:8983/solr/statistics/select?q=id:d09e6100-df60-4280-9ff3-f3fea5ea4e6b&rows=0'
{
...
  "response":{"numFound":2543,"start":0,"numFoundExact":true,"docs":[]
}}

But really there are many, many more if we include the child communities, collections, item pages, and bitstreams via the owningComm:

$ curl 'http://localhost:8983/solr/statistics/select?q=owningComm:d09e6100-df60-4280-9ff3-f3fea5ea4e6b&rows=0'
{
...
  "response":{"numFound":6461926,"start":0,"numFoundExact":true,"docs":[]
}}

In addition to this, at our institution we disaggregate statistics into "views" and "downloads" using the type field (see DSpace Constants¹):

  • A "view" is something that is an HTML landing page for a community, collection, or item (types 2, 3, 4)
  • A "download" is something that is a "view" of a bitstream (type 0) where the bundleName:ORIGINAL

So in Solr I can see this community had:

  • 1872043 "views"
  • 2582247 "downloads"

You can see how we count views and downloads in our standalone dspace-statistics-api indexer.

Related work


¹ See dspace-api/src/main/java/org/dspace/core/Constants.java

@alanorth alanorth added bug component: statistics Related to Statistics (Solr or Google Analytics) needs triage New issue needs triage and/or scheduling labels Jan 22, 2024
@alanorth
Copy link
Contributor Author

alanorth commented Mar 1, 2024

This is a duplicate of #8572.

@alanorth alanorth closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component: statistics Related to Statistics (Solr or Google Analytics) needs triage New issue needs triage and/or scheduling
Projects
Status: Done / Closed
Development

No branches or pull requests

1 participant