Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service: Establish Dataverse Project and Harvard Dataverse Repository Metrics Service #271

Open
3 tasks
cmbz opened this issue Jun 14, 2024 · 2 comments
Open
3 tasks
Assignees
Labels
Dataverse Project Issues related to Dataverse Project software GREI 4 Analytics and Reporting Harvard Dataverse Issues related to Harvard Dataverse Repository Project: Metrics Tasks related to developing a Dataverse and HDV metrics service

Comments

@cmbz
Copy link
Contributor

cmbz commented Jun 14, 2024

Purpose

  • Design, implement & operationalize a metrics service for the Dataverse Project and Harvard Dataverse Repository.

Background

  • A number of metrics about Dataverse Project and Harvard Dataverse are collected at different time intervals and published to different locations. For instance, some metrics are available from the Dataverse.org home page: https://dataverse.org/metrics, the IQSS metrics webpage: https://www.iq.harvard.edu/metrics. Metrics also appear on the Harvard Dataverse Repository homepage: https://dataverse.harvard.edu/. Use metrics for individual datasets are also shown on Harvard Dataverse Dataset pages. Other metrics related to Harvard Dataverse Repository (e.g., number of user searches or most popular users searches) are currently produced and tracked by the software but not always publicized.
  • Within IQSS there is a need for ready access to these, and additional new metrics about the Dataverse Project and Harvard Dataverse Repository for monitoring and marketing purposes and to quantify and demonstrate growth of the software and repository over time.
  • The work defined in this proposal will:
    • Centralize all Dataverse and Harvard Dataverse-related metrics
    • Identify the existing and new elements needed to collect and present Dataverse Project and Harvard Dataverse Repository metrics
    • Improve existing or implement new processes (technical and administrative) to collect desired data and produce metrics in a repeatable, highly automated fashion.
    • Establish an implementation timeline, including specific short, medium, and long term tasks needed to produce all required metrics
    • Define a repeatable, highly automated process for aggregating and delivering existing metrics and for supporting new metrics as needed.

Proposed Approach

  • Create a new service called Dataverse Hub located at hub.dataverse.org for the following

    • PostgresQL that stores the metrics (with dates/timestamps)
    • A simple server that serves APIs to get access to stats
    • A set of batch scripts that accumulate stats from HDV and the DV network, including search queries.
    • Create either a small new package or collaborate with Jan for pyDataverse, so that everyone can get the same numbers via python. We can decide we also want to use pyDataverse to write the stats
    • Create a set of tests to check unwanted behaviors (like sudden drop, or other strange things, TBD)
    • Normalize all the stats (the definitions) and we add the metrics that were discussed with Gary recently
    • Define better graphical representations for the data. We should probably think at more dynamic frameworks where we can select stats by installations, by subjects etc, compare or do other simple UX interactions. Embed framework in 1), 2) and 3) possibly.
  • By implementing the above steps, hub.dataverse.org will become the only authoritative source of information of all the original three metrics pages.

Harvard Dataverse Repository Metrics

Service will provide the following:

  • Regular automated monitoring and sharing of metrics about HDV and related services, including storage usage, downloads, and curation
  • Static and interactive visualizations of data about HDV for visual exploration and use in Web pages and reports
  • Provide input for IQSS and Harvard Library strategic planning efforts and HDV research projects
  • Support user dataset creation and usage report requests for NIH and other grants

Tasks

Related

Resources

@cmbz cmbz added Harvard Dataverse Issues related to Harvard Dataverse Repository Dataverse Project Issues related to Dataverse Project software labels Jun 14, 2024
@cmbz cmbz self-assigned this Jun 14, 2024
@cmbz
Copy link
Contributor Author

cmbz commented Jun 14, 2024

Status: June 2024

@cmbz
Copy link
Contributor Author

cmbz commented Jun 14, 2024

Status: July 2024

@cmbz cmbz changed the title Project: Establish Dataverse Project Analytics Program Project: Establish Dataverse Project and Harvard Dataverse Repository Analytics Program Jun 14, 2024
@cmbz cmbz added the Project: Metrics Tasks related to developing a Dataverse and HDV metrics service label Jun 14, 2024
@cmbz cmbz changed the title Project: Establish Dataverse Project and Harvard Dataverse Repository Analytics Program Service: Establish Dataverse Project and Harvard Dataverse Repository Metrics Service Jun 24, 2024
@cmbz cmbz added the GREI 4 Analytics and Reporting label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dataverse Project Issues related to Dataverse Project software GREI 4 Analytics and Reporting Harvard Dataverse Issues related to Harvard Dataverse Repository Project: Metrics Tasks related to developing a Dataverse and HDV metrics service
Projects
None yet
Development

No branches or pull requests

3 participants