Skip to content

VTUL/figshare_API_data_use_metrics

Repository files navigation

[[TOC]]

Project Info

These codes help get figshare statistics for specified categories. They were gathered from retrieving item ids through a search query and codes provided by Jonathan Petters on gathering statistics.

About the scripts

  • The first function itemids_for_categories() in figshare_statistics_for_categories.py generates item ids associated with the given categories.
  • The second function figshare_categorystatistics(itemList) takes an item list and creates a JSON file for all the metadata associated with the item ids.
  • The third function fetch_figshare_statistics(file_path) gathers figshare statistics for the provided CSV file.
  • The new script FigStats-iType-dRange.py:
    • Processes Figshare statistics for datasets within specific date ranges.
    • Generates 5-day intervals for a given year and retrieves metadata and statistics for items posted within those intervals.
    • Saves results as JSON and CSV files, including detailed views and downloads statistics.

How to run the categories codes

  • Go to category numbers matching with category names figshare dataset and download the Figshare categories as a csv file, name it Figshare_categories.csv
  • Open figshare_statistics.py and change the file path of the Figshare categories csv file in 'file_path'
  • Enter the category_search='main_category' or category_search='sub_category' based on the Figshare_categories.csv file on line 25. 'main_category' corresponds to column 2, and 'sub_category' corresponds to column 3 category ids.
  • Enter category_to_find = '3704' on line 31 for 'main_category' or line 36 for 'sub_category'
  • Run figshare_statistics.py
  • This will generate a csv file with views and downloads : views_and_downloads_figshare_[current_date]
  • Two other csv files are also generated:
    1. LimitedMetadataInFigshare_report_datasets[current_date].csv: limited metadata for the items related to the category picked (by the code)
    2. allMetadataInFigshare_report_datasets[current_date].csv: all the metadata for the items related to the category picked (by the code)

How to run the FigStats-iType-dRange code

Running FigStats-iType-dRange.py

  1. Open FigStats-iType-dRange.py.
  2. Change the save_directory to the desired directory and run the code. The code currently gathers statistics for 2022 in 5 day increments. If testing for a customized year then change the year in "date_ranges.extend(generate_date_ranges(2022, month))". This will gather statistics in 5 days increments for the year given. If customizing the date range to specific days then uncomment the test range as below and run the code:
    date_ranges = [
        ("2022-01-01", "2022-01-02")  # Replace with any desired 2-day range
    ]

New Workflow for Institutional Harvesting

There are now three main scripts for harvesting and analyzing institutional data:

  1. oaimphFig.py
    First, run this script.

    • Harvests institution IDs and institution URLs using the OAI-PMH protocol.
    • Outputs a CSV file with institution information.
    • Institution abbreviation can be extracted from institution URLs.
  2. harvest_institution_items.py
    Second, run this script after oaimphFig.py.

    • Reads the institution list generated by oaimphFig.py.
    • Harvests all items (e.g., datasets) for each institution using the regular Figshare API.
    • Allows you to specify date ranges and item types to keep the output manageable.
    • Outputs a CSV file with all harvested items and their metadata.
  3. add_stats_to_existing_csv.py
    Third, run this script after harvesting items.

    • Takes the CSV of harvested items from harvest_institution_items.py.
    • Gathers statistics (views, downloads, etc.) for each item using the appropriate stats API.
    • Adds these statistics as new columns to the CSV.

Institution-Specific and Fallback Stats URL Logic

The script now supports institution-specific statistics endpoints for:

  • University of Melbourne
  • University of Sheffield
  • University of Leicester
  • Virginia Tech
  • J-STAGE
  • Ryerson

If a direct match is not found, the script will:

  1. Try to construct the stats URL using the institution name (with common prefixes like "University of" removed).
  2. Try to construct the stats URL using the first word after the slash in the DOI (e.g., for DOI 10.25400/lincolnuninz.21358338.v1, the stats URL will use lincolnuninz).

All attempted URLs and their HTTP responses are printed and logged for debugging.

Required Columns

Your CSV should include at least:

  • id (item ID)
  • harvested_institution_id
  • url_public_html
  • name (institution name)
  • doi (if available)

Output

  • A new CSV file will be created with additional columns:
    • views_url_used
    • views_status_code
    • views
    • downloads_url_used
    • downloads_status_code
    • downloads
  • A log file will be created with detailed debug information.

Example

For an item with:

  • harvested_institution_id = 8 and id = 12345
    → Stats URL: https://stats.figshare.com/melbourne/total/views/article/12345

For an item with:

  • doi = 10.25400/lincolnuninz.21358338.v1
    → Fallback stats URL: https://stats.figshare.com/lincolnuninz/total/views/article/12345

Links

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages