[[TOC]]
These codes help get figshare statistics for specified categories. They were gathered from retrieving item ids through a search query and codes provided by Jonathan Petters on gathering statistics.
- The first function
itemids_for_categories()in figshare_statistics_for_categories.py generates item ids associated with the given categories. - The second function
figshare_categorystatistics(itemList)takes an item list and creates a JSON file for all the metadata associated with the item ids. - The third function
fetch_figshare_statistics(file_path)gathers figshare statistics for the provided CSV file. - The new script
FigStats-iType-dRange.py:- Processes Figshare statistics for datasets within specific date ranges.
- Generates 5-day intervals for a given year and retrieves metadata and statistics for items posted within those intervals.
- Saves results as JSON and CSV files, including detailed views and downloads statistics.
- Go to category numbers matching with category names figshare dataset and download the Figshare categories as a csv file, name it Figshare_categories.csv
- Open figshare_statistics.py and change the file path of the Figshare categories csv file in 'file_path'
- Enter the category_search='main_category' or category_search='sub_category' based on the Figshare_categories.csv file on line 25. 'main_category' corresponds to column 2, and 'sub_category' corresponds to column 3 category ids.
- Enter category_to_find = '3704' on line 31 for 'main_category' or line 36 for 'sub_category'
- Run figshare_statistics.py
- This will generate a csv file with views and downloads : views_and_downloads_figshare_[current_date]
- Two other csv files are also generated:
- LimitedMetadataInFigshare_report_datasets[current_date].csv: limited metadata for the items related to the category picked (by the code)
- allMetadataInFigshare_report_datasets[current_date].csv: all the metadata for the items related to the category picked (by the code)
- Open
FigStats-iType-dRange.py. - Change the save_directory to the desired directory and run the code. The code currently gathers statistics for 2022 in 5 day increments. If testing for a customized year then change the year in "date_ranges.extend(generate_date_ranges(2022, month))". This will gather statistics in 5 days increments for the year given. If customizing the date range to specific days then uncomment the test range as below and run the code:
date_ranges = [ ("2022-01-01", "2022-01-02") # Replace with any desired 2-day range ]
There are now three main scripts for harvesting and analyzing institutional data:
-
oaimphFig.py
First, run this script.- Harvests institution IDs and institution URLs using the OAI-PMH protocol.
- Outputs a CSV file with institution information.
- Institution abbreviation can be extracted from institution URLs.
-
harvest_institution_items.py
Second, run this script after oaimphFig.py.- Reads the institution list generated by
oaimphFig.py. - Harvests all items (e.g., datasets) for each institution using the regular Figshare API.
- Allows you to specify date ranges and item types to keep the output manageable.
- Outputs a CSV file with all harvested items and their metadata.
- Reads the institution list generated by
-
add_stats_to_existing_csv.py
Third, run this script after harvesting items.- Takes the CSV of harvested items from
harvest_institution_items.py. - Gathers statistics (views, downloads, etc.) for each item using the appropriate stats API.
- Adds these statistics as new columns to the CSV.
- Takes the CSV of harvested items from
The script now supports institution-specific statistics endpoints for:
- University of Melbourne
- University of Sheffield
- University of Leicester
- Virginia Tech
- J-STAGE
- Ryerson
If a direct match is not found, the script will:
- Try to construct the stats URL using the institution name (with common prefixes like "University of" removed).
- Try to construct the stats URL using the first word after the slash in the DOI (e.g., for DOI
10.25400/lincolnuninz.21358338.v1, the stats URL will uselincolnuninz).
All attempted URLs and their HTTP responses are printed and logged for debugging.
Your CSV should include at least:
id(item ID)harvested_institution_idurl_public_htmlname(institution name)doi(if available)
- A new CSV file will be created with additional columns:
views_url_usedviews_status_codeviewsdownloads_url_useddownloads_status_codedownloads
- A log file will be created with detailed debug information.
For an item with:
harvested_institution_id= 8 andid= 12345
→ Stats URL:https://stats.figshare.com/melbourne/total/views/article/12345
For an item with:
doi=10.25400/lincolnuninz.21358338.v1
→ Fallback stats URL:https://stats.figshare.com/lincolnuninz/total/views/article/12345
- https://help.figshare.com/article/how-to-use-advanced-search-in-figshare
- https://colab.research.google.com/drive/1bCVsSjg5Y5WsHHsxTq_W1j1U3B4TyO2x#scrollTo=415f0c3e-d599-415d-944e-6fe1aaad18dc
- https://help.figshare.com/article/search-examples
- https://help.figshare.com/article/how-to-use-the-figshare-api#metadata-search
- https://help.figshare.com/article/how-to-use-the-figshare-api#search-ids