Project Info

[[TOC]]

Project Info

These codes help get figshare statistics for specified categories. They were gathered from retrieving item ids through a search query and codes provided by Jonathan Petters on gathering statistics.

About the scripts

The first function itemids_for_categories() in figshare_statistics_for_categories.py generates item ids associated with the given categories.
The second function figshare_categorystatistics(itemList) takes an item list and creates a JSON file for all the metadata associated with the item ids.
The third function fetch_figshare_statistics(file_path) gathers figshare statistics for the provided CSV file.
The new script FigStats-iType-dRange.py:
- Processes Figshare statistics for datasets within specific date ranges.
- Generates 5-day intervals for a given year and retrieves metadata and statistics for items posted within those intervals.
- Saves results as JSON and CSV files, including detailed views and downloads statistics.

How to run the categories codes

Go to category numbers matching with category names figshare dataset and download the Figshare categories as a csv file, name it Figshare_categories.csv
Open figshare_statistics.py and change the file path of the Figshare categories csv file in 'file_path'
Enter the category_search='main_category' or category_search='sub_category' based on the Figshare_categories.csv file on line 25. 'main_category' corresponds to column 2, and 'sub_category' corresponds to column 3 category ids.
Enter category_to_find = '3704' on line 31 for 'main_category' or line 36 for 'sub_category'
Run figshare_statistics.py
This will generate a csv file with views and downloads : views_and_downloads_figshare_[current_date]
Two other csv files are also generated:
1. LimitedMetadataInFigshare_report_datasets[current_date].csv: limited metadata for the items related to the category picked (by the code)
2. allMetadataInFigshare_report_datasets[current_date].csv: all the metadata for the items related to the category picked (by the code)

How to run the FigStats-iType-dRange code

Running `FigStats-iType-dRange.py`

Open FigStats-iType-dRange.py.
Change the save_directory to the desired directory and run the code. The code currently gathers statistics for 2022 in 5 day increments. If testing for a customized year then change the year in "date_ranges.extend(generate_date_ranges(2022, month))". This will gather statistics in 5 days increments for the year given. If customizing the date range to specific days then uncomment the test range as below and run the code:
```
date_ranges = [
    ("2022-01-01", "2022-01-02")  # Replace with any desired 2-day range
]
```

New Workflow for Institutional Harvesting

There are now three main scripts for harvesting and analyzing institutional data:

oaimphFig.py
First, run this script.
- Harvests institution IDs and institution URLs using the OAI-PMH protocol.
- Outputs a CSV file with institution information.
- Institution abbreviation can be extracted from institution URLs.
harvest_institution_items.py
Second, run this script after oaimphFig.py.
- Reads the institution list generated by oaimphFig.py.
- Harvests all items (e.g., datasets) for each institution using the regular Figshare API.
- Allows you to specify date ranges and item types to keep the output manageable.
- Outputs a CSV file with all harvested items and their metadata.
add_stats_to_existing_csv.py
Third, run this script after harvesting items.
- Takes the CSV of harvested items from harvest_institution_items.py.
- Gathers statistics (views, downloads, etc.) for each item using the appropriate stats API.
- Adds these statistics as new columns to the CSV.

Institution-Specific and Fallback Stats URL Logic

The script now supports institution-specific statistics endpoints for:

University of Melbourne
University of Sheffield
University of Leicester
Virginia Tech
J-STAGE
Ryerson

If a direct match is not found, the script will:

Try to construct the stats URL using the institution name (with common prefixes like "University of" removed).
Try to construct the stats URL using the first word after the slash in the DOI (e.g., for DOI 10.25400/lincolnuninz.21358338.v1, the stats URL will use lincolnuninz).

All attempted URLs and their HTTP responses are printed and logged for debugging.

Required Columns

Your CSV should include at least:

id (item ID)
harvested_institution_id
url_public_html
name (institution name)
doi (if available)

Output

A new CSV file will be created with additional columns:
- views_url_used
- views_status_code
- views
- downloads_url_used
- downloads_status_code
- downloads
A log file will be created with detailed debug information.

Example

For an item with:

harvested_institution_id = 8 and id = 12345
→ Stats URL: https://stats.figshare.com/melbourne/total/views/article/12345

For an item with:

doi = 10.25400/lincolnuninz.21358338.v1
→ Fallback stats URL: https://stats.figshare.com/lincolnuninz/total/views/article/12345

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.gitignore		.gitignore
FigStats-iType-dRange.py		FigStats-iType-dRange.py
LICENSE		LICENSE
README.md		README.md
add_stats_to_existing_csv.py		add_stats_to_existing_csv.py
figshare_statistics.py		figshare_statistics.py
figshare_statistics_dataset_query.py		figshare_statistics_dataset_query.py
figshare_statistics_for_categories.py		figshare_statistics_for_categories.py
harvest_institution_items.py		harvest_institution_items.py
oaimphFig.py		oaimphFig.py
sampletest.py		sampletest.py
unique_institutions.py		unique_institutions.py
views-downlds-plot.py		views-downlds-plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Info

About the scripts

How to run the categories codes

How to run the FigStats-iType-dRange code

Running `FigStats-iType-dRange.py`

New Workflow for Institutional Harvesting

Institution-Specific and Fallback Stats URL Logic

Required Columns

Output

Example

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Info

About the scripts

How to run the categories codes

How to run the FigStats-iType-dRange code

Running FigStats-iType-dRange.py

New Workflow for Institutional Harvesting

Institution-Specific and Fallback Stats URL Logic

Required Columns

Output

Example

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Running `FigStats-iType-dRange.py`

Packages