Minor tweaking of jpbarrett13's OpenAlex institution ID prediction service to spit out ROR IDs.
The team at OpenAlex trained a text classification model on their affiliation strings to predict institution IDs. By adding a mapping between OpenAlex institution IDs and ROR IDs, the same model can be used to predict ROR IDs. See OpenAlex's paper and jpbarrett13's notebooks for full details.
- pip install -r requirements.txt
- Download the model artifacts as described in the notes section of the OpenAlex repository.
- Add institution_ror_id_mapping.pkl to the model artifacts directory.
- Add model artifacts file path to the predictor class instantiation in app.py
Start up the flask app and post the affiliation string to the invocations route. See test.py for an example. Where both an institution ID and a ROR ID exist, both values will be returned. Where no ROR ID exists for an institution ID, only the institution ID will be returned.
Please review all of jpbarrett13's notebooks to understand how the model was trained. Prediction success is determined (in part) by the amount of training affiliation data that was available for any given institution or ROR ID. In addition, because this is a classification model, affiliation strings for institutions that do not exist in OpenAlex will not return correct results.