ror-predictor-fasttext

ROR prediction service, trained with fastText

setup

Install fastText:

git clone https://github.com/facebookresearch/fastText.git
cd fastText
sudo pip install .

Install requirements.txt

pip install -r requirements.txt

Download the model files from Hugging Face and place in a directory. Pass this directory to the Predictor class when creating, e.g.:

PREDICTOR = Predictor('path_to/model_files_dir/')

usage

See test.py for an example and test_data for sample datasets. Create an instance of the predictor class and feed it an affiliation string and prediction confidence level. In testing, 0.85 was found to be a good good threshold for returning a sufficient amount of accurate predictions (75-80% predicted at 85-90% accuracy).

training

Prediction service was trained on a subset of affiliation strings from OpenAlex that contained ROR IDs whose assignments could be validated. See the OpenAlex documentation for downloading their works dataset. See parse-openalex-works for extracting the training data. See validate-ror-id-assignments for validation logic. See the training directory for training on the validated assignments.

limitations

Training data that could be validated was only available for 64,656 ROR IDs (~63% of total ROR IDs) in the OpenAlex works dataset. See model_ids.txt for a complete list of IDs that are able to be predicted. Predictions cannot be made for ROR IDs on which the service was not trained. Use the affiliation service in the ROR API for more general matching (but please run it locally using the Docker image if you're trying to match a large volume of affiliation data).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ror-predictor-fasttext

setup

usage

training

limitations

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
test_data		test_data
training		training
README.md		README.md
model_ids.txt		model_ids.txt
predictor.py		predictor.py
requirements.txt		requirements.txt
test.py		test.py

adambuttrick/ror-predictor-fasttext

Folders and files

Latest commit

History

Repository files navigation

ror-predictor-fasttext

setup

usage

training

limitations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages