- Clone this repository.
- cd to repository location and run
create_database.sh
- Set config options in config.ini. These include:
- institutional ROR id (required)
- email to be used for OpenAlex requests to get into the polite pool (optional, but recommended)
- ORCIDS of researchers to explicitly search for (useful when authors are missed by institutional affiliation search) (optional)
- Run
python main.py
- automatically ingests new records on subsequent runs.
- Run
python main.py --update_works
NOTE: spreadsheet must be tab separated. The CAS sheet is here.
- To load all the author data for the first time into the authors database table:
python populate_authors.py --local_sheet_path [your path] --container_name [container name] --load_data
- Where
[your path]
is the filepath of local downloaded spreadsheet with author records. - And
[container name]
is the name of the Docker container.
- To update the authors table with modified records from the authors spreadsheet:
python populate_authors.py --local_sheet_path [your path] --container_name [container name] --update_data
- Saves a csv of query results.
- Authors are concatenated into one field.
- Most useful for counting total CAS papers and getting institutional overview.
Run python queries.py --from_year [year] --to_year [year]
e.g. python queries.py --from_year 2022 --to_year 2022
- Saves a csv of query results.
- Authors are exploded (papers will appear more than once)
- e.g. publications are repeated for every author on that publication.
- This is useful for further filtering and grouping by individual author names, roles, etc.
Run python queries.py --single_authors --from_year [year] --to_year [year]
e.g. python queries.py --single_authors --from_year 2022 --to_year 2022
Run python queries.py --single_authors --curators --from_year [year] --to_year [year]
e.g. python queries.py --single_authors --curators --from_year 2022 --to_year 2022
Run python queries.py --single_authors --from_year [year] --to_year [year]
e.g. python queries.py --single_authors --from_year 2022 --to_year 2022
Set --department
to one of the following allowed values:
Anthropology
Aquarium
Botany
Center for Biodiversity and Community Science
Center for Comparative Genomics
Center for Exploration and Travel Health
Coral Regeneration Lab
Education
Entomology
Herpetology
Ichthyology
iNaturalist
Invertebrate Zoology and Geology
Microbiology
Ornithology and Mammalogy
Planetarium
Scientific Computing
Run python queries.py --single_authors --department [department] --from_year [year] --to_year [year]
e.g. python queries.py --single_authors --department Botany --from_year 2022 --to_year 2022
- Also saves a csv sorted by counts of papers by publisher + journal.
Run python queries.py --single_authors --department [department] --from_year [year] --to_year [year] --journal_info
e.g. python queries.py --single_authors --department Botany --from_year 2022 --to_year 2022 --journal_info
- Also saves a csv sorted by goal counts.
Run python queries.py --single_authors --department [department] --from_year [year] --to_year [year] --sustainable_goals
e.g. python queries.py --single_authors --department Botany --from_year 2022 --to_year 2022 --sustainable_goals
Run python queries.py --from_year [year] --to_year [year] --open_access_info
e.g. python queries.py --from_year 2022 --to_year 2022 --open_access_info
run python send_emails.py