System and workflows for the mass georeferencing of museum records using known localities from collections and spatial databases.
- Linux Server
- PostgreSQL with PostGIS
- Python3
- rapidfuzz
- tqdm
- pycountry
- swifter
- pandas
- pyfiglet
- psycopg2
- nltk
- R
- shiny
- shinyjs
- leaflet
- jsonlite
- shinyWidgets
- shinycssloaders
- dplyr
- sp
- DT
- rgbif
- rmarkdown
- shp2pgsql
We are working on a system to test, develop, and showcase a new approach to allow to georeference museum records on a massive scale. This includes the setup of a set of tools by the institution's IT department, with help and input from GIS experts, to allow the collection staff to concentrate on the records instead of the overhead. For example, for the Smithsonian, the tasks would be divided between the IT team (OCIO) and the collection staff as:
- Web-based tools
- Collection staff won't need a powerful workstation with ArcGIS/QGIS/GRASS, disk space for datasets, and large ammounts of RAM for performance
- Processing happens in the Data Center
- Customizable for each Dept, Collection, or sub-Collection
- Repeatable workflow with automated logging for error detection and correction
- Exports a CIS-ready data package
- Relevant data and spatial records included
- Open source
We have started to develop a system based on PostGIS. The UI is written in R/Shiny, but will be ported to Python to keep a consistent language across the components of the project.
The current version of the UI allows the collection staff to browse each species, select a group of records and select the best match for that locality.
This is a project by the Digitization Program Office, OCIO, at the Smithsonian Institution.
Available under an Apache 2.0 License.