General toolkit for working with VertNet data. We call these data "migrators." Once customized to an original data source, it converts the original data into Darwin Core ready for upload to an Integrated Publishing Toolkit (IPT) resource.
Batchfile Shell Awk
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
ForIPT
bkp
reports
source
templates
workspace
1 - FullDataPreparation.bat
1a - RunMigrators.bat
1b - CleanMigratedTables.bat
1c - RunAggregators.bat
BlankLineIssues.awk
ChangeLog.txt
Cleanup-AudioDwC2.bat
Cleanup-AvesDwC2.bat
Cleanup-EggsDwC2.bat
Cleanup-EntDwC2.bat
Cleanup-FishDwC2.bat
Cleanup-FossilsDwC2.bat
Cleanup-FungiDwC2.bat
Cleanup-HerpsDwC2.bat
Cleanup-InvertsDwC2.bat
Cleanup-MammalsDwC2.bat
Cleanup-PlantsDwC2.bat
Cleanup-VertsDwC2.bat
DwC2migration-Audio.bat
DwC2migration-Aves.bat
DwC2migration-Eggs.bat
DwC2migration-Ent.bat
DwC2migration-Fish.bat
DwC2migration-Fossils.bat
DwC2migration-Fungi.bat
DwC2migration-Herps.bat
DwC2migration-Inverts.bat
DwC2migration-Mammals.bat
DwC2migration-Plants.bat
DwC2migration-Verts.bat
EncodingIssues.awk
NewLineIssues.awk
PurgeNonprintingCharacters.sh
PurgeNuls.sh
PurgeVerticalTabs.sh
README - MigratorPrepSteps.txt
README.md
RemoveLastLine.sh
globalnamesresolver.json
utf8er.awk

README.md

VertNet Darwin Core Data Migrator Toolkit

Scripts and databases to migrate source data to Darwin Core ready for publication via IPT.

A description of the steps required to be modified to create a migrator customized for a given data set is given in the file README - MigratorPrepSteps.txt.

The migrator uses Microsoft Access, and requires that the system on which it runs has unix shell command capability enabled in the environment on which the migrator DOS .BAT scripts are invoked.

  • BlankLineIssues.awk - Script to report unexpected blank lines in a CSV file.
  • NewLineIssues.awk - Script to report records having a new line in the field content in a CSV file.
  • EncodingIssues.awk - Script to report records having UTF8 encoding issues.
  • PurgeNonprintingCharacters.sh - Script to substitute '.' for non-printing characters in data content.
  • PurgeNuls.sh - Script to remove the NUL characters in a file encoded as utf16 to render utf8.
  • PurgeVerticalTabs.sh - Script to remove all vertical tab characters from a file.
  • RemoveLastLine.sh - Script to remove the final line in a file.
  • utf8er.awk - Script to prepend Byte Order Marker (0xEF 0xBB 0xBF) to CSV file known to be utf8-encoded.