This folder contains raw trait data and related files. The generated database with standardised trait values is NOT located here; it is compiled into ../output/observations.csv
, then copied to the website folder ../docs
.
CSV files are encoded in UTF-8. UTF-8 is used so that accented characters can be used reliably and portably. Google sheets handles UTF-8 by default, but MS Excel requires some effort, see:
- Is it possible to force Excel recognize UTF-8 CSV files automatically?
- How to open UTF-8 CSV file in Excel without mis-conversion of characters in Japanese and Chinese language for both Mac and Windows?
or query the Internet for UTF8 CSV
. After editing a CSV file in Excel, be careful to either Save As
and specify Save as type: CSV (Comma delimited) (*.csv)
, or else Export
to file type CSV (Comma delimited) (*.csv)
. Simply clicking Save
may save the file with the wrong format (e.g. tab-separated values).
In R, UTF-8 CSV files can be
opened by using read.csv(..., encoding = "UTF-8")
. See this post for
reading UTF-8 in Python: Reading a UTF8 CSV file with
Python. Raw files can be written from R using write.csv(..., row.names = FALSE, fileEncoding = "UTF-8", na = "")
.
The Emacs editor should work, but if you open a file and it has used the wrong encoding, run the command revert-buffer-with-coding-system
and specify utf-8
. Alternatively, set the coding for the next file with C-x RET c utf-8 RET
then open the file using C-x C-f <path> RET
.
Folder | Description |
---|---|
raw |
Contains raw data in UTF-8 encoded CSV files. Sub-folders are used to organise files, but do not affect processing in any way. |
Endnote-dbs |
Contains an Endnote database that contains references for all of the raw data sources. Also contains a list of journal terms used by EndNote to abbreviate the journal names. |
File | Description |
---|---|
README.md |
This file |
Template.xlsx |
Documents the structure of the raw files. Contains 2 tabs: Column descriptions and Raw data template. |
Template.csv |
This is the "Raw data template" tab from Template.xlsx in CSV format. This file can be copied and renamed to create a new raw data file. |
Too Hard Reference List.docx |
Word document that lists candidate data sources that were not included in the database. Each data source has a comment that describes why it was excluded from the database. |
checked-taxa.csv |
List of taxa that have been checked by the database compilation process. Used to speed up the compilation process by caching taxon queries to reduce the number of queries required. |
database-column-definitions.xlsx |
Spreadsheet that describes the columns in the compiled database. This is converted to CSV by the database compilation process. |