Name		Name	Last commit message	Last commit date
parent directory ..
Endnote-dbs		Endnote-dbs
raw		raw
README.md		README.md
Template.csv		Template.csv
Template.xlsx		Template.xlsx
Too Hard Reference List.docx		Too Hard Reference List.docx
checked-taxa.csv		checked-taxa.csv
database-column-definitions.xlsx		database-column-definitions.xlsx

README.md

`data` folder

This folder contains raw trait data and related files. The generated database with standardised trait values is NOT located here; it is compiled into ../output/observations.csv, then copied to the website folder ../docs.

Encoding

CSV files are encoded in UTF-8. UTF-8 is used so that accented characters can be used reliably and portably. Google sheets handles UTF-8 by default, but MS Excel requires some effort, see:

Is it possible to force Excel recognize UTF-8 CSV files automatically?
How to open UTF-8 CSV file in Excel without mis-conversion of characters in Japanese and Chinese language for both Mac and Windows?

or query the Internet for UTF8 CSV. After editing a CSV file in Excel, be careful to either Save As and specify Save as type: CSV (Comma delimited) (*.csv), or else Export to file type CSV (Comma delimited) (*.csv). Simply clicking Save may save the file with the wrong format (e.g. tab-separated values).

In R, UTF-8 CSV files can be opened by using read.csv(..., encoding = "UTF-8"). See this post for reading UTF-8 in Python: Reading a UTF8 CSV file with Python. Raw files can be written from R using write.csv(..., row.names = FALSE, fileEncoding = "UTF-8", na = "").

The Emacs editor should work, but if you open a file and it has used the wrong encoding, run the command revert-buffer-with-coding-system and specify utf-8. Alternatively, set the coding for the next file with C-x RET c utf-8 RET then open the file using C-x C-f <path> RET.

Sub-folders

Folder	Description
`raw`	Contains raw data in UTF-8 encoded CSV files. Sub-folders are used to organise files, but do not affect processing in any way.
`Endnote-dbs`	Contains an Endnote database that contains references for all of the raw data sources. Also contains a list of journal terms used by EndNote to abbreviate the journal names.

Files

File	Description
`README.md`	This file
`Template.xlsx`	Documents the structure of the raw files. Contains 2 tabs: Column descriptions and Raw data template.
`Template.csv`	This is the "Raw data template" tab from Template.xlsx in CSV format. This file can be copied and renamed to create a new raw data file.
`Too Hard Reference List.docx`	Word document that lists candidate data sources that were not included in the database. Each data source has a comment that describes why it was excluded from the database.
`checked-taxa.csv`	List of taxa that have been checked by the database compilation process. Used to speed up the compilation process by caching taxon queries to reduce the number of queries required.
`database-column-definitions.xlsx`	Spreadsheet that describes the columns in the compiled database. This is converted to CSV by the database compilation process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

`data` folder

Encoding

Sub-folders

Files

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

data folder

Encoding

Sub-folders

Files

`data` folder