Tools and journal metadata for generating dependent CSL styles
Ruby
Latest commit 62d9714 Jan 15, 2017 @zuphilip zuphilip committed with rmzelle Add some badges in README (#24)
Permalink
Failed to load latest commit information.
aacr Get rid of "data" subdirectory for input data Dec 29, 2015
acm Get rid of "data" subdirectory for input data Dec 29, 2015
acs ACS data update Dec 9, 2016
afs Get rid of "data" subdirectory for input data Dec 29, 2015
agu Fix ISSN in AGU metadata May 24, 2016
aiaa
aip Add AIP Journal of Vacuum Science & Technology styles (#17) Jan 8, 2017
ams Get rid of "data" subdirectory for input data Dec 29, 2015
annual-reviews Get rid of "data" subdirectory for input data Dec 29, 2015
apa Get rid of "data" subdirectory for input data Dec 29, 2015
aps Get rid of "data" subdirectory for input data Dec 29, 2015
asa Get rid of "data" subdirectory for input data Dec 29, 2015
asce Get rid of "data" subdirectory for input data Dec 29, 2015
asm Get rid of "data" subdirectory for input data Dec 29, 2015
asme
aspet Get rid of "data" subdirectory for input data Dec 29, 2015
biologists Get rid of "data" subdirectory for input data Dec 29, 2015
bmj Get rid of "data" subdirectory for input data Dec 29, 2015
cell-press-research
cell-press-trends
copernicus Get rid of "data" subdirectory for input data Dec 29, 2015
csiro Get rid of "data" subdirectory for input data Dec 29, 2015
current-opinion Get rid of "data" subdirectory for input data Dec 29, 2015
elsevier
endocrine-press Get rid of "data" subdirectory for input data Dec 29, 2015
fems Get rid of "data" subdirectory for input data Dec 29, 2015
frontiers Get rid of "data" subdirectory for input data Dec 29, 2015
future-science-group Get rid of "data" subdirectory for input data Dec 29, 2015
healio Get rid of "data" subdirectory for input data Dec 29, 2015
ieee Get rid of "data" subdirectory for input data Dec 29, 2015
iet Get rid of "data" subdirectory for input data Dec 29, 2015
informs Add INFORMS metadata (#14) Dec 18, 2016
int-res Get rid of "data" subdirectory for input data Dec 29, 2015
iop Get rid of "data" subdirectory for input data Dec 29, 2015
iucr Get rid of "data" subdirectory for input data Dec 29, 2015
karger Get rid of "data" subdirectory for input data Dec 29, 2015
landes-bioscience
mdpi
mhra Get rid of "data" subdirectory for input data Dec 29, 2015
nihr Add NIHR metadata Feb 18, 2016
npg Get rid of "data" subdirectory for input data Dec 29, 2015
nrc Update _journals.tab for NRC journals (#19) Jan 13, 2017
oikos [oikos] Delete Hereditas from _journals.tab Feb 14, 2016
plos Get rid of "data" subdirectory for input data Dec 29, 2015
royal-society Add journals (#16) Jan 6, 2017
rsc Get rid of "data" subdirectory for input data Dec 29, 2015
sgm Get rid of "data" subdirectory for input data Dec 29, 2015
slas Get rid of "data" subdirectory for input data Dec 29, 2015
spandidos Get rid of "data" subdirectory for input data Dec 29, 2015
spec Add RSpec tests to check publishers.json Mar 31, 2016
spie Get rid of "data" subdirectory for input data Dec 29, 2015
springer-fachzeitschriften-medizin Get rid of "data" subdirectory for input data Dec 29, 2015
springer Springer2016: Update Feb 14, 2016
taylor-and-francis add Current Medical Research and Opinion (#15) Jan 8, 2017
the-geological-society
tr-law-australia Get rid of "data" subdirectory for input data Dec 29, 2015
.gitignore Put generated styles in "_dependent" subdirectory Mar 20, 2016
.travis.yml Add RSpec tests to check publishers.json Mar 31, 2016
Gemfile Add RSpec tests to check publishers.json Mar 31, 2016
Gemfile.lock Add RSpec tests to check publishers.json Mar 31, 2016
LICENSE.md Add MIT license Mar 26, 2016
README.mdown
Rakefile Add RSpec tests to check publishers.json Mar 31, 2016
generate_styles.rb Better handle missing publisher.jsons entries Dec 17, 2016
publishers.json Add INFORMS metadata (#14) Dec 18, 2016

README.mdown

Journals

Build Status license

Metadata repository for dependent CSL styles

While there seems to be an endless variety of citation formats, many publishers fortunately standardize on one, or at most a few, citation formats for their collection of journals. This reduces the number of "independent" CSL citation styles we need to maintain and allows us to create lightweight "dependent" CSL styles for the individual journals.

Whereas independent styles contain a full description of a citation format, dependent styles are light-weight and act as an alias to an independent style (while dependent styles inherit their complete style format from their parent style, they can re(define) one setting: the style locale).

This journals repository holds a collection of journal metadata for publishers that have standardized their use of citation formats. At a minimum, this metadata should include the journal titles and ISSNs (print and online). Through a script, we can quickly generate dependent CSL styles from this metadata.

Repository history

This repository was created on 2016-03-10 as a dedicated home for the journal metadata previously stored in the https://github.com/citation-style-language/utilities/ repository (in the now-removed "generate_dependent_styles" subdirectory).

Generating dependent styles

Preparing journal metadata

The first step in creating dependent styles for the journals of a publisher is collecting the metadata. Some publishers conveniently publish spreadsheets with all the necessary details of their journals. Sometimes we contact publishers to ask for their journal metadata, or for small publishers, compile the metadata by hand.

Each publisher gets its own metadata directory in the root of this repository (e.g., asm for the American Society for Microbiology). When creating a new publisher directory, update the file publishers.json to match the directory name to a full publisher name.

Required files

Each publisher directory must contain the following files:

_journals.tab

_journals.tab is the main file for storing a publisher's journal metadata. It must be a tab-delimited file, with one column per metadata field. Metadata fields are identified by the field names in the column header. By convention these field names are in all-caps, although case doesn't matter for the script. Each row below the header should cover a single journal.

For example, the first few rows could look like:

TITLE   ISSN    EISSN   DOCUMENTATION
Antimicrobial Agents and Chemotherapy   0066-4804   1098-6596   http://aac.asm.org/
Applied and Environmental Microbiology  0099-2240   1098-5336   http://aem.asm.org/
Genome Announcements    x   2169-8287   http://genomea.asm.org/

When preparing _journals.tab, keep the following in mind:

  • _journals.tab should ideally only contain metadata that differs between journals. If, for example, all journals have the same documentation URL, it's better to add this URL to the shared style template.
  • the only required metadata field is "TITLE", providing the style title. The field names "IDENTIFIER" and "XML-COMMENT" are reserved and may not be used within _journals.tab. There are no other restrictions on the naming of column headers.
  • for consistency, we recommend the following names for common metadata fields:
    • "TITLESHORT": abbreviated journal title
    • "PARENT": independent parent style
    • "DOCUMENTATION": documentation URL
    • "FORMAT": citation format (e.g., "numeric")
    • "FIELD": field (e.g., "medicine")
    • "ISSN": print ISSN
    • "EISSN": electronic ISSN
    • "LOCALE": locale code for style localization (e.g., "en-US" for US English; for a list of common locale codes, see the CSL locales repository)
  • ampersands are automatically escaped for the "TITLE", "TITLESHORT", and "DOCUMENTATION" fields.
  • not every journal will have a value for each metadata field. For example, online-only journals won't have a print ISSN. In this case, just leave the field empty, or fill it with a single character, such as "x".
_template.csl

_template.csl is the template for the dependent CSL styles. An example:

<?xml version="1.0" encoding="utf-8"?>
<style xmlns="http://purl.org/net/xbiblio/csl" version="1.0" default-locale="#LOCALE#">
  <!-- #XML-COMMENT# -->
  <info>
    <title>#TITLE#</title>
    <id>http://www.zotero.org/styles/#IDENTIFIER#</id>
    <link href="http://www.zotero.org/styles/#IDENTIFIER#" rel="self"/>
    <link href="http://www.zotero.org/styles/american-society-for-microbiology" rel="independent-parent"/>
    <link href="#DOCUMENTATION#" rel="documentation"/>
    <category citation-format="numeric"/>
    <category field="biology"/>
    <issn>#ISSN#</issn>
    <eissn>#EISSN#</eissn>
    <updated>2015-05-08T17:36:22+00:00</updated>
    <rights license="http://creativecommons.org/licenses/by-sa/3.0/">This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License</rights>
  </info>
</style>

This template file should include all metadata that is the same for all journals. In contrast, template placeholders should be used for metadata that differs between journals. These placeholders, in all-caps and enclosed in hashes (e.g., #TITLE#), are replaced by journal metadata during style generation. Except for the automatically generated "IDENTIFIER" and "XML-COMMENT" metadata fields, all placeholders must therefore match metadata field names from _journals.tab.

If the metadata field for a journal is empty or contains only a single character, the entire line containing the matching placeholder is deleted in the generated style.

The "#IDENTIFIER#" placeholder must be used to generate the style ID and "self" link, and generated styles are automatically named after the "IDENTIFIER" field. The value of the "IDENTIFIER" field is derived from "TITLE", by, among others, removing diacritics and punctuation, and replacing spaces by hyphens. For example, for the journal "Applied and Environmental Microbiology", the "IDENTIFIER" will become "applied-and-environmental-microbiology", the file name "applied-and-environmental-microbiology.csl", and the style ID should be "http://www.zotero.org/styles/applied-and-environmental-microbiology".

The "#XML-COMMENT#" placeholder should be included above the <info/> element, and is replaced by a string that tells us that the style was automatically generated, and which subdirectory produced the style. For example, the string could read "Generated with https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles/data/**subdirectory**", with "subdirectory" matching the name of the subdirectory being processed (this particular URL is outdated following our move to the journals repository).

Optional files

Each publisher directory may also contain any of the following files:

Raw metadata files

Some publishers helpfully publish a spreadsheet of their journal catalog with all required metadata. In this case, it's often handy to save a copy of this spreadsheet in the repository.

_README.mdown

It's always a good idea to document your work in a _README.mdown file, so that other contributors (or your future self!) can figure out what you did. For instance, where does the metadata come from? How up-to-date is the metadata you collected? And if you got help from the publisher, who were your contacts?

_skip.txt

_skip.txt may be used to list journals that should be skipped during style generation. Journals can be specified by their "TITLE" or "IDENTIFIER" value, one per line.

_rename.tab

_rename.tab may be used to change the value of "IDENTIFIER". Each line in this tab-delimited file should list the original generated "IDENTIFIER" value, followed by the new value.

An example of _rename.tab is shown below:

acm-computing-surveys   computing-surveys
acm-transactions-on-algorithms  transactions-on-algorithms
_extra.tab

_extra.tab may be used to add additional journals, or to overwrite the metadata for journals already stored in _journals.tab. For example, for large publishers it's handier to store the publisher-provided metadata untouched in _journals.tab, while making any manual corrections in _extra.tab.

_extra.tab must be a tab-delimited file, with the exact same header and column order as _journals.tab. When entries in _journals.tab and _extra.tab generate the same "IDENTIFIER" value, the entry in _extra.tab completely replaces the entry in _journals.tab. Journals defined in _extra.tab are still subject to skipping (through _skip.txt) and renaming (through _rename.tab).

Running the script

Once the journal metadata has been properly formatted, dependent styles can be generated by running the generate_styles.rb Ruby script from the command line:

ruby generate_styles.rb

Generated styles are placed in the _dependent subdirectory of the journals repository, which will be created if it does not yet exist. Existing directory contents will be deleted for each run of the script. Since the _dependent directory is listed in .gitignore, you won't accidentally commit any generated styles to the journals repository.

Generating styles for a single publisher

By default, generate_styles.rb generates styles for all publisher directories. However, you can limit style generation to a single directory with the --dir (or -d) option. Just specify the name of the desired directory directly after the option. For example:

ruby generate_styles.rb -d asm

Updating the styles repository

If you use the --sync (or -s) option, the script will automatically update your local styles repository with the newly generated styles. This requires that you cloned the CSL styles and journals repositories into the same parent directory.

When using the --sync option, the script will by default delete all existing generated dependent styles from the local styles repository, and copy over the newly generated styles. Only those dependent styles are deleted that have an XML comment that matches one of the publisher directories processed by the script (e.g., "Generated with https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles/data/**subdirectory**").

The --sync option may be followed by a qualifier to limit which styles are updated. The qualifiers are "additions", "deletions", and "modifications", which together cover all possible updates:

  • "additions": only styles that don't yet exist in the styles repository are copied over
  • "deletions": only previously generated styles that aren't present in the newly generated set are deleted from the styles directory.
  • "modifications": only styles that already exist in the styles directory are copied over

The --dir and --sync options can be used in combination. Also note that it's enough to type the first letter of the desired qualifier. For example, to generate the styles for the "asm" directory and copy over only the new styles, use:

ruby generate_styles.rb -d asm -s a

When you are preparing a pull request for a big metadata update for a large publisher, it's often a good idea to create separate commits for style additions, deletions, and modifications. This makes it easier to review the pull requests, since Git gets easily confused if a commit contains both style deletions and additions and will often mistake these for style renamings.

Updating timestamps

When using the --sync option, the script will by default not touch those styles in the styles repository that are identical to a newly generated style, or if the only differences are in the styles' timestamps (stored in <updated/>) and/or the styles' XML comments. This allows you to update the timestamp in _template.tab with every metadata update, while avoiding changes to all styles for which the journal metadata stayed the same.

However, you can force the replacement of all styles by using the --sync option in combination with --force (or -f) option.

Troubleshooting

Encoding errors

You may encounter the following encoding error when running the Ruby script under Windows:

generate_styles.rb:21:in `delete!': incompatible character encodings: CP850 and UTF-8 (Encoding::CompatibilityError)

If this happens, try running Ruby with the -E UTF-8 option:

ruby -E UTF-8 generate_styles.rb