Metadata repository for dependent CSL styles
While there seems to be an endless variety of citation formats, many publishers fortunately standardize on one, or at most a few, citation formats for their collection of journals. This reduces the number of "independent" CSL citation styles we need to maintain and allows us to create lightweight "dependent" CSL styles for the individual journals.
Whereas independent styles contain a full description of a citation format, dependent styles are light-weight and act as an alias to an independent style (while dependent styles inherit their complete style format from their parent style, they can re(define) one setting: the style locale).
journals repository holds a collection of journal metadata for publishers that have standardized their use of citation formats.
At a minimum, this metadata should include the journal titles and ISSNs (print and online).
Through a script, we can quickly generate dependent CSL styles from this metadata.
This repository was created on 2016-03-10 as a dedicated home for the journal metadata previously stored in the https://github.com/citation-style-language/utilities/ repository (in the now-removed "generate_dependent_styles" subdirectory).
Generating dependent styles
Preparing journal metadata
The first step in creating dependent styles for the journals of a publisher is collecting the metadata. Some publishers conveniently publish spreadsheets with all the necessary details of their journals. Sometimes we contact publishers to ask for their journal metadata, or for small publishers, compile the metadata by hand.
Each publisher gets its own metadata directory in the root of this repository (e.g.,
asm for the American Society for Microbiology).
When creating a new publisher directory, update the file publishers.json to match the directory name to a full publisher name.
Each publisher directory must contain the following files:
_journals.tab is the main file for storing a publisher's journal metadata.
It must be a tab-delimited file, with one column per metadata field.
Metadata fields are identified by the field names in the column header.
By convention these field names are in all-caps, although case doesn't matter for the script.
Each row below the header should cover a single journal.
For example, the first few rows could look like:
TITLE ISSN EISSN DOCUMENTATION Antimicrobial Agents and Chemotherapy 0066-4804 1098-6596 http://aac.asm.org/ Applied and Environmental Microbiology 0099-2240 1098-5336 http://aem.asm.org/ Genome Announcements x 2169-8287 http://genomea.asm.org/
_journals.tab, keep the following in mind:
_journals.tabshould ideally only contain metadata that differs between journals. If, for example, all journals have the same documentation URL, it's better to add this URL to the shared style template.
- the only required metadata field is "TITLE", providing the style title.
The field names "IDENTIFIER" and "XML-COMMENT" are reserved and may not be used within
_journals.tab. There are no other restrictions on the naming of column headers.
- for consistency, we recommend the following names for common metadata fields:
- "TITLESHORT": abbreviated journal title
- "PARENT": independent parent style
- "DOCUMENTATION": documentation URL
- "FORMAT": citation format (e.g., "numeric")
- "FIELD": field (e.g., "medicine")
- "ISSN": print ISSN
- "EISSN": electronic ISSN
- "LOCALE": locale code for style localization (e.g., "en-US" for US English; for a list of common locale codes, see the CSL locales repository)
- ampersands are automatically escaped for the "TITLE", "TITLESHORT", and "DOCUMENTATION" fields.
- not every journal will have a value for each metadata field. For example, online-only journals won't have a print ISSN. In this case, just leave the field empty, or fill it with one of the accepted placeholder characters, "x", "X", or a space.
_template.csl is the template for the dependent CSL styles.
<?xml version="1.0" encoding="utf-8"?> <style xmlns="http://purl.org/net/xbiblio/csl" version="1.0" default-locale="#LOCALE#"> <!-- #XML-COMMENT# --> <info> <title>#TITLE#</title> <id>http://www.zotero.org/styles/#IDENTIFIER#</id> <link href="http://www.zotero.org/styles/#IDENTIFIER#" rel="self"/> <link href="http://www.zotero.org/styles/american-society-for-microbiology" rel="independent-parent"/> <link href="#DOCUMENTATION#" rel="documentation"/> <category citation-format="numeric"/> <category field="biology"/> <issn>#ISSN#</issn> <eissn>#EISSN#</eissn> <updated>2015-05-08T17:36:22+00:00</updated> <rights license="http://creativecommons.org/licenses/by-sa/3.0/">This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License</rights> </info> </style>
This template file should include all metadata that is the same for all journals.
In contrast, template placeholders should be used for metadata that differs between journals.
These placeholders, in all-caps and enclosed in hashes (e.g.,
#TITLE#), are replaced by journal metadata during style generation.
Except for the automatically generated "IDENTIFIER" and "XML-COMMENT" metadata fields, all placeholders must therefore match metadata field names from
If the metadata field for a journal is empty or contains only a placeholder character ("x", "X", or space are accepted), the entire line containing the matching placeholder is deleted in the generated style.
The "#IDENTIFIER#" placeholder must be used to generate the style ID and "self" link, and generated styles are automatically named after the "IDENTIFIER" field. The value of the "IDENTIFIER" field is derived from "TITLE", by, among others, removing diacritics and punctuation, and replacing spaces by hyphens. For example, for the journal "Applied and Environmental Microbiology", the "IDENTIFIER" will become "applied-and-environmental-microbiology", the file name "applied-and-environmental-microbiology.csl", and the style ID should be "http://www.zotero.org/styles/applied-and-environmental-microbiology".
The "#XML-COMMENT#" placeholder should be included above the
<info/> element, and is replaced by a string that tells us that the style was automatically generated, and which subdirectory produced the style.
For example, the string could read "Generated with https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles/data/**subdirectory**", with "subdirectory" matching the name of the subdirectory being processed (this particular URL is outdated following our move to the
Each publisher directory may also contain any of the following files:
Raw metadata files
Some publishers helpfully publish a spreadsheet of their journal catalog with all required metadata. In this case, it's often handy to save a copy of this spreadsheet in the repository.
It's always a good idea to document your work in a
_README.mdown file, so that other contributors (or your future self!) can figure out what you did.
For instance, where does the metadata come from?
How up-to-date is the metadata you collected?
And if you got help from the publisher, who were your contacts?
_skip.txt may be used to list journals that should be skipped during style generation.
Journals can be specified by their "TITLE" or "IDENTIFIER" value, one per line.
_rename.tab may be used to change the value of "IDENTIFIER".
Each line in this tab-delimited file should list the original generated "IDENTIFIER" value, followed by the new value.
An example of
_rename.tab is shown below:
acm-computing-surveys computing-surveys acm-transactions-on-algorithms transactions-on-algorithms
_extra.tab may be used to add additional journals, or to overwrite the metadata for journals already stored in
For example, for large publishers it's handier to store the publisher-provided metadata untouched in
_journals.tab, while making any manual corrections in
_extra.tab must be a tab-delimited file, with the exact same header and column order as
When entries in
_extra.tab generate the same "IDENTIFIER" value, the entry in
_extra.tab completely replaces the entry in
Journals defined in
_extra.tab are still subject to skipping (through
_skip.txt) and renaming (through
Running the script
Once the journal metadata has been properly formatted, dependent styles can be generated by running the
generate_styles.rb Ruby script from the command line:
Generated styles are placed in the
_dependent subdirectory of the
journals repository, which will be created if it does not yet exist.
Existing directory contents will be deleted for each run of the script.
_dependent directory is listed in
.gitignore, you won't accidentally commit any generated styles to the
Generating styles for a single publisher
generate_styles.rb generates styles for all publisher directories.
However, you can limit style generation to a single directory with the
Just specify the name of the desired directory directly after the option. For example:
ruby generate_styles.rb -d asm
If you use the
-s) option, the script will automatically update your local
styles repository with the newly generated styles.
This requires that you cloned the CSL
journals repositories into the same parent directory.
When using the
--sync option, the script will by default delete all existing generated dependent styles from the local
styles repository, and copy over the newly generated styles.
Only those dependent styles are deleted that have an XML comment that matches one of the publisher directories processed by the script (e.g., "Generated with https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles/data/**subdirectory**").
--sync option may be followed by a qualifier to limit which styles are updated. The qualifiers are "additions", "deletions", and "modifications", which together cover all possible updates:
- "additions": only styles that don't yet exist in the
stylesrepository are copied over
- "deletions": only previously generated styles that aren't present in the newly generated set are deleted from the
- "modifications": only styles that already exist in the
stylesdirectory are copied over
--sync options can be used in combination. Also note that it's enough to type the first letter of the desired qualifier.
For example, to generate the styles for the "asm" directory and copy over only the new styles, use:
ruby generate_styles.rb -d asm -s a
When you are preparing a pull request for a big metadata update for a large publisher, it's often a good idea to create separate commits for style additions, deletions, and modifications. This makes it easier to review the pull requests, since Git gets easily confused if a commit contains both style deletions and additions and will often mistake these for style renamings.
When using the
--sync option, the script will by default not touch those styles in the
styles repository that are identical to a newly generated style, or if the only differences are in the styles' timestamps (stored in
<updated/>) and/or the styles' XML comments.
This allows you to update the timestamp in
_template.tab with every metadata update, while avoiding changes to all styles for which the journal metadata stayed the same.
However, you can force the replacement of all styles by using the
--sync option in combination with
You may encounter the following encoding error when running the Ruby script under Windows:
generate_styles.rb:21:in `delete!': incompatible character encodings: CP850 and UTF-8 (Encoding::CompatibilityError)
If this happens, try running Ruby with the
-E UTF-8 option:
ruby -E UTF-8 generate_styles.rb