Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split title -> Title-main/title-sub #203

Merged
merged 26 commits into from
Jun 14, 2020
Merged

Split title -> Title-main/title-sub #203

merged 26 commits into from
Jun 14, 2020

Conversation

denismaier
Copy link
Member

@denismaier denismaier commented May 27, 2020

Closes citation-style-language/csl-evolution#31

Updated proposal:

Summary

Adds variables and attributes for splitting and re-assembling titles

This adds distinct variables for different parts of a title, and attributes that instruct processors how these variables can be derived from regular title variables, how titles should be re-assembled from these derived variables, and which delimiters to use for that.

It also adds a new option "capitalize-subtitle" to text-case.

Suggestion for spec addition

To be added ...

Requirements

CSL processors currently accept titles as a single string, and CSL styles can only define formatting for the title as a whole.

This leads to the following (documented) problems regarding style agnostic data entry, which this proposal addresses:

  1. delimiters between title and subtitle can't be changed on a per style basis.
  2. delimiters between multiple subtitles can't be changed on a per style basis.
  3. capitalizing the first word of a subtitle can't be applied on a per style basis.
  4. title and subtitle can only be formatted as a whole.

Proposed solution

Build upon existing functionality in citeproc-js to split title variables into title-main and title-sub variables, and provide ways to configure the splitting and re-assembling mechanism, and to define delimiters.

This is probably the solution that has the least impact on existing styles and reference management tools, it is consistent with existing behaviour, it does not break existing workflows.

So, we have to decide:

  1. How to render title if the data provides title-main and title-sub (what delimiters to use)
  2. How to split title into title-main and title-sub
  3. Whether and how to normalize delimiters

There are several variations across styles that need to accommodated:

  1. Styles vary in what delimiters are used to split (e.g., APA splits on , Chicago does not; Chicago splits on ; or, and , or, but most other styles do not)
  2. Some styles (e.g., Chicago) normalize delimiters, other (e.g., APA) do not; also there's some variation across styles which delimiters should be normalized.
  3. Styles vary in whether second+ subtitles use the same or a different delimiter (e.g., ; instead of :).

With these in mind, here is the proposal:

  1. Variables: Add title-main and title-sub variables. Create a new variable type variables.titles in csl-variables.rnc

  2. Data structure: In CSL JSON, in addition to title allow title-main and title-sub. Both are flat strings. Citeprocs should parse and normalize delimiters between multiple subtitles from title-sub. Multiple subtitles can be explicitly provided by separating them with two vertical bars: First subtitle || Second subtitle.

  3. Parsing by citeproc: If title-main and title-sub are not supplied in the data, the citeproc will derive them from title following these rules (based on existing citeproc-js behavior):

  4. Split the title based on the inheritable attribute title-split. Options for this attribute are:
    * "simple": Matches . , : , :: , ! , ?
    * "extended": Matches the values for "simple", plus ;
    * "full": Matches the values for "simple", plus and ;
    * "chicago": Matches the values for "simple", plus ; and [;,] or[,:]

  5. title-main is the text before the first delimiter. title-sub is an array of the text following each delimiter. (How such an array should be entered is still undecided.)

  6. If both title and title-main are provided and title-main matches the beginning of title, only split the delimiter following the title-main text if any. (This provides an override to prevent the standard splitting behavior if needed.) The same applies for title-sub: If both title and title-sub are provided, only split the delimiter following the first subtitle if any.

  7. Rendering title:: <text variable="title"> renders title-main followed by title-sub.

    • The delimiter is set by the inheritable attribute title-delimiter, a text string. title-sub renders all of its elements, with the delimiter set by the inheritable attribute title-sub-delimiter (the same as title-delimiter if empty).
    • Don't add another delimiter if one is already at the end of the previous title chunk (e.g., don't add ":" to the end of 'title-main` "Dude, where's my car?").
  8. Normalizing delimiters: Delimiters are changed to the values specified by title-delimiter and title-sub-delimiter. Which delimiters are normalized is controlled via an inheritable attribute normalize-title-delimiters. Options for this attribute are:

    • "simple": Normalize . , : , ::
    • "extended": Normalize the values for "simple", plus ;
    • "full": Normalize the values for "simple", plus and ;
  9. Splitting and normalization behavior regarding dashes: As some locales prefer en to em dashes citeprocs should check against both if the "full" options are selected on normalize-title-delimiters and/or title-split.

  10. Text casing: When calling <text variable="title">:

    • text-case="title" title casing is applied to title-main and each element of title-sub
    • text-case="capitalize-subtitle" the first letter of title-main and each element of title-sub is capitalized
    • Other text cases have the current behavior.
  11. Other titles: All of the above apply to all title variables (container-, collection-, volume-, etc.)


Original proposal

I think title-main and title-sub should be adopted. I think that the citeproc should split title if that is all that is provided.

With that in mind, several behaviors have to be defined:

  1. How to render title if the data provides title-main and title-sub (what delimiters to use)
  2. How to split title into title-main and title-sub
  3. Whether and how to normalize delimiters

There are several variations across styles that need to accommodated:

  1. Styles vary in what delimiters are used to split (e.g., APA splits on , Chicago does not; Chicago splits on ; or, and , or, but most other styles do not)
  2. Some styles (e.g., Chicago) normalize delimiters, other (e.g., APA) do not
  3. Styles vary in whether second+ subtitles use the same or a different delimiter (e.g., ; instead of :).
With these in mind, here is my proposal:
  1. Add title-main and title-sub as variables.
  2. Data structure: In CSL JSON, title can be provided as an array with full, main, and sub elements. sub can be be an array with multiple subtitles. full should always be specified to
  3. Parsing by citeproc: If title-main and title-sub are not supplied in the data, the citeproc will derive them from title following these rules (based on existing citeproc-js behavior):
    1. Split the title based on the inheritable attribute title-split. Options for this attribute are:
      • "simple": Matches . , : , :: , ! , ?
      • "full": Matches the values for "simple", plus and ;
      • "chicago": Matches the values for "simple", plus ; and [;,] or[,:]
      • (Not exactly sure about ; . APA, e.g., doesn't clearly mention it, but every use a semicolon in a title I have seen is to delimit a subtitle.)
      • (Possibly could just have "regular" ["full" above] and "chicago" if that would suffice.)
    2. title-main is the text before the first delimiter. title-sub is an array of the text following each delimiter.
    3. If both title and title-main are provided and title-main matches the beginning of title, only split the delimiter following the title-main text if any. (This provides an override to prevent the standard splitting behavior if needed.)
  4. Rendering title: <text variable="title"> renders title-main followed by title-sub.
    • The delimiter is set by the inheritable attribute title-delimiter, a text string. title-sub renders all of its elements, with the delimiter set by the inheritable attribute title-sub-delimiter (the same as title-delimiter if empty).
    • Don't add another delimiter if one is already at the end of the previous title chunk (e.g., don't add ":" to the end of 'title-main` "Dude, where's my car?").
    • Child elements <main> and <sub> control text styling for each part separately (e.g., ABNT rules about title-main in bold but title-sub in regular face).
    • This might warrant a new cs:title element sibling to cs:text.
  5. Normalizing delimiters: If the inheritable attribute normalize-sub-delimiter is true, . and : delimiters are changed to the values specified by title-delimiter and title-sub-delimiter. Other delimiters are left unchanged.
  6. Text casing: When calling <text variable="title">:
    1. text-case="title" title casing is applied to title-main and each element of title-sub
    2. text-case="capitalize-subtitle" the first letter of title-main and each element of title-sub is capitalized
    3. Other text cases have the current behavior.
  7. Other titles: All of the above apply to all title variables (container-, collection-, volume-, etc.)
Links to relevant citeproc-js behavior:

Originally posted by @bwiernik in citation-style-language/csl-evolution#31 (comment)

@denismaier

This comment has been minimized.

@bdarcus

This comment has been minimized.

@bdarcus

This comment has been minimized.

@bdarcus
Copy link
Member

bdarcus commented May 29, 2020

Note: this is an example of one of the problems having the documentation repo separate from this one. Really we need to be able to evaluation PRs that impact the spec alongside the spec language.

@bdarcus

This comment has been minimized.

@denismaier

This comment has been minimized.

@denismaier
Copy link
Member Author

Note: this is an example of one of the problems having the documentation repo separate from this one. Really we need to be able to evaluation PRs that impact the spec alongside the spec language.

Perhaps we should do things the other way round? Write documentation first, schema second?

@denismaier
Copy link
Member Author

denismaier commented May 29, 2020

Question: since this makes a recommendation on csl-data.rnc, which I think may be controversial, should this PR also include the proposed changes to that file?

Sure. I'll look into it, but need to understand csl-data.rnc first. How do you transform title into an array? @bwiernik hints?

@bdarcus
Copy link
Member

bdarcus commented May 29, 2020

Perhaps we should do things the other way round? Write documentation first, schema second?

I suspect that would end introducing the same problem.

I'm seriously wondering if we should move the spec files here, and simply include them in the documentation repo as a submodule. The documentation repo would then effectively be more of a publishing repo.

Just mentioning it while I think about it.

@bdarcus
Copy link
Member

bdarcus commented May 29, 2020

Question: since this makes a recommendation on csl-data.rnc, which I think may be controversial, should this PR also include the proposed changes to that file?

Sure. I'll look into it, but need to understand csl-data.rnc first. How do you transform title into an array? @bwiernik hints?

So you are proposing to turn ...

"title": "Some Time: With Subtitle"

... into:

"title": {"title-main": "Some Time", "title-sub": "With Subtitle"}

And the first example would thus be deprecated?

If yes, why not simply?

"title-main": "Some Time", 
"title-sub": "With Subtitle"

.. and retain "title"?

E.g. the split title would be optional.

@bdarcus

This comment has been minimized.

@bdarcus

This comment has been minimized.

@denismaier

This comment has been minimized.

@denismaier
Copy link
Member Author

@bwiernik What do you think so far? And what do you think regarding Bruce's questions? What I was also thinking of: If title-sub is again an array, then this means that we need another optional sub element under the sub element under cs:title. This should be possible:

<title>
  <main/>
  <sub>
    <sub/>
  </sub>
<title>

Note also that in the current version normalize-title-delimiters and title-split are completely in sync. title-split="never" is still there even if that is probably not necessary.

I'm not sure about the dashes yet, but semicolons should be normalized, therefore normalize-title-delimiters="chicago". And I've also included a full variant where dashes are normalized as well.

@bwiernik
Copy link
Member

In the .rnc, make a new section that is variables.titles and move the title variables there.

In the .json schema for data, specify a definition for titles like for names or dates. The slots for that would be main, sub, full (or literal or raw) (not title-main, etc.). Set all the title variables to be either a string or an array with that definition.

Copy link
Member

@bwiernik bwiernik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the CSL JSON change?

csl-variables.rnc Outdated Show resolved Hide resolved
csl.rnc Outdated Show resolved Hide resolved
csl.rnc Outdated Show resolved Hide resolved
@denismaier
Copy link
Member Author

And the CSL JSON change?

Comes next. I am just thinking if I should add sub as an array or if a three component solution would be better: main, sub, addon. That would allow for different formatting of the last part, which is sometimes required. (Or you can even omit certain parts of the title if you want. Could be useful for Festschriften etc.)
And, should we add someting for the delimiters? I was thinking about this structure:

"title-variable": {
	"anyOf": [
		{
			"properties": {
				"main": {
					"type": "string"
				},
				"sub": {
					"type": "string"
				},
				"addon": {
					"type": "string"
				},
				"sub-delimiter": {
					"type": "string"
				},
				"addon-delimiter": {
					"type": "string"
				},
				"literal": {
					"type": "string"
				}
			},
			"additionalProperties": false
		}
	]
}

Am I overthinking this?

Then, what about the short forms?

@bwiernik
Copy link
Member

What is addon?

@denismaier
Copy link
Member Author

addon: biblatex has a field three part structure for titles:
title, subtitle, titleaddon.

So I was thinking if we should do usetitle-main, title-sub, title-addon rather than having title-sub as an array with unlimited strings. Users could then specify each part of the title. If sub is a array, how would you access, format or suppress the second part of the subtitle?

Copy link
Member

@bdarcus bdarcus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're giving this a bit of time for feedback, I just took another close close at this section.

schemas/styles/csl.rnc Show resolved Hide resolved
schemas/styles/csl.rnc Show resolved Hide resolved
schemas/styles/csl.rnc Show resolved Hide resolved
schemas/styles/csl.rnc Outdated Show resolved Hide resolved
@denismaier

This comment has been minimized.

@bdarcus

This comment has been minimized.

@bdarcus

This comment has been minimized.

@denismaier

This comment has been minimized.

@bdarcus

This comment has been minimized.

@denismaier

This comment has been minimized.

@denismaier

This comment has been minimized.

@bwiernik

This comment has been minimized.

@denismaier

This comment has been minimized.

@bwiernik

This comment has been minimized.

@bdarcus bdarcus merged commit 6d38934 into citation-style-language:v1.1 Jun 14, 2020
@bdarcus
Copy link
Member

bdarcus commented Jun 14, 2020

Just a head's up that the merges I did today caused some conflict between master and v1.1 that I'm having trouble resolving. I guess the problem of trying to do development on multiple branches.

What's weird is if I rebase from master, I only get problems on csl.rnc, while if I do merge, I get this.

Auto-merging schemas/styles/csl.rnc
Auto-merging schemas/styles/csl-variables.rnc
CONFLICT (content): Merge conflict in schemas/styles/csl-variables.rnc
Auto-merging schemas/input/csl-data.json
CONFLICT (content): Merge conflict in schemas/input/csl-data.json
Automatic merge failed; fix conflicts and then commit the result.

@bwiernik
Copy link
Member

@bdarcus I can reconcile in a PR

@bdarcus
Copy link
Member

bdarcus commented Jun 14, 2020

@bdarcus I can reconcile in a PR

Thank you!

bwiernik pushed a commit to bwiernik/schema that referenced this pull request Jun 14, 2020
This adds distinct variables for different parts of a title, and 
attributes that instruct processors how these variables can be derived 
from regular title variables, how titles should be re-assembled from 
these derived variables, and which delimiters to use for that.

It also adds a new option "capitalize-subtitle" to text-case.

Closes citation-style-language/csl-evolution#31
@bwiernik
Copy link
Member

#245

@denismaier
Copy link
Member Author

Flagging this for @jgm

bwiernik pushed a commit to bwiernik/schema that referenced this pull request Jul 8, 2020
This adds distinct variables for different parts of a title, and 
attributes that instruct processors how these variables can be derived 
from regular title variables, how titles should be re-assembled from 
these derived variables, and which delimiters to use for that.

It also adds a new option "capitalize-subtitle" to text-case.

Closes citation-style-language/csl-evolution#31
bwiernik pushed a commit to bwiernik/schema that referenced this pull request Jul 8, 2020
This adds distinct variables for different parts of a title, and 
attributes that instruct processors how these variables can be derived 
from regular title variables, how titles should be re-assembled from 
these derived variables, and which delimiters to use for that.

It also adds a new option "capitalize-subtitle" to text-case.

Closes citation-style-language/csl-evolution#31
bdarcus pushed a commit that referenced this pull request Jul 26, 2020
This adds distinct variables for different parts of a title, and 
attributes that instruct processors how these variables can be derived 
from regular title variables, how titles should be re-assembled from 
these derived variables, and which delimiters to use for that.

It also adds a new option "capitalize-subtitle" to text-case.

Closes citation-style-language/csl-evolution#31
bdarcus pushed a commit that referenced this pull request Jul 26, 2020
This adds distinct variables for different parts of a title, and 
attributes that instruct processors how these variables can be derived 
from regular title variables, how titles should be re-assembled from 
these derived variables, and which delimiters to use for that.

It also adds a new option "capitalize-subtitle" to text-case.

Closes citation-style-language/csl-evolution#31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split main-title and sub-title
4 participants