Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace text with unicode symbols #413

Merged
merged 24 commits into from
Jun 21, 2023
Merged

Conversation

rowlesmr
Copy link
Collaborator

@rowlesmr rowlesmr commented Jun 4, 2023

As aluded to in #410 (comment)

This PR replace all unambiguous markup with unicode.

The eventual goal is to have unicode symbols everywhere, rather than textual descriptions.

I haven't touched the update dates yet.

I haven't touched the update dates yet.
Strictly, this should be tau, but I think it means theta.
ERROR: LoadError: StringIndexError: invalid index [21], valid nearby indices [20]=>'°', [22]=>'''
Stacktrace:
  [1] string_index_err(s::Token, i::Int64)
    @ Base ./strings/string.jl:12
  [2] SubString{Token}(s::Token, i::Int64, j::Int64)
    @ Base ./strings/substring.jl:32
  [3] SubString
    @ ./strings/substring.jl:38 [inlined]
  [4] SubString
    @ ./strings/substring.jl:40 [inlined]
  [5] getindex
    @ ./strings/substring.jl:255 [inlined]
  [6] check_delimiter(value::Tree)
    @ Main ~/work/cif_core/cif_core/julia_cif_tools/layout.jl:514
  [7] transformer_func(l::Linter, #unused#::Val{:scalar_item}, meta::Lerche.Meta, tree::Tree)
    @ Main ~/work/cif_core/cif_core/julia_cif_tools/layout.jl:491
  [8] _call_userfunc(v::Linter, tree::Tree)
    @ Lerche ~/.julia/packages/Lerche/QxD5R/src/visitors.jl:269
  [9] visit(v::Linter, tree::Tree)
    @ Lerche ~/.julia/packages/Lerche/QxD5R/src/visitors.jl:299
 [10] visit(v::Linter, tree::Tree) (repeats 5 times)
    @ Lerche ~/.julia/packages/Lerche/QxD5R/src/visitors.jl:296
 [11] lint_report(filename::String; ref_dic::String, import_dir::String, warn::Bool, no_text::Bool)
    @ Main ~/work/cif_core/cif_core/julia_cif_tools/linter.jl:50
 [12] top-level scope
    @ ~/work/cif_core/cif_core/julia_cif_tools/linter.jl:107
in expression starting at /home/runner/work/cif_core/cif_core/julia_cif_tools/linter.jl:104
Error: Process completed with exit code 1.
@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 4, 2023

See #414 re checker failure

@jamesrhester
Copy link
Contributor

Unicode in the definition text is a little tricky, because this definition text is used to automatically prepare text for Volume G and the online web pages. Before doing this, we should coordinate with both Vol G people (me and @nautolycus ) and the Chester people (@publcif ) to confirm that they can handle the likely unicode characters that will pop up. So I'd hold off on doing anything until those people respond here. Last I checked the Vol G workflow would suffer.

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 5, 2023

Roger dodger.

@nautolycus
Copy link
Collaborator

Last I checked the Vol G workflow would suffer.

This shouldn't be an issue for Vol. G, at least so long as Unicode is used conservatively (the Little Dictionary has sentences with Russian, Japanese and Eastern European text, which is a little tricky to handle!). I'm polling internal views to see if people here can identify any other possible gotchas, so don't commit just yet.

@nautolycus nautolycus closed this Jun 5, 2023
@nautolycus nautolycus reopened this Jun 5, 2023
Copy link
Collaborator

@vaitkus vaitkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure if values of _description_example.case should be modified since they provide examples on how values should be written in actual CIF instance files. While Unicode can be used in CIF2 files (and its use is probably encouraged), the given example could not be used verbatim in CIF1.1 files.

templ_attr.cif Outdated Show resolved Hide resolved
@@ -2158,7 +2158,7 @@ save_wyckoff_letter
x
y
z
\a
α
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually changes the enumeration value from \a to α thus breaking backwards compatibility with CIF1. We could probably include α as an alias of \a, but \a must be retained.

templ_attr.cif Outdated Show resolved Hide resolved
templ_attr.cif Outdated Show resolved Hide resolved
templ_attr.cif Outdated Show resolved Hide resolved
cif_core.dic Outdated Show resolved Hide resolved
cif_core.dic Outdated
@@ -24764,8 +24764,8 @@ save_atom_type_scat.gaussian_coefs

f(s; Z) = c + Sum( a~i~ * exp(-b~i~ * s^2^), i=1:N)

where s = sin(theta)/lambda and theta is the diffraction angle and lambda is
the wavelength of the incident radiation in angstroms.
where s = sin(θ)/λ and θ is the diffraction angle and λ is the wavelength
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where s = sin(θ)/λ and θ is the diffraction angle and λ is the wavelength
where s = sin(θ)/λ and θ is the diffraction angle and λ is the wavelength

cif_core.dic Outdated Show resolved Hide resolved
cif_core.dic Outdated
@@ -12129,7 +12129,7 @@ save_space_group_wyckoff.letter
;
The Wyckoff letter associated with this position, as given in
International Tables for Crystallography Volume A. The
enumeration value '\a' corresponds to the Greek letter 'alpha'
enumeration value 'α' corresponds to the Greek letter 'alpha'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence does not make a lot of sense with \a replaced. However, I think that \a enumeration value should in general not be replaced (see my comment in on the temp_enum.cif file).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with putting it back.

Do the quotes around the \a make it such that it isn't converted to α when it is typeset? Do enumeration values end up being typeset? or used as-is? Or do we need other trickery, like '\a' to do such a thing? @nautolycus ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IUCr journals practice with regard to enumeration values: usually, cases where they might need to be typeset are in experimental tables or other well-defined contexts. Where appropriate, they may be reformatted on printing to satisfy house style rules (e.g. monoclinic -> Monoclinic, 'Mo K\a' -> 'Mo _K_α' ). There are style rules coded into publcif etc. for doing this.

rowlesmr and others added 7 commits June 14, 2023 21:05
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
@nautolycus
Copy link
Collaborator

I've consulted with the IT and editorial people in Chester and there is no objection to using Unicode in the dictionary definitions.

@jamesrhester jamesrhester merged commit e004cb7 into COMCIFS:master Jun 21, 2023
3 checks passed
@jamesrhester
Copy link
Contributor

I don't think there is a need to update the dictionary date, as all changes are cosmetic, i.e. do not change the behaviour of software that relies on these definitions.

@vaitkus
Copy link
Collaborator

vaitkus commented Jun 21, 2023

@jamesrhester well, a new enumeration value of 'α' was added thus be behaviour did change so the date should probably be modified.

Also, as I commented before, are we completely ok with values of _description_example.case being modified to contain Unicode since this attribute provides examples on how values should be written in actual CIF instance files. While Unicode can be used in CIF2 files (and its use is probably encouraged), the given example could not be used verbatim in CIF1.1 files.

I do not think that it is a very big deal, but it might be something we want to take into account.

@jamesrhester
Copy link
Contributor

I guess you are right about 'α', as it was only in the template dictionary I didn't pay attention.

The extra Unicode examples are OK, as DDLm is meant to apply to any format, not just CIF1/2.

@vaitkus
Copy link
Collaborator

vaitkus commented Jun 22, 2023

OK, just making sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants