-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes generated bibtex key and display of institute authors #6479
Fixes generated bibtex key and display of institute authors #6479
Conversation
The original condition is evaluated to false. The substring is shorter than "uni".
Perhaps the assumption should be that the letters are ASCII. If all letters are ASCII checking 'A' <= k.charAt(0) <= 'Z' might make more sense. I am not convinced about doing this with a regex.
Both test cases involves an author name containing department or school without university or institute of technology.
Corporate authors without university/institute of technology
I think it could be that the latex2unicode convert is called for the author field. It kills the extra braces. I had a similar problem in the MSOffice Exporter. I think I implemented a workaround there. |
Ah, nice, thank you! I will take a look! |
I believe this is fixable in
I can also look at it a bit more and see if I have missed something, anyway, any suggestion/hint is greatly appreciated @Siedlerchr |
Hi, Regarding the second issue:
What about using a formatter that uses the Capital letters? e.g. JabRef would then become JR |
Regarding the first issue, first of all, as you know, I am not a regular contributor to JabRef. If you'd like me to drop this topic for any reason, please say so. I am just hoping that sorting this out will save someone else time in the future, but I am not familiar enough with JabRef to know if I have missed something important. If I understand things correctly, there are currently three different cases relevant to #4337 and #4152. Assuming we are using the BibTeX format described at BibTeX.org (special symbols)
The issue with case 2 is that LaTeX and BibTeX are two incompatible "languages" (since some reserved words are the same). Therefore, if we are using the format described at BibTeX.org (format), the "correct" way of dealing with case 2 using
which is then dealt with correctly in What I was trying to argue is that case 2 and case 3 must be dealt with differently. I can deal with this after issue #6459 if that is of interest? However, I just realized that issue #6459 is case 3, and can most likely be solved by changing the order of method calls. Please point out if something is unclear or incomprehensible. I am trying to improve my writing skills, but I am well aware that I have some practice ahead of me. My only excuse is that it is still morning here X) |
Regarding the second issue. That does sound better. I can't think of any case where keeping all capitalized letters isn't good and it should be compatible with the current implementation. |
# Conflicts: # src/main/java/org/jabref/logic/bibtexkeypattern/BracketedPattern.java
No problem :) We are happy for every contributor. It's just a complex issue. It's just complicated issue because it involves Corporate authors and Unicode. Unicode: Corporate Authors:
In JabRef the latex2unicode formatter is called for every field and the formatter now receives the I hope my long explanation helps you a bit to understand the problem. Maybe you come up with an idea. |
Hello @k3KAW8Pnf7mkmdSMPHz27 , thank you for your work in fixing this issue! And thanks for that informative context @Siedlerchr ; that's a very interesting edge case... I think there could be a test where braces are counted to corroborate that the brace at the very beginning is not closed earlier on (that would be the case where formatting as a Corporate Author would apply, I think). I have some code in Python that does something related... (though the opposite: it matches the most ancient parenthesis first...); I'll adapt it when I return home. def par_count2(text: Letters, opener: str = "(", closer: str = ")") -> Numbers:
"""Base algorithm to count matching parentheses.
Parameters
----------
`text` : `str`
The string to be parsed against matching opener and closer
`opener` : `str`
The character(s) to be considered as the 'opener' of a sequence
`closer` : `str`
The character(s) to be considered as the 'end' of a sequence
Yields
------
`count` : `int`
A sequence of `int` that are the number of matches for each character
Examples
--------
Here are some base examples of expected output, and actual output.
``(((((((((``
``123456789``
>>> [c for c in par_count2("(((((((((")]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
``)))))))))``
``123456789``
>>> [c for c in par_count2(")))))))))")]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
``()()()()()()()()()``
``112233445566778899``
>>> [c for c in par_count2("()()()()()()()()()")]
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]
``((((((((()))))))))``
``123456789123456789``
>>> [c for c in par_count2("((((((((()))))))))")]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]
``))()())(())``
``12334456767``
>>> [c for c in par_count2("))()())(())")]
[1, 2, 3, 3, 4, 4, 5, 6, 7, 6, 7]
``(A)()(A)()(B)()(A)()(A)``
``11122323445156673788949``
>>> [c for c in par_count2("(A)()(A)()(B)()(A)()(A)")]
[1, 1, 1, 2, 2, 3, 2, 3, 4, 4, 5, 1, 5, 6, 6, 7, 3, 7, 8, 8, 9, 4, 9]
"""
# type declarations for local variables
count: int
character: str
pending_from: int
pending_to: int
others: Counter[str]
count = 0
pending_from = 0
pending_to = 0
others = Counter()
# logging recommends using %s substitutions, instead of f-strings or
# string interpolation with brackets
logger = logging.getLogger(__name__)
logger.debug("About to parse %s with %s and %s", text, opener, closer)
for character in text:
if character == closer:
if pending_from != pending_to:
pending_from += 1
yield pending_from
else:
count += 1
pending_from = count
pending_to = count
yield count
elif character == opener:
count += 1
pending_to = count
yield count
else:
others.update(character)
yield others.get(character, 0)
logger.info("Parsed %s with %s and %s", text, opener, closer) |
@rolandog Your hint with the braces gave me the idea and I think we maybe have already a solution for this problem. I can't believe it's been hidden in plain sight 🤦 This method is used in the BracketedPattern class to define if it's an insituation (corporate author or not) jabref/src/main/java/org/jabref/model/strings/StringUtil.java Lines 418 to 446 in 4e220f6
And the test: jabref/src/test/java/org/jabref/model/strings/StringUtilTest.java Lines 181 to 193 in 862078a
|
Hi @rolandog ! Thank you for the very detailed issue!! It makes things a lot easier :P I have done a "bad" code update to demonstrate what I think is the issue with displaying names in the maintable (i.e., the order of method calls). @Siedlerchr and @rolandog you are both (of course) most welcome to add any comment/change any code and if there is any way I can make life easier for you do tell. I am currently quite new to Github and my todo-list, notes and links are currently kept in a offline jupyter document and I have no idea how to do it differently X) @Siedlerchr thank you for the overview and the biblatex manual link! I have been looking for that one o.O |
@k3KAW8Pnf7mkmdSMPHz27, you're welcome, I'm glad to have helped pinpoint this, and thank you for taking on this issue! And, that's great @Siedlerchr, I'm happy that it seems that this may not need extreme refactoring; that's a very clever function! However, I think I found an edge case where I'm not familiar with JabRef's whole codebase, so mismatched braces may be caught earlier on, as per: jabref/src/main/java/org/jabref/logic/bibtex/FieldWriter.java Lines 41 to 74 in 4e220f6
Here is an example, displayed as a test, where one of the entries would throw an error (the last one): @Test
void testIsInCurlyBrackets() {
/** correct
* c : {\L{}}ukasz Micha\l{}
* brackets : 1 210 10
* count : 1 112 22
*/
assertFalse(StringUtil.isInCurlyBrackets("{\L{}}ukasz Micha\l{}"));
/** correct
* c : {{\L{}}ukasz Micha\l{} Society}
* brackets : 12 321 21 0
* count : 11 111 11 1
*/
assertTrue(StringUtil.isInCurlyBrackets("{{\L{}}ukasz Micha\l{} Society}"));
/** mismatched braces, should return false?
* c : {{\L{}}ukasz Micha\l{} {Society}
* brackets : 12 321 21 2 1
* count : 11 111 11 1 1
*/
assertFalse(StringUtil.isInCurlyBrackets("{{\L{}}ukasz Micha\l{} {Society}"));
} In case this isn't filtered by /**
* Checks if the given String has exactly one pair of surrounding curly braces <br>
* Strings with escaped characters in curly braces at the beginning and end are respected, too
* @param toCheck The string to check
* @return True, if the check was succesful. False otherwise.
*/
public static boolean isInCurlyBrackets(String toCheck) {
int count = 0;
int brackets = 0;
if ((toCheck == null) || toCheck.isEmpty()) {
return false;
} else {
if ((toCheck.charAt(0) == '{') && (toCheck.charAt(toCheck.length() - 1) == '}')) {
for (char c : toCheck.toCharArray()) {
if (c == '{') {
if (brackets == 0) {
count++;
}
brackets++;
} else if (c == '}') {
brackets--;
}
}
return count == 1 && brackets == 0;
}
return false;
}
} |
@rolandog Thanks for testing and checking that edge case. Feel free to submit a new PR and in that way you can also fix the workaround I used in the MSbibAuthor for the MS Office exporter
|
"Curly brackets are not respected (in the maintable's author column?)." is the same as issue #6465. |
Since they are related you can update your PR to close both. Just add it also to the changelog then. |
The author list parsing is moved outside of the if/else statements
Move them close to other parse tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks very good now, so I'll merge. We hope you enjoyed contributing to JabRef and re looking forward to your next PR 😸
@k3KAW8Pnf7mkmdSMPHz27 Big thanks for your work on this. I just noticed some latex commands are not cleared in the now-master-build. Test library is JabRefAuthors in src/test/resources/testbib |
Which |
Hum... I am not sure what is going on, I'll have a look |
Most of those should be "cleared" and are cleared when I build locally, but that might be something wrong with my setup. |
Firstname lastname option and do not abbreviate |
I think I could reproduce @calixtus screenshot in JabRef 5.1--2020-05-28--ffa07cd but right now I seem unable to do so again (and I have tried). |
@k3KAW8Pnf7mkmdSMPHz27 If you have "Show names unchanged" in the prefs activated then no conversion is happening |
I see, so the solution would be to change |
@calixtus I thought the intent of that option was to leave the names completely unchanged? EDIT |
Uh im not going to change anything today 😄 |
Hum... actually that is my bad. JabRef 5.0 does indeed "clear" the latex, even when |
Yes, it would be nice if you could provide a fix (no need to create an issue before). |
Ok, I apparently don't know how to do this. I created #6552 |
Well, thank you all for reviewing this PR, it is very appreciated! ❤️ 🎉 🎈 I'll take the reviews and results to heart, and make sure the next PR will create fewer issues 🤦 |
41531558a8 Fix unsigned newspaper articles throughout Chicago 17 (#6486) 7678212826 Create trames.csl (#6479) 0cae26ac85 Update hochschule-fur-soziale-arbeit-fhnw.csl (#6480) 85c4b693a2 Update to UP Harvard Theology & Religion (#6485) c273aa7e43 Update ieee.csl (#6481) fe67b80e47 Update open-window.csl (#6367) f2229705ef Create iainutuban-tarbiyah.csl (#6361) 1867a56a26 Create business-and-human-rights-journal (#6359) 1371dbdf26 Update iso690-author-date-es.csl (#6477) 6953a43efd Update ieee.csl (#6478) f56d5ef1cc Create czech-journal-of-international-relations.csl (#6453) 678b53f99c Update harvard-stellenbosch-university.csl (#6464) 3074938038 Update ucl-university-college-apa.csl (#6475) 27dab9ea0f Update iso690-author-date-es.csl (#6476) a8aea63d00 Create elsevier-american-chemical-society.csl (#6342) f8f290fa63 Update iso690-author-date-es.csl (#6472) 7fdc621eee Update journal-of-neolithic-archaeology (#6466) 7025568e70 Update offa.csl (#6465) 2d69299b19 Create uni-fribourg-theologie.csl (#6473) 8db531a73e Create travail-et-emploi.csl (#6351) c8b54fc531 Make monash-university-harvard dependent style (#6470) b95f59ff5c Update journal-of-the-marine-biological-association-of-the-united-kingdom.csl (#6456) a12b513119 Update universite-du-quebec-a-montreal.csl (#6463) 048e6641e4 Update zeitschrift-fur-geschichtsdidaktik.csl (#6454) f0d3d7ef15 Update journal-fur-kulturpflanzen-journal-of-cultivated-plants.csl (#6447) 3b814fe048 Update the-accounting-review.csl (#6459) f24befd580 Update survey-of-ophthalmology.csl from ama.csl to its own independent style (#6460) c868ab54f6 Create vancouver-alphabetical.csl (#6461) 782e39cfe1 Update american-institute-of-physics.csl (#6457) a56cf03e3c Fix Chicago Cases & Newspaper sorting (#6458) git-subtree-dir: buildres/csl/csl-styles git-subtree-split: 41531558a873b2533f2d17d8d6484c2408174fce
Fixes #6459. Fixes #6465 .
There are two parts of this issue,
1. Fix to the prepended null
What is going on?
BracketedPattern.generateInstitutionKey
gets called for any author enclosed in curly brackets (e.g.,"{The School of Life}"
). The method expects an institute of technology or university and appends its name to the key (e.g., null if there is no name).Why is it going on?
Academic institutions can have long generated BibTeX keys unless abbreviated, e.g.,
"Royal Institute of Technology: The School of Electrical Engineering and Computer Science"
, whichgenerateInstitutionKey
shortens toRITEECS
.Fix
Replace a null valued university with an empty string. The drawback is potentially very short BibTeX keys (e.g.,
The School of Life
->L
).What are alternatives?
The School of Life
would be abbreviated toSL
instead ofL
.2. Fix to the author column
When the list of authors gets converted to a latex-free version, all curly brackets are removed since the whole string is parsed as latex. When the latex-free string is used to create/fetch an
AuthorList
it will no longer contain any brackets, and the information needed to format the string is gone.3. What I think is left to do
{The School of Life}
isn't respected in the author field of the GUIAttempt to match universities etc. with regexAssume that names that have comma separated parts are universitiesUpdatethe deprecated methods are essentially convenience methods so they have been moved inside the test fileBibtexKeyGeneratorTest
as it makes heavy use of deprecated methodsgenerateInstitutionKey
can be improvedChange the key generator for institution/corporate names to a Formatter?generateInstitutionKey
should not be a separate Formatter, it is only called bynormalize
.and