Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export author in highlights and sticky notes #27

Closed
wants to merge 2 commits into from

Conversation

matteosecli
Copy link
Contributor

I've noticed that, by opening an exported PDF file with a PDF reader, the author field was empty. So I looked into my Mendeley database and I've found that the author field was empty, too; however, there was a profileUuid that links to profile information (& that doesn't seem to change even if you change your email address).

I've added this information into the query to the database; the author is by default the author field in the database; however, if this field is empty, the author's name is constructed by merging the firstName and lastName fields of the database.

I've also added an author field to the highlights dictionary because it was completely missing, although the relevant function pdfannotation.createHighlight() already had all the things in place.

I've tested the changes with my database and Mendeley 1.18, although I don't know whether these fields were present or not in much older versions of Mendeley's databases (imho I don't think so, because an account was required even for much older versions; I've checked on Mendeley 1.17.11 and they are there).

I'm not a Python or even more a SQL expert (never used SQL in my life), so I apologize in advance for any mistakes!


PS: The ordering of the new fields looks a bit random, but it's just because I've tried to change the code as little as possible.

@Xunius
Copy link
Owner

Xunius commented Jun 26, 2018

Hi matteosecli,
Many thanks for the work.
Just to clarify, by 'author' I think you meant the author that created the highlights, not the article? In that case, I've already retrieved author in the getUserName() function in menotexport.py, and the author field is then passed onto the meta attribute associated with extracted annotations and can be accessed from there easily. So no need to do that again in the queries you modified.

I haven't tested, but I think the only change needed is something like this in exportPdf():

anno=pdfannotation.createHighlight(hjj['rect'], cdate=hjj['cdate'], color=hjj['color'], author=annotations.meta['user_name'])

and similar to the createNote() part.

Would you like to give it a try and see that does what you want?

Skip NULL strings when joining firstName and lastName of highlights authors. NULL fields appear sometimes, apparently if the local database was not synchronized; trying to join them blidly caused crashes.
@matteosecli
Copy link
Contributor Author

Hi @Xunius, sorry for the (very) late reply; I was waiting for a friend of mine to help me carrying out some tests.

I've set up a collaborative folder in Mendeley in order to test what happens when multiple people annotate the same document; I'll go step by step.

Just to clarify, by 'author' I think you meant the author that created the highlights, not the article?

Yes, by 'author' I mean the Mendeley user that created the highlight, not the author of the article.

In that case, I've already retrieved author in the getUserName() function in menotexport.py, and the author field is then passed onto the meta attribute associated with extracted annotations and can be accessed from there easily.

I have to say that I actually missed that! However, there are a couple of objections:

  • I think that the query in the function getUserName() needs a fix, because it has to look for the Profile which also has the attribute isSelf set to true. In fact, when you collaborate with other Mendeley users, your local database also contains their profile information in the Profiles table; you, the owner of the account in use, are identified only by that attribute set to true. The current function, instead, assumes that the owner of the account is the first person that appears in the Profiles table, which is not true. Indeed, upon different annotate->save->exit operations in Mendeley, my friend was listed first in the table and therefore was wrongly assumed to be the user by Menotexport itself. But this would be the subject of another pull request.
  • Assuming that the username retrieved by getUserName() is the correct one, this doesn't mean at all that he's also the author of all the notes in a document! Indeed, I've tried with one of these test documents annotated by multiple people; with your suggestion, all the notes appear as created by the same person. With the edits I've suggested, each note is correctly coupled to the Mendeley user that created it. (btw: if you don't have any collaborative documents and you want to join the collaborative test folder, just let me know)

As a final note, I've realized that if firstName or lastName were NULL, the script was crashing on the lines that join them – the ones that I've added. So, I've slightly modified the joining procedure in order to skip these NULL fields – and so far in my tests, it seems to work.

@Xunius
Copy link
Owner

Xunius commented Jul 22, 2018

Hi matteosecli,

Many thanks for coming back and all the work. I've merged your PR #28.
I didn't do collaboration in Mendeley so that's a use case that has been largely neglected, and I think you made a valid point that the highlight authors (and note author as well, right?) would be different in that use case. But the thing is, I thought you won't be coming back so I've made quite some changes in the code (to address the multiple attachment issue) and now the getHighlights(), getNotes() and some other functions look bit different. To be honest I'm not quite experienced in handling conflicting merges, I believe your changes will be based on an old base should I commit my changes. So, do you think it would be better to let me finish my multi-attachment issue fix and incorporate your changes myself (to save your time), or you wait for my commit and do your changes again (so we retain your credits)?

Thanks again for the contribution.

@matteosecli
Copy link
Contributor Author

By multiple attachment issue you are referring to #26, right?

Anyway, for me it's ok either way! I don't know the extent of the new changes; I think this PR can wait until you commit all the new changes related to that issue and then see if has conflicts or not at that point. If it has conflicts which are better resolved by rewriting these changes from scratch and you think it would be more efficient for you to directly incorporate them, it's totally fine for me! Or if you don't have time I can make a new PR as well. 😉

So, I'd say to check back here once you commit those changes.

In the meantime, I'm looking into something else – which I hope I can report in a new thread for a discussion as soon as I have some minimal working examples/data.

Xunius added a commit that referenced this pull request Aug 8, 2018
This is re-implementing the changes suggested in this PR:
#27

The author of the highlights and sticky notes are now retrieved and
saved to exported pdfs

In menotexport.py:

   getUserName() does a filtering before joining.

   getHighlights() queries "author" field, if empty, use user_name.

   Similar in getNotes().

In exportpdf.py:

    exportPdf() passes the new "author" argument.
@Xunius
Copy link
Owner

Xunius commented Aug 8, 2018

Hi matteosecli

I've implemented your suggested changes (with minor differences) and pushed. Here is what I did:

  • In menotexport.py, getHighlights() function queries the "author" field associated with each highlight box, and if "author" is empty, use the name from the "profile". You added a filtering before joining the first and last names, but as I already have a getUserName() function, I added the filtering to getUserName(), and assigned the return value of getUserName() to "author" if "author" is emtpy. Similar for the getNotes() func which fetches sticky notes.

  • In lib/exportpdf.py, added the new "author" argument as you suggested.

Would you like to give it a test and see it works as intended?

@matteosecli
Copy link
Contributor Author

Hi @Xunius,

I've tested the latest master version. Now all the highlights have a non-empty author, but the problem is that the author is always myself even if the highlight was added by another person!

I'll try to explain better, maybe I was a bit messy last time. What I was doing was the following:

  • For each highlight, fetch the additional fields FileHighlights.author, Profiles.firstName and Profiles.lastName, where Profiles.firstName and Profiles.lastName are the fields corresponding to the same profile uuid (Profiles.uuid) that authored the highlight (which is FileHighlights.profileUuid).
  • I'd expect that the author of the highlight is in FileHighlights.author (and maybe it was in older versions), but in my case this field is actually always empty. Each highlight, instead, has a FileHighlights.profileUuid. So, I first try to set FileHighlights.author as the author of the highlight; if this is empty (i.e. in my case, always) I then set "Profiles.firstName Profiles.lastName" as the author of the highlight, which can be different from username (which is also the merge of Profiles.firstName and Profiles.lastName, but only of the Profile with isSelf=true; the other profiles, i.e. your co-workers, have other Profile entries with isSelf=false).
  • (same for the Notes)

So, your lines

Menotexport/menotexport.py

Lines 453 to 455 in 6a8ba45

author=r[7]
if not author.strip():
author=username

always set myself as the author of every highlight. Instead the change I was proposing, i.e. the lines

Menotexport/menotexport.py

Lines 411 to 414 in fa6f644

author=r[8]
if not author.strip():
# Join r[9] and r[10], if not "None", separated by space
author=' '.join(filter(None, r[9:11]))

set the name linked to the highlight's profileUuid as the correct author of the highlight.

So

  • I think there's no escape in getting either the field FileHighlights.profileUuid or the fields Profiles.firstName and Profiles.lastName in the highlights query.
  • I realize that this slows down the process; in fact, my solution is pretty stupid. Maybe a faster option would be:
    • Get, for each highlight, only the field FileHighlights.profileUuid instead of the fields Profiles.firstName and Profiles.lastName. This way, you can also drop the lines LEFT JOIN Profiles ON Profiles.uuid=FileHighlights.profileUuid in my suggested changes and speed up the query quite a bit.
    • Make, at the start of the program, a dictionary that has the Profiles.uuid as the keys and ' '.join(filter(None, [Profiles.firstName Profiles.lastName])) as the values.
    • In the getHighlights() function, once the query returns the FileHighlights.profileUuid for each highlight, assign the corresponding Profile name contained in the dictionary as the highlights author, if FileHighlights.author is empty.

What do you think it would be better to do? If you don't want to rewrite these bits and you are not planning to do any incompatible changes, I can do these modifications, test them, and send them as a PR in the next few days. If instead you prefer to write directly the code, I'm always here to test anyways. 😉

@Xunius
Copy link
Owner

Xunius commented Aug 8, 2018

I got your points now, because I never did co-authoring before so I thought that's why my FileHighlights.author is always empty. It appears that it's indeed necessary to query the Profiles table, but I wasn't expecting doing this would make it much slower. My profiling suggests the slowest parts are those relating to PDF processes via pypdf2 and pdfminer, I tried to multi-thread some function calls but only get negligible speed gain (maybe I'm doing something wrong). But your dictionary idea sounds good.

I have some free time tomorrow, so how about me re-doing these changes as we understand each other quite well already, and I'll let you continue on the "Wrong coordinate ordering in "Rectangle Highlight"" fix (#29) as you have a much better understanding in that regard?

@matteosecli
Copy link
Contributor Author

I have some free time tomorrow, so how about me re-doing these changes as we understand each other quite well already, and I'll let you continue on the "Wrong coordinate ordering in "Rectangle Highlight"" fix (#29) as you have a much better understanding in that regard?

That sounds totally fine for me! 😃

Xunius added a commit that referenced this pull request Aug 9, 2018
In menotexport.py

    getHighlights() and getNotes() funcs query 2 more columns:
    Profiles.firstName and Profiles.lastName for the author
    of highlights/notes. See #27

    I didn't use the dictionary approach as no notable diff in speed.
@matteosecli
Copy link
Contributor Author

Closing since these changes, with the discussion that followed in this thread, were reimplemented by @Xunius in 6a8ba45 and e6eebeb.

@matteosecli matteosecli closed this Aug 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants