Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probably Bug? Imports don't seem to grab updates to Open Library. #527

Closed
nerdteacher opened this issue Jan 16, 2021 · 5 comments
Closed

Comments

@nerdteacher
Copy link

Actually, I'm not sure if this is a bug! (Or just something that hasn't been implemented? Or if it's something that only sometimes works.) Also, apologies if I'm not clear; feel free to yell at me!

Anyway, I notice this non-updating thing in a few instances:

  1. When I import a book, it doesn't always grab the right edition from OpenLibrary (and then that edition isn't in the 50 that are pulled). This makes me curious: What information takes precedence on imports? (I've started deleting all numerical information other than the ISBN-13. This sometimes helps. Not always, especially not on classics that have 5 billion editions that've been merged into one entry. That last part isn't a BookWyrm issue.)

  2. Sometimes editions of books don't exist at all, so I add them to OpenLibrary; the editions list on BookWyrm doesn't seem to consistently update. Sometimes they're there, sometimes they're not.

  3. I'll unshelve the editions that have wonky or broken information to re-import, but the imports don't seem to update the edition information from OpenLibrary. (I usually wait about a day or so before I re-import; it's not immediate.)


With regards to the first point, would it be possible to only grab the editions that people import based on identifiers (ISBN-10, ISBN-13, etc) rather than grab all the editions available (or first 50)? This could help keep cleaner editions lists, since OpenLibrary has a mountain of editions for some books. (I'm rubbish at programming, so it's just a thought.)

(Thanks for the work you're doing! <3)

@mouse-reeve
Copy link
Member

Thank you, yeah this is a finicky process! Can you clarify when you say import, do you mean import a csv from goodreads, or do you mean clicking the "import book" button from the search page? Those use the same code to actually get the data, but the way a csv import formats search queries could be a factor as well.

  1. When I import a book, it doesn't always grab the right edition from OpenLibrary (and then that edition isn't in the 50 that are pulled). This makes me curious: What information takes precedence on imports? (I've started deleting all numerical information other than the ISBN-13. This sometimes helps. Not always, especially not on classics that have 5 billion editions that've been merged into one entry. That last part isn't a BookWyrm issue.)

In a goodreads import, this is caused by some hand-waviness in how openlibrary's search api works. I think it would be fixes by #467. Classics are definitely my most frustrating type of book data..

  1. Sometimes editions of books don't exist at all, so I add them to OpenLibrary; the editions list on BookWyrm doesn't seem to consistently update. Sometimes they're there, sometimes they're not.

It could be that the edition isn't in the first 50 results, but if this is happening for books with a more reasonable number of editions, it could also be wonkiness in how re-importing is handled. I'll take a look at the way adding editions happens for books that already exist in bookwyrm.

  1. I'll unshelve the editions that have wonky or broken information to re-import, but the imports don't seem to update the edition information from OpenLibrary. (I usually wait about a day or so before I re-import; it's not immediate.)

This is an interesting one that I haven't totally figured out how to handle. For the time being, bookwyrm isn't taking updated data from openlibrary after the initial import. You can edit a book or author directly in bookwyrm (if you didn't have permission before, you do now) -- there will be a pencil icon across from the title of the book on the book page. I haven't been re-upping data from OL because I haven't figured out a good way to negotiate changes there with local changes to book data.

With regards to the first point, would it be possible to only grab the editions that people import based on identifiers (ISBN-10, ISBN-13, etc) rather than grab all the editions available (or first 50)? This could help keep cleaner editions lists, since OpenLibrary has a mountain of editions for some books. (I'm rubbish at programming, so it's just a thought.)

I think this is a good idea! Some OL editions have so little metadata that they're not much use. One change that has happened quite recently is that I've started giving editions a score based on how much metadata they have, and sorting the editions page based on the score, so at least it should be easier for the time being to sift through to find useful data

@nerdteacher
Copy link
Author

Sorry! When I say 'import', I mean from CSV. (I don't generally search for books... yet. Because I'm still playing around with and organising GoodReads imports.)

Also, thanks for the explanations! That makes things a lot clearer for me.

And yeah, I figured the 'Classics' would be a huge issue because... well, they have billions of editions in tons of languages and are the messiest thing I've ever seen, ahhh. (I feel like maybe there's a suggestion to OpenLibrary on how to better organise that, which is totally not a BookWyrm issue. My "I was a former school librarian" brain gets a little antsy when I end up on their classics pages, ahaha.)

@nerdteacher
Copy link
Author

nerdteacher commented Jan 17, 2021

Oh! I just thought of something with regards to the updating from Open Library, which could work with regards to the third point? If it's possible or feasible, since I also don't know how much work it would take to do this (and you're already doing quite a bit). But maybe it could be useful as a jumping off point, since you said you don't know how to update books without overwriting the data?

It would really suck to accidentally overwrite information that someone has input locally for an edition of the book, but maybe it could kind of tie into the idea I had for "pulling the relevant edition" based on the IBSN-13, ISBN-10, etc instead of just grabbing the first 50. So instead of updating the other editions and overwriting changes to them, it could just update to add one that someone's requesting? After that point, it doesn't need to update that specific edition; it just needs to grab new versions that are being requested by the user.

... I hope that made sense; it made sense in my head, but I'm not sure I wrote it clearly.

@mouse-reeve
Copy link
Member

okay, I've made a few changes that I think will improve things -- it's still only looking at the first 50 editions, but on import, it automatically ignores any editions that don't have a cover, isbn, or oclc number, or a language besides english. There's also a script to remove existing editions that no one has interacted with that also are missing that metadata (about 15,000 editions of ~75,000 total). And it improves the import so editions are more likely to come in with an isbn number.

@mouse-reeve
Copy link
Member

I'm closing this as a duplicate of #591, since I think that encompasses the remaining work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants