You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
goodtables validation revealed a fair number of duplicate rows in the events data export, and we'd like to check them in a format that's easier to use than going back and forth between the goodtables output by line number and a spreadsheet.
If we could have the duplicated subscription events first (subs, renewals, supplements), and book events after (borrows, purchases, generics, etc) next, that would ideal. Ian could do the sub events and I could do the books.
If we include some specific site urls for different event types, I think we can make this even easier for them to review quickly. (I'll add notes in a moment).
No need to use goodtables for this, use pandas or whatever else is easiest. (If you use something else please you sanity check the total number of duplicates against the goodtables report).
identify duplicate rows in the events.csv
sort by / segment out membership events (Subscription, Supplement, Other, Reimbursement) from all other events; possibly also useful to separate remaining events that are associated with books by the presence of item_uri
for membership events, please link to the individual member activities page on the site; generate this by adding membership/ to the first url in membership_urls (in the case of joint accounts, member info is in one field delimited by ;; using the first one should be sufficient)
for non-membership events, please link to borrowing activities page on the site; generate this by adding borrowing/ to the first url in membership_urls
for any event with a source_image, provide the link to the image so reviewers can quickly look at the image as they review the events (mostly book activity but there are some exceptions, I don't know if they will show up in the duplicates or not)
The text was updated successfully, but these errors were encountered:
@kmcelwee Don't worry about making this reusable, it's a one-off to help with data cleaning so we can publish the data! At this point we prefer something quick; I don't think it requires code review.
@kmcelwee when I was writing up the instructions for this and looking at the data in the export, it occurred to me that it's not easy to generate a link to events on the card detail page based on the information currently included in the export, because I'm using the IIIF image id but the data export only includes the IIIF manifest and the IIIF image. (You could probably figure out the image id from the combination of those two, but I bet it would be a pain.)
Do you have an opinion on how useful/important it would be to include either the source image id or (probably better) some kind of event in context url? (e.g. card url with event highlighted if the event is linked to an image; membership or book activity page otherwise)
The google doc 2020-07-09 Event export duplicates was created. There were 113 duplicate rows. The google doc can be found at S&Co -> Data Work -> Reports in the Drive.
goodtables validation revealed a fair number of duplicate rows in the events data export, and we'd like to check them in a format that's easier to use than going back and forth between the goodtables output by line number and a spreadsheet.
Please work from this export in google drive (latest revisions to export logic, generated from a fresh copy of production data): https://drive.google.com/drive/u/0/folders/1aPDGBhT9CE0aIozbaelcPrHb6_hDDTkC
Josh would like them sorted by event type:
If we include some specific site urls for different event types, I think we can make this even easier for them to review quickly. (I'll add notes in a moment).
No need to use goodtables for this, use pandas or whatever else is easiest. (If you use something else please you sanity check the total number of duplicates against the goodtables report).
item_uri
membership/
to the first url inmembership_urls
(in the case of joint accounts, member info is in one field delimited by;
; using the first one should be sufficient)borrowing/
to the first url inmembership_urls
for any event with asource_image
, provide the link to the image so reviewers can quickly look at the image as they review the events (mostly book activity but there are some exceptions, I don't know if they will show up in the duplicates or not)The text was updated successfully, but these errors were encountered: