-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize Datasets for Deposit #673
Comments
Comments on members:
|
Additional comments on members:
|
Comments on events:
|
@jkotin thanks for the feedback and review. Responses: books
members
events
|
@jkotin @i-davis here are the 10 subscription events that are showing up as event type "other" :
|
@jkotin I ran the Here's the report output (named as .txt because GitHub doesn't allow CSV attachments; rename after you download). |
@rlskoeser Re: books:
Re: members:
Re: events:
|
@jkotin thanks for responses; seems like we should be choosing the simplest option at this point in favor of getting the datasets published and not introducing new problems. That means: books
members
events
We should remember these decisions and think about how to incorporate into the dataset essay (especially the things we're not including and why). |
@rlskoeser I agree with all this, except I'd like to think a little more about books 2/ the uncertain column. I agree about not generating new data. Let me think about a label today. Sound OK? |
@jkotin yes, that sounds fine. Thanks. |
We have ten "other" event types -- all listed above, earlier in this issue. We would like to get rid of the event type "other." Solution 1: change them all to either "generic" or "supplement." Are there any other options? Which would you both prefer? My worry about solution 2 is that there are other events that should be ID'd as "deposit" that we don't know about. Perhaps they are currently ID'd as generic. There are also all the subscription events that include deposits. |
Solution 1: I don't think generic events work here because they don't allow you to document the deposit. Does supplement make sense to you two? It's a small enough number of records that maybe it doesn't matter that much, as long as we document what "other" means. And Ian already corrected some of them, right? So now less than 10? Documenting Ian's comment from Slack so we know status of these "other" events based on his analysis:
|
Solution 1: Agreed, re generic. I don't think supplement works: I think we should keep that restricted to the particular situation where a member pays for an extra vol or a duration extension after the initial subscription. Solution 2: I could see "deposit" working. There are now only 8 "other" subscription events. I corrected Culver and Michaelides. So technically every "other" except for Moynahan is a deposit, either for periodical subscription or library subscription. I think that Moynahan event is a sui generis instance of Sylvia sass about Moynahan owing her for something unspecified. @jkotin's questions on slack: "1/ are there events that fit one of the three categories above that are currently labeled as something else? and 2/ there were ten “other” events in the list — do the above capture them all? I’m a little confused because earlier you mentioned seven."
|
I just found another record for the Moynahan, and it looks like it should just be a regular subscription. So we're back down to 7 "other" categories, and all of them would fit under "deposit." |
@i-davis good thought. I did a quick query and there are 197 subscription events with no price paid but a deposit. Here's a handful of them if you want to look and see if this matches what you expect and would make sense to convert to a "subscription deposit" event:
|
@jkotin did you come to any decisions about the uncertainty column for the books export? I thought of a couple more possible labels: problematic, ambiguous |
@rlskoeser I think "uncertain" is good. Not perfect, but better than not including the column. Sorry it took me so long to come around to your original suggestion. |
@jkotin thanks. I think it was worth discussing anyway! |
I'm still a little confused re: "other" -- sorry @i-davis and @rlskoeser. What's the current proposal? Re: "subscription deposit" event -- it seems odd to me to separate out IFF there is no subscription fee. Wouldn't a subscription deposit event be a subscription deposit event even if a subscription fee was paid as well? Another question: do we know the deposit amount for the 7 remaining events? |
@jkotin did Beach treat them as separate events when people joined and paid their deposit & subscription fee at the same time? |
@rlskoeser : @i-davis can confirm -- he knows the material better than I do -- but Beach usually recorded new memberships like this:
That means a deposit of 50f for new subscriber John Smith for 1 month, 1 volume at a time, for a 25f membership fee. Beach would sum up the membership fees as part of her revenue, but not the deposits. I realize now that our sample logbook page doesn't have any deposits!! |
@jkotin @rlskoeser : Yep, confirmed, that's what most sub events look like! We do know the deposit amount for each of the remaining 7. I'm not sure if this answers your question about whether they should be denoted differently, Josh -- I think that's a good question, I'm not sure they should. But the Others do seem like a different sort of event from a sub. They're mostly off in the left margin, and all they say is: "dep. [name] 50f." See attached screenshot. Heineman's deposit isn't a refund, like Walher, nor a deposit attached to a sub, like Milne. |
@i-davis would you post screen shots of the remaining 7 "other" events? Include links to the logbooks and the dates with years of the events. Also include versos, if relevant. We'll tackle them one by one. I worry that Heineman's is deposit on a book that Beach has ordered for him. And thus doesn't involve the lending library. When a patron wanted an expensive English language books, Beach would likely have asked for a deposit. |
@rlskoeser Is this "other" challenge the last thing we have to decide for the exports? I manually fixed a lot of the format blanks. If you could generate a CSV with all the remaining format blanks without uncertainty notes, I'll fix them all. |
@jkotin : mm I was just wondering that myself. Will collect the screen shots now! |
Chang: 1932-10-01 Price: 1933-06-09 Bell: 1938-12-19 Creswick: 1939-03-08 Burrow: 1939-04-19 de Girodon: 1941-10-20 |
@i-davis thank you! Have you checked to see how these events fit the membership timelines of the individual members? If not, would you? For example, does de Girodon not give a deposit for their membership on 10/22/41, and hence this deposit? I suspect that these are legit membership/subscription deposit-events. If that suspicion is correct, I'm not sure how to categorize them. Ideally, we would have a separate deposit-event type for ALL deposits, but barring that, I don't know. We could categorize them as subscriptions with no fee and only a deposit, but that will look strange on the activities/membership page. Alternatively, we could label the events "deposit," and create a FAQ that indicates that when given on the same day subscription = fee + deposit, but when given on separate days subscription = fee and deposit = deposit. Ugh. |
@jkotin: Some of them do seem to fit as subscription-like or -adjacent events in membership timelines (Bell, Creswick, Burrow, de Girodon). Some of them don't quite seem to fit (Price, Heinemann). Chang seems clearly to have been a magazine subscription. Here are the timelines: does that analysis seem right to you? Price: the deposit comes in the middle of another subscription, 15 days before a renewal would be needed. Heinemann's subscription activity begins with two unusual events, including this "other": Bell looks like their other could be a renewal: Creswick looks like theirs could be a downpayment on a subscription that is recorded 14 days later: Burrow looks like they could be renewing by re-depositing a couple days after they got a reimbursement: de Girodon looks like they're depositing two days in advance of actually subscribing: |
@rlskoeser @jkotin : re generic events: yes, totally, that description seems right to me, Josh. As always with the database, I'm not entirely sure, there are so many events that have been entered by a variety of people--but I can say with certainty that the vast majority of generic events are definitely about books. |
@i-davis Thank you. Let me think on this. Re: Chang -- it seems like we should treat that as a subscription to borrow periodicals. What about categorizing it as a "subscription" event, 10/1/32, with a 15f deposit, no fee, and adding a note that it's for periodical privileges? Now we have 6! |
@jkotin: sounds good to me! I'll make the change now. |
@i-davis: I'm still working on Price. But here's what I want to do with the others: Keep them all as other. I worry that if we change the event to "deposit" it will imply that these are the only deposits. At a later date, I think we should separate out all the deposits and make them their own events. That way, the site can give a clear portrait of the Shakespeare and Co. finances and whether members were or were not reimbursed. But until then, let's keep these 5 as "other." The one extra issue: do you think the Bell "other" could be for a different Bell? Would it make sense to separate it out? Price in a second. |
@i-davis Re: Price -- would you do a deep dive here and look at the Price cards? I worry that we might be conflating two different Prices? Let me know what you think. |
@jkotin : I could imagine the Bell other being for a different Bell. I mean, they are fairly close to each other, calendrically, and we don't have many Bells. But yeah, it's definitely a possibility! Want me to separate? Re Price: good question. It does seem to me like we could separate:
Arguments for keeping them together:
|
@i-davis -- re: Bell: leave it the way it is. I think it's a deposit refund, but Beach just didn't note it as such. I looked at the logbook: there's very little differentiation between deposits and refunds. I suspect some of these questions will be resolved in the fall when you review the logbooks work. But let's leave it alone for now. This makes me realize that we should make all the logbooks available as PDFs at some point. We have them. People will be interested. Maybe we can plan to do this in the new year -- it could just be links from the logbooks source essay. |
@i-davis -- thank you for the research for Price. I'm not convinced that separating would make things any more accurate. The "other" is likely a supplement deposit for periodicals. My vote is to leave it as "other" until we create separate events for "deposits" if that happens. This seem OK to you? |
@jkotin: re Bell: sounds good! Excellent! The logbooks feel absolutely vital to me, and fascinating, esp full of little oddities that the current iteration of the site can't capture. I think it'd be great to make them open access! re Price: Yeah, that seems right to me, keeping all these events together as Phyllis Price, and keeping the other as other! |
@rlskoeser I think we are set for the exports. I need today to finish going through the books one last time -- I'm 80% through. But otherwise, I don't think there are any unresolved issues. Is there anything I need to do for the export pages? I'll revise the export page on the site and save a draft on Wagtail. |
@jkotin just to confirm, you are signing off on the revised data exports without any additional software changes (i.e. we'll leave "other" as is for now and document)? Please close this issue if that's the case. We'll probably want to revise the export page (and it would probably help to see what the dataspace page looks like! I can work on that soon); that can be done independently of this task to revise the data set. |
@rlskoeser I've been going through all the books and fixing mistakes. I'm wondering if it would be possible to do two queries that would further clean the data:
This is an event for an overdue notice that is likely connected to the incorrect account.
Sorry for the delay identifying these queries. It's been helpful to go through all the titles and fix errors that I would have asked Cate to fix. |
@i-davis There are 12 "overdue" subscription events currently labelled "generic." See: https://shakespeareandco.princeton.edu/admin/accounts/event/?q=overdue These should be changed to "other" or deleted, or made into a new kind of event "overdue." As it stands, they lead to weird information on the site. For example, Mrs. P. F. Dunne has 1923 and 1926 as membership years, but only visible events in 1923: https://shakespeareandco.princeton.edu/members/dunne/. What's your opinion on how to handle this @rlskoeser ? |
@jkotin if it's an overdue subscription, I can add another subscription "subtype" analogous to supplements and renewals — would that make sense? That would make it into something the code recognizes as a "membership activity" so it would be listed in that table. |
@jkotin I think the queries you're interested in can be done with OpenRefine. I'll generate and share a fresh event export from production and provide guidance on how to find the events you're interested in. I'm glad you've identified these problems and will be able to fix more of them before we publish the data! |
@rlskoeser excellent, thank you, re: OpenRefine and fresh event export. Re: overdue -- @i-davis what do you think? Have we been recording overdue notes consistently, or are these 12 a remnant of a practice that was abandoned? If the latter, we should probably just delete them. If these are all the overdue notices (or most of them), when we should follow Rebecca's suggestion. Have we been recording fines? |
@rlskoeser one other query that occurred to me -- but it's not vital to do it before publishing the data: identify people that are not attached to any events or any works. I think there are a lot of people (members and creators) we created by mistake. |
@jkotin : I'm not sure about this: the overdue pre-date me! They clearly came, that is, from XML transcriptions of the logbooks before the database existed. They have no event history, and the notes attached to them are weirdly standardized, produced by the database as it metabolized the XML transcriptions. They're clearly representing events in the logbook: I don't know: should we save them? I'd be happy to turn them all into the sub events @rlskoeser suggests! |
Deleting them is fine with me, especially if we think we haven't captured them systematically (as seems to be the case)! |
I deleted them! |
@jkotin are there any software changes needed at this point? It seems like at this point it's all data cleanup that will any additional software changes. If you've reviewed the changes we agreed on (removing OCLC urls, switching uncertainty, re-ordering member fields in the events export) then I think this issue can be closed. |
@rlskoeser the events export that I'm working with still have a "item_work_uri" field -- should that be deleted? |
@jkotin I was concerned about that when I saw it until I remembered that I gave you fresh production exports to make sure you're looking at the latest data. Did you ever review the revised qa/test data exports for the changes we agreed on? (including removing OCLC URIs, re-ordering member fields in the event export, and the revision to the uncertainty flag). All I saw was your #673 (comment) that it "looks great" |
Oh, good that makes sense. I was just worried that I overlooked at field in my earlier review. I'll close now. My hesitation (psychological) is that the datasets are really finalized yet. But all the software changes are done. I don't think we need to autofill the formats anymore. I manually corrected them, FWIW. |
@jkotin should I remove the auto-fill format logic? Great to have them corrected in the database. |
Yes please. |
Comments on books:
Exclude OCLC links. I worry many are incorrect—when I was correcting books, I tried to check them, but I’m not confident that I deleted all the incorrect ones.
Exclude column “identified.” I worry that we’re using the uncertainty icon for too broad a range of issues. Might be better to refer researchers to the notes and the use of “unidentified.”
I'm going to try to make some headway on the blank format. The format should only be blank for some items with an uncertainty icon.
I'll post on the members and events ASAP.
The text was updated successfully, but these errors were encountered: