New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create DB entries for all society and venue strings #541

Open
hoyes opened this Issue Dec 7, 2018 · 6 comments

Comments

4 participants
@hoyes
Member

hoyes commented Dec 7, 2018

From #536 (review)

At the moment, we have an "other_society" field which is used to store a plain text society name string if the show's society is not already listed on Camdram. Instead, we could create a new society entry for any unique society string. Such entries would be hidden from /societies unless/until manually made visible by an admin.

Ditto for venues.

Advantages:

  • Simplifies show schema and removes need for special branches in various places to handle concrete societies and raw society names.
  • Makes the societies and venues data model similar to people
  • If someone comes along and wants to be an admin for a society, the society entry will likely already exist and already linked to all the shows they've done to date.

Disadvantages:

  • We'd likely end up with a large number of messy society names in the acts_societies table.
  • ???
@CHTJonas

This comment has been minimized.

Member

CHTJonas commented Dec 7, 2018

I think that film listings that spring up ad-hoc venues & societies would be the biggest issue RE the first disadvantage. Then again, I'm not convinced that's necessarily a bad thing in that it wouldn't be any more messy or worse than the acts_users table in terms of duplication. Would also be easy to cut through that noise too with a WHERE approved = 1.

It would also in theory be possible to detect orphaned societies and venues that are no longer linked to shows (through mis-spellings etc.) and purge them. Similarly but more complicated: detect similarly spelt venues/societies differing by a few characters or punctuation.

@hoyes hoyes referenced this issue Dec 7, 2018

Closed

Multiple societies #536

@CHTJonas

This comment has been minimized.

Member

CHTJonas commented Dec 7, 2018

Maybe helpful to the discussion on duplication/mess:

SELECT DISTINCT `society` FROM `acts_shows`  
ORDER BY `acts_shows`.`society`  ASC
@GKFX

This comment has been minimized.

Contributor

GKFX commented Dec 7, 2018

Had a skim through that earlier and didn't much like the look of it.

SELECT DISTINCT `society` FROM `acts_shows` WHERE `socid` IS NULL ORDER BY `society` ASC 

gives 479 results; there's:

  • a full stop
  • lots of "X and Y" entries (and "X, Y and Z") which should probably be cleaned up once #536 hits production,
  • spelling variants
  • variants on TBC, None
  • etc.

All-in-all thoroughly grim data!

If we want to pursue this further I would suggest:

  • Release #536
  • Big semi-manual clean-up
  • Then look at auto-registering societies and allow users to search for unregistered societies in the Society field but not in the main search bar.

This would prevent further duplicates from appearing as users can then make their spelling conform to what previous users have typed.

@GKFX

This comment has been minimized.

Contributor

GKFX commented Dec 7, 2018

Also if we do do the clean-up, NB that by design if you put integers into socs_list but not into acts_show_soc_link they'll have no effect and will be erased the next time the edit form is submitted.

@philosophicles

This comment has been minimized.

Contributor

philosophicles commented Dec 8, 2018

@GKFX plan seems very sensible. I like the idea of making society addition work more like person addition.

If looking at cleaning spelling variants, typos, etc (I've played this game many times before), I'd probably suggest trying to use phonetic algorithms like Soundex and/or Metaphone; they'd probably do a decent job of finding probable dupes on society names (despite being constructed for human names mainly). MySQL has soundex() and sounds like, PHP apparently has metaphone(). Then there's also string distance functions like Levenshtein or Damerau-Levenshtein. Of course, simple "try taking out punctuation & weird characters" type approaches are also useful. Sorry if I'm spelling out the bleeding obvious here!

@philosophicles

This comment has been minimized.

Contributor

philosophicles commented Dec 8, 2018

Regarding:

allow users to search for unregistered societies in the Society field but not in the main search bar
This would prevent further duplicates from appearing as users can then make their spelling conform to what previous users have typed

For more work but possibly more reward (in terms of making Camdram a little more self-maintaining) maybe we could expand that workflow?

If a show-creator enters a society that isn't already an approved, publicly-visible one, but has enough other shows linked to it to satisfy our conditions for listing the society: could we prompt them to think "should it be?" and maybe provide us with structured information that can make it so (with minimal-to-no interaction from the webteam).

My thinking is that reasonably often, somebody creating a show for an "unrecognised" society will actually be part of the society committee - most smaller / newer societies are quite small operations and don't have a standalone committee apart from show production teams. If not on the committee, they'll probably know people who are.

This line of thought is closely related to why I raised #346 and #359 and might overlap very closely with #346, but it could be implemented differently in light of this issue.

Semi-philosophical question: is there any difference between a previously approved, publicly-visible society with no admins any more (like #346), and a non-approved society created via this newly-proposed method?

@CHTJonas CHTJonas added this to To Do in API Changes/Deprecations via automation Dec 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment