Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong data in the json source #8

Open
ryson282 opened this issue Sep 27, 2014 · 12 comments
Open

Wrong data in the json source #8

ryson282 opened this issue Sep 27, 2014 · 12 comments

Comments

@ryson282
Copy link

For example, for card 06053, the field count = 53 => Meaning you can add up to 53 copy of that card in a deck.

For this particular example, a coherency check on the count value would do the trick.
=> If count is above 3, set the value to 3.

However, in the long term, I think a way to correct erroneous data and distribute the corrected data through the API will be needed.

@gereons
Copy link
Contributor

gereons commented Sep 27, 2014

Good catch, pretty obvious it's a typo in the data entry (number:53 and quantity:53).
However, Quantity is "how many are in this pack" and not "how many can I have in a deck", that's what maxperdeck is for.

@ryson282
Copy link
Author

Yes, indeed, you are right.
But in some deckbuilder, that value is used as the max card per deck you can use. (That is how I spotted it :o).)
And usually a multiplicator is applied to the Core set depending on the number of core set you have.

@DoubleAitch
Copy link

I understand the plan is to create a Master Datasucker, that will feed down to others in a pyramid structure. The Master will either implement corrections for the CGDB errors or be using a seperate data source that is corrected and maintained in isolation of the CGDB source.

In order to keep track of any errors we find I suggest they are listed here so we can easily correct them all once the Master Datasucker structure is in place.

  • 06053 - "quantity":53 => "quantity":3
  • 05006 - "maxperdeck":3 => "maxperdeck":1
  • 03004 - "maxperdeck":3 => "maxperdeck":1
  • 06020 - "maxperdeck":3 => "maxperdeck":1

@gereons
Copy link
Contributor

gereons commented Sep 29, 2014

Almost all of the card names that have special diacritical marks are wrong as well:

  • 02046 - "title": "Chaos Theory: Wünderkind"
  • 02020 - "title": "Dracō"
  • 05011 - "title": "Shi.Kyū"
  • 01002 - "title": "Déjà Vu"

@MarbleMunkey
Copy link

Two thoughts:

  1. Would it be helpful to leave it that way and include a second 'display
    name' attribute?

  2. Do we want to consider supporting multiple languages (realizing that we
    have no current source of i18n data), or do we want to assume that
    non-English languages would run separate datasuckers?
    On Sep 29, 2014 7:26 AM, "Gereon Steffens" notifications@github.com
    wrote:

Almost all of the card names that have special diacritical marks are wrong
as well:

  • 02046 - "title": "Chaos Theory: Wünderkind"
  • 02020 - "title": "Dracō"
  • 05011 - "title": "Shi.Kyū"
  • 01002 - "title": "Déjà Vu"


Reply to this email directly or view it on GitHub
#8 (comment)
.

@gereons
Copy link
Contributor

gereons commented Sep 29, 2014

I would only ever use the "display name" attribute. Objective-C has "case insensitive, ignore diacritics" string comparison/matching built in (and I hope every other modern language makes this easy too).

@MarbleMunkey
Copy link

Javascript, in particular, lacks any such niceties.

On Mon, Sep 29, 2014 at 8:39 AM, Gereon Steffens notifications@github.com
wrote:

I would only ever use the "display name" attribute. Objective-C has "case
insensitive, ignore diacritics" string comparison/matching built in (and I
hope every other modern language makes this easy too).


Reply to this email directly or view it on GitHub
#8 (comment)
.

@datasucker
Copy link
Owner

i18n is a good thought here. did NRDB's API support different languages?

@gereons
Copy link
Contributor

gereons commented Sep 29, 2014

Yes it did: http://netrunnerdb.ca/api/cards?_locale=de for german cards, although the data isn't complete enough to be useable, IMO.

Note the distinct fields like "faction" and "faction_code": you need to be able to parse this without knowing that "Shaper" is "Gestalter" in german or any other language, so faction_code always has lower-cased english words like "shaper" in every locale. This applies to side, faction, type and subtype.

@ryson282
Copy link
Author

@MarbleMunkey

1/ I think the Master Datasucker should only share corrected data. I do not see the point of sharing erronneous data. However on Master Datasucker, 3 classes could be defined: RawNRCard, Correction and NRCard (corrected) with a daily batch computing the later.

2/ I would advise to only set up an english Datasucker network at first and make it work while thinking the architecture to support multi-lang. Once the network works and is in place, it will be easier to get contributor mastering other language and having access to other language content.

@DoubleAitch
Copy link

I have created a separate issue (#10) for discussing Multiple Language Support

@datasucker
Copy link
Owner

There is now a "master" data source with above corrections to the data. This is only active on 1 datascuker in the network at the moment. More data suckers will be coming online shortly that will clone that "master" DS and provide the initial top-level data suckers to feed the rest of the network.

If you are interested in helping maintain that Datasucker (please help!), visit shapers.cyberdeck.io, register, and request access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants