Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some "_id" fields not being unique. #17

Merged
merged 12 commits into from May 28, 2018
Merged

Fix some "_id" fields not being unique. #17

merged 12 commits into from May 28, 2018

Conversation

Senexis
Copy link
Contributor

@Senexis Senexis commented May 22, 2018

The field not being unique may cause issues with the difference checker as it just randomly decides the order of certain objects, causing false positives in difference checkers. It was also causing some issues in my own overlay project.

  • Make the "_id" fields use more than just the name, as duplicate names are possible in the cases of enemies.
  • Add an "_id" field to transients and refactor the file a bit.
  • Updates the README to reflect the "_id" addition.

These are the only current files that have ambiguity, but there may be more cases of this issue possible later (think items, primarily). Might need to improve the "_id" field of those later, too. For now, however, this suffices.

Difference can be checked here, compare with the live version of the diff checker.

Nadermane added 3 commits May 22, 2018 19:34
The field not being unique may cause issues with the difference checker, as it just randomly decides the order of certain objects, causing false positives in difference checkers.

- Make the "_id" fields use more than just the name, as duplicate names are possible in the cases of enemies.
- Add an "_id" field to transients and refactor the file a bit.

These are the only current files that have ambiguity, but there may be more cases of this issue possible later (think items, primarily). Might need to improve the "_id" field of those later, too. For now, however, this suffices.
@@ -41,6 +38,7 @@ module.exports = function($) {
}

// push the last one too
enemy._id = hash(enemy.enemyName + enemy.ememyModDropChance + enemy.mods.length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a template string instead of concat

Copy link
Contributor Author

@Senexis Senexis May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: Why?

The point of this string is not to be human readable at all, in fact, the user won't directly interact with this in any way. The only purpose is to make the hash target more strict instead of just using "text" which is the cause of duplicate identifiers.

If this text were in any way supposed to be readable, I would immediately agree but that's not the point in this case, so why? 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Less chance of misunderstanding this as a numerical operation, since aspects of it are numerical
  2. Subjectively: it looks better
  3. Across several projects we enforce the usage of templates over concatenation, so from this point of view: consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't think that I'm consistent with that in this project either tho :D)

@@ -48,7 +45,9 @@ module.exports = function($) {
}
}

transientRewards.push(curr)
// push the last one too
transient._id = hash(transient.objectiveName + transient.rewards.length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a template string instead of concat

}

name = text
curr = {objectiveName: name, rewards: []}
transient = {_id: null, objectiveName: text, rewards: []}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't define _id if there's no intention to set it until later, which will create it anyway.

mods.push(mod)
}

mod = {_id: hash(text), modName: text, enemies: []}
mod = {_id: null, modName: text, enemies: []}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't define _id if there's no intention to set it until later, which will create it anyway.


mod.enemies.push({
_id: hash(text),
_id: hash(text + enemyModDropChance + chance.rarity),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template string, don't concat

@Senexis
Copy link
Contributor Author

Senexis commented May 23, 2018

Oh dang, I just figured out where "ememy" came from, it's from the enemy mod locations file, and has been since pretty much always, dating back to 2017. Does it matter that this fixes that too (I didn't think twice about it and just auto-corrected it) or should we add "ememy" back for existing implementations?

Again, "ememy" has been in there since forever.

@atomicptr
Copy link
Member

I don't like that changes in the data will change the ID because you add rarity to the hash stuff. Instead I'd suggest salting the file name into it like hash(text + "modLocations.js") or something that helps to turn it into a more unique thing

@Senexis
Copy link
Contributor Author

Senexis commented May 23, 2018

That wouldn't work. This would result in double items like "Bailiff" still having the same ID (reference, see enemyModTables) in the same file. There has to be something unique added to it other than something which is the same in both cases, and drop chance and mod items count is as close as it will get that's both unique (enough) and consistent across updates.

@atomicptr
Copy link
Member

atomicptr commented May 23, 2018

Hm, how about adding a string that explains what this is then?

hash(text + "this is mod locations")

or something, just to salt it.

Because adding rarity values to it will cause the ID to change every single time DE changes the rarity which makes _id obsolete because it always changes and you can't tell if it's still the same field as before.

Edit: What is salted into it isn't relevant, but it shouldn't be something that might change, the purpose of the ID is to make the entry "unique" and I agree that it should probably be unique for each use case. But lets say the reward "400 Endo" from Blah Rotation B should always have the same ID no matter what DE thinks the drop rate should be. Potential values to salt the hash in this example btw are game mode, rotation, location etc and maybe a random string that describes it.

@Senexis
Copy link
Contributor Author

Senexis commented May 23, 2018

That's the thing, though, anything based off just a text string is going to cause duplicate IDs. Any static salt will which is file-based. Therefor we have to use an object-based salt, which this is...

@atomicptr
Copy link
Member

Hmm... yeah I see that... the question is in taht case if it wouldn't make more sense to merge these two 🤔

@atomicptr
Copy link
Member

I don't even know WHY they did this this way lol...

@Senexis
Copy link
Contributor Author

Senexis commented May 23, 2018

You mean like an "_id" (text only) field in addition to "_uid" (text + other properties) or something?

Yeah, the best way would have been to have THEM set an ID for every item. Additionally, why is it that one enemy has multiple drop tables? 🤔


mod.enemies.push({
_id: hash(text),
_id: hash(`${text} ${enemyModDropChance} ${chance.rarity}`),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now there's spaces, which shouldn't be there....

Copy link
Contributor Author

@Senexis Senexis May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't matter to be fair, as the output will be the same length and will change no matter what due to the earlier changes.

@TobiTenno
Copy link
Member

I think, going from an object-based standpoint, it should be treated as, effectively, a different object, since the drop chance did change.. we could almost think of the id as also representing a specific state.

@Senexis
Copy link
Contributor Author

Senexis commented May 23, 2018

This is why I wrote it like this. It doesn't matter as much whether the ID changes sometimes, compared to being an identifier for that object, and that object alone.

What's the point of an identifier if it's not unique? If a non-unique identifier is something people are looking for, they could just use the text field.

Edit: Only using the chance and and item name would cause less identifier changes. However, this would not fix cases like the Eidolon Gantulyst (2 entries, same chances), not solving the original problem. I don't think there's any way around needing to effectively hash the entire object to be truly unique.

@atomicptr
Copy link
Member

Yes it should be unique, but I also don't like the ID changing (every time) because this makes it fairly obsolete. The entire reason why I added this in the first place is to check if an object changed between builds, if you change the ID there is no link between them anymore. Which is why I'm against that.

One idea to solve this problem would be to merge the drop lists of these multi entry things, because I don't see a good argument to have 2x "Drop by enemy" entries at all. If that doesn't work, we could maybe add a counter for occurrences of names, this would rely on DE having a consistent order in the list tho....

@Senexis
Copy link
Contributor Author

Senexis commented May 23, 2018

That would be an option. However, this wouldn't solve cases where the drop chance differs. How would you denote the drop chance for the entire table? It would need to be something like:

{
  "name": "Bailiff",
  "drops": {
    "1.25": [
       "items..."
    ],
    "25.00": [
       "items..."
    ]
  }
}

Edit: Another option might be to rename the duplicate items themselves into something like "Bailiff" and "Bailiff (Alternative)", though that would cater to only 2 duplicate items, and what would you do with more than 2? Something like "Bailiff (1)" and "Bailiff (2)" seems a bit blunt.

Edit 2: Won't work, there's no guarantee that the order DE prints their drops is consistent, so it could occur that "Bailiff (Alternative)" gets to be first sometime and become "Bailiff", bringing us right back to square 1...

@atomicptr
Copy link
Member

Have you found a case where they're actually different? Bailiff and Eidolon Gantulyst seemto be the same. If that happens you could reference that value in the drop itself tho.

@Senexis
Copy link
Contributor Author

Senexis commented May 24, 2018

Yes -- the Commander has this issue.

Referencing the table drop chance and mod drop chance in the objects will cause more data as in bigger objects, which is more than what it's worth. I would still prefer changing the ID over the structure of objects to be honest.

@atomicptr
Copy link
Member

I would still prefer changing the ID over the structure of objects to be honest.

Might as well remove the _id then, it's completely useless for its intended purpose if it changes whenever the data gets changed. Tho I agree with not trying to change the structure of objects.

Edit 2: Won't work, there's no guarantee that the order DE prints their drops is consistent, so it could occur that "Bailiff (Alternative)" gets to be first sometime and become "Bailiff", bringing us right back to square 1...

I wanted to use the counter only for salt, assuming that most of the time they don't change the order this makes this option a lot more viable then the other ones (though still not perfect obviously), also because this doesn't affect a lot of objects. I don't know if there is a better solution 🤔.

@Senexis
Copy link
Contributor Author

Senexis commented May 24, 2018

I actually think DE not printing their list in a set order is how this issue occurs in the first place, so I really wouldn't go that route.

Maybe it's useless for that purpose but it would still be useful for any other purpose. You would still have the name to identify it by, don't forget that. 😉

Also, the only time the id is going to change is if the drop chance or amount of mods change, which isn't that much of an issue, is it?

@TobiTenno
Copy link
Member

I think I've mentioned that if we sorted after applying the ids and such, it would fix the issue here: #18

So, i'm ok with this if we add sorting as well.

Nadermane added 2 commits May 24, 2018 20:27
Reverts the dynamic salt in 90% of the cases, so comparisons can be more easily made. Now, when the same ID has been found in an array, try to hash it with some properties until giving up eventually and just skipping adding the object when it's the _exact_ same. This applies to enemyModTables.js and modLocations.js, as those are the currently known cases of duplicates occurring.

Also refactors the other files having inconsistent formatting.
@Senexis
Copy link
Contributor Author

Senexis commented May 24, 2018

There we go. I went the middle route with this. Objects whose IDs are unique are not changed at all by this, making comparison easier in theory. Objects whose IDs are not unique get additional values added to the salt until falling back onto the entire JSON object stringified. If even that fails, we have a duplicate object which gets removed, so this acts as both a unique-ifier and a de-duplicator at the same time.

Diff is available at my gh-pages as soon as Github decides to update it.

Edit: You may notice more removals and additions than before. This is because these items just happened to go OK on the current iteration, but they could have been compared with each other on a whim. These potential issues are also fixed now.

Edit 2: @atomicptr There is no way to keep the transients from not being counted as removals and additions in this state (adding an _id field). However, in the next actual update from DE it will be compared as it should, so it will resolve itself then.

Edit 3: @TobiTenno Sorting won't help as the diff checker is already sorting the items, as mentioned in the issue as well. Have tested this, doesn't change anything.

Nadermane added 2 commits May 24, 2018 21:04
This reverts my previous fix for ememy > enemy for now and discards all changes to the object already in the array when all previous checks fail. This causes less removals and additions when just de-duplicating the array as the original item doesn't get edited at all.
@Senexis
Copy link
Contributor Author

Senexis commented May 24, 2018

Those commits make this solution dang-near perfect. It now provides the least amount of id changes I can do while still fixing the issue and de-duplicating the arrays.

Also since my question about the "ememy" fix got completely ignored (no worries about that!), I just reverted my "ememy" fix for now.

Edit: I know checking against == true is not needed, will remove if there's another change request. Leftover from stupidity.

Copy link
Member

@atomicptr atomicptr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to duplicate the checkUnique function, move that to utils or something and also don't use "==" for comparison but use "===" instead. Beyond that it looks fine.

@Senexis
Copy link
Contributor Author

Senexis commented May 25, 2018

It isn't a duplicate function, one uses the length of a collection, the other uses the value of "chance". I'm sure there's a better way to write that, but I can't think of it right now. Will fix the comparisons soon.

@atomicptr
Copy link
Member

atomicptr commented May 25, 2018

It's almost 90% the same function tho, which is a duplication. You could pass a function as parameter that returns the value it's using for comparison or something along those lines:

checkUnique(..., obj => obj.chance);
checkUnique(..., obj => obj.collection.length);

Nadermane added 2 commits May 25, 2018 21:05
Adds checkUniqueHash(items, source, [...properties]) function. Expects "_id" field to exist within the source object, and experts the first property to be the value that's the default hash.
@Senexis
Copy link
Contributor Author

Senexis commented May 25, 2018

Took a while (due to the very nice weather here), but there it is. Fully modular (apart of the "_id" field being required). Can easily be used in other places now.

Copy link
Member

@atomicptr atomicptr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor nitpick, but looks good otherwise :)

lib/utils.js Outdated
let sourceHash = []

// Add one to the properties length to add an additional static check.
for (var i = 0; i < (properties.length + 1); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nitpick, but I'd change var to let because I don't see a reason why you'd want to hoist i. This can introduce some unexpected/unintended behavior.

If you know how variable hoisting works in JS you can ignore this, if not this saves one Google search :).

var declares the variable in the entire function not just the scope:

function f() {
    for(let i = 0; i < 10; i++) {
        for(var j = 0; j < 10; j++) {
            // do shit
        }
    }
    
    console.log(j); // works fine even though we're not in the scope anymore, thank you var? :P
    console.log(i); // this will throw an ReferenceError because i was used with let which doesn't hoist the variable
}

f();

// j doesn't exist here obviously...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the code template I use. Simply a side effect of using the "for" template. Will fix.

@atomicptr
Copy link
Member

You should probably also merge the changes from the main branch into this and rebuilt, because #19 introduced a change in the data.

@Senexis
Copy link
Contributor Author

Senexis commented May 28, 2018

Would suggest you merge this and do the rebuild yourself. My fork is too messed up at this point unfortunately due to my own stupidity and doing this all on my gh-page branch.

@atomicptr
Copy link
Member

Done :), ty for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants