Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate LogData Entries #1006

Merged
merged 1 commit into from
Nov 21, 2017
Merged

Populate LogData Entries #1006

merged 1 commit into from
Nov 21, 2017

Conversation

Maithem
Copy link
Contributor

@Maithem Maithem commented Nov 17, 2017

This patch populates the global address for all entry types on
reads.

Copy link
Contributor

@rogermichoud rogermichoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I am sure there is some code here and there that "worked around it". Should we try to refactor it as well (in a seperate PR) ?

Copy link
Member

@no2chem no2chem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this PR is probably trying to fix some code that calls getGlobalAddress on an empty/trimmed address. But, there are several issues with this PR. For example,

LogData.getEmpty(0).equals(LogData.getEmpty(1))

is now false.

Where is the behavior you are trying to fix happening?

@Maithem
Copy link
Contributor Author

Maithem commented Nov 17, 2017

@no2chem is there a typo in your example, because I don't see it. You are comparing two different slots in the address space, so I expect them to be different (i.e they are not equal). Also, we need to agree on the definition of equality here. Do you define equality solely based on the type of entry ?

@Maithem
Copy link
Contributor Author

Maithem commented Nov 17, 2017

@rogermichoud are you referring to the fact that some interfaces pass the address long and the LogData entry, hence redundancy ? Can you point of instances of the workarounds. Maybe they should be addressed in this patch.

@no2chem
Copy link
Member

no2chem commented Nov 17, 2017

@Maithem Well, you added an address field to an empty return type, before, an empty address would equal an empty address no matter where the read was from.

As in,

LogData.EMPTY.equals(LogData.EMPTY);

What are you trying to fix? Where is there code that is querying an empty log data for its address? I'd rather fix that than add additional new-object creation just to pass the address (which the reader should already know!)...

@Maithem
Copy link
Contributor Author

Maithem commented Nov 17, 2017

@no2chem perhaps we need to override the equality logic to compare on the address (or all the contents if we are paranoid) because its only based on references right now. For example, some logic might want to expect that LogData.getEmpty(0).equals(LogData.getEmpty(0)) is true.

@Maithem
Copy link
Contributor Author

Maithem commented Nov 17, 2017

@no2chem this change was based on my assumption of the modelling, where a LogData type represents a slot in the address space.

Essentially, my intention is to bind trimmed and hole addresses to a specific address. I did it for those two types and extended it to the empty type for the sake of uniformity.

@no2chem
Copy link
Member

no2chem commented Nov 17, 2017

Why should empty and hole be bound to a specific address? Have you considered checkpoints, which also generate ILogData, as well?

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 51f150a on EntryAddresses into ** on master**.

@Maithem
Copy link
Contributor Author

Maithem commented Nov 18, 2017

When manipulating LogData collections, it is useful to have LogData entries know their global address, otherwise they have to be wrapped with different structures to include their address.

Yes, all checkpoint entry types populate the global address metadata correctly, or as expected.

@no2chem
Copy link
Member

no2chem commented Nov 18, 2017

I don't know, this increases the size of serialization as well... for example if I return a map of addresses to logdatas, and I read a range of 0 to 4

I get
0 -> empty(0)
1 -> empty(1)
2 -> empty(2)
3 -> empty(3)

When empty(0) semantically is the same as empty(1).
When are we manipulating a log data collection in a way where we need to know the address of an empty log data? When is it useful?

@Maithem
Copy link
Contributor Author

Maithem commented Nov 18, 2017

@no2chem How is that any different than?
0 -> empty(-1)
1 -> empty(-1)
2 -> empty(-1)
3 -> empty(-1)

Here: #1004

That's just an artifact of how the address space is modeled, having a single instance of LogData.Empty
on the server side is not an optimization and on the client size we don't deal with many Empty entries anyways, because we fill them up.

@no2chem
Copy link
Member

no2chem commented Nov 18, 2017

@Maithem because empty without an address can be represented as a single bit, but with an address will require 8 bytes.

More importantly, I'll repeat my question again:

When are we manipulating a log data collection in a way where we need to know the address of an empty log data? When is it useful?

@Maithem
Copy link
Contributor Author

Maithem commented Nov 18, 2017

@no2chem We never persist empty entries. So, if you think that 8 extra bytes will incur a significant perf impact in terms of transmission, serialization and memory footprint, then we can settle this by running some perf tests.

In my last response I linked the PR that does make use of the populated entries, here's the link again #1004

@no2chem
Copy link
Member

no2chem commented Nov 18, 2017

@Maithem You just referenced #1004 without specifying why you referenced it.

Either way, this PR seems to be about writes, and you can't write EMPTY, I guess you could write 'hole', but IMO hole should be stored in a bitmap, and I'm not sure if writers should be able to write a hole through the write() interface. But I'm OK with holes having an (optional) address in this case.

@Maithem
Copy link
Contributor Author

Maithem commented Nov 18, 2017

@no2chem I referenced because you asked me where this behavior is needed. The added code depends on having uniformity between all LogData types. See, its not hard to code around this and avoid all of this discussion together, but my point is, thats how we modeled the address space and therefore I should be able to use it as such. Having to code around these special cases makes the code messy. If you dont like the model let's change it, but for now this change seems reasonable with almost no impact.

Anyways, Ill fix the equality issue and resubmit this patch, let me know if you have more concerns.

@no2chem
Copy link
Member

no2chem commented Nov 18, 2017

@Maithem Ok, I think this is fine, but if we go this route, we should eliminate the use of maps in ReadResponse and other places, because that information is redundant.

@Maithem Maithem force-pushed the EntryAddresses branch 2 times, most recently from fb09c0f to d39fe31 Compare November 20, 2017 20:37
@Maithem
Copy link
Contributor Author

Maithem commented Nov 20, 2017

@no2chem the equality example that was mentioned earlier is actually not only specific to EMPTY, it will fail on any two LogData objects that are that aren't the same reference. For example, writing to address x and reading address x twice will yield two different objects that are not equal (i.e. even if they are the same address). The problem is LogData doesn't implement equals, which I just added to this PR. Please review again.

This patch populates the global address for all entry types on
reads.
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 2979284 on EntryAddresses into ** on master**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 2979284 on EntryAddresses into ** on master**.

@rogermichoud
Copy link
Contributor

@Maithem What I meant is maybe there is some code out there that is especially cautious about using getGlobalAddress on LogData because it was expecting to get a -1 instead of the real address. I was skimming through the getGlobalAddress usage and I didn't find anything obvious.

I recall that I had to use a workaround in the FastLoader to pass the addresses with the list of LogData because I couldn' t safely get the address from each instance. I think this is too tedious to search for these occurrences and we just going to live with the fact that some code out there was built "working around" this bug. Shouldn't be harmful.

@Maithem
Copy link
Contributor Author

Maithem commented Nov 21, 2017

@no2chem this PR is ready

Copy link
Member

@no2chem no2chem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@no2chem no2chem merged commit 867f3bb into master Nov 21, 2017
@Maithem Maithem deleted the EntryAddresses branch November 21, 2017 22:53
Copy link
Collaborator

@medhavidhawan medhavidhawan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants