Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SConsign sometimes mangles cached node md5sum. #2980

Closed
bdbaddog opened this issue Jan 2, 2018 · 3 comments
Closed

SConsign sometimes mangles cached node md5sum. #2980

bdbaddog opened this issue Jan 2, 2018 · 3 comments

Comments

@bdbaddog
Copy link
Contributor

bdbaddog commented Jan 2, 2018

This issue was originally created at: 2014-10-17 17:01:38.
This issue was reported by: wblevins001.
wblevins001 said at 2014-10-17 17:01:38

Original problem description:

Hi,

While using SCons (version 2.3.0) together with git I encountered some problems with MD5 calculations. After rebasing to different revisions of my repository I noticed strange cache misses (the proper .o files should be already in the cache while they were actually rebuilt). This led me to the conclusion that some MD5 values may be calculated incorrectly. In fact, after using the script sconsign.py it became apparent that some MD5s in .sconsign.dblite were incorrect (the sums calculated directly using hashlib were different).

To sum up, after this simple procedure:

rm .sconsign.dblite
git checkout SOME_VERSION_1
scons
git checkout SOME_VERSION_2
scons

sconsign.py reported incorrect MD5 sums.

As it turned out, the problem is localized in Node/__init__.py. The code

then.extend([None] * diff)

in line 1050 causes desynchronization between two lists: the current dependencies and info nodes for dependencies from the previous build (containing, among others, previously calculated MD5 sums).

For example, consider that in SOME_VERSION_1 a file has five dependencies:

A B C D E

and in SOME_VERSION_2 a new dependency appears:

A B C X D E

After line 1050, the lists "then" and "children" look like below:

then:     A B C D E None
children: A B C X D E

Code further below updates children information. As it appears, such desynchronization does not cause any problems if we are checking MD5s each time: they just will be recalculated for X, D and E. But if we also look at the timestamps, the condition of timestamp remaining the same allows SCons to copy the MD5 sum from the "then" array to the current child object. Consequently, if D and E have the same modification date, E's checksum will be incorrectly copied into D's info. And this is exactly the error that I observe.

My quick fix builds "then" list in a proper way, .i.e. without desynchronization (please see the attachment):

then:     A B C None D E
children: A B C X    D E

Please look at it and tell me what you think. I don't know the SCons' sources very well so my fix may be a little naive but this is what I can provide for a start.

Thanks,
Piotr

wblevins001 said at 2014-10-17 17:05:03

please find attached a simple test that reproduces the bug. I tested it on Linux. Below is my example output with discrepancies highlighted:

$ ./test.sh ~/tools/scons
Note: checking out '6ee27c0db68515e3faa71bf25c0ed7a883ecf697'.
 
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
 
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
 
  git checkout -b new_branch_name
 
HEAD is now at 6ee27c0... initial
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
g++ -o main.o -c main.cpp
g++ -o main main.o
scons: done building targets.
Previous HEAD position was 6ee27c0... initial
Switched to branch 'master'
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
g++ -o main.o -c main.cpp
scons: done building targets.
=== .:
SConstruct: None
alpha.h: 3518d8c3310bba3c2a0da5489f569e14
beta.h: 2ff783593a2224d0574181661ab5f1b7
gamma.h: 2ff783593a2224d0574181661ab5f1b7
main: b53741b91f13474487bbbc1efe5e328b
main.cpp: ebf971b7da9c98f91067a9c601e96259
main.o: faaa740a3704b2c8f034bafd7c1722ff
=== /usr/bin:
g++: bb36ec5f9b8a8bfc02d1c4bf917a58e1
3518d8c3310bba3c2a0da5489f569e14  alpha.h
bca4755df4cd2eef7d51bfb007c36dd1  beta.h
2ff783593a2224d0574181661ab5f1b7  gamma.h
ebf971b7da9c98f91067a9c601e96259  main.cpp

wblevins001 said at 2014-10-17 17:05:57

Created an attachment (id=951)
Potential patch

wblevins001 said at 2014-10-17 17:06:58

Created an attachment (id=952)
Simple Test Case

wblevins001 said at 2014-10-19 12:06:18

Unfortunately, this patch isn't just a drop-in. See the following tests:
test/Repository/JavaH.py
test/Repository/RMIC.py
test/Repository/signature-order.py
test/Repository/variants.py
test/option/tree-lib.py
test/subdivide.py

wblevins001 said at 2014-11-01 20:41:26

I investigated test/option/tree-lib.py failure, and interestingly a rebuild happens since the generated map isn't the same as the sum of all binfo.

Why you ask? It appears that the info list contains the same entry twice which is inherently pruned by creating the initial map. Should the same node be listed twice? I do not know.

wblevins001 said at 2014-11-01 22:02:03

I'm gonna do some investigation here ;)

wblevins001 said at 2014-11-02 00:52:15

Looking at test/subdivide.py

In order to map from binfo to the current list of children, there must be guaranteed access to a disambiguate (or path normalization) method for comparing two Nodes. It seems that Node.changed always gets called twice: once by taskmaster.prepare and once by the builder (if I remember correctly). Not sure why this happens twice, but the first time self.children are str types and the second time self.children are Node types. The taskmaster.prepare call chain will not map to anything after a directory change due to the "node string" changing relative to the call dir.

I don't see a simple way to resolve this issue.

wblevins001 said at 2014-11-02 00:59:50

Correction to the last comment: binfo nodes show as str types first and Node types second pass. Children were always node types.

Not sure why binfo changed? Either way this is a classic strings as keys
issue.

wblevins001 said at 2016-01-14 17:18:32

...

Votes for this issue: 1.

wblevins001 attached csig_fix.diff at 2014-10-17 17:05:57.

Potential patch

wblevins001 attached csig_bug.tar.gz at 2014-10-17 17:06:58.

Simple Test Case

@williamblevins
Copy link
Contributor

williamblevins commented Mar 29, 2018

I was able to reproduce this test on Debian 9.4 with both scons 2.3.0 and 3.0.1.

If I remember correctly, this issue was submitted via the mailing list and I just created the legacy Tigris issue. Piotr created all the attachments, but I thought I could get it ready quickly. What you should be looking for in the test outputs is that SCons thinks beta.h and gamma.h have the same signature, but they shouldn't.

@bdbaddog
Copy link
Contributor Author

bdbaddog commented Mar 29, 2018 via email

@bdbaddog
Copy link
Contributor Author

bdbaddog commented Mar 29, 2018

Interesting. So the cause is two fold.

  1. As listed above, in some situations the array of csigs will contain values which don't map to the corresponding file
  2. When using Datestamp-md5, the function called to check if the file has changed will propagate the wrong csig into the .sconsign which is written by SCons before it exits if the timestamp hasn't changed.

In this case beta.h is getting assigned the correct md5 for gamma.h..

    def changed_timestamp_then_content(self, target, prev_ni):
        if not self.changed_timestamp_match(target, prev_ni):
            try:
                # Some debug code to catch the issue..
                if str(self) == 'beta.h' and prev_ni.csig == '2ff783593a2224d0574181661ab5f1b7':
                    print("in problem code")
                print('Setting csig [%s]:%s'%(str(self),prev_ni.csig))
                self.get_ninfo().csig = prev_ni.csig
            except AttributeError:
                pass
            return False
        return self.changed_content(target, prev_ni)

bdbaddog added a commit to bdbaddog/scons that referenced this issue Mar 29, 2018
… but as discussion indicates breaks other tests. WIP
bdbaddog added a commit to bdbaddog/scons that referenced this issue Mar 29, 2018
… only applied when the number of children in the current build doesn't match the number in the previous build as retrieved from the sconsign file.
bdbaddog added a commit to bdbaddog/scons that referenced this issue Mar 29, 2018
bdbaddog added a commit to bdbaddog/scons that referenced this issue Apr 22, 2018
There are still possible errors due to timestamp-MD5 + cachedir + changed implicit or regular dependencies (but the same # of such as the previous build).
These are not yet handled as the fix being used for changed number of such breaks a number of tests.
This fix doe reduce the number of possible issues.
bdbaddog added a commit to bdbaddog/scons that referenced this issue Jul 11, 2018
… but as discussion indicates breaks other tests. WIP
bdbaddog added a commit to bdbaddog/scons that referenced this issue Jul 11, 2018
… only applied when the number of children in the current build doesn't match the number in the previous build as retrieved from the sconsign file.
bdbaddog added a commit to bdbaddog/scons that referenced this issue Jul 11, 2018
bdbaddog added a commit to bdbaddog/scons that referenced this issue Jul 11, 2018
There are still possible errors due to timestamp-MD5 + cachedir + changed implicit or regular dependencies (but the same # of such as the previous build).
These are not yet handled as the fix being used for changed number of such breaks a number of tests.
This fix doe reduce the number of possible issues.
bdbaddog added a commit to bdbaddog/scons that referenced this issue Nov 12, 2018
… but as discussion indicates breaks other tests. WIP
bdbaddog added a commit to bdbaddog/scons that referenced this issue Nov 12, 2018
… only applied when the number of children in the current build doesn't match the number in the previous build as retrieved from the sconsign file.
bdbaddog added a commit to bdbaddog/scons that referenced this issue Nov 12, 2018
bdbaddog added a commit to bdbaddog/scons that referenced this issue Nov 12, 2018
There are still possible errors due to timestamp-MD5 + cachedir + changed implicit or regular dependencies (but the same # of such as the previous build).
These are not yet handled as the fix being used for changed number of such breaks a number of tests.
This fix doe reduce the number of possible issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants