-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename wip #128
Rename wip #128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code performs renames on files where the hash does not change and on files where the name minus digits is the same, correct? Did I miss a case? How does this do compared to the current rename code in swupd-server?
swupd/rename.go
Outdated
func trimrenamed(a []*File) []*File { | ||
r := make([]*File, 0, len(a)) | ||
for _, f := range a { | ||
if f.Status == StatusGhosted { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, a ghosted file can be a renamedfrom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Perhaps I need to think a little more about what a ghosted file can be.
We certainly need to add a test case for this.
My reasoning is that a ghosted file shouldn't be a candidate for a renamefrom, as the lifetime of the file is not managed by swupd-client. In particular a ghosted file could be deleted a minute after it is installed. I agree that a swupd verify --fix
would find it missing, and reinstall it, which could be annoying. An example I can think of is kernel modules being removed from /usr/lib/modules when they have been added to an initrd, which is stored on another partition.
Since you added the ghosted code, do you think that there is a requirement for the ghosted files to exist until there is an update? Put it another way, do you think 'ghosted' means that the files are already dead, or just that we don't have to kill them?
Of course we shouldn't panic here, just skip the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point of the ghosted file is to enable renames for kernel files. I think currently we actually deprecate old ghosted files immediately after every build, which is the wrong answer (I will add a unit test to check this and fix if necessary). In that case the client would attempt a retry but fail and proceed as normal - we do know that is the current state of the client.
Ghosted only means we don't have to kill them on the client.
swupd/rename.go
Outdated
} | ||
|
||
// makepairs returns a sorted by name, disregarding digits | ||
// secondary sort key is the original name. This is mainly inteneded for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/inteneded/intended/
swupd/rename.go
Outdated
// secondary sort key is the original name. This is mainly inteneded for | ||
// cases like with python we have the same filename under both python3.6 and python2.7 | ||
// it would look weird to rename the old python2.7/file1 to python3.6/file2 and at | ||
// the same time rename the old python3.6/file1 to python2.7/file2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice example.
swupd/rename.go
Outdated
pair[i].n = strings.Map(stripdigits, f.Name) | ||
} | ||
sort.Slice(pair, func(i, j int) bool { | ||
if pair[1].n == pair[j].n { // Same striped name, sort on original name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/striped/stripped/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why pair[1]
? Should it be pair[i]
?
swupd/rename.go
Outdated
// cases like with python we have the same filename under both python3.6 and python2.7 | ||
// it would look weird to rename the old python2.7/file1 to python3.6/file2 and at | ||
// the same time rename the old python3.6/file1 to python2.7/file2. | ||
func makepairs(a []*File) []pairednames { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really understanding the makepairs
name here since it doesn't pair any filenames here. Do you instead mean makepares
and parednames
or something since you are kind of paring the filenames?
swupd/rename_test.go
Outdated
} | ||
|
||
// filelist takes a list of wanted shortnames, looks them up in sn, and returns | ||
// the a list of pointers to copies of the Files. Because they are copies they can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the
or a
at the beginning of this line.
swupd/rename_test.go
Outdated
|
||
// checkLinked tests forward and backward pairing of DeltaPeers | ||
func checkLinked(t *testing.T, a, b *File, description string) bool { | ||
if a.DeltaPeer != b { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You check if b is nil
below but not if a
is nil here?
partner []int // Indexes of add that should match remove | ||
}{ | ||
{description: "Rename of file, same contents", | ||
remove: []string{"I1", "I2"}, add: []string{"I2"}, partner: []int{0, -1}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I1 and I2 are removed but I2 is also added? When could this be possible?....... Oh, shortnames. Also forgot that sn
is being generated.
I'm leaving this so you can see my thought process, no action needed.
swupd/rename_test.go
Outdated
for _, tc := range tests { | ||
add := filelist(t, sn, tc.add) | ||
remove := markdelete(filelist(t, sn, tc.remove)) | ||
// fmt.Println(add) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug print.
swupd/rename_test.go
Outdated
|
||
func TestRename(t *testing.T) { | ||
sn := genTestArray(t) | ||
// for f := range sn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
1cef116
to
093f8aa
Compare
swupd/rename.go
Outdated
if f.Status == StatusGhosted { | ||
// TODO. Do we allow ghosted files to be source files for renames? | ||
// I think not, as we don't manage their lifetimes, so they could | ||
// be deleted. The counter argument is that rename is only an optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The counter argument is essentially my argument. The whole point of ghosted files is to allow renames with them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will allow ghosted files as sources for renames. It makes the code shorter, even if it makes the code longer.
093f8aa
to
1ffda8c
Compare
I'll add it with this PR as a base. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this implementation. @cmarcelo please take a look.
Questions: does it ever make sense to use a deleted file as a source of rename? Also, isn't checking the Status flag enough? |
Deleted files are the only source of renames so it always makes sense to use a deleted file as the source, unless it is too small.
Not if the file is under 200 bytes. Experimentation with bsdiff in the old swupd-server showed us that bsdiff does not perform well with files under that threshold, sometimes even resulting in segfaults I believe. |
swupd/rename.go
Outdated
rx++ | ||
default: // Equal hash, so link | ||
linkrenamepair(af, rf) | ||
ax++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means 'added index' I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I said in one of my code reviews that using i
and j
wasn't very helpful as the two index values, as there was no connect in my mind between i
and one of the added and removed slices, and therefore I needed to look very hard to see if the correct one was being incremented. Go programs traditionally use short variable names, and i
is fine for a single index, but for this loop where we are stepping down two lists looking for matches I feel we need something better than i
and j
. Hence ax
and rx
for the added and removed index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Es ok, I figured it out pretty quickly. I just have PTSD from a different codebase using completely terrible names, so when I see 1-2 letter vars I look into it too hard 🔨 Was mainly clarifying for myself/anyone else.
swupd/rename.go
Outdated
rx++ | ||
} | ||
} | ||
// Link things with the same names skippingdigits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/skippingdigits/skipping digits ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add the missing space in "skippingdigits".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After explanations from Tudor and Matthew things are clearer. I'll take a look at this after lunch.
swupd/rename.go
Outdated
} | ||
|
||
// trimrenamed returns an slice which has had files with DeltaPeers purged | ||
func trimrenamed(a []*File) []*File { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: to be consistent with the rest of the codebase, prefer camelCase names like trimRename, linkRenamePair and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. My C background is showing! Please continue to call these out.
swupd/rename.go
Outdated
a.DeltaPeer = b | ||
b.DeltaPeer = a | ||
a.Rename = true | ||
b.Rename = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to set b.Type = TypeUnset
to match current swupd-server behavior. Probably rename a
and b
to renameTo
and renameFrom
or similar to avoid confusion as well.
) | ||
|
||
// BSDIFFSIZE is smallest size of file to consider using bsdiff | ||
const BSDIFFSIZE = 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I've added a similar constant (but with a more detailed name and background story) in the delta integration PR. If that PR lands first, reuse it.
swupd/rename.go
Outdated
return // nothing to rename | ||
} | ||
added = trimrenamed(added) // Make copies of input slices, tidy up whilst we are here | ||
removed = trimrenamed(removed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the tidying up here. Both lists shouldn't have any DeltaPeer at this point. Maybe you just need a function to copy the slices (or copy them inline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy inline would work, I was just reusing something I had around....
swupd/rename.go
Outdated
// First remove small files (not worth sending diff) and files which have | ||
// exact match | ||
added = trimsmall(trimrenamed(added), BSDIFFSIZE) // Make it explicit we are doing two steps | ||
removed = trimsmall(trimrenamed(removed), BSDIFFSIZE) // Can trivially make it one pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with your comment. Do that in a single pass, otherwise you allocate space twice for both trims.
swupd/rename.go
Outdated
sort.Slice(pairs, func(a, b int) bool { | ||
if pairs[a].n == pairs[b].n { // Same stripped name, sort on original name | ||
// mainly for python2.x vs 3.x | ||
return pairs[a].f.Name < pairs[b].f.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if would be sufficient to use sort.SliceStable in this case. (If is just the Python case looks so)
swupd/rename.go
Outdated
added = trimsmall(trimrenamed(added), BSDIFFSIZE) // Make it explicit we are doing two steps | ||
removed = trimsmall(trimrenamed(removed), BSDIFFSIZE) // Can trivially make it one pass. | ||
//generate the pairs of *File and short name | ||
pa := makePairedNames(added) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: the section from here to the end of the function is a strong candidate to have some unit test cases, so I would extract it into a separate function and have some concrete tests. I know there are wider integration tests, but for more "hairy" cases of this logic, a test without the rest of the luggage might come handy.
swupd/rename.go
Outdated
// digits so a new release of a shared library is matched to previous ones | ||
type pairednames struct { | ||
f *File | ||
n string // Filename with digits removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: would call it 'short' or 'shortName'. With all things equal, 'n' sounds like more a number than a reference to the filename without the digits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
swupd/rename.go
Outdated
r := make([]*File, 0, len(a)) | ||
for _, f := range a { | ||
if f.Info == nil { | ||
panic("Missing f.Info for " + f.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return error, and let the caller return the error up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a logic error. The information needs to be passed in. This is like assert() in C, it is a "shouldn't happen" case.
swupd/rename.go
Outdated
continue | ||
} | ||
if bundleChroot == "" { | ||
bundleChroot = filepath.Join(c.imageBase, fmt.Sprint(m.Header.Previous), m.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you can set bundleChroot (unconditionally) outside the loop before it starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, but I am setting it conditionally as there is a reasonable chance bundleChroot will not be needed at all.
partner: []int{-1, 1, 0}}, | ||
{description: "Rename files pairwise longer reversed should link", | ||
remove: []string{"S1", "L3", "S2", "L1"}, add: []string{"L4", "L2"}, | ||
partner: []int{-1, 0, -1, 1}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: it is easy to get confused with partner indices. I would consider making the expected result look more like "L3->L4 L1->L2" or ["L3->L4", "L1->L2"]. It is slightly more work in the test mechanics but I think pays off later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is easy to get confused with partner indicies, and yes, I did think about expressing the renames differently. I will try implementing it and see how it goes.
_rwx = syscall.S_IFREG + 0644 | ||
) | ||
files := []struct { | ||
n string // short name for the tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ended up being confusing, because there's also the notion of "shortname" above that is the name without digits.
Optional: I think the indirection here doesn't add much. I would explore one of those two venues.
(a) Instead of having generateTestArray, have a "createTestFile" (with a better name), that takes the parameters and creates a *File (or something else) with the properties you want. Then in the top of the test you create those assigning variable names that could act as your short name. The arrays would use those variables instead of strings.
(b) Use the real filenames in the test. The testcase entries will be multiline, but I think its fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention with the generateTestArray is to provide a pool of names that tests can draw on, rather than having each test create their own. I look at some of the manifest tests for example and it seems like 90% of the test is setup. It probably needs more tests to be written to make the advantage clear.
Perhaps I should set the field which has the name without digits to be partialName, which fits in better with another use case for the field as well.
Initially I did use the full names in the tests, but they made the tests long enough that my eyes started glazing over. When you have several similar but different names I found the L1, L2 type short names much easier to reason about.
swupd/rename.go
Outdated
continue | ||
} | ||
if bundleChroot == "" { | ||
bundleChroot = filepath.Join(c.imageBase, fmt.Sprint(m.Header.Previous), m.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The panic happens here because the bundle-chroot does not exist anymore when this happens, and only "full" is available. This should always use "full" instead of m.Name since not everyone will build with --keep-chroots flag on, so there won't be a dir to stat() anymore. 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be resolved. No more panics for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did what @tmarcu told me to do!
Just pushing a rebase to fix the delta.go conflict
eb81eab
to
0ef89de
Compare
0ef89de
to
0e2f327
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after you rebase and fix conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good comment on this.
I want to run a bigger test on this like last time before this is merged.
eaab8c6
to
142730f
Compare
The last commit looks wrong. |
You are welcome to tell me why. |
Because it is the commit message from my swupd index commit but the content of the commit from my rename tests pr |
And now that I'm looking at it the "Add CreateManifests tests..." Commit does not actually contain the tests. |
Needed for the rename code. Currently the old Manifest does not have the Info field populated in its files, but the rename code wants to know the size of the files to see if it makes sense to use a deleted file as the source of a rename. This means that we will need to call os.Stat on the files, and we need the config to tell us where they are. Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Create a function `sizer` (Probably ought to be Sizer or a bettername) that allows one to specify what Size() returns. Used in testing. Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Address some review comments Fix m.Name -> "full" to avoid panic in mixer tests Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
The test creates a file "foo1" in version 10 with contents "foo1", and a file "foo" with contents "foo". In version 20, the file "foo" is removed. In version 30, the file "foo" is created with contents "foo1", and the file "foo1" is removed. With rename detection, it correctly identifies that the version 20 foo1 file should be renamed to the version 30 foo file. Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Signed-off-by: Matthew Johnson <matthew.johnson@intel.com>
Clear the File.Rename flags on the old manifest. This is symptomatic of a deeper issue, we are reusing data but the data structures are set up incorrectly. A rename is a property of a pair of releases, not a file. Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
The names were missing a leading / Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
142730f
to
16111db
Compare
Time for a code review.