Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename wip #128

Merged
merged 8 commits into from
Feb 6, 2018
Merged

Rename wip #128

merged 8 commits into from
Feb 6, 2018

Conversation

icarus-sparry
Copy link

Time for a code review.

Copy link
Contributor

@matthewrsj matthewrsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code performs renames on files where the hash does not change and on files where the name minus digits is the same, correct? Did I miss a case? How does this do compared to the current rename code in swupd-server?

swupd/rename.go Outdated
func trimrenamed(a []*File) []*File {
r := make([]*File, 0, len(a))
for _, f := range a {
if f.Status == StatusGhosted {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, a ghosted file can be a renamedfrom

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Perhaps I need to think a little more about what a ghosted file can be.

We certainly need to add a test case for this.

My reasoning is that a ghosted file shouldn't be a candidate for a renamefrom, as the lifetime of the file is not managed by swupd-client. In particular a ghosted file could be deleted a minute after it is installed. I agree that a swupd verify --fix would find it missing, and reinstall it, which could be annoying. An example I can think of is kernel modules being removed from /usr/lib/modules when they have been added to an initrd, which is stored on another partition.

Since you added the ghosted code, do you think that there is a requirement for the ghosted files to exist until there is an update? Put it another way, do you think 'ghosted' means that the files are already dead, or just that we don't have to kill them?

Of course we shouldn't panic here, just skip the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of the ghosted file is to enable renames for kernel files. I think currently we actually deprecate old ghosted files immediately after every build, which is the wrong answer (I will add a unit test to check this and fix if necessary). In that case the client would attempt a retry but fail and proceed as normal - we do know that is the current state of the client.

Ghosted only means we don't have to kill them on the client.

swupd/rename.go Outdated
}

// makepairs returns a sorted by name, disregarding digits
// secondary sort key is the original name. This is mainly inteneded for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/inteneded/intended/

swupd/rename.go Outdated
// secondary sort key is the original name. This is mainly inteneded for
// cases like with python we have the same filename under both python3.6 and python2.7
// it would look weird to rename the old python2.7/file1 to python3.6/file2 and at
// the same time rename the old python3.6/file1 to python2.7/file2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice example.

swupd/rename.go Outdated
pair[i].n = strings.Map(stripdigits, f.Name)
}
sort.Slice(pair, func(i, j int) bool {
if pair[1].n == pair[j].n { // Same striped name, sort on original name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/striped/stripped/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pair[1]? Should it be pair[i]?

swupd/rename.go Outdated
// cases like with python we have the same filename under both python3.6 and python2.7
// it would look weird to rename the old python2.7/file1 to python3.6/file2 and at
// the same time rename the old python3.6/file1 to python2.7/file2.
func makepairs(a []*File) []pairednames {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really understanding the makepairs name here since it doesn't pair any filenames here. Do you instead mean makepares and parednames or something since you are kind of paring the filenames?

}

// filelist takes a list of wanted shortnames, looks them up in sn, and returns
// the a list of pointers to copies of the Files. Because they are copies they can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the or a at the beginning of this line.


// checkLinked tests forward and backward pairing of DeltaPeers
func checkLinked(t *testing.T, a, b *File, description string) bool {
if a.DeltaPeer != b {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You check if b is nil below but not if a is nil here?

partner []int // Indexes of add that should match remove
}{
{description: "Rename of file, same contents",
remove: []string{"I1", "I2"}, add: []string{"I2"}, partner: []int{0, -1}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I1 and I2 are removed but I2 is also added? When could this be possible?....... Oh, shortnames. Also forgot that sn is being generated.

I'm leaving this so you can see my thought process, no action needed.

for _, tc := range tests {
add := filelist(t, sn, tc.add)
remove := markdelete(filelist(t, sn, tc.remove))
// fmt.Println(add)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug print.


func TestRename(t *testing.T) {
sn := genTestArray(t)
// for f := range sn {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

swupd/rename.go Outdated
if f.Status == StatusGhosted {
// TODO. Do we allow ghosted files to be source files for renames?
// I think not, as we don't manage their lifetimes, so they could
// be deleted. The counter argument is that rename is only an optimization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counter argument is essentially my argument. The whole point of ghosted files is to allow renames with them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will allow ghosted files as sources for renames. It makes the code shorter, even if it makes the code longer.

@matthewrsj
Copy link
Contributor

I will add a unit test to check this and fix if necessary

I'll add it with this PR as a base.

matthewrsj
matthewrsj previously approved these changes Feb 1, 2018
Copy link
Contributor

@matthewrsj matthewrsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this implementation. @cmarcelo please take a look.

@cmarcelo
Copy link
Contributor

cmarcelo commented Feb 1, 2018

...but the rename code wants to know the size of the files to see if it makes sense to use a deleted
file as the source of a rename.

Questions: does it ever make sense to use a deleted file as a source of rename? Also, isn't checking the Status flag enough?

@matthewrsj
Copy link
Contributor

does it ever make sense to use a deleted file as a source of rename?

Deleted files are the only source of renames so it always makes sense to use a deleted file as the source, unless it is too small.

Also, isn't checking the Status flag enough?

Not if the file is under 200 bytes. Experimentation with bsdiff in the old swupd-server showed us that bsdiff does not perform well with files under that threshold, sometimes even resulting in segfaults I believe.

swupd/rename.go Outdated
rx++
default: // Equal hash, so link
linkrenamepair(af, rf)
ax++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means 'added index' I assume?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said in one of my code reviews that using i and j wasn't very helpful as the two index values, as there was no connect in my mind between i and one of the added and removed slices, and therefore I needed to look very hard to see if the correct one was being incremented. Go programs traditionally use short variable names, and i is fine for a single index, but for this loop where we are stepping down two lists looking for matches I feel we need something better than i and j. Hence ax and rx for the added and removed index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Es ok, I figured it out pretty quickly. I just have PTSD from a different codebase using completely terrible names, so when I see 1-2 letter vars I look into it too hard 🔨 Was mainly clarifying for myself/anyone else.

swupd/rename.go Outdated
rx++
}
}
// Link things with the same names skippingdigits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/skippingdigits/skipping digits ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add the missing space in "skippingdigits".

Copy link
Contributor

@cmarcelo cmarcelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After explanations from Tudor and Matthew things are clearer. I'll take a look at this after lunch.

swupd/rename.go Outdated
}

// trimrenamed returns an slice which has had files with DeltaPeers purged
func trimrenamed(a []*File) []*File {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: to be consistent with the rest of the codebase, prefer camelCase names like trimRename, linkRenamePair and so on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. My C background is showing! Please continue to call these out.

@matthewrsj matthewrsj dismissed their stale review February 1, 2018 23:13

seeing issues now

swupd/rename.go Outdated
a.DeltaPeer = b
b.DeltaPeer = a
a.Rename = true
b.Rename = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to set b.Type = TypeUnset to match current swupd-server behavior. Probably rename a and b to renameTo and renameFrom or similar to avoid confusion as well.

)

// BSDIFFSIZE is smallest size of file to consider using bsdiff
const BSDIFFSIZE = 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I've added a similar constant (but with a more detailed name and background story) in the delta integration PR. If that PR lands first, reuse it.

swupd/rename.go Outdated
return // nothing to rename
}
added = trimrenamed(added) // Make copies of input slices, tidy up whilst we are here
removed = trimrenamed(removed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the tidying up here. Both lists shouldn't have any DeltaPeer at this point. Maybe you just need a function to copy the slices (or copy them inline).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy inline would work, I was just reusing something I had around....

swupd/rename.go Outdated
// First remove small files (not worth sending diff) and files which have
// exact match
added = trimsmall(trimrenamed(added), BSDIFFSIZE) // Make it explicit we are doing two steps
removed = trimsmall(trimrenamed(removed), BSDIFFSIZE) // Can trivially make it one pass.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your comment. Do that in a single pass, otherwise you allocate space twice for both trims.

swupd/rename.go Outdated
sort.Slice(pairs, func(a, b int) bool {
if pairs[a].n == pairs[b].n { // Same stripped name, sort on original name
// mainly for python2.x vs 3.x
return pairs[a].f.Name < pairs[b].f.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if would be sufficient to use sort.SliceStable in this case. (If is just the Python case looks so)

swupd/rename.go Outdated
added = trimsmall(trimrenamed(added), BSDIFFSIZE) // Make it explicit we are doing two steps
removed = trimsmall(trimrenamed(removed), BSDIFFSIZE) // Can trivially make it one pass.
//generate the pairs of *File and short name
pa := makePairedNames(added)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: the section from here to the end of the function is a strong candidate to have some unit test cases, so I would extract it into a separate function and have some concrete tests. I know there are wider integration tests, but for more "hairy" cases of this logic, a test without the rest of the luggage might come handy.

swupd/rename.go Outdated
// digits so a new release of a shared library is matched to previous ones
type pairednames struct {
f *File
n string // Filename with digits removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: would call it 'short' or 'shortName'. With all things equal, 'n' sounds like more a number than a reference to the filename without the digits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

swupd/rename.go Outdated
r := make([]*File, 0, len(a))
for _, f := range a {
if f.Info == nil {
panic("Missing f.Info for " + f.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return error, and let the caller return the error up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a logic error. The information needs to be passed in. This is like assert() in C, it is a "shouldn't happen" case.

swupd/rename.go Outdated
continue
}
if bundleChroot == "" {
bundleChroot = filepath.Join(c.imageBase, fmt.Sprint(m.Header.Previous), m.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you can set bundleChroot (unconditionally) outside the loop before it starts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, but I am setting it conditionally as there is a reasonable chance bundleChroot will not be needed at all.

partner: []int{-1, 1, 0}},
{description: "Rename files pairwise longer reversed should link",
remove: []string{"S1", "L3", "S2", "L1"}, add: []string{"L4", "L2"},
partner: []int{-1, 0, -1, 1}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: it is easy to get confused with partner indices. I would consider making the expected result look more like "L3->L4 L1->L2" or ["L3->L4", "L1->L2"]. It is slightly more work in the test mechanics but I think pays off later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is easy to get confused with partner indicies, and yes, I did think about expressing the renames differently. I will try implementing it and see how it goes.

_rwx = syscall.S_IFREG + 0644
)
files := []struct {
n string // short name for the tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up being confusing, because there's also the notion of "shortname" above that is the name without digits.

Optional: I think the indirection here doesn't add much. I would explore one of those two venues.

(a) Instead of having generateTestArray, have a "createTestFile" (with a better name), that takes the parameters and creates a *File (or something else) with the properties you want. Then in the top of the test you create those assigning variable names that could act as your short name. The arrays would use those variables instead of strings.
(b) Use the real filenames in the test. The testcase entries will be multiline, but I think its fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention with the generateTestArray is to provide a pool of names that tests can draw on, rather than having each test create their own. I look at some of the manifest tests for example and it seems like 90% of the test is setup. It probably needs more tests to be written to make the advantage clear.

Perhaps I should set the field which has the name without digits to be partialName, which fits in better with another use case for the field as well.

Initially I did use the full names in the tests, but they made the tests long enough that my eyes started glazing over. When you have several similar but different names I found the L1, L2 type short names much easier to reason about.

@cmarcelo cmarcelo added the swupd label Feb 1, 2018
@matthewrsj matthewrsj mentioned this pull request Feb 2, 2018
tmarcu
tmarcu previously requested changes Feb 5, 2018
swupd/rename.go Outdated
continue
}
if bundleChroot == "" {
bundleChroot = filepath.Join(c.imageBase, fmt.Sprint(m.Header.Previous), m.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic happens here because the bundle-chroot does not exist anymore when this happens, and only "full" is available. This should always use "full" instead of m.Name since not everyone will build with --keep-chroots flag on, so there won't be a dir to stat() anymore. 🙌

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be resolved. No more panics for me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did what @tmarcu told me to do!

Just pushing a rebase to fix the delta.go conflict

Copy link
Contributor

@matthewrsj matthewrsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after you rebase and fix conflicts

tmarcu
tmarcu previously approved these changes Feb 6, 2018
Copy link
Contributor

@tmarcu tmarcu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment on this.

@tmarcu tmarcu dismissed their stale review February 6, 2018 00:45

I want to run a bigger test on this like last time before this is merged.

@icarus-sparry icarus-sparry force-pushed the rename-wip branch 2 times, most recently from eaab8c6 to 142730f Compare February 6, 2018 05:10
@matthewrsj
Copy link
Contributor

The last commit looks wrong.

@icarus-sparry
Copy link
Author

You are welcome to tell me why.

@matthewrsj
Copy link
Contributor

Because it is the commit message from my swupd index commit but the content of the commit from my rename tests pr

@matthewrsj
Copy link
Contributor

And now that I'm looking at it the "Add CreateManifests tests..." Commit does not actually contain the tests.

Icarus Sparry and others added 8 commits February 6, 2018 08:20
Needed for the rename code. Currently the old Manifest does not have
the Info field populated in its files, but the rename code wants to
know the size of the files to see if it makes sense to use a deleted
file as the source of a rename.

This means that we will need to call os.Stat on the files, and we need
the config to tell us where they are.

Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Create a function `sizer` (Probably ought to be Sizer or a bettername)
that allows one to specify what Size() returns. Used in testing.

Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Address some review comments
Fix m.Name -> "full" to avoid panic in mixer tests

Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
The test creates a file "foo1" in version 10 with contents "foo1", and
a file "foo" with contents "foo".

In version 20, the file "foo" is removed.

In version 30, the file "foo" is created with contents "foo1", and the
file "foo1" is removed.

With rename detection, it correctly identifies that the version 20
foo1 file should be renamed to the version 30 foo file.

Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Signed-off-by: Matthew Johnson <matthew.johnson@intel.com>
Clear the File.Rename flags on the old manifest.
This is symptomatic of a deeper issue, we are reusing data but the
data structures are set up incorrectly. A rename is a property of a
pair of releases, not a file.

Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
The names were missing a leading /

Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants