Skip to content
This repository has been archived by the owner on Aug 4, 2022. It is now read-only.

[REEF-2022] Fixed AzureBlockBlobFileSystem.GetChildren api and unit tests #1464

Merged
merged 6 commits into from Jun 20, 2018

Conversation

sharathmalladi
Copy link
Contributor

Fixed the GetChildren api and added unit test.

JIRA:
[REEF-2022] https://issues.apache.org/jira/browse/REEF-2022

@@ -162,13 +162,27 @@ public void DeleteDirectory(Uri directoryUri)
public IEnumerable<Uri> GetChildren(Uri directoryUri)
{
BlobContinuationToken blobContinuationToken = null;
var path = directoryUri.AbsolutePath.Trim('/');
string path = directoryUri.AbsolutePath.Trim('/');
Copy link
Contributor

@tyclintw tyclintw May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path work is unnecessary. Just use directoryUri.Segments #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the suggestion. #Resolved

/// </summary>
BlobResultSegment ListBlobsSegmented(string prefix, bool useFlatListing, BlobListingDetails blobListingDetails, int? maxResults,
BlobResultSegment ListBlobsSegmented(string containerName, string directoryName, bool useFlatListing, BlobListingDetails blobListingDetails, int? maxResults,
Copy link
Contributor

@tyclintw tyclintw May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"directoryName" is confusing since this could actually be a path. I suggest using "relativeAddress" to match the blob API. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #Resolved


Assert.Equal(allNames.Count(), blobs.Count());

// List files only in the sub-folder in the container
Copy link
Contributor

@tyclintw tyclintw May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sections of this test is identical to the first section of this test. Please consolidate to a helper method. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated code into a common method. #Resolved

@@ -138,6 +139,43 @@ public void TestDeleteE2E()
Assert.False(CheckBlobExists(blob));
}

[Fact(Skip = SkipMessage)]
public void TestGetChildrenE2E()
Copy link
Contributor

@tyclintw tyclintw May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test listing containers. After all the IFileSystem should work like the local file system. Containers are just root directories from an outside observer. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I added code to make it work for both containers and blobs. #Closed

Assert.Equal(expectedChildBlobNames.Count(), blobNames.Count());
}

public void ValidateChildrenWithContainers(Uri rootUri, IEnumerable<string> expectedContainerNames)
Copy link
Contributor

@tyclintw tyclintw May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason for another method. This method is basically identical to ValidateChildenWithBlobs. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm .. they are different .. in how we extract out the blob names or container names as the case may be from the output uri. I could try to consolidate it even more but I am afraid that we lose readability (containers vs blobs). Lets chat offline if you are not convinced with the reasoning. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to remove the duplicate method. #Resolved


public void ValidateChildenWithBlobs(Uri storageBlobUri, IEnumerable<string> expectedChildBlobNames)
{
IEnumerable<Uri> blobs = _fileSystem.GetChildren(storageBlobUri).ToList();
Copy link
Contributor

@tyclintw tyclintw May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToList() is unnecessary. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Got rid of it. #Resolved

IEnumerable<Uri> blobs = _fileSystem.GetChildren(storageBlobUri).ToList();
IEnumerable<string> blobNames = blobs.Select(b => b.LocalPath.Replace("/" + _container.Name + "/", string.Empty));

Assert.True(expectedChildBlobNames.All(blobName => blobNames.Contains(blobName)), "blobNames has elements that are not in expectedChildBlobNames");
Copy link
Contributor

@tyclintw tyclintw May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionAssert.AreEquivalent should be used instead. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see AreEquivalent. I do see Assert.Equal() here but that requires that the two lists be in the same order since it uses CheckIfEnumerablesAreEqual .. which is not guaranteed in the lists that I have. I could go ahead and sort and then do this, but I felt that what I have is simple enough to follow. #Resolved

IEnumerable<string> blobNames = blobs.Select(b => b.LocalPath.Replace("/" + _container.Name + "/", string.Empty));

Assert.True(expectedChildBlobNames.All(blobName => blobNames.Contains(blobName)), "blobNames has elements that are not in expectedChildBlobNames");
Assert.Equal(expectedChildBlobNames.Count(), blobNames.Count());
Copy link
Contributor

@tyclintw tyclintw May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary if CollectionAssert is used. #Closed

}

// If not at the root folder, return all blobs within the container
string containerName = segments[1];
Copy link
Contributor

@tyclintw tyclintw May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible out of bounds exception. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by changing the condition earlier by ensuring this line is not executed if segments.Count() <= 1. #Resolved

foreach (string uploadedBlobName in parentFileNames.Concat(folderfileNames))
{
CloudBlockBlob blob = _container.GetBlockBlobReference(uploadedBlobName);
UploadFromString(blob, "hello");
Copy link
Contributor

@tyclintw tyclintw May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these files cleaned up anywhere if the test fails and you want to rerun? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Dispose() method cleans up the container and everything in it. #Resolved

// setup
string[] parentFileNames = new string[] { "sample1", "sample2", "sample3" };
string[] folderfileNames = new string[] { "folder1/sample4", "folder1/sample5" };
string[] parentListNames = new string[] { "sample1", "sample2", "sample3", "folder1/" };
Copy link
Contributor

@tyclintw tyclintw May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three arrays and their naming conventions make if very difficult to follow your logic. Can this be simplified with one list and the use of select statements? Something like the following. At the very least change parentFileNames to a concrete list so the concat at 150 isn't necessary.

string[] fileNames = new string[] { "sample1", "sample2", "sample3", "folder1/sample4", "folder1/sample5" };

var rootFilenames = fileNames.GroupBy(f => f.Split('/')[0]).Select(g => g.Key);
ValidateChildenWithBlobs(_container.Uri, rootFilenames);

var folderBlobs = fileNames.Where(f => f.StartsWith("folder1");
ValidateChildenWithBlobs(folderUri, folderBlobs);

OR

string[] fileNames = new string[] { "sample1", "sample2", "sample3","folder1/sample4", "folder1/sample5" };
string[] expectedFolderChildren = new string[] { "folder1/sample4", "folder1/sample5" };
string[] expectedRootChildren = new string[] { "sample1", "sample2", "sample3", "folder1/" };
``` #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Took the second option that you proposed. #Resolved

// If not at the root folder, return all blobs within the container
string containerName = segments[1];
string relativeAddress = string.Empty;
if (segments.Count() > 1)
Copy link
Contributor

@tyclintw tyclintw May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary check since we've already guaranteed length >= 1. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed check. #Resolved

{
IEnumerable<Uri> blobs = _fileSystem.GetChildren(storageBlobUri);
Assert.Equal(
expectedChildBlobs.Select(uri => uri.AbsoluteUri).OrderBy(uri => uri),
Copy link
Contributor

@tyclintw tyclintw May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Select shouldn't necessasry #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Uri does not implement IComparer. So I had to get the AbsoluteUri (string) before doing the OrderBy. #Resolved

@sharathmalladi
Copy link
Contributor Author

@markusweimer and @motus : This pull request has been approved. Please review as needed and merge to master. Thanks!

@sharathmalladi sharathmalladi force-pushed the users/sharathm/blobgetchildren branch from 9d4cb16 to d8d82da Compare June 1, 2018 00:45
@sharathmalladi
Copy link
Contributor Author

sharathmalladi commented Jun 5, 2018

Pinging again for @markusweimer or @motus to please merge this pull request.

The test failures that I see with the appveyor failures seem unrelated to this pull request.

@motus
Copy link
Contributor

motus commented Jun 6, 2018

@sharathmalladi please give me a day or two. I really want to review and merge it, but I am in the middle of the huge #1466 now

@sharathmalladi
Copy link
Contributor Author

Pinging again for @markusweimer or @motus to please merge this pull request. There are other changes that are piling up that depend on this one.

Copy link
Contributor

@motus motus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! I am done with my round of nit picks 😄

P.S. a general comment: try to keep lines shorter than 120 characters (preferably below 100); trim trailing whitespaces; use CodeMaid or similar tool to fix these and other formatting issues for you

@@ -29,7 +29,7 @@ namespace Org.Apache.REEF.Driver.Evaluator
internal class EvaluatorRequest : IEvaluatorRequest
{
internal EvaluatorRequest()
: this(0, 0, 1, string.Empty, Guid.NewGuid().ToString("N"), string.Empty, Enumerable.Empty<string>().ToList(), true, String.Empty)
: this(0, 0, 1, string.Empty, Guid.NewGuid().ToString("N"), string.Empty, Enumerable.Empty<string>().ToList(), true, string.Empty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, why is it so complicated, with Enumerable and such? Can't we just write, say

private static readonly string NoBatchId = Guid.NewGuid().ToString("N");
private static readonly string[] NoNodeNames = { };

internal EvaluatorRequest() : this(0, 0, 1, "", NoBatchId, "", NoNodeNames, true, "")

or, if you want more readability, maybe

internal EvaluatorRequest() : this(
    number: 0,
    megaBytes: 0,
    core: 1,
    rack: "",
    NoBatchId,
    runtimeName: "",
    NoNodeNames,
    relaxLocality: true,
    nodeLabelExpression: "")

etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refined constructors by calling the relevant overrides in each place.


In reply to: 196612445 [](ancestors = 196612445)

// setup
string[] fileNames = new string[] { "sample1", "sample2", "sample3", "folder1/sample4", "folder1/sample5" };
string[] expectedFolderChildren = new string[] { "folder1/sample4", "folder1/sample5" };
string[] expectedRootChildren = new string[] { "sample1", "sample2", "sample3", "folder1/" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string arrays can be private readonly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C# local variables cannot be marked readonly. If the suggestion was to make these as fields of the TestAzureBlockBlobFileSystemE2E class instead: I feel like having to expand the scope just to make it readonly is not a great idea. Wont fix.


In reply to: 196644714 [](ancestors = 196644714)

ValidateChildren(rootUri, new List<Uri> { _container.Uri });
}

public void ValidateChildren(Uri storageBlobUri, IEnumerable<Uri> expectedChildBlobs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


In reply to: 196649314 [](ancestors = 196649314)

@@ -157,22 +157,58 @@ public void DeleteDirectory(Uri directoryUri)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Task.WaitAll(directory
    .ListBlobs(true)
    .OfType<ICloudBlob>()
    .Select(blob => blob.DeleteIfExistsAsync())
    .ToArray());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelization is better. Thanks! Fixed.


In reply to: 196651685 [](ancestors = 196651685)

var path = directoryUri.AbsolutePath.Trim('/');

// If at the root, return all containers
if (segments.Count() <= 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use segments.Length instead of Linq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


In reply to: 196652155 [](ancestors = 196652155)

}
}

blobContinuationToken = containerListing.ContinuationToken;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will raise an NPE if containerListing can ever be null. did you mean ?. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


In reply to: 196652608 [](ancestors = 196652608)

{
throw new ArgumentException("Call to ListBlobsSegmented returned no results. Uri is invalid or does not have children.", nameof(directoryUri));
}
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for else block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. All files that were touched in this pr now have less than 120 characters per line.


In reply to: 196653152 [](ancestors = 196653152)

if (listing.Results != null)
if (listing == null || listing.Results.Count() == 0)
{
throw new ArgumentException("Call to ListBlobsSegmented returned no results. Uri is invalid or does not have children.", nameof(directoryUri));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is quite long. our coding standard requires that lines should be no longer than 120 characters; we usually keep it below 100 for readability.

maxResults,
continuationToken,
blobRequestOptions,
operationContext).GetAwaiter().GetResult();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.GetAwaiter().GetResult() can be simply .Result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some discussions on this topic and it appears the two have different responses if an exception is thrown: https://stackoverflow.com/questions/17284517/is-task-result-the-same-as-getawaiter-getresult

From that discussion, I wager that GetAwaiter().GetResult() is the better option to use. If you still disagree, we can take this offline and resolve this.


In reply to: 196655236 [](ancestors = 196655236)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!


public ContainerResultSegment ListContainersSegmented(BlobContinuationToken continuationToken)
{
return _client.ListContainersSegmentedAsync(continuationToken).GetAwaiter().GetResult();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied on the other issue and I will follow the same resolution here once that is addressed.


In reply to: 196655252 [](ancestors = 196655252)

Copy link
Contributor

@motus motus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!! Will test and merge now

@motus motus merged commit 396fd82 into apache:master Jun 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants