Conversation
@@ -36,6 +40,8 @@ public class DnxCatalogCollector : CommitCollector | |||
: base(index, telemetryService, handlerFunc, httpClientTimeout) | |||
{ | |||
_storageFactory = storageFactory ?? throw new ArgumentNullException(nameof(storageFactory)); | |||
_sourceStorage = preferredPackageSourceStorage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null checks, unless it'sm ok for it to be null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK to be null
.
await TransferManager.CopyAsync(copiableContent.SourceBlob, blob, isServiceCopy: true); | ||
|
||
// Copying (above) did not apply these properties so apply them again. | ||
blob.Properties.ContentType = content.ContentType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to apply all properties? (future proofing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the only properties we set today are ContentType and CacheControl.
var destinationStorage = _storageFactory.Create(packageId); | ||
var destinationUri = destinationStorage.GetUri(DnxMaker.GetRelativeAddressNupkg(packageId, packageVersion)); | ||
|
||
var isNupkgSynchronized = await destinationStorage.AreSynchronized(sourceUri, destinationUri); | ||
var isPackageInIndex = await _dnxMaker.HasPackageInIndexAsync(destinationStorage, packageId, packageVersion, cancellationToken); | ||
var areRequiredPropertiesPresent = await AreRequiredPropertiesPresentAsync(destinationStorage, destinationUri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AreRequiredPropertiesPresentAsync [](start = 65, length = 33)
Out of curiosity, why is this needed? Is this so allow us to fix packages that are missing properties by reflowing? Do we have cases of blobs in the flat container without these properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we set blob properties and upload blob content in one operation. As part of my changes I don't want to introduce scenarios where a blob could be uploaded but not have the correct blob properties (say, because of an exception).
The properties may be different or may have different values in flat container.
@@ -173,6 +195,70 @@ private async Task UpdatePackageVersionIndexAsync(IEnumerable<CatalogEntry> cata | |||
string version, | |||
Uri sourceUri, | |||
CancellationToken cancellationToken) | |||
{ | |||
var shouldUseSourceStorage = _sourceStorage != null; | |||
var sourceBlob = await GetBlockBlobAsync(id, version) as AzureCloudBlockBlob; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetBlockBlobAsync [](start = 35, length = 17)
What do you think of renaming GetBlockBlobAsync
to something like GetSourceBlobOrNullAsync
to clearly indicate that:
- It gets the source blob
- It returns null if the blob couldn't be fetched
Also, it seems a little weird to call GetBlockBlobAsync
after checking that _sourceStorage
isn't null. What do you think of removing the _sourceStorage
check from GetBlockBlobAsync
and checking _sourceStorage
here instead?
src/Catalog/Dnx/DnxMaker.cs
Outdated
throw new InvalidOperationException("Only Azure storage support has been implemented."); | ||
} | ||
|
||
var normalizedVersion = NuGetVersionUtility.NormalizeVersion(packageVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to normalized at this level? Can't we make this the caller's concern and renamed the parameter to normalizedVersion
?
src/Catalog/Dnx/DnxMaker.cs
Outdated
DnxConstants.ApplicationOctetStreamContentType, | ||
DnxConstants.DefaultCacheControl); | ||
|
||
await storage.Save(nupkgUri, content, cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this Save
? The implementation must check types to make this work. Seems like a broken abstraction. Can't we have a Copy
method that only knows how to copy? And let Save
only upload?
// single GET request returns the whole blob in a consistent state, but we're reading the blob many | ||
// different times. To detect the blob changing between reads, we check the ETag again later. | ||
// If the ETag's differ, we'll fall back to using a single HTTP GET request. | ||
var etag = sourceBlob.ETag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only necessary in the off chance that we have to catalog2dnx instances running? Or is there some other way to have someone else updating the blob?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⌚️
Seems like the copy abstraction is wrong here. The as
plus the InvalidOperationException
in GetContentStream
are smelly.
|
||
if (string.IsNullOrEmpty(nuspec)) | ||
{ | ||
_logger.LogWarning("No .nuspec available for {Id}/{Version}. Skipping.", id, version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping [](start = 81, length = 8)
Is Skipping
correct? It looks like this package will be processed by ProcessPackageDetailsViaHttpAsync
. If so, how about Falling back to HTTP processing.
?
@@ -13,35 +13,30 @@ namespace NuGet.Services.Metadata.Catalog.Persistence | |||
{ | |||
public sealed class AzureCloudBlockBlob : ICloudBlockBlob | |||
{ | |||
private readonly CloudBlockBlob _blob; | |||
public CloudBlockBlob Blob { get; } | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: usually properties are placed after the constructors. Same for ETag
and Uri
.
10b9fef
to
19625cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Feed2Catalog: replace inner loops with one outer loop (#329) Resolve NuGet/Engineering#1526. * V3: rename asynchronous IStorage methods (#336) Progress on NuGet/NuGetGallery#6267. * Test: fix flaky basic search tests (#337) Progress on NuGet/NuGetGallery#6292. * Add signing to search service (#338) Progress on https://github.com/NuGet/Engineering/issues/1644 * Test: add registration tests (#341) Progress on NuGet/NuGetGallery#6317. * remove unused scripts (#340) * MonitoringProcessor: improve throughput (#342) Progress on NuGet/NuGetGallery#6327. * Catalog2Dnx: improve throughput (#335) Progress on NuGet/NuGetGallery#6267. * Functional tests should not be dependent on a specific hash and file size of the test package (#343) * Add MicroBuild dependency and signing of output DLLs (#345) Progress on https://github.com/NuGet/Engineering/issues/1644 * Rewrite SafeRoleEnvironment to not use Assembly.LoadWithPartialName (#339) * Test: improve and add registration tests (#346) More progress on NuGet/NuGetGallery#6317. * Use ServerCommon commit that is consistent with other repositories (#347) Progress on https://github.com/NuGet/Engineering/issues/1644 * [Monitoring] Ensure packages are signed (#348) This adds a validation to ensure indexed packages are signed by verifying that the package contains a [package signature file](https://github.com/NuGet/Home/wiki/Package-Signatures-Technical-Details#-the-package-signature-file). This validation is disabled for now. To enable this validation, the flag `-expectsSignature true` must be added to ng.exe's command line arguments. This validation will be enabled once all packages are repository signed. Part of NuGet/Engineering#1627 * V3: make stylistic consistency and cleanup changes (#349) Progress on NuGet/NuGetGallery#6411.
Progress on NuGet/NuGetGallery#6267.
This is an optimization for Azure storage-based usage of Catalog2Dnx. This change:
This results in huge savings in time and disk I/O.
Catalog2Dnx will fall back to downloading a package via HTTP if necessary.
On a local test machine with a China storage destination I observed a ~60% improvement in processing throughput with ~4,800 packages following production package size and package entry count distributions to the nearest percentile.
Notes: