Skip to content

Releases: MicrosoftDX/Dash

Complete CORS Support

02 Jul 00:50
Compare
Choose a tag to compare

Complete support has been added to enable use of DASH endpoints by browser-based javascript clients that would normally be constrained by cross-origin restrictions. The Azure Storage API (since API version 2013-08-15) enables support for Cross-Origin Resource Sharing (CORS) via calling the Set Blob Service Properties API (https://msdn.microsoft.com/en-us/library/azure/hh452235.aspx) and then the Preflight Blob Request OPTIONS (https://msdn.microsoft.com/en-us/library/azure/dn535599.aspx). Previous versions of DASH did not fully support setting the service properties which prevented the preflight request authorizing any hosts.

Additionally, explicit support has been added to enable XMLHttpRequest callers to indicate that they handle redirection responses issued by DASH by setting the x-ms-dash-client request header (the normal mechanism of updating the User-Agent header is unavailable to XHR callers).

A new integration test suite project has been added (derived from the Azure Storage CORS sample - https://code.msdn.microsoft.com/Windows-Azure-Storage-CORS-45e5ce76) to enable full regression testing of CORS support in future releases.

Fixed resource leak

05 Jan 03:13
Compare
Choose a tag to compare

Fixed a critical resource leak that was discovered after the v1.0 release shipped. For each successful DELETE operation the response from the Azure Storage service was not being correctly disposed. If many consecutive (> 15) DELETE operations occurred in a short period of time (< 1 sec), the HttpClient object used to forward the requests to the service could block, causing the server to freeze.

The service responses are now correctly disposed, thus fixing this error.

Production Release of Data At Scale Hub

30 Dec 20:33
Compare
Choose a tag to compare

We're excited to announce that this latest release is our v1.0 - DASH is now a production release! With this release, we believe DASH is appropriate for use in production scenarios.

The main new feature added in this release is a new Management API and Portal. Additional small improvements and bugfixes are also included.

Management API and Portal

DASH now exposes a new HTTP(S) WebAPI endpoint to remotely manage configuration of the deployment. Additionally, a new single page application can also be served from this same endpoint which can act as a management portal.

Current functionality exposed by the management api is limited to:

  1. Updating configuration - all of the configuration entries may be viewed and updated
  2. Provisioning of new scale-out storage accounts - if a new storage account is added to the configuration it will be automatically provisioned in Azure with the appropriate location and replication characteristics
  3. Software update - as new releases of the DASH software are published, they may be viewed in the Management Portal. Additionally, different versions of the software may be deployed without any downtime to the service

The api and portal are secured by Azure Active Directory. See this link for documentation on how to setup authentication: https://github.com/MicrosoftDX/Dash/wiki/Configure-Authentication-for-Management-API.

The Management Portal documentation is located here: https://github.com/MicrosoftDX/Dash/wiki/Dash-Management-Portal. The Management API is documented here: https://github.com/MicrosoftDX/Dash/wiki/Dash-Management-API

In the future, the Management API & Portal will be extended to provide monitoring of the Dash service itself as well as metrics for the overall performance of the virtual account.

Other Enhancements

  1. Support has been added to fully enable Azure Batch (https://azure.microsoft.com/en-us/services/batch/) to leverage the read replica feature of Dash thus improving scale for read throughput. Previously, Batch jobs had to perform their own I/O to gain the benefits of Dash. With this change, the native Azure Batch I/O library will be fully supported by Dash.
  2. Added support for Azure Storage versions 2015-02-21 and 2015-04-05. These versions clarified a previously ambiguous requirement for content-length in the string to sign. The new version is now unambiguous and is authenticated correctly by Dash.

Bug Fixes

  1. Fixed resource leak when forwarding/proxying requests to data account. Under extreme cases, the resource leak was observed to be sufficient to exhaust TCP/IP ports on the host server.
  2. Fixed multiple bugs associated with incorrectly encoding/decoding URIs containing special characters. In the worst case, the namespace blob exhibited a different name encoding that the associated data blob and could therefore not be resolved.

Let Us Know What You Think

Please feel free to drop the team a note on Github if you have thoughts about Dash or scalable storage generally. We're always looking out for new scenarios that improve the scalability and therefore applicability of Azure Storage.

Enjoy!

v0.4

30 Jul 18:51
Compare
Choose a tag to compare
v0.4 Pre-release
Pre-release

This release adds support for a new type of workload to dramatically improve read performance. Importing/expanding data accounts and SAS urls are now also supported.

Read Replicas

For compute intensive workloads that must concurrently spin-up a large number of computers, read a relatively small reference dataset and then compute an outcome, a storage challenge exists to not have the reference dataset readers throttled at startup. For a single Azure Storage Account, the throughput limit of 30/20 Gbps (US) 15/10 Gbps (Elsewhere - see https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/) can become a constraining factor, even when reading a relatively small dataset (~10GB) when you have > 1000 readers. The Azure Batch Service (http://azure.microsoft.com/en-us/services/batch/) identified this as a recurring pattern in their customer's workloads.

To address this constraint a feature has been added to Dash whereby an identified blob will be asynchronously replicated to all available data accounts. Subsequent read operations will randomly select one of the replicas to redirect the client to, thus distributing read load across all of the available data accounts.

Not all blobs will be replicated. Blobs may be identified for replication either by attaching special metadata to the blob (see ReplicationMetadataName and ReplicationMetadataValue configuration options) or by configuring a regular expression (see ReplicationPathPattern configuration option) that will be matched against the path name of the blob.

SAS Url Support

Support has been added to Dash so that Shared Access Signature (SAS) keys may be specified as query parameters to requests as a form of authentication. The SAS feature in Dash is fully compatible with the same feature in Azure Storge, including client library support. SAS urls are fully described here: https://msdn.microsoft.com/en-us/library/azure/ee395415.aspx.

Importing New Data Accounts

Additional data accounts may now be added to an existing Dash deployment. The data accounts to be added may be empty, in which case additional capacity is added to the virtual account. Data accounts with existing data blobs may also be imported. In this case, the blobs contained in the account are listed and added to the namespace. After the namespace has been updated, the existing blobs will be accessible via the Dash endpoint in exactly the same way as any other blob.

At this stage we have not added support to detach a data account from Dash. Although the implementation for this feature would be trivial, we do not yet have any requests to implement it and therefore will wait.

One point to note about any existing blobs that are imported from a new data account is that they will NOT be automatically replicated, even if their metadata or blob name match the configuration. To force an imported blob to be replicated, simply 'touch' the blob by updating a property or adding benign metadata - once this write operation is processed, the blob will be replicated.

To import a data account to an existing Dash deployment, update the next available ScaleoutStorage configuration entry with the connection string of the account. Add the account name to the ImportAccounts configuration entry (a comma separated list) and restart the server. At server startup, the account will be imported. Protections exist such that the same account will only be imported once.

In the future, the mechanism to import accounts will be included in our Management API feature.

Azure Virtual Network Support

In conjunction with our friends in the HDInsight http://azure.microsoft.com/en-us/services/hdinsight/ team we identified a networking bottleneck when a large HDInsight cluster (> 128 data nodes) performed I/O through Dash. Given that all data nodes in a HDInsight cluster contain only private IP addresses, it is necessary for network traffic to flow through a Source Network Address Translation (SNAT) device so that Dash (or any other destination host) knows where to send it's responses. Given the extreme volume of requests flowing from such a large cluster, we found that the number of available SNAT ports were being exhausted on the cluster and traffic was being throttled.

The solution to overcome this extreme limit is to deploy both the HDInsight cluster and Dash into the same Azure Virtual Network (VNet) and configure the Internal Load Balancer (ILB) for Dash to communicate directly with the HDInsight data nodes.

We have updated our cloud deployment configurations to support this deployment topology.

Other Features

  • The base VM SKU for Dash is now a D3. Research across available SKUs indicated that D3 provides the best combination of network & memory resources for the best price.

Bug Fixes

  • Fixed issue with handling specifically encoded blob names.
  • Corrected error in List Blobs handler for formatting the name of a snapshot.
  • Fixed bug in Copy Blob handler to correctly copy page blobs.
  • Fixed incorrect response to Get Service Properties handler. This is required so that other service can identify the endpoint as supporting the Azure Blob Storage protocol.

Obtaining Dash

Pre-built binaries for this release are published here: https://www.dash-update.net/DashServer/v0.4 - this is a link to a configuration manifest file, which includes references to the artifacts required for the desired configuration (HTTP, HTTPS, ILB).

Download the files listed under the desired configuration - there are 2 files per configuration - the binary package (.cspkg) and configuration (.cscfg). Update the configuration file with appropriate values and then deploy both files to Azure as a Cloud Service.

Future improvements will enable updating of an existing Dash deployment with new versions as well as improved initial deployment using Azure Resource Manager (ARM) templates.

Modified Azure Storage client SDKs are available at the following locations - note that it is not necessary to use our modified versions (the official versions work), but certain operations are more performant using the modified libraries:

Alternatively, clone the code repository, update the configuration file and deploy directly from Visual Studio or using your own cloud deployment mechanism.

See readme.md for details on how to build and deploy DASH.

v0.3

15 May 21:10
Compare
Choose a tag to compare
v0.3 Pre-release
Pre-release

This release adds an important security feature to enable clients to confidently follow the redirect locations specified by the Dash server. Full support has also been added so that clients that are incapable of supporting redirection will be proxied by Dash, making the virtual storage account accessible to all clients.

Signing Redirection Responses

When the Dash server responds to a storage request by specifying a redirection (HTTP status code: 302) it is expected that the client will automatically follow that redirection to enable getting or putting blob data in the appropriately sharded location. An important security ramification of this is that the client must explicitly trust the locations returned by Dash as these locations are opaque SAS urls and cannot be verified by the client.

With this release the server will now add a new response header, x-ms-redirect-signature that will include the shared key signature (using the account key specified by the corresponding request). Clients may optionally use this signature to verify that the redirection location has not been tampered with by a man-in-the-middle attacker or an interloper that does not hold the account key. As always, use of HTTPS is recommended for all traffic to Dash to avoid the chance of an interloper sniffing the data.

The Java client SDK has been updated to include verification of this signature. The updated client library is available at: https://www.dash-update.net/client/v0.3/StorageSDK2.0/dash-azure-storage-2.0.0.jar. The modified .NET client is available at: https://www.dash-update.net/client/v0.3/StorageSDK2.0/Microsoft.WindowsAzure.Storage.dll.

Full Proxy Support

Many Azure Storage client libraries are incapable of automatically following the redirection responses issued by the Dash server. Additionally, some data payloads/operations do not warrant high throughput capabilities. To address these and other scenarios a full proxy mode has been added to Dash.

When a storage API request is received by a Dash server, it will now automatically determine if the remote client is capable of following redirects (using a variety of information including; User-Agent string, presence of Expect:100-Continue header). If it is determined that the remote client cannot support redirection for the request, Dash will now forward the request onto the Azure Storage account selected for the blob associated with the request.

Given that acting as a proxy significantly reduces the throughput and scalability of these requests, this capability is only intended to be used for situations where high throughput is not required, but blobs are required for high throughput scenarios is other aspects of an end to end workload (eg. Initial population of a reference data set subsequently to be used in a high throughput computation).

Other Features

  • Support has been added for a secondary account key. This allows the 'rolling over' of account keys in a gradual manner identical to support in the Azure Storage service.

See readme.md for details on how to build and deploy DASH.

v0.2

30 Mar 01:26
Compare
Choose a tag to compare
v0.2 Pre-release
Pre-release

This is the initial public release of DASH - a scale-out virtual storage account for Azure Blob Storage. Included in this release:

  • Scale-out virtual account support for all Page and Block blobs using HTTP redirections for capable clients.
  • Scale-out virtual account support for all Page and Block blobs via forwarding proxy for clients that are not capable of supporting redirections (.NET Storage SDK for PUT requests, Java Storage SDK).
  • Account and container operations for virtual account.

The following is NOT included in this release:

  • Table and Queue support.
  • (Upcomming) Rebalancing of data storage accounts.
  • (Upcomming) SAS URLs for blobs in virtual account.
  • (Upcomming) Replication of blobs to provide read throughput scale-out.
  • (Upcomming) Metrics and logs for virtual storage account.
  • (Possible future) Sharding at the block/page level. Current unit of sharding is at the blob level.

See readme.md for details on how to build and deploy DASH.