Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,26 @@ project(":samza-core_$scalaVersion") {
}
}


project(':samza-azure') {
apply plugin: 'java'
apply plugin: 'checkstyle'

dependencies {
compile "com.microsoft.azure:azure-storage:5.3.1"
compile "com.fasterxml.jackson.core:jackson-core:2.8.8"
compile project(':samza-api')
compile project(":samza-core_$scalaVersion")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we depend on samza-core?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the coordination Apis are in samza-core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samza core has the JobCoordinator, LeaderElector and a few other interfaces that I'm implementing for Azure.

compile "org.slf4j:slf4j-api:$slf4jVersion"
testCompile "junit:junit:$junitVersion"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any tests here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add tests in the future.

}
checkstyle {
configFile = new File(rootDir, "checkstyle/checkstyle.xml")
toolVersion = "$checkstyleVersion"
}
}


project(":samza-autoscaling_$scalaVersion") {
apply plugin: 'scala'
apply plugin: 'checkstyle'
Expand Down
22 changes: 21 additions & 1 deletion docs/learn/documentation/versioned/jobs/configuration-table.html
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,8 @@ <h1>Samza Configuration Reference</h1>
<dd>Fixed partition mapping. No Zoookeeper. </dd>
<dt><code>org.apache.samza.zk.ZkJobCoordinatorFactory</code></dt>
<dd>Zookeeper-based coordination. </dd>
</dl>
<dt><code>org.apache.samza.AzureJobCoordinatorFactory</code></dt>
<dd>Azure-based coordination</dl>
Required only for non-cluster-managed applications. Please see the required value for <a href=#task-name-grouper-factory>task-name-grouper-factory </a>
</td>
</tr>
Expand Down Expand Up @@ -468,6 +469,25 @@ <h1>Samza Configuration Reference</h1>
How long the Leader processor will wait before recalculating the JobModel on change of registered processors.
</td>
</tr>

<th colspan="3" class="section" id="AzureBasedJobCoordination"><a href="../index.html">Azure-based job configuration</a></th>
</tr>
<tr>
<td class="property" id="azure.storage.connect">azure.storage.connect</td>
<td class="default"></td>
<td class="description">
<strong>Required</strong> for applications with Azure-based coordination. This is the storage connection string related to every Azure account. It is of the format: "DefaultEndpointsProtocol=https;AccountName=&ltInsert your account name&gt;;AccountKey=&ltInsert your account key&gt;"

</td>
</tr>
<tr>
<td class="property" id="job.coordinator.azure.blob.length">job.coordinator.azure.blob.length</td>
<td class="default"> 5120000 </td>
<td class="description">
Length in bytes, of the page blob on which the leader stores the shared data. Different types of data is stored on different pages with predefined lengths. The offsets of these pages are dependent on the total page blob length.
</td>
</tr>

<tr>
<th colspan="3" class="section" id="task"><a href="../api/overview.html">Task configuration</a></th>
</tr>
Expand Down
34 changes: 34 additions & 0 deletions samza-azure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

## Samza on Azure

* Provides the ability to run Samza Standalone in the cloud, using Azure.
* Removes dependency from Zookeeper
* All coordination services written using services provided by Azure.

Read [Samza on Azure Design Doc](https://cwiki.apache.org/confluence/display/SAMZA/SEP-7%3A+Samza+on+Azure) to learn more about the implementation details.

### Running Samza with Azure

* Change: job.coordinator.factory = org.apache.samza.AzureJobCoordinatorFactory
* Add Azure Storage Connection String.
<br /> azure.storage.connect = DefaultEndpointsProtocol=https;AccountName="Insert your account name";AccountKey="Insert your account key"
* Add blob length in bytes => job.coordinator.azure.blob.length
<br /> Default value = 5120000
64 changes: 64 additions & 0 deletions samza-azure/src/main/java/org/apache/samza/AzureClient.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.samza;

import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.blob.CloudBlobClient;
import com.microsoft.azure.storage.table.CloudTableClient;
import java.net.URISyntaxException;
import java.security.InvalidKeyException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


/**
* Creates the client handles for the Azure Storage account, Azure Blob storage and Azure Table storage
*/
public class AzureClient {

private static final Logger LOG = LoggerFactory.getLogger(AzureClient.class);
private final CloudStorageAccount account;
private final CloudTableClient tableClient;
private final CloudBlobClient blobClient;

AzureClient(String storageConnectionString) {
try {
account = CloudStorageAccount.parse(storageConnectionString);
blobClient = account.createCloudBlobClient();
tableClient = account.createCloudTableClient();
} catch (IllegalArgumentException | URISyntaxException e) {
LOG.error("\nConnection string {} specifies an invalid URI.", storageConnectionString);
LOG.error("Please confirm the connection string is in the Azure connection string format.");
throw new SamzaException(e);
} catch (InvalidKeyException e) {
LOG.error("\nConnection string {} specifies an invalid key.", storageConnectionString);
LOG.error("Please confirm the AccountName and AccountKey in the connection string are valid.");
throw new SamzaException(e);
}
}

public CloudBlobClient getBlobClient() {
return blobClient;
}

public CloudTableClient getTableClient() {
return tableClient;
}
}
72 changes: 72 additions & 0 deletions samza-azure/src/main/java/org/apache/samza/AzureConfig.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.samza;

import org.apache.samza.config.ApplicationConfig;
import org.apache.samza.config.Config;
import org.apache.samza.config.ConfigException;
import org.apache.samza.config.MapConfig;


public class AzureConfig extends MapConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the Samza configuration page with a section for "Samza on Azure" and add config relevant to this component?
It helps to add the configs incrementally with your PRs because:

  1. You can make sure that you don't miss any configs in the end
  2. It will be useful for reviewers to understand your code better. For example, it's not obvious to me how AZURE_BLOB_NAME and AZURE_CONTAINER_NAME is used and when it should be overridden by the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the configuration page and added a README.md file.


// Connection string for Azure Storage Account, format: "DefaultEndpointsProtocol=<https>;AccountName=<>;AccountKey=<>"
public static final String AZURE_STORAGE_CONNECT = "azure.storage.connect";
public static final String AZURE_PAGEBLOB_LENGTH = "job.coordinator.azure.blob.length";

private static String containerName;
private static String blobName;
private static String tableName;
public static final long DEFAULT_AZURE_PAGEBLOB_LENGTH = 5120000;

public AzureConfig(Config config) {
super(config);
ApplicationConfig appConfig = new ApplicationConfig(config);
//Remove all non-alphanumeric characters from id as table name does not allow them.
String id = appConfig.getGlobalAppId().replaceAll("[^A-Za-z0-9]", "");
containerName = "samzacontainer" + id;
blobName = "samzablob" + id;
tableName = "samzatable" + id;
}

public String getAzureConnect() {
if (!containsKey(AZURE_STORAGE_CONNECT)) {
throw new ConfigException("Missing " + AZURE_STORAGE_CONNECT + " config!");
}
return get(AZURE_STORAGE_CONNECT);
}

public String getAzureContainerName() {
return containerName;
}

public String getAzureBlobName() {
return blobName;
}
public long getAzureBlobLength() {
return getLong(AZURE_PAGEBLOB_LENGTH, DEFAULT_AZURE_PAGEBLOB_LENGTH);
}

public String getAzureTableName() {
return tableName;
}

}

3 changes: 2 additions & 1 deletion settings.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ include \
'samza-elasticsearch',
'samza-log4j',
'samza-rest',
'samza-shell'
'samza-shell',
'samza-azure'

def scalaModules = [
'samza-core',
Expand Down