Skip to content

Commit

Permalink
Merge pull request #3963 from Azure/master
Browse files Browse the repository at this point in the history
master
  • Loading branch information
v-aljenk committed Jul 18, 2014
2 parents f08f128 + 450406c commit 9112469
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 7 deletions.
3 changes: 1 addition & 2 deletions articles/hdinsight-use-oozie.md
Expand Up @@ -90,8 +90,7 @@ Oozie workflows definitions are written in hPDL (a XML Process Definition Langua
The Hive action in the workflow calls a HiveQL script file. This script file contains three HiveQL statements:

1. **The DROP TABLE statement** deletes the log4j Hive table in case it exists.
2. **The CREATE TABLE statement** creates a log4j Hive external table pointing to the
3. location of the log4j log file. The field delimiter is ",". The default line delimiter is "\n". Hive external table is used to avoid the data file being removed from the original location, in case you want to run the Oozie workflow multiple times.
2. **The CREATE TABLE statement** creates a log4j Hive external table pointing to the location of the log4j log file. The field delimiter is ",". The default line delimiter is "\n". Hive external table is used to avoid the data file being removed from the original location, in case you want to run the Oozie workflow multiple times.
3. **The INSERT OVERWRITE statement** counts the occurrences of each log level type from the log4j Hive table, and saves the output to an Azure Storage - Blob (WASB) location.

There is a known Hive path issue. You will run into this problem when submitting an Oozie job. The instructions for fixing the issue can be found at [TechNet Wiki][technetwiki-hive-error].
Expand Down
4 changes: 2 additions & 2 deletions articles/machine-learning-faq.md
Expand Up @@ -50,7 +50,7 @@ The **Reader** module can read data from Azure Table, Azure Blob, SQL Database (

8.**How large can my data set be?**

Azure ML Studio supports training dataset of up to 10GB. There is no limit on the dataset size for Web Services. Sampling larger datasets via Hive or SQL queries before ingestion is also supported. If you are working with data larger than10GB, you can you can create multiple datasets and use the ‘Partition and Sample’, ‘Split’ or ‘Join’ modules to recombine these datasets in ML studio to create training sets for building predictive models. Visit module help in ML Studio to learn more about these modules.
Azure ML Studio supports training dataset of up to 10GB. There is no limit on the dataset size for Web Services. Sampling larger datasets via Hive or SQL queries before ingestion is also supported. If you are working with data larger than 10GB, you can create multiple datasets and use the ‘Partition and Sample’, ‘Split’ or ‘Join’ modules to recombine these datasets in ML studio to create training sets for building predictive models. Visit module help in ML Studio to learn more about these modules.

For datasets larger than a couple GB, the recommended approach is to upload data to Azure storage or SQL Database (Azure) or use HDInsight, rather than directly uploading from local file.

Expand Down Expand Up @@ -109,4 +109,4 @@ We will be adding new material to Machine Learning Center on an ongoing basis. Y

Azure ML is supported as a part of the Azure support offering. To get technical support on Azure ML, select ‘Machine Learning’ as service and you will be provided a category of topics to file your support ticket. To learn more about Azure Support offering visit <http://azure.microsoft.com/en-us/support/options/>

Azure Machine Learning also has a community forum on MSDN, where you can ask Azure ML related questions. The forum is monitored by the Azure ML team. Visit [Azure Forum](http://social.msdn.microsoft.com/Forums/windowsazure/en-US/home?forum=MachineLearning).
Azure Machine Learning also has a community forum on MSDN, where you can ask Azure ML related questions. The forum is monitored by the Azure ML team. Visit [Azure Forum](http://social.msdn.microsoft.com/Forums/windowsazure/en-US/home?forum=MachineLearning).
Expand Up @@ -17,7 +17,7 @@ We used 2011 data as a training set and 2012 data as a test set. We compared 4 s
1. number of bikes that were rented in each of the previous 12 days at the same hour
1. number of bikes that were rented in each of the previous 12 weeks at the same hour and the same day

Features B capture very recent demand for the bikes. Features C capture demand for bikes at particular hour. Features D capture demand for bikes at particular and particular day of the week.
Features B capture very recent demand for the bikes. Features C capture demand for bikes at particular hour. Features D capture demand for bikes at particular hour and particular day of the week.

Since the label (number of rentals) is real-valued we have regression setting. Also, since the number of features is relatively small (less than 100) and they are not sparse, the decision boundary is probably nonlinear. Based on this, we decided to use boosted decision tree regression algorithm.

Expand Down
4 changes: 2 additions & 2 deletions articles/storage-introduction.md
Expand Up @@ -20,7 +20,7 @@ Azure Storage is elastic, so you can design applications for a large global audi

Azure Storage uses an auto-partitioning system that automatically load-balances your data based on traffic. This means that as the demands on your application grow, Azure Storage automatically allocates the appropriate resources to meet them.

Azure Storage is accessible from anywhere in the world, from any type of application, whether it’s running in the cloud, on the desktop, on an on-premise server, or on a mobile or tablet device. You can use Azure Storage in mobile scenarios where the application stores a subset of data on the device and synchronizes it with a full set of data stored in the cloud.
Azure Storage is accessible from anywhere in the world, from any type of application, whether it’s running in the cloud, on the desktop, on an on-premises server, or on a mobile or tablet device. You can use Azure Storage in mobile scenarios where the application stores a subset of data on the device and synchronizes it with a full set of data stored in the cloud.

Azure Storage supports clients using a diverse set of operating systems (including Windows and Linux) and a variety of programming languages (including .NET, Java, and C++) for convenient development. Azure Storage also exposes data resources via simple REST APIs, which are available to any client capable of sending and receiving data via HTTP/HTTPS.

Expand Down Expand Up @@ -76,7 +76,7 @@ For today's Internet-based applications, NoSQL databases like Table storage offe

## Queue Storage ##

In designing applications for scale, application components are often decoupled, so that they can scale independently. Queue storage provides a reliable messaging solution for asynchronous communication between application components, whether they are running in the cloud, on the desktop, on an on-premise server, or on a mobile device. Queue storage also supports managing asynchronous tasks and building process workflows.
In designing applications for scale, application components are often decoupled, so that they can scale independently. Queue storage provides a reliable messaging solution for asynchronous communication between application components, whether they are running in the cloud, on the desktop, on an on-premises server, or on a mobile device. Queue storage also supports managing asynchronous tasks and building process workflows.

A storage account can contain any number of queues. A queue can contain any number of messages, up to the 200 TB capacity limit of the storage account. Individual messages may be up to 64 KB in size.

Expand Down

0 comments on commit 9112469

Please sign in to comment.