Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

fixing some issues in my HDI docs

  • Loading branch information...
commit ece1f6c749c4eb5a4a71ffb774b12a79d534f8c8 1 parent 86b9bc1
@mumian mumian authored
View
6 ITPro/Services/hdinsight/blob-hive-sql.md
@@ -83,7 +83,7 @@ In this tutorial, you will use the Hive console to run the Hive queries. The ot
#ls asv://flightinfo@StorageAccountName.blob.core.windows.net/delays
- You will get the list of files you uploaded using Azure STorage Explorer.
+ You will get the list of files you uploaded using Azure Storage Explorer.
##<a id="createtable"></a>Create a Hive Table and Populate Data
The next step is to create a Hive table from the data in Azure Storage Vault (ASV)/Blog storage.
@@ -197,13 +197,13 @@ The next step is to create a Hive table from the data in Azure Storage Vault (AS
##<a id="executequery"></a>Execute a HiveQL Query
After the *delays* table has been created, you are now ready to run queries against it.
-1. Replace **username** in the following query with the username you used to log into the cluster, and then copy and paste the followingquery into the query pane
+1. Replace **username** in the following query with the username you used to log into the cluster, and then copy and paste the following query into the query pane
INSERT OVERWRITE DIRECTORY '/user/username/queryoutput' select regexp_replace(origin_city_name, '''', ''), avg(weather_delay) from delays where weather_delay is not null group by origin_city_name;
This query computes the average weather delay and groups the results by city name. It will also output the results to HDFS. Note that the query will remove apostrophes from the data and will exclude rows where the value for *weather_deal*y is *null*, which is necessary because Sqoop, used in the next step, doesn't handle those values gracefully by default.
-2. Click **Evaluate**.Output from the query above should look similar to the following:
+2. Click **Evaluate**. The output from the query above should look similar to the following:
Hive history file=c:\apps\dist\hive-0.9.0\logs/hive_job_log_RD00155D47138A$_201303220108_1260638792.txt
Logging initialized using configuration in file:/C:/apps/dist/hive-0.9.0/conf/hive-log4j.properties
View
63 ITPro/Services/hdinsight/upload-data.md
@@ -4,7 +4,7 @@
#How to Upload Data to HDInsight
-Windows Azure HDInsight Service provides two options in how it manages its data, Windows Azure Blob Storage and Hadoop Distributed File System (HDFS). HDFS is designed to store data used by Hadoop applications. Data stored in Windows Azure Blob Storage can be accessed by Hadoop applications using Windows Azure Storage Vault (ASV), which provides a full featured HDFS file system over Windows Azure Blob storage. It has been designed as an HDFS extension to provide a seamless experience to customers by enabling the full set of components in the Hadoop ecosystem to operate directly on the data it manages. Both options are distinct file systems that are optimized for storage of data and computations on that data.
+Windows Azure HDInsight Service provides two options in how it manages its data, Azure Storage Vault (ASV) and Hadoop Distributed File System (HDFS). HDFS is designed to store data used by Hadoop applications. Data stored in Windows Azure Blob Storage can be accessed by Hadoop applications using Windows Azure Storage Vault (ASV), which provides a full featured HDFS file system over Windows Azure Blob storage. It has been designed as an HDFS extension to provide a seamless experience to customers by enabling the full set of components in the Hadoop ecosystem to operate directly on the data it manages. Both options are distinct file systems that are optimized for storage of data and computations on that data. For the benefits of using ASV, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store/).
Windows Azure HDInsight clusters are typically deployed to execute MapReduce jobs and are dropped once these jobs have been completed. Keeping the data in the HDFS clusters after computations have been completed would be an expensive way to store this data. Windows Azure Blob storage is a highly available, highly scalable, high capacity, low cost, and shareable storage option for data that is to be processed using HDInsight. Storing data in a Blob enables the HDInsight clusters used for computation to be safely released without losing data.
@@ -19,16 +19,16 @@ Windows Azure Blob storage can either be accessed through the [API](http://www.w
##Table of Contents
* [How to: Upload data to Windows Azure Storage using Azure Storage Explorer](#storageexplorer)
-* [How to: Access data in Windows Azure Storage](#blob)
-* [How to: Upload data to HDFS using Interactive JavaScript Console](#console)
-* [How to: Upload data to HDFS using Hadoop command line](#commandline)
-* [How to: Import data from Windows Azure SQL Database to HDFS using Sqoop](#sqoop)
+* [How to: Access data stored in ASV](#blob)
+* [How to: Upload data to ASV using Interactive JavaScript Console](#console)
+* [How to: Upload data to ASV using Hadoop command line](#commandline)
+* [How to: Import data from Windows Azure SQL Database to ASV using Sqoop](#sqoop)
-##<a id="storageexplorer"></a>How to: Upload data to Windows Azure Storage using Azure Storage Explorer
+##<a id="storageexplorer"></a>How to: Upload Data to Windows Azure Storage Using Azure Storage Explorer
*Azure Storage Explorer* is a useful tool for inspecting and altering the data in your Windows Azure Storage. It is a free tool that can be downloaded from [http://azurestorageexplorer.codeplex.com/](http://azurestorageexplorer.codeplex.com/ "Azure Storage Explorer").
-Before using the tool, you must know your Windows Azure storage account name and account key. For the instructions for get the information, see the *How to: View, copy and regenerate storage access keys* section of [How To Manage Storage Accounts](/en-us/manage/services/storage/how-to-manage-a-storage-account/).
+Before using the tool, you must know your Windows Azure storage account name and account key. For the instructions for get the information, see the *How to: View, copy and regenerate storage access keys* section of [How to Manage Storage Accounts](/en-us/manage/services/storage/how-to-manage-a-storage-account/).
1. Run Azure Storage Explorer.
@@ -47,14 +47,15 @@ Before using the tool, you must know your Windows Azure storage account name and
6. From **Blob**, click **Upload**.
7. Specify a file to upload, and then click **Open**.
+Blob storage containers store data as key/value pairs, and there is no directory hierarchy. However the ‘/’ character can be used within the key name to make it appear as if a file is stored within a directory structure. For example, a blob’s key may be ‘input/log1.txt’. No actual ‘input’ directory exists, but due to the presence of the ‘/’ character in the key name, it has the appearance of a file path. You can click **Rename** from the tool to give a file a folder structure.
-##<a id="blob"></a>How to: Access data stored in Windows Azure Blob Storage
+##<a id="blob"></a>How to: Access Data Stored in Azure Storage Vault
Data stored in Windows Azure Blob Storage can be accessed directly from the Interactive JavaScript Console by prefixing the protocol scheme of the URI for the assets you are accessing with asv://. To secure the connection, use asvs://. The scheme for accessing data in Windows Azure Blob Storage is:
- asvs://[<container>@]<accountname>.blob.core.microsoft.com/<path>
+ asv[s]://[<container>@]<accountname>.blob.core.windows.net/<path>
-The following is an example of viewing data stored in Windows Azure Blob Storage using the Interactive Javascript Console:
+The following is an example of viewing data stored in Windows Azure Blob Storage using the Interactive JavaScript Console:
![HDI.ASVSample](../media/HDI.ASVSample.png "ASV sample")
@@ -62,13 +63,15 @@ The following will run a Hadoop streaming job that uses Windows Azure Blob Stora
Hadoop jar hadoop-streaming.jar
-files "hdfs:///example/apps/map.exe, hdfs:///example/apps/reduce.exe"
- -input "asv://iislogsinput/iislogs.txt"
- -output "asv://iislogsoutput/results.txt"
+ -input "asvs://container@storageaccount.blob.core.windows.net/iislogsinput/iislogs.txt"
+ -output "asvs://container@storageaccount.blob.core.windows.net/iislogsoutput/results.txt"
-mapper "map.exe"
-reducer "reduce.exe"
+For more information on accessing the files stored in ASV, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store/).
-##<a id="console"></a> How to: Upload data to HDFS using interactive JavaScript console
+
+##<a id="console"></a> How to: Upload Data to ASV Using Interactive JavaScript Console
Windows Azure HDInsight Service comes with a web based interactive JavaScript console that can be used as an administration/deployment tool.
1. Sign in to the [Management Portal](https://manage.windowsazure.com).
@@ -88,12 +91,21 @@ Windows Azure HDInsight Service comes with a web based interactive JavaScript co
![HDI.fs.put](../media/HDI.fsput.png "fs.put()")
-9. Enter **Source** and **Destination**, and then click **Upload**.
+9. Enter **Source** and **Destination**, and then click **Upload**. Here are some sample values for the the Destination field:
+
+ <table border="1">
+ <tr><th>Sample</th><th>Note</th></tr>
+ <tr><td>.</td><td>refer to /user/&lt;currentloggedinuser&gt; on the default file system.</td></tr>
+ <tr><td>/</td><td>refer to / on the default file system.</td></tr>
+ <tr><td>asv:/// or asvs://container@accountname.blob.core.windows.net</td><td>refer to / on teh default file system.</td></tr>
+ </table>
+
+
10. Use the following command to list the uploaded files.
- #ls /
+ #ls <path>
-##<a id="commandline"></a> How to: Upload data to HDFS using Hadoop command line
+##<a id="commandline"></a> How to: Upload Data to ASV Using Hadoop Command Line
To use Hadoop command line, you must first connect to the cluster using remote desktop.
@@ -114,12 +126,22 @@ To use Hadoop command line, you must first connect to the cluster using remote d
hadoop dfs -copyFromLocal C:\temp\davinci.txt /example/data/davinci.txt
+ Because the default file system is on ASV, /example/datadavinci.txt is actually on ASV. You can also refer to the file as:
+
+ asv:///example/data/davinci.txt
+
+ or
+
+ asvs://container@accountname.blob.core.windows.net/example/data/davinci.txt
+
+ The FQDN is required when you use asvs.
+
13. Use the following command to list the uploaded files:
hadoop dfs -lsr /example/data
-##<a id="sqoop"></a> How to: Import data to HDFS from SQL Database/SQL Server using Sqoop
+##<a id="sqoop"></a> How to: Import Data to HDFS from SQL Database/SQL Server Using Sqoop
Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use it to import data from a relational database management system (RDBMS) such as SQL or MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop with MapReduce or Hive, and then export the data back into a RDBMS. For more information, see [Sqoop User Guide](http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html).
@@ -137,13 +159,13 @@ Before importing data, you must know the Windows Azure SQL Database server name,
12. Run a command similar to the following:
sqoop import
- --connect "jdbc:sqlserver://s6ok0p9kfz.database.windows.net;username=user1@s6ok0p9kfz;password=Pass@word1;database=AdventureWorks2012"
+ --connect "jdbc:sqlserver://s6ok0p9kft.database.windows.net;username=user1@s6ok0p9kft;password=Pass@word1;database=AdventureWorks2012"
--table Sales.SalesOrderDetail
--columns "SalesOrderID,SalesOrderDetailID,CarrierTrackingNumber,OrderQty,ProductID,SpecialOfferID,UnitPrice,UnitPriceDiscount,LineTotal"
--target-dir /data/lineitemData
-m 1
- In the command, the SQL database server is *s6ok0p9kfz*, username is *user1*, password is *Pass@word1*, and the database is *AdventureWorks2012*.
+ In the command, the SQL database server is *s6ok0p9kft*, username is *user1*, password is *Pass@word1*, and the database is *AdventureWorks2012*.
13. You can run the #tail command from the Interactive Console to see the result:
@@ -160,8 +182,9 @@ Note: When specifying an escape character as delimiter with the arguments *--inp
## Next Steps
-Now that you understand how to get data into HDInsight Service, use the following tutorials to learn how to perform analyis:
+Now that you understand how to get data into HDInsight Service, use the following tutorials to learn how to perform analysis:
+* [Getting Started with Windows Azure HDInsight Service](/en-us/manage/services/hdinsight/get-started-hdinsight/)
* [Tutorial: Using MapReduce with HDInsight](/en-us/manage/services/hdinsight/using-mapreduce-with-hdinsight/)
* [Tutorial: Using Hive with HDInsight](/en-us/manage/services/hdinsight/using-hive-with-hdinsight/)
* [Tutorial: Using Pig with HDInsight](/en-us/manage/services/hdinsight/using-pig-with-hdinsight/)
View
2  ITPro/Services/hdinsight/using-blob-store.md
@@ -35,7 +35,7 @@ The HDInsight Service provides access to the distributed file system that is loc
In addition, HDInsight Service provides the ability to access data stored in Blob Storage containers. The syntax to access ASV is:
- asv[s]://[<container>@]<accountname>.blob.core.microsoft.net/<path>
+ asv[s]://[<container>@]<accountname>.blob.core.windows.net/<path>
Hadoop supports a notion of default file system. The default file system implies a default scheme and authority; it can also be used to resolve relative paths. During the HDInsight provision process, user must specify a Blob Storage and a container used as the default file system.
View
6 ITPro/Services/hdinsight/using-hdinsight-sdk.md
@@ -21,12 +21,12 @@ You can install latest published build of the library from [NuGet](http://nuget.
* **MapReduce library:** This library simplifies writing MapReduce jobs in .NET languages using the Hadoop streaming interface.
* **LINQ to Hive client library:** This library translates C# or F# LINQ queries into HiveQL queries and executes them on the Hadoop cluster. This library can execute arbitrary HiveQL queries from a .NET program as well.
-* **WebClient library:** This libarary contains client libraries for *WebHDFS* and *WebHCat*.
+* **WebClient library:** This library contains client libraries for *WebHDFS* and *WebHCat*.
* **WebHDFS client library:** It works with files in HDFS and Windows Azure Blog Storage
* **WebHCat client library:** It manages scheduling and execution of jobs in HDInsight cluster
-The NuGet syntax to install the librarys:
+The NuGet syntax to install the libraries:
install-package Microsoft.Hadoop.MapReduce
install-package Microsoft.Hadoop.Hive
@@ -54,7 +54,7 @@ In this section you will learn how to upload files to Hadoop cluster programmati
<table>
<tr><th>Property</th><th>Value</th></tr>
- <tr><td>Catagory</td><td>Templates/Visual C#/Windows</td></tr>
+ <tr><td>Category</td><td>Templates/Visual C#/Windows</td></tr>
<tr><td>Template</td><td>Console Application</td></tr>
<tr><td>Name</td><td>SimpleHiveJob</td></tr>
</table>
View
8 ITPro/Services/hdinsight/using-hive.md
@@ -16,7 +16,7 @@ Hive provides a means of running MapReduce job through an SQL-like scripting lan
##In this Article
-* [The Hive Usage case](#usage)
+* [The Hive usage case](#usage)
* [Upload a sample log4j file to Windows Azure Blob Storage](#uploaddata)
* [Connect to the interactive console](#connect)
* [Create a Hive table and upload data to the table](#createhivetable)
@@ -51,7 +51,7 @@ In this tutorial, you will complete the following tasks:
##<a id="uploaddata"></a>Upload a Sample Log4j File to Windows Azure Blob Storage
-HDInsight provides two options for storing data, Windows Azure Blob Storage and Hadoop Distributed File system (HDFS). For more information on choosing file storage, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/using-blob-store). When you provision an HDInsight cluster, the provision process creates a Windows Azure Blob storage container as the default HDInsight file system. To simplify the tutorial procedures, you will use this container for storing the log4j file.
+HDInsight provides two options for storing data, Windows Azure Blob Storage and Hadoop Distributed File system (HDFS). For more information on choosing file storage, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store). When you provision an HDInsight cluster, the provision process creates a Windows Azure Blob storage container as the default HDInsight file system. To simplify the tutorial procedures, you will use this container for storing the log4j file.
*Azure Storage Explorer* is a useful tool for inspecting and altering the data in your Windows Azure Storage. It is a free tool that can be downloaded from [http://azurestorageexplorer.codeplex.com/](http://azurestorageexplorer.codeplex.com/ "Azure Storage Explorer").
@@ -123,7 +123,7 @@ Before using the tool, you must know your Windows Azure storage account name and
12. Click **Close**.
13. From the **File** menu, click **Exit** to close Azure Storage Explorer.
-For accessing ASV, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/using-blob-store/)
+For accessing ASV, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store/)
##<a id="connect"></a> Connect to the Interactive Console
@@ -162,7 +162,7 @@ You must have an HDInsight cluster previsioned before you can work on this tutor
To use asvs, you must provide the FQDN. For example to access sample.log on the default file system:
- #ls asvs://container@storagename.blob.core.microsoft.net/sample.log 
+ #ls asvs://container@storagename.blob.core.windows.net/sample.log 
##<a id="createhivetable"></a> Create a Hive Table and Upload Data to the Table
View
22 ITPro/Services/hdinsight/using-mapreduce.md
@@ -4,7 +4,7 @@
# Using MapReduce with HDInsight#
-Hadoop MapReduce is a software framework for writing applications which process vast amounts of data. In this tutorial, you will create a Haddop MapReduce job in Java, and execute the job on a Windows Azure HDInsight cluster to process a semi-structured Apache *log4j* log file stored in Azure Storage Vault (Azure Storage Vault or ASV provides a full featured HDFS file system over Windows Azure Blob storage).
+Hadoop MapReduce is a software framework for writing applications which process vast amounts of data. In this tutorial, you will create a Hadoop MapReduce job in Java, and execute the job on a Windows Azure HDInsight cluster to process a semi-structured Apache *log4j* log file stored in Azure Storage Vault (Azure Storage Vault or ASV provides a full featured HDFS file system over Windows Azure Blob storage).
[Apache Log4j](http://en.wikipedia.org/wiki/Log4j) is a logging utility. Each log inside a file contains a *log level* field to show the type and the severity. For example:
@@ -27,8 +27,8 @@ This MapReduce job takes a log4j log file as input, and generates an output file
* [Connect to an HDInsight Cluster](#connect)
* [Create a MapReduce job](#createjob)
* [Run the MapReduce job](#runjob)
-* [Tutorial Clean Up](#cleanup)
-* [Next Steps](#nextsteps)
+* [Tutorial clean up](#cleanup)
+* [Next steps](#nextsteps)
##<a id="mapreduce"></a> Big Data and Hadoop MapReduce
Generally, all applications save errors, exceptions and other coded issues in a log file. These log files can get quite large in size, containing a wealth of data that must be processed and mined. Log files are a good example of big data. Working with big data is difficult using relational databases with statistics and visualization packages. Due to the large amounts of data and the computation of this data, parallel software running on tens, hundreds, or even thousands of servers is often required to compute this data in a reasonable time. Hadoop provides a MapReduce framework for writing applications that process large amounts of structured and semi-structured data in parallel across large clusters of machines in a very reliable and fault-tolerant manner.
@@ -47,7 +47,7 @@ You will complete the following tasks in this tutorial:
###<a id="uploaddata"></a>Upload a Sample Log4j File to the Blob Storage
-HDInsight provides two options for storing data, Windows Azure Blob Storage and Hadoop Distributed File system (HDFS). For more information on choosing file storage, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/using-blob-store). When you provision an HDInsight cluster, the provision process creates a Windows Azure Blob storage container as the default HDInsight file system. To simplify the tutorial procedures, you will use this container for storing the log4j file.
+HDInsight provides two options for storing data, Windows Azure Blob Storage and Hadoop Distributed File system (HDFS). For more information on choosing file storage, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store). When you provision an HDInsight cluster, the provision process creates a Windows Azure Blob storage container as the default HDInsight file system. To simplify the tutorial procedures, you will use this container for storing the log4j file.
*Azure Storage Explorer* is a useful tool for inspecting and altering the data in your Windows Azure Storage. It is a free tool that can be downloaded from [http://azurestorageexplorer.codeplex.com/](http://azurestorageexplorer.codeplex.com/ "Azure Storage Explorer").
@@ -72,7 +72,7 @@ Before using the tool, you must know your Windows Azure storage account name and
<div class="dev-callout">
<b>Note</b>
- <p>To simplify the tutorial, you will use the default file system. You can also use other containers on the same storage account or other storage accouns. For more information, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/using-blob-store/).</p>
+ <p>To simplify the tutorial, you will use the default file system. You can also use other containers on the same storage account or other storage accounts. For more information, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store/).</p>
</div>
7. From **Blob**, click **Upload**.
@@ -125,7 +125,7 @@ Before using the tool, you must know your Windows Azure storage account name and
12. Click **Close**.
13. From the **File** menu, click **Exit** to close Azure Storage Explorer.
-For accessing ASV, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/using-blob-store/).
+For accessing ASV, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store/).
##<a id="connect"></a>Connect to an HDInsight Cluster
You must have an HDInsight cluster previsioned before you can work on this tutorial. To enable the Windows Azure HDInsight Service preview, click [here](https://account.windowsazure.com/PreviewFeatures). For information on prevision an HDInsight cluster see [How to Administer HDInsight Service](/en-us/manage/services/hdinsight/howto-administer-hdinsight/) or [Getting Started with Windows Azure HDInsight Service](/en-us/manage/services/hdinsight/get-started-hdinsight/).
@@ -151,7 +151,7 @@ You must have an HDInsight cluster previsioned before you can work on this tutor
hadoop fs -ls asv://container@storagename.blob.core.windows.net/sample.log
- replace *container* with the container name, and *storagename* with the Blob Storage account name.
+ Replace *container* with the container name, and *storagename* with the Blob Storage account name.
Because the file is located on the default file system, the same result can also be retrieved by using the following command:
@@ -159,11 +159,11 @@ You must have an HDInsight cluster previsioned before you can work on this tutor
To use asvs, you must provide the FQDN. For example to access sample.log on the default file system:
- #ls asvs://container@storagename.blob.core.microsoft.net/sample.log
+ #ls asvs://container@storagename.blob.core.windows.net/sample.log
-##<a id="createjob"></a> Create the MapReduce job ##
+##<a id="createjob"></a> Create the MapReduce Job ##
The Java programming language is used in this sample. Hadoop Streaming allows developers to use virtually any programming language to create MapReduces jobs.
1. From Hadoop command prompt, run the following commands to make a directory and change directory to the folder:
@@ -171,7 +171,7 @@ The Java programming language is used in this sample. Hadoop Streaming allows de
mkdir c:\Tutorials
cd \Tutorials
-2. run the following command to create a java file in the C:\Tutorials folder:
+2. Run the following command to create a java file in the C:\Tutorials folder:
notepad log4jMapReduce.java
@@ -303,7 +303,7 @@ The Java programming language is used in this sample. Hadoop Streaming allows de
log4jMapReduce.jar
log4jMapReduce.java
-##<a id="runjob"></a> Run the MapReduce job
+##<a id="runjob"></a> Run the MapReduce Job
Until now, you have uploaded a log4j log files to the Blob storage, and compiled the MapReduce job. The next step is to run the job.
1. From Hadoop command prompt, execute the following command to run the Hadoop MapReduce job:
View
16 ITPro/Services/hdinsight/using-pig.md
@@ -27,7 +27,7 @@ In this tutorial, you will write Pig Latin statements to analyze an Apache log4j
* [Use Pig in the interactive mode](#interactivemode)
* [Use Pig in the batch mode](#batchmode)
* [Tutorial clean up](#cleanup)
-* [Next Steps](#nextsteps)
+* [Next steps](#nextsteps)
##<a id="usage"></a>The Pig Usage Case
Databases are great for small sets of data and low latency queries. However, when it comes to Big Data and large data sets in terabytes, traditional SQL databases are not the ideal solution. As database load increases and performance degrades, historically, database administrators have had to buy bigger hardware.
@@ -59,7 +59,7 @@ You will complete the following tasks in this tutorial:
##<a id="uploaddata"></a>Upload a Sample Log4j File to Windows Azure Blob Storage
-HDInsight provides two options for storing data, Windows Azure Blob Storage and Hadoop Distributed File system (HDFS). For more information on choosing file storage, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/using-blob-store). When you provision an HDInsight cluster, the provision process creates a Windows Azure Blob storage container as the default HDInsight file system. To simplify the tutorial procedures, you will use this container for storing the log4j file.
+HDInsight provides two options for storing data, Windows Azure Blob Storage and Hadoop Distributed File system (HDFS). For more information on choosing file storage, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store). When you provision an HDInsight cluster, the provision process creates a Windows Azure Blob storage container as the default HDInsight file system. To simplify the tutorial procedures, you will use this container for storing the log4j file.
*Azure Storage Explorer* is a useful tool for inspecting and altering the data in your Windows Azure Storage. It is a free tool that can be downloaded from [http://azurestorageexplorer.codeplex.com/](http://azurestorageexplorer.codeplex.com/ "Azure Storage Explorer").
@@ -88,9 +88,9 @@ Before using the tool, you must know your Windows Azure storage account name and
12. Click **Close**.
13. From the **File** menu, click **Exit** to close Azure Storage Explorer.
-For information on access ASV, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/using-blob-store/).
+For information on access ASV, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store/).
-##<a id="connect"></a>Connect to your HDInsight Cluster ##
+##<a id="connect"></a>Connect to Your HDInsight Cluster ##
You must have an HDInsight cluster previsioned before you can work on this tutorial. To enable the Windows Azure HDInsight Service preview, click [here](https://account.windowsazure.com/PreviewFeatures). For information on prevision an HDInsight cluster see [How to Administer HDInsight Service](/en-us/manage/services/hdinsight/howto-administer-hdinsight/) or [Getting Started with Windows Azure HDInsight Service](/en-us/manage/services/hdinsight/get-started-hdinsight/).
@@ -119,7 +119,7 @@ First, you will use Pig Latin in interactive mode (Grunt shell) to analyze a sin
<div class="dev-callout"> 
<b>Note</b> 
<p>To use asvs, you must provide the FQDN. For example: <br/>
-LOG = LOAD 'asvs://container@storagename.blob.core.microsoft.net/sample.log'. For more informaiton, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/using-blob-store/).</p> 
+LOG = LOAD 'asvs://container@storagename.blob.core.windows.net/sample.log'. For more information, see [Using Windows Azure Blob Storage with HDInsight](/en-us/manage/services/hdinsight/howto-blob-store/).</p> 
</div>
@@ -162,7 +162,7 @@ LOG = LOAD 'asvs://container@storagename.blob.core.microsoft.net/sample.log'. Fo
grunt> dump FILTEREDLEVELS;
- The output is similiar to the following:
+ The output is similar to the following:
(DEBUG)
(TRACE)
@@ -192,7 +192,7 @@ LOG = LOAD 'asvs://container@storagename.blob.core.microsoft.net/sample.log'. Fo
grunt> dump GROUPEDLEVELS;
- The output is similiar to the following:
+ The output is similar to the following:
(TRACE),(TRACE),(TRACE),(TRACE),(TRACE),(TRACE),(TRACE),(TRACE),
(TRACE),(TRACE),(TRACE),(TRACE),(TRACE),(TRACE),(TRACE),(TRACE),
@@ -215,7 +215,7 @@ LOG = LOAD 'asvs://container@storagename.blob.core.microsoft.net/sample.log'. Fo
grunt> dump FREQUENCIES;
- The output is similiar to the following:
+ The output is similar to the following:
(INFO,3355)
(WARN,361)
View
BIN  ITPro/Services/media/HDI.ASVSample.PNG
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit ece1f6c

Please sign in to comment.
Something went wrong with that request. Please try again.