Removed sample code.

YanTangZhai · Sep 6, 2014 · 0447c9f · 0447c9f
1 parent e9c3761
commit 0447c9f
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 123 deletions.
diff --git a/core/pom.xml b/core/pom.xml
@@ -44,7 +44,7 @@
         </exclusion>
       </exclusions>
     </dependency>
-     <dependency>
+    <dependency>
       <groupId>net.java.dev.jets3t</groupId>
       <artifactId>jets3t</artifactId>
     </dependency>

diff --git a/docs/openstack-integration.md b/docs/openstack-integration.md
@@ -1,6 +1,6 @@
 ---
 layout: global
-title: OpenStack Integration
+title: OpenStack Swift Integration
 ---
 
 * This will become a table of contents (this text will be scraped).
@@ -9,16 +9,12 @@ title: OpenStack Integration
 
 # Accessing OpenStack Swift from Spark
 
-Spark's file interface allows it to process data in OpenStack Swift using the same URI 
-formats that are supported for Hadoop. You can specify a path in Swift as input through a 
-URI of the form <code>swift://<container.PROVIDER/path</code>. You will also need to set your 
+Spark's support for Hadoop InputFormat allows it to process data in OpenStack Swift using the
+same URI formats as in Hadoop. You can specify a path in Swift as input through a 
+URI of the form <code>swift://container.PROVIDER/path</code>. You will also need to set your 
 Swift security credentials, through <code>core-sites.xml</code> or via
-<code>SparkContext.hadoopConfiguration</code>. 
-Openstack Swift driver was merged in Hadoop version 2.3.0
-([Swift driver](https://issues.apache.org/jira/browse/HADOOP-8545)).
-Users that wish to use previous Hadoop versions will need to configure Swift driver manually.
-Current Swift driver requires Swift to use Keystone authentication method. There are recent efforts
-to support temp auth [Hadoop-10420](https://issues.apache.org/jira/browse/HADOOP-10420).
+<code>SparkContext.hadoopConfiguration</code>.
+Current Swift driver requires Swift to use Keystone authentication method.
 
 # Configuring Swift 
 Proxy server of Swift should include <code>list_endpoints</code> middleware. More information
@@ -27,9 +23,9 @@ available
 
 # Dependencies
 
-Spark should be compiled with <code>hadoop-openstack-2.3.0.jar</code> that is distributted with
-Hadoop 2.3.0. For the Maven builds, the <code>dependencyManagement</code> section of Spark's main
-<code>pom.xml</code> should include:
+The Spark application should include <code>hadoop-openstack</code> dependency.
+For example, for Maven support, add the following to the <code>pom.xml</code> file:
+
 {% highlight xml %}
 <dependencyManagement>
   ...
@@ -42,19 +38,6 @@ Hadoop 2.3.0. For the Maven builds, the <code>dependencyManagement</code> sectio
 </dependencyManagement>
 {% endhighlight %}
 
-In addition, both <code>core</code> and <code>yarn</code> projects should add
-<code>hadoop-openstack</code> to the <code>dependencies</code> section of their
-<code>pom.xml</code>:
-{% highlight xml %}
-<dependencies>
-  ...
-  <dependency>
-    <groupId>org.apache.hadoop</groupId>
-    <artifactId>hadoop-openstack</artifactId>
-  </dependency>
-  ...
-</dependencies>
-{% endhighlight %}
 
 # Configuration Parameters
 
@@ -171,99 +154,3 @@ Notice that
 We suggest to keep those parameters in <code>core-sites.xml</code> for testing purposes when running Spark
 via <code>spark-shell</code>.
 For job submissions they should be provided via <code>sparkContext.hadoopConfiguration</code>.
-
-# Usage examples
-
-Assume Keystone's authentication URL is <code>http://127.0.0.1:5000/v2.0/tokens</code> and Keystone contains tenant <code>test</code>, user <code>tester</code> with password <code>testing</code>. In our example we define <code>PROVIDER=SparkTest</code>. Assume that Swift contains container <code>logs</code> with an object <code>data.log</code>. To access <code>data.log</code> from Spark the <code>swift://</code> scheme should be used.
-
-
-## Running Spark via spark-shell
-
-Make sure that <code>core-sites.xml</code> contains <code>fs.swift.service.SparkTest.tenant</code>, <code>fs.swift.service.SparkTest.username</code>, 
-<code>fs.swift.service.SparkTest.password</code>. Run Spark via <code>spark-shell</code> and access Swift via <code>swift://</code> scheme.
-
-{% highlight scala %}
-val sfdata = sc.textFile("swift://logs.SparkTest/data.log")
-sfdata.count()
-{% endhighlight %}
-
-
-## Sample Application
-
-In this case <code>core-sites.xml</code> need not contain <code>fs.swift.service.SparkTest.tenant</code>, <code>fs.swift.service.SparkTest.username</code>, 
-<code>fs.swift.service.SparkTest.password</code>. Example of Java usage:
-
-{% highlight java %}
-/* SimpleApp.java */
-import org.apache.spark.api.java.*;
-import org.apache.spark.SparkConf;
-import org.apache.spark.api.java.function.Function;
-
-public class SimpleApp {
-  public static void main(String[] args) {
-    String logFile = "swift://logs.SparkTest/data.log";
-    SparkConf conf = new SparkConf().setAppName("Simple Application");
-    JavaSparkContext sc = new JavaSparkContext(conf);
-    sc.hadoopConfiguration().set("fs.swift.service.ibm.tenant", "test");
-    sc.hadoopConfiguration().set("fs.swift.service.ibm.password", "testing");
-    sc.hadoopConfiguration().set("fs.swift.service.ibm.username", "tester");
-
-    JavaRDD<String> logData = sc.textFile(logFile).cache();
-    long num = logData.count();
-
-    System.out.println("Total number of lines: " + num);
-  }
-}
-{% endhighlight %}
-
-The directory structure is 
-{% highlight bash %}
-./src
-./src/main
-./src/main/java
-./src/main/java/SimpleApp.java
-{% endhighlight %}
-
-Maven pom.xml should contain:
-{% highlight xml %}
-<project>
-  <groupId>edu.berkeley</groupId>
-  <artifactId>simple-project</artifactId>
-  <modelVersion>4.0.0</modelVersion>
-  <name>Simple Project</name>
-  <packaging>jar</packaging>
-  <version>1.0</version>
-  <repositories>
-  <repository>
-    <id>Akka repository</id>
-    <url>http://repo.akka.io/releases</url>
-  </repository>
-  </repositories>
-  <build>
-	  <plugins>
-	    <plugin>
-	      <groupId>org.apache.maven.plugins</groupId>
-	      <artifactId>maven-compiler-plugin</artifactId>
-	      <version>2.3</version>
-	      <configuration>
-          <source>1.6</source>
-          <target>1.6</target>
-	      </configuration>
-	    </plugin>
-	  </plugins>
-  </build>
-  <dependencies>
-	  <dependency> <!-- Spark dependency -->
-	    <groupId>org.apache.spark</groupId>
-	    <artifactId>spark-core_2.10</artifactId>
-	    <version>1.0.0</version>
-	  </dependency>
-  </dependencies>
-</project>
-{% endhighlight %}
-
-Compile and execute
-{% highlight bash %}
-mvn package
-SPARK_HOME/spark-submit --class SimpleApp --master local[4] target/simple-project-1.0.jar
-{% endhighlight %}