From 6e54ffaef027f3b369eea99e70e5c9cfeb66f471 Mon Sep 17 00:00:00 2001 From: Sebastian Kunert Date: Tue, 1 Jul 2014 15:30:41 +0200 Subject: [PATCH 1/6] modified gh_link to support naming --- docs/_plugins/gh_link.rb | 6 +- docs/faq.md | 4 +- docs/iterations.md | 30 ++++----- docs/java_api_examples.md | 16 ++--- docs/java_api_guide.md | 73 ++++++++++------------ docs/local_execution.md | 4 +- docs/scala_api_examples.md | 16 ++--- docs/scala_api_guide.md | 43 ++++++++++++- docs/scala_api_quickstart.md | 114 ++++++++++++++++------------------- docs/spargel_guide.md | 19 +++--- docs/web_client.md | 2 +- docs/yarn_setup.md | 4 +- 12 files changed, 178 insertions(+), 153 deletions(-) diff --git a/docs/_plugins/gh_link.rb b/docs/_plugins/gh_link.rb index b7dda3e69b0fb..fdddaad5f4ce9 100644 --- a/docs/_plugins/gh_link.rb +++ b/docs/_plugins/gh_link.rb @@ -8,6 +8,7 @@ def initialize(tag_name, input, tokens) def render(context) input = @input.split + name = @input.split.drop(2).join(" ") config = context.registers[:site].config path = input[0] @@ -19,9 +20,10 @@ def render(context) # 2. 'gh_link_tag' of page frontmatter # 3. "master" (default) gh_tag = input[1].nil? ? (page_gh_tag.nil? ? "master" : page_gh_tag) : input[1] - + name = name.to_s == '' ? file : name + #refname = input[2].nil? ? file : input[2] - "#{file}" + "#{name}" end end end diff --git a/docs/faq.md b/docs/faq.md index 3ceb527411a1f..143453b303193 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -77,7 +77,7 @@ As a rule-of-thumb, the number of buffers should be at least ## My job fails early with a java.io.EOFException. What could be the cause? -Note: In version 0.4, the delta iterations limit the solution set to +Note: In version _0.4_, the delta iterations limit the solution set to records with fixed-length data types. We will in the next version. The most common case for these exception is when Stratosphere is set up with the @@ -100,7 +100,7 @@ Call to failed on local exception: java.io.EOFException ``` Please refer to the [download page](http://stratosphere.eu/downloads/#maven) and -the [build instructions](https://github.com/stratosphere/stratosphere/blob/master/README.md) +the {% gh_link README.md master build instructions %} for details on how to set up Stratosphere for different Hadoop and HDFS versions. ## My program does not compute the correct result. Why are my custom key types diff --git a/docs/iterations.md b/docs/iterations.md index 063f1219d74e1..c56e0f58280e2 100644 --- a/docs/iterations.md +++ b/docs/iterations.md @@ -6,10 +6,11 @@ Iterative algorithms occur in many domains of data analysis, such as *machine le Stratosphere programs implement iterative algorithms by defining a **step function** and embedding it into a special iteration operator. There are two variants of this operator: **Iterate** and **Delta Iterate**. Both operators repeatedly invoke the step function on the current iteration state until a certain termination condition is reached. -Here, we provide background on both operator variants and outline their usage. The [programming guides]({{ site.baseurl }}/docs/0.4/programming_guides/) explain how to implement the operators in both [Scala]({{ site.baseurl }}/docs/0.4/programming_guides/scala.html) and [Java]({{ site.baseurl }}/docs/0.4/programming_guides/java.html#iterations). We also provide a **vertex-centric graph processing API** called [Spargel]({{ site.baseurl }}/docs/0.4/programming_guides/spargel.html). +Here, we provide background on both operator variants and outline their usage. The [programming guides](java_api_guide.html) explain how to implement the operators in both [Scala](scala_api_guide.html) and [Java](java_api_guide.html#iterations). We also provide a **vertex-centric graph processing API** called [Spargel](spargel_guide.html). The following table provides an overview of both operators: + @@ -64,11 +65,11 @@ Iterate Operator The **iterate operator** covers the *simple form of iterations*: in each iteration, the **step function** consumes the **entire input** (the *result of the previous iteration*, or the *initial data set*), and computes the **next version of the partial solution** (e.g. `map`, `reduce`, `join`, etc.).

- Iterate Operator + Iterate Operator

1. **Iteration Input**: Initial input for the *first iteration* from a *data source* or *previous operators*. - 2. **Step Function**: The step function will be executed in each iteration. It is an arbitrary data flow consisting of operators like `map`, `reduce`, `join`, etc. (see [programming model]({{ site.baseurl }}/docs/0.4/programming_guides/pmodel.html) for details) and depends on your specific task at hand. + 2. **Step Function**: The step function will be executed in each iteration. It is an arbitrary data flow consisting of operators like `map`, `reduce`, `join`, etc. and depends on your specific task at hand. 3. **Next Partial Solution**: In each iteration, the output of the step function will be fed back into the *next iteration*. 4. **Iteration Result**: Output of the *last iteration* is written to a *data sink* or used as input to the *following operators*. @@ -79,7 +80,7 @@ There are multiple options to specify **termination conditions** for an iteratio You can also think about the iterate operator in pseudo-code: -{% highlight java %} +```java IterationState state = getInitialState(); while (!terminationCriterion()) { @@ -87,11 +88,11 @@ while (!terminationCriterion()) { } setFinalState(state); -{% endhighlight %} +```
- See the Scala and Java programming guides for details and code examples.
+ See the Scala and Java programming guides for details and code examples.
### Example: Incrementing Numbers @@ -99,7 +100,7 @@ setFinalState(state); In the following example, we **iteratively incremenet a set numbers**:

- Iterate Operator Example + Iterate Operator Example

1. **Iteration Input**: The inital input is read from a data source and consists of five single-field records (integers `1` to `5`). @@ -128,18 +129,19 @@ The **delta iterate operator** covers the case of **incremental iterations**. In Where applicable, this leads to **more efficient algorithms**, because not every element in the solution set changes in each iteration. This allows to **focus on the hot parts** of the solution and leave the **cold parts untouched**. Frequently, the majority of the solution cools down comparatively fast and the later iterations operate only on a small subset of the data.

- Delta Iterate Operator + Delta Iterate Operator

1. **Iteration Input**: The initial workset and solution set are read from *data sources* or *previous operators* as input to the first iteration. - 2. **Step Function**: The step function will be executed in each iteration. It is an arbitrary data flow consisting of operators like `map`, `reduce`, `join`, etc. (see [programming model]({{ site.baseurl }}/docs/0.4/programming_guides/pmodel.html) for details) and depends on your specific task at hand. + 2. **Step Function**: The step function will be executed in each iteration. It is an arbitrary data flow consisting of operators like `map`, `reduce`, `join`, etc. and depends on your specific task at hand. 3. **Next Workset/Update Solution Set**: The *next workset* drives the iterative computation and will be fed back into the *next iteration*. Furthermore, the solution set will be updated and implicitly forwarded (it is not required to be rebuild). Both data sets can be updated by different operators of the step function. 4. **Iteration Result**: After the *last iteration*, the *solution set* is written to a *data sink* or used as input to the *following operators*. The default **termination condition** for delta iterations is specified by the **empty workset convergence criterion** and a **maximum number of iterations**. The iteration will terminate when a produced *next workset* is empty or when the maximum number of iterations is reached. It is also possible to specify a **custom aggregator** and **convergence criterion**. You can also think about the iterate operator in pseudo-code: -{% highlight java %} + +```java IterationState workset = getInitialState(); IterationState solution = getInitialSolution(); @@ -150,11 +152,11 @@ while (!terminationCriterion()) { } setFinalState(solution); -{% endhighlight %} +```
- See the Scala and Java programming guides for details and code examples.
+ See the Scala and Java programming guides for details and code examples.
### Example: Propagate Minimum in Graph @@ -162,7 +164,7 @@ setFinalState(solution); In the following example, every vertex has an **ID** and a **coloring**. Each vertex will propagete its vertex ID to neighboring vertices. The **goal** is to *assign the minimum ID to every vertex in a subgraph*. If a received ID is smaller then the current one, it changes to the color of the vertex with the received ID. One application of this can be found in *community analysis* or *connected components* computation.

- Delta Iterate Operator Example + Delta Iterate Operator Example

The **intial input** is set as **both workset and solution set.** In the above figure, the colors visualize the **evolution of the solution set**. With each iteration, the color of the minimum ID is spreading in the respective subgraph. At the same time, the amount of work (exchanged and compared vertex IDs) decreases with each iteration. This corresponds to the **decreasing size of the workset**, which goes from all seven vertices to zero after three iterations, at which time the iteration terminates. The **important observation** is that *the lower subgraph converges before the upper half* does and the delta iteration is able to capture this with the workset abstraction. @@ -183,6 +185,6 @@ Superstep Synchronization We referred to each execution of the step function of an iteration operator as *a single iteration*. In parallel setups, **multiple instances of the step function are evaluated in parallel** on different partitions of the iteration state. In many settings, one evaluation of the step function on all parallel instances forms a so called **superstep**, which is also the granularity of synchronization. Therefore, *all* parallel tasks of an iteration need to complete the superstep, before a next superstep will be initialized. **Termination criteria** will also be evaluated at superstep barriers.

- Supersteps + Supersteps

\ No newline at end of file diff --git a/docs/java_api_examples.md b/docs/java_api_examples.md index ddb43e6ea95f7..e71c7b63df3c6 100644 --- a/docs/java_api_examples.md +++ b/docs/java_api_examples.md @@ -4,9 +4,9 @@ title: "Java API Examples" The following example programs showcase different applications of Stratosphere from simple word counting to graph algorithms. The code samples illustrate the -use of **[Stratosphere's Java API]({{site.baseurl}}/docs/{{site current_stable}}/programming_guides/java.html)**. +use of [Stratosphere's Java API](java_api_guide.html). -The full source code of the following and more examples can be found in the **[stratosphere-java-examples](https://github.com/stratosphere/stratosphere/tree/release-{{site.current_stable}}/stratosphere-examples/stratosphere-java-examples)** module. +The full source code of the following and more examples can be found in the __stratosphere-java-examples__ module. # Word Count WordCount is the "Hello World" of Big Data processing systems. It computes the frequency of words in a text collection. The algorithm works in two steps: First, the texts are splits the text to individual words. Second, the words are grouped and counted. @@ -42,13 +42,13 @@ public static final class Tokenizer extends FlatMapFunction, `. As test data, any text file will do. +The [WordCount example](https://github.com/apache/incubator-flink/blob/cd665b9e8abec2bbfecf384fe7273bd50f22ce67/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/wordcount/WordCount.java) implements the above described algorithm with input parameters: `, `. As test data, any text file will do. # Page Rank The PageRank algorithm computes the "importance" of pages in a graph defined by links, which point from one pages to another page. It is an iterative graph algorithm, which means that it repeatedly applies the same computation. In each iteration, each page distributes its current rank over all its neighbors, and compute its new rank as a taxed sum of the ranks it received from its neighbors. The PageRank algorithm was popularized by the Google search engine which uses the importance of webpages to rank the results of search queries. -In this simple example, PageRank is implemented with a [bulk iteration]({{site.baseurl}}/docs/{{site.current_stable}}/programming_guides/java.html#iterations) and a fixed number of iterations. +In this simple example, PageRank is implemented with a [bulk iteration](java_api_guide.html#iterations) and a fixed number of iterations. ```java // get input data @@ -118,7 +118,7 @@ public static final class EpsilonFilter } ``` -The [PageRank program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/graph/PageRankBasic.java) implements the above example. +The [PageRank program](https://github.com/apache/incubator-flink/blob/ca2b287a7a78328ebf43766b9fdf39b56fb5fd4f/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/graph/PageRankBasic.java) implements the above example. It requires the following parameters to run: `, , , , `. Input files are plain text files and must be formatted as follows: @@ -133,7 +133,7 @@ For this simple implementation it is required that each page has at least one in The Connected Components algorithm identifies parts of a larger graph which are connected by assigning all vertices in the same connected part the same component ID. Similar to PageRank, Connected Components is an iterative algorithm. In each step, each vertex propagates its current component ID to all its neighbors. A vertex accepts the component ID from a neighbor, if it is smaller than its own component ID. -This implementation uses a [delta iteration]({{site.baseurl}}/docs/{{site.current_stable}}/programming_guides/java.html#iterations): Vertices that have not changed their component ID do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices. +This implementation uses a [delta iteration](iterations.html): Vertices that have not changed their component ID do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices. ```java // read vertex and edge data @@ -209,7 +209,7 @@ public static final class ComponentIdFilter } ``` -The [ConnectedComponents program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/graph/ConnectedComponents.java) implements the above example. It requires the following parameters to run: `, , `. +The [ConnectedComponents program](https://github.com/apache/incubator-flink/blob/ca2b287a7a78328ebf43766b9fdf39b56fb5fd4f/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/graph/ConnectedComponents.java) implements the above example. It requires the following parameters to run: `, , `. Input files are plain text files and must be formatted as follows: - Vertices represented as IDs and separated by new-line characters. @@ -280,7 +280,7 @@ DataSet> priceSums = priceSums.writeAsCsv(outputPath); ``` -The [Relational Query program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/relational/RelationalQuery.java) implements the above query. It requires the following parameters to run: `, , `. +The [Relational Query program](https://github.com/apache/incubator-flink/blob/ca2b287a7a78328ebf43766b9fdf39b56fb5fd4f/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/relational/RelationalQuery.java) implements the above query. It requires the following parameters to run: `, , `. The orders and lineitem files can be generated using the [TPC-H benchmark](http://www.tpc.org/tpch/) suite's data generator tool (DBGEN). Take the following steps to generate arbitrary large input files for the provided Stratosphere programs: diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index c09c26a13ade9..e30dc979b78c6 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1,7 +1,7 @@ --- title: "Java API Programming Guide" --- - +
Java API ======== @@ -12,10 +12,9 @@ Introduction Analysis programs in Stratosphere are regular Java programs that implement transformations on data sets (e.g., filtering, mapping, joining, grouping). The data sets are initially created from certain sources (e.g., by reading files, or from collections). Results are returned via sinks, which may for example write the data to (distributed) files, or to standard output (for example the command line terminal). Stratosphere programs run in a variety of contexts, standalone, or embedded in other programs. The execution can happen in a local JVM, or on clusters of many machines. In order to create your own Stratosphere program, we encourage you to start with the [program skeleton](#skeleton) and gradually add your own [transformations](#transformations). The remaining sections act as references for additional operations and advanced features. - -
+
{% for sublink in page.toc %} @@ -24,8 +23,6 @@ In order to create your own Stratosphere program, we encourage you to start with
-
-
Example Program --------------- @@ -62,8 +59,7 @@ public class WordCountExample { } ``` - -
+[Back to top](#top)
Linking with Stratosphere @@ -73,16 +69,16 @@ To write programs with Stratosphere, you need to include Stratosphere’s Java A The simplest way to do this is to use the [quickstart scripts]({{site.baseurl}}/quickstart/java.html). They create a blank project from a template (a Maven Archetype), which sets up everything for you. To manually create the project, you can use the archetype and create a project by calling: -{% highlight bash %} +```bash mvn archetype:generate / -DarchetypeGroupId=eu.stratosphere / -DarchetypeArtifactId=quickstart-java / -DarchetypeVersion={{site.docs_05_stable}} -{% endhighlight %} +``` If you want to add Stratosphere to an existing Maven project, add the following entry to your *dependencies* section in the *pom.xml* file of your project: -{% highlight xml %} +```xml eu.stratosphere stratosphere-java @@ -93,15 +89,14 @@ If you want to add Stratosphere to an existing Maven project, add the following stratosphere-clients {{site.docs_05_stable}} -{% endhighlight %} +``` In order to link against the latest SNAPSHOT versions of the code, please follow [this guide]({{site.baseurl}}/downloads/#nightly). The *stratosphere-clients* dependency is only necessary to invoke the Stratosphere program locally (for example to run it standalone for testing and debugging). If you intend to only export the program as a JAR file and [run it on a cluster]({{site.baseurl}}/docs/0.5/program_execution/cluster_execution.html), you can skip that dependency. - -
+[Back to top](#top)
Program Skeleton @@ -206,8 +201,7 @@ Once you specified the complete program you need to call `execute` on the `ExecutionEnvironment`. This will either execute on your local machine or submit your program for execution on a cluster, depending on how you created the execution environment. - -
+[Back to top](#top)
Lazy Evaluation @@ -216,7 +210,6 @@ Lazy Evaluation All Stratosphere programs are executed lazily: When the program's main method is executed, the data loading and transformations do not happen directly. Rather, each operation is created and added to the program's plan. The operations are actually executed when one of the `execute()` methods is invoked on the ExecutionEnvironment object. Whether the program is executed locally or on a cluster depends on the environment of the program. The lazy evaluation lets you construct sophisticated programs that Stratosphere executes as one holistically planned unit. -
Data Types @@ -346,8 +339,8 @@ The type inference has its limits and needs the "cooperation" of the programmer The [ResultTypeQueryable](https://github.com/stratosphere/stratosphere/blob/{{ site.docs_05_stable_gh_tag }}/stratosphere-java/src/main/java/eu/stratosphere/api/java/typeutils/ResultTypeQueryable.java) interface can be implemented by input formats and functions to tell the API explicitly about their return type. The *input types* that the functions are invoked with can usually be inferred by the result types of the previous operations. - -
+[Back to top](#top) +
Data Transformations @@ -992,8 +985,8 @@ DataSet> unioned = vals1.union(vals2) ``` - -
+[Back to top](#top) +
Data Sources @@ -1061,8 +1054,8 @@ DataSet dbData = // Note: Stratosphere's program compiler needs to infer the data types of the data items which are returned by an InputFormat. If this information cannot be automatically inferred, it is necessary to manually provide the type information as shown in the examples above. ``` - -
+[Back to top](#top) +
Data Sinks @@ -1120,8 +1113,8 @@ myResult.output( ); ``` - -
+[Back to top](#top) +
Debugging @@ -1155,9 +1148,7 @@ env.execute(); Providing input for an analysis program and checking its output is cumbersome done by creating input files and reading output files. Stratosphere features special data sources and sinks which are backed by Java collections to ease testing. Once a program has been tested, the sources and sinks can be easily replaced by sources and sinks that read from / write to external data stores such as HDFS. -

Collection data sources can be used as follows: -

```java final ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment(); @@ -1176,9 +1167,7 @@ DataSet myLongs = env.fromCollection(longIt, Long.class); **Note:** Currently, the collection data source requires that data types and iterators implement `Serializable`. Furthermore, collection data sources can not be executed in parallel (degree of parallelism = 1). -

A collection data sink is specified as follows: -

```java DataSet> myResult = ... @@ -1189,8 +1178,8 @@ myResult.output(new LocalCollectionOutputFormat(outData)); **Note:** Collection data sources will only work correctly, if the whole program is executed in the same JVM! - -
+[Back to top](#top) +
Iteration Operators @@ -1280,8 +1269,8 @@ iteration.closeWith(deltas, nextWorkset) .writeAsCsv(outputPath); ``` - -
+[Back to top](#top) +
Semantic Annotations @@ -1320,8 +1309,8 @@ The following annotations are currently available: **Note**: It is important to be conservative when providing annotations. Only annotate fields, when they are always constant for every call to the function. Otherwise the system has incorrect assumptions about the execution and the execution may produce wrong results. If the behavior of the operator is not clearly predictable, no annotation should be provided. - -
+[Back to top](#top) +
Broadcast Variables @@ -1358,8 +1347,8 @@ Make sure that the names (`broadcastSetName` in the previous example) match when **Note**: As the content of broadcast variables is kept in-memory on each node, it should not become too large. For simpler things like scalar values you can simply make parameters part of the closure of a function, or use the `withParameters(...)` method to pass in a configuration. - -
+[Back to top](#top) +
Program Packaging & Distributed Execution @@ -1387,8 +1376,8 @@ The overall procedure to invoke a packaged program is as follows: 2. If the entry point class implements the `eu.stratosphere.api.common.Program`, then the system calls the `getPlan(String...)` method to obtain the program plan to execute. The `getPlan(String...)` method was the only possible way of defining a program in the *Record API* (see [0.4 docs]({{ site.baseurl }}/docs/0.4/)) and is also supported in the new Java API. 3. If the entry point class does not implement the `eu.stratosphere.api.common.Program` interface, the system will invoke the main method of the class. - -
+[Back to top](#top) +
Accumulators & Counters @@ -1432,8 +1421,8 @@ To implement your own accumulator you simply have to write your implementation o You have the choice to implement either [Accumulator](https://github.com/stratosphere/stratosphere/blob/{{ site.docs_05_stable_gh_tag }}/stratosphere-core/src/main/java/eu/stratosphere/api/common/accumulators/Accumulator.java) or [SimpleAccumulator](https://github.com/stratosphere/stratosphere/blob/{{ site.docs_05_stable_gh_tag }}/stratosphere-core/src/main/java/eu/stratosphere/api/common/accumulators/SimpleAccumulator.java). ```Accumulator``` is most flexible: It defines a type ```V``` for the value to add, and a result type ```R``` for the final result. E.g. for a histogram, ```V``` is a number and ```R``` is a histogram. ```SimpleAccumulator``` is for the cases where both types are the same, e.g. for counters. - -
+[Back to top](#top) +
Execution Plans @@ -1472,5 +1461,5 @@ The script to start the webinterface is located under ```bin/start-webclient.sh` You are able to specify program arguments in the textbox at the bottom of the page. Checking the plan visualization checkbox shows the execution plan before executing the actual program. - -
+[Back to top](#top) + diff --git a/docs/local_execution.md b/docs/local_execution.md index cd60f62c58b4e..d7ed37c28a3a7 100644 --- a/docs/local_execution.md +++ b/docs/local_execution.md @@ -6,7 +6,7 @@ title: "Local Execution" Stratosphere can run on a single machine, even in a single Java Virtual Machine. This allows users to test and debug Stratosphere programs locally. This section gives an overview of the local execution mechanisms. -**NOTE:** Please also refer to the [debugging section]({{site.baseurl}}/docs/0.5/programming_guides/java.html#debugging) in the Java API documentation for a guide to testing and local debugging utilities in the Java API. +**NOTE:** Please also refer to the [debugging section](java_api_guide.html#debugging) in the Java API documentation for a guide to testing and local debugging utilities in the Java API. The local environments and executors allow you to run Stratosphere programs in local Java Virtual Machine, or with within any JVM as part of existing programs. Most examples can be launched locally by simply hitting the "Run" button of your IDE. @@ -35,7 +35,7 @@ The `LocalEnvironment` is a handle to local execution for Stratosphere programs. The local environment is instantiated via the method `ExecutionEnvironment.createLocalEnvironment()`. By default, it will use as many local threads for execution as your machine has CPU cores (hardware contexts). You can alternatively specify the desired parallelism. The local environment can be configured to log to the console using `enableLogging()`/`disableLogging()`. -In most cases, calling `ExecutionEnvironment.getExecutionEnvironment()` is the even better way to go. That method returns a `LocalEnvironment` when the program is started locally (outside the command line interface), and it returns a pre-configured environment for cluster execution, when the program is invoked by the [command line interface]({{ site.baseurl }}/docs/0.5/program_execution/cli_client.html). +In most cases, calling `ExecutionEnvironment.getExecutionEnvironment()` is the even better way to go. That method returns a `LocalEnvironment` when the program is started locally (outside the command line interface), and it returns a pre-configured environment for cluster execution, when the program is invoked by the [command line interface](cli.html). ```java public static void main(String[] args) throws Exception { diff --git a/docs/scala_api_examples.md b/docs/scala_api_examples.md index ac930b3ad2eb3..d948576d9d32e 100644 --- a/docs/scala_api_examples.md +++ b/docs/scala_api_examples.md @@ -3,9 +3,9 @@ title: "Scala API Examples" --- The following example programs showcase different applications of Stratosphere from simple word counting to graph algorithms. -The code samples illustrate the use of **[Stratosphere's Scala API]({{site.baseurl}}/docs/{{site.current_stable}}/programming_guides/scala.html)**. +The code samples illustrate the use of [Stratosphere's Scala API](scala_api_guide.html). -The full source code of the following and more examples can be found in the **[stratosphere-scala-examples](https://github.com/stratosphere/stratosphere/tree/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples)** module. +The full source code of the following and more examples can be found in the [stratosphere-scala-examples](https://github.com/apache/incubator-flink/tree/ca2b287a7a78328ebf43766b9fdf39b56fb5fd4f/stratosphere-examples/stratosphere-scala-examples) module. # Word Count @@ -25,13 +25,13 @@ val counts = words.groupBy { case (word, _) => word } val output = counts.write(wordsOutput, CsvOutputFormat())) ``` -The [WordCount example](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/wordcount/WordCount.scala) implements the above described algorithm with input parameters: `, , `. As test data, any text file will do. +The [WordCount example](https://github.com/apache/incubator-flink/blob/b746f452e7187dad08340b9cfdc2fa18a516a6c7/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/wordcount/WordCount.scala) implements the above described algorithm with input parameters: `, , `. As test data, any text file will do. # Page Rank The PageRank algorithm computes the "importance" of pages in a graph defined by links, which point from one pages to another page. It is an iterative graph algorithm, which means that it repeatedly applies the same computation. In each iteration, each page distributes its current rank over all its neighbors, and compute its new rank as a taxed sum of the ranks it received from its neighbors. The PageRank algorithm was popularized by the Google search engine which uses the importance of webpages to rank the results of search queries. -In this simple example, PageRank is implemented with a [bulk iteration]({{site.baseurl}}/docs/{{site.current_stable}}/programming_guides/java.html#iterations) and a fixed number of iterations. +In this simple example, PageRank is implemented with a [bulk iteration](java_api_guide.html#iterations) and a fixed number of iterations. ```scala // cases classes so we have named fields @@ -71,7 +71,7 @@ val output = finalRanks.write(outputPath, CsvOutputFormat()) -The [PageRank program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/graph/PageRank.scala) implements the above example. +The [PageRank program](https://github.com/apache/incubator-flink/blob/b746f452e7187dad08340b9cfdc2fa18a516a6c7/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/graph/PageRank.scala) implements the above example. It requires the following parameters to run: `, , , , `. Input files are plain text files and must be formatted as follows: @@ -86,7 +86,7 @@ For this simple implementation it is required that each page has at least one in The Connected Components algorithm identifies parts of a larger graph which are connected by assigning all vertices in the same connected part the same component ID. Similar to PageRank, Connected Components is an iterative algorithm. In each step, each vertex propagates its current component ID to all its neighbors. A vertex accepts the component ID from a neighbor, if it is smaller than its own component ID. -This implementation uses a [delta iteration]({{site.baseurl}}/docs/{{site.current_stable}}/programming_guides/java.html#iterations): Vertices that have not changed their component id do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices. +This implementation uses a [delta iteration](iterations.html): Vertices that have not changed their component id do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices. ```scala // define case classes @@ -123,7 +123,7 @@ val components = initialComponents.iterateWithDelta(initialComponents, { _.verte val output = components.write(componentsOutput, CsvOutputFormat()) ``` -The [ConnectedComponents program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/graph/ConnectedComponents.scala) implements the above example. It requires the following parameters to run: `, , `. +The [ConnectedComponents program](https://github.com/apache/incubator-flink/blob/b746f452e7187dad08340b9cfdc2fa18a516a6c7/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/graph/ConnectedComponents.scala) implements the above example. It requires the following parameters to run: `, , `. Input files are plain text files and must be formatted as follows: - Vertices represented as IDs and separated by new-line characters. @@ -171,7 +171,7 @@ val prioritizedOrders = prioritizedItems val output = prioritizedOrders.write(ordersOutput, CsvOutputFormat(formatOutput)) ``` -The [Relational Query program](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/relational/RelationalQuery.scala) implements the above query. It requires the following parameters to run: `, , , `. +The [Relational Query program](https://github.com/apache/incubator-flink/blob/b746f452e7187dad08340b9cfdc2fa18a516a6c7/stratosphere-examples/stratosphere-scala-examples/src/main/scala/eu/stratosphere/examples/scala/relational/RelationalQuery.scala) implements the above query. It requires the following parameters to run: `, , , `. The orders and lineitem files can be generated using the [TPC-H benchmark](http://www.tpc.org/tpch/) suite's data generator tool (DBGEN). Take the following steps to generate arbitrary large input files for the provided Stratosphere programs: diff --git a/docs/scala_api_guide.md b/docs/scala_api_guide.md index 4b43938e46e5b..d63ba989e4a1f 100644 --- a/docs/scala_api_guide.md +++ b/docs/scala_api_guide.md @@ -2,7 +2,7 @@ title: "Scala API Programming Guide" --- - +
Scala Programming Guide ======================= @@ -21,6 +21,8 @@ documentation available [here](http://scala-lang.org/documentation/). Most of the examples can be understood by someone with a good understanding of programming in general, though. +[Back to top](#top) +
Word Count Example ------------------ @@ -77,6 +79,8 @@ which can then be executed on a cluster using `RemoteExecutor`. Here, the `LocalExecutor` is used to run the flow on the local computer. This is useful for debugging your job before running it on an actual cluster. +[Back to top](#top) +
Project Setup ------------- @@ -122,6 +126,8 @@ The first two imports contain things like `DataSet`, `Plan`, data sources, data sinks, and the operations. The last two imports are required if you want to run a data flow on your local machine, respectively cluster. +[Back to top](#top) +
The DataSet Abstraction ----------------------- @@ -155,6 +161,8 @@ val mapped = input map { a => (a._1, a._2 + 1)} The anonymous function would retrieve in `a` tuples of type `(String, Int)`. +[Back to top](#top) +
Data Types ---------- @@ -170,7 +178,7 @@ For custom data types that should also be used as a grouping key or join key the [Key](https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-core/src/main/java/eu/stratosphere/types/Key.java) interface must be implemented. - +[Back to top](#top)
Creating Data Sources @@ -201,6 +209,8 @@ formats are: We will now have a look at each of them and show how they are employed and in which situations. +[Back to top](#top) +
#### TextInputFormat @@ -220,6 +230,8 @@ val input = TextFile("") The `input` would then be a `DataSet[String]`. +[Back to top](#top) +
#### CsvInputFormat @@ -259,6 +271,8 @@ val input = DataSource("file:///some/file", CsvInputFormat[(Int, Int, String)](S Here only the specified fields would be read and 3-tuples created for you. The type of input would be `DataSet[(Int, Int, String)]`. +[Back to top](#top) +
#### DelimitedInputFormat @@ -290,6 +304,8 @@ In this example EdgeInputPattern is some regular expression used for parsing a line of text and `Path` is a custom case class that is used to represent the data. The type of input would in this case be `DataSet[Path]`. +[Back to top](#top) +
#### BinaryInputFormat @@ -320,6 +336,8 @@ val input = DataSource("file:///some/file", BinaryInputFormat( { input => Here `input` would be of type `DataSet[(Int, Double)]`. +[Back to top](#top) +
#### BinarySerializedInputFormat @@ -340,6 +358,7 @@ could use: ```scala val input = DataSource("file:///some/file", BinarySerializedInputFormat[(String, Int)]()) ``` +[Back to top](#top)
#### FixedLengthInputFormat @@ -357,6 +376,7 @@ FixedLengthInputFormat[Out](readFunction: (Array[Byte], Int) => Out, recordLengt The specified function gets an array and a position at which it must start reading the array and returns the element read from the binary data. +[Back to top](#top)
Operations on DataSet @@ -388,6 +408,8 @@ There are operations on `DataSet` that correspond to all the types of operators that the Stratosphere system supports. We will shortly go trough all of them with some examples. +[Back to top](#top) +
#### Basic Operator Templates @@ -438,6 +460,7 @@ val input: DataSet[(String, Int)] val mapped = input.filter { x => x._2 >= 3 } ``` +[Back to top](#top)
#### Field/Key Selectors @@ -660,6 +683,8 @@ def union(secondInput: DataSet[A]) Where `A` is the generic type of the `DataSet` on which you execute the `union`. +[Back to top](#top) +
Iterations ---------- @@ -729,6 +754,7 @@ refer to [iterations](iterations.html). A working example job is available here: [Scala Connected Components Example](examples_scala.html#connected_components) +[Back to top](#top)
Creating Data Sinks @@ -757,6 +783,8 @@ builtin formats or a custom output format. The builtin formats are: We will now have a look at each of them and show how they are employed and in which situations. +[Back to top](#top) +
#### DelimitedOutputFormat @@ -785,6 +813,8 @@ Here we use Scala String formatting to write the two fields of the tuple separated by a pipe character. The default newline delimiter will be inserted between the elements in the output files. +[Back to top](#top) +
#### CsvOutputFormat @@ -810,6 +840,8 @@ val sink = out.write("file:///some/file", CsvOutputFormat()) Notice how we don't need to specify the generic type here, it is inferred. +[Back to top](#top) +
#### RawOutputFormat @@ -850,6 +882,7 @@ A `BinaryOutputFormat` is created like this: BinaryOutputFormat[In](writeFunction: (In, DataOutput) => Unit) BinaryOutputFormat[In](writeFunction: (In, DataOutput) => Unit, blockSize: Long) ``` +[Back to top](#top)
#### BinarySerializedOutputFormat @@ -875,6 +908,8 @@ val sink = out.write("file:///some/file", BinarySerializedInputFormat()) As you can see the type of the elements need not be specified, it is inferred by Scala. +[Back to top](#top) +
Executing Jobs -------------- @@ -934,6 +969,8 @@ guide about how to set up a cluster. The default cluster port is 6123, so if you run a job manger on your local computer you can give this and "localhost" as the first to parameters to the `RemoteExecutor` constructor. +[Back to top](#top) +
Rich Functions -------------- @@ -1006,3 +1043,5 @@ abstract class FlatCrossFunction[LeftIn, RightIn, Out] Note that for all the rich stubs, you need to specify the generic type of the input (or inputs) and the output type. + +[Back to top](#top) \ No newline at end of file diff --git a/docs/scala_api_quickstart.md b/docs/scala_api_quickstart.md index e15eed0fa5a99..c8c6ee942c2ff 100644 --- a/docs/scala_api_quickstart.md +++ b/docs/scala_api_quickstart.md @@ -2,70 +2,62 @@ title: "Quick Start: Scala API" --- -

Start working on your Stratosphere Scala program in a few simple steps.

- -
- -

The only requirements are working Maven 3.0.4 (or higher) and Java 6.x (or higher) installations.

-
- -
- -

Use one of the following commands to create a project:

- - -
-
+Start working on your Stratosphere Scala program in a few simple steps. + +#Requirements +The only requirements are working __Maven 3.0.4__ (or higher) and __Java 6.x__ (or higher) installations. + + +#Create Project +Use one of the following commands to __create a project__: + + +
+
{% highlight bash %} $ curl https://raw.githubusercontent.com/stratosphere/stratosphere-quickstart/master/quickstart-scala.sh | bash {% endhighlight %} -
-
+
+
{% highlight bash %} $ mvn archetype:generate \ - -DarchetypeGroupId=eu.stratosphere \ - -DarchetypeArtifactId=quickstart-scala \ - -DarchetypeVersion={{site.current_stable}} + -DarchetypeGroupId=eu.stratosphere \ + -DarchetypeArtifactId=quickstart-scala \ + -DarchetypeVersion={{site.current_stable}} {% endhighlight %} - This allows you to name your newly created project. It will interactively ask you for the groupId, artifactId, and package name. -
-
-
- -
- -

There will be a new directory in your working directory. If you've used the curl approach, the directory is called quickstart. Otherwise, it has the name of your artifactId.

-

The sample project is a Maven project, which contains a sample scala Job that implements Word Count. Please note that the RunJobLocal and RunJobRemote objects allow you to start Stratosphere in a development/testing mode.

-

We recommend to import this project into your IDE. For Eclipse, you need the following plugins, which you can install from the provided Eclipse Update Sites: -

    -
  • Eclipse 4.x: -
      -
    • Scala IDE (http://download.scala-ide.org/sdk/e38/scala210/stable/site)
    • -
    • m2eclipse-scala (http://alchim31.free.fr/m2e-scala/update-site)
    • -
    • Build Helper Maven Plugin (https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.15.0/N/0.15.0.201206251206/)
    • -
    -
  • -
  • Eclipse 3.7: -
      -
    • Scala IDE (http://download.scala-ide.org/sdk/e37/scala210/stable/site)
    • -
    • m2eclipse-scala (http://alchim31.free.fr/m2e-scala/update-site)
    • -
    • Build Helper Maven Plugin (https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/)
    • -
    -
  • -
-

-

The IntelliJ IDE also supports Maven and offers a plugin for Scala development.

-
- -
- -

If you want to build your project, go to your project directory and issue the mvn clean package command. You will find a jar that runs on every Stratosphere cluster in target/stratosphere-project-0.1-SNAPSHOT.jar.

-
- -
- -

Write your application! If you have any trouble, ask on our GitHub page (open an issue) or on our Mailing list. We are happy to provide help.

-

+ This allows you to name your newly created project. It will interactively ask you for the groupId, artifactId, and package name. + + + + +#Inspect Project +There will be a __new directory in your working directory__. If you've used the _curl_ approach, the directory is called `quickstart`. Otherwise, it has the name of your artifactId. + +The sample project is a __Maven project__, which contains a sample scala _job_ that implements Word Count. Please note that the _RunJobLocal_ and _RunJobRemote_ objects allow you to start Stratosphere in a development/testing mode.

+ +We recommend to __import this project into your IDE__. For Eclipse, you need the following plugins, which you can install from the provided Eclipse Update Sites: + +* _Eclipse 4.x_ + * [Scala IDE](http://download.scala-ide.org/sdk/e38/scala210/stable/site) + * [m2eclipse-scala](http://alchim31.free.fr/m2e-scala/update-site) + * [Build Helper Maven Plugin](https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.15.0/N/0.15.0.201206251206/) +* _Eclipse 3.7_ + * [Scala IDE](http://download.scala-ide.org/sdk/e37/scala210/stable/site) + * [m2eclipse-scala](http://alchim31.free.fr/m2e-scala/update-site) + * [Build Helper Maven Plugin](https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/) + +The IntelliJ IDE also supports Maven and offers a plugin for Scala development. + + +# Build Project + +If you want to __build your project__, go to your project directory and issue the`mvn clean package` command. You will __find a jar__ that runs on every Stratosphere cluster in __target/stratosphere-project-0.1-SNAPSHOT.jar__. + +#Next Steps + +__Write your application!__ +If you have any trouble, ask on our [Jira page](https://issues.apache.org/jira/browse/FLINK) (open an issue) or on our Mailing list. We are happy to provide help. + diff --git a/docs/spargel_guide.md b/docs/spargel_guide.md index 5766f8b2f0a6f..d21badbd31b00 100644 --- a/docs/spargel_guide.md +++ b/docs/spargel_guide.md @@ -5,7 +5,7 @@ title: "Spargel Graph Processing API" Spargel ======= -Spargel is our [Giraph](http://giraph.apache.org) like **graph processing** Java API. It supports basic graph computations, which are run as a sequence of [supersteps]({{ site.baseurl }}/docs/0.4/programming_guides/iterations.html#supersteps). Spargel and Giraph both implement the [Bulk Synchronous Parallel (BSP)](https://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel) programming model, propsed by Google's [Pregel](http://googleresearch.blogspot.de/2009/06/large-scale-graph-computing-at-google.html). +Spargel is our [Giraph](http://giraph.apache.org) like **graph processing** Java API. It supports basic graph computations, which are run as a sequence of [supersteps](iterations.html#supersteps). Spargel and Giraph both implement the [Bulk Synchronous Parallel (BSP)](https://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel) programming model, propsed by Google's [Pregel](http://googleresearch.blogspot.de/2009/06/large-scale-graph-computing-at-google.html). The API provides a **vertex-centric** view on graph processing with two basic operations per superstep: @@ -40,7 +40,7 @@ Example: Propagate Minimum Vertex ID in Graph The Spargel operator **SpargelIteration** includes Spargel graph processing into your data flow. As usual, it can be combined with other operators like *map*, *reduce*, *join*, etc. -{% highlight java %} +```java FileDataSource vertices = new FileDataSource(...); FileDataSource edges = new FileDataSource(...); @@ -53,7 +53,8 @@ FileDataSink result = new FileDataSink(...); result.setInput(iteration.getOutput()); new Plan(result); -{% endhighlight %} +``` + Besides the **program logic** of vertex updates in *MinNeighborUpdater* and messages in *MinMessager*, you have to specify the **initial vertex** and **edge input**. Every vertex has a **key** and **value**. In each superstep, it **receives messages** from other vertices and updates its state: - **Vertex** input: **(id**: *VertexKeyType*, **value**: *VertexValueType***)** @@ -62,12 +63,12 @@ Besides the **program logic** of vertex updates in *MinNeighborUpdater* and mess For our example, we set the vertex ID as both *id and value* (initial minimum) and *leave out the edge values* as we don't need them:

- Spargel Example Input + Spargel Example Input

In order to **propagate the minimum vertex ID**, we iterate over all received messages (which contain the neighboring IDs) and update our value, if we found a new minimum: -{% highlight java %} +```java public class MinNeighborUpdater extends VertexUpdateFunction { @Override @@ -86,11 +87,11 @@ public class MinNeighborUpdater extends VertexUpdateFunction { @Override @@ -99,7 +100,7 @@ public class MinMessager extends MessagingFunction - Spargel Example + Spargel Example

\ No newline at end of file diff --git a/docs/web_client.md b/docs/web_client.md index 98cfd6942d99a..faaf7c03b1014 100644 --- a/docs/web_client.md +++ b/docs/web_client.md @@ -14,7 +14,7 @@ and stop it by calling: ./bin/stop-webclient.sh -The web interface runs on port 8080 by default. To specify a custom port set the ```webclient.port``` property in the *./conf/stratosphere.yaml* configuration file. Jobs are submitted to the JobManager specified by ```jobmanager.rpc.address``` and ```jobmanager.rpc.port```. Please consult the [configuration](../setup/config.html#web_frontend "Configuration") page for details and further configuration options. +The web interface runs on port 8080 by default. To specify a custom port set the ```webclient.port``` property in the *./conf/stratosphere.yaml* configuration file. Jobs are submitted to the JobManager specified by ```jobmanager.rpc.address``` and ```jobmanager.rpc.port```. Please consult the [configuration](config.html#web_frontend) page for details and further configuration options. # Use the Web Interface diff --git a/docs/yarn_setup.md b/docs/yarn_setup.md index c317e064dedbf..db2d97791f9dc 100644 --- a/docs/yarn_setup.md +++ b/docs/yarn_setup.md @@ -172,7 +172,7 @@ Please post to the [Stratosphere mailinglist](https://groups.google.com/d/forum/ This section briefly describes how Stratosphere and YARN interact. - + The YARN client needs to access the Hadoop configuration to connect to the YARN resource manager and to HDFS. It determines the Hadoop configuration using the following strategy: @@ -185,4 +185,4 @@ The next step of the client is to request (step 2) a YARN container to start the The *JobManager* and AM are running in the same container. Once they successfully started, the AM knows the address of the JobManager (its own host). It is generating a new Stratosphere configuration file for the TaskManagers (so that they can connect to the JobManager). The file is also uploaded to HDFS. Additionally, the *AM* container is also serving Stratosphere's web interface. -After that, the AM starts allocating the containers for Stratosphere's TaskManagers, which will download the jar file and the modified configuration from the HDFS. Once these steps are completed, Stratosphere is set up and ready to accept Jobs. +After that, the AM starts allocating the containers for Stratosphere's TaskManagers, which will download the jar file and the modified configuration from the HDFS. Once these steps are completed, Stratosphere is set up and ready to accept Jobs. \ No newline at end of file From 03ac37d8927afa671b28fd4b1c3d297c518b0580 Mon Sep 17 00:00:00 2001 From: Robert Metzger Date: Wed, 2 Jul 2014 14:43:35 +0200 Subject: [PATCH 2/6] documentation: fix internals linking --- docs/{internals => }/ClientJmTm.svg | 0 docs/{internals => }/JobManagerComponents.svg | 0 docs/_layouts/docs.html | 4 +++- .../add_operator.md => internal_add_operator.md} | 0 ...d_runtime.md => internal_distributed_runtime.md} | 0 .../general_arch.md => internal_general_arch.md} | 0 ...job_scheduling.md => internal_job_scheduling.md} | 0 ...d_memory.md => internal_operators_and_memory.md} | 0 .../optimizer.md => internal_optimizer.md} | 0 .../{internals/overview.md => internal_overview.md} | 4 ++-- ...life_cycle.md => internal_program_life_cycle.md} | 0 docs/{internals => }/jobgraph_executiongraph.svg | 0 docs/{internals => }/projects_dependencies.svg | 0 docs/setup_quickstart.md | 2 +- docs/{internals => }/slot_based_scheduling.jpg | Bin docs/{internals => }/stack.svg | 0 docs/{internals => }/state_machine.jpg | Bin 17 files changed, 6 insertions(+), 4 deletions(-) rename docs/{internals => }/ClientJmTm.svg (100%) rename docs/{internals => }/JobManagerComponents.svg (100%) rename docs/{internals/add_operator.md => internal_add_operator.md} (100%) rename docs/{internals/distributed_runtime.md => internal_distributed_runtime.md} (100%) rename docs/{internals/general_arch.md => internal_general_arch.md} (100%) rename docs/{internals/job_scheduling.md => internal_job_scheduling.md} (100%) rename docs/{internals/operators_and_memory.md => internal_operators_and_memory.md} (100%) rename docs/{internals/optimizer.md => internal_optimizer.md} (100%) rename docs/{internals/overview.md => internal_overview.md} (87%) rename docs/{internals/program_life_cycle.md => internal_program_life_cycle.md} (100%) rename docs/{internals => }/jobgraph_executiongraph.svg (100%) rename docs/{internals => }/projects_dependencies.svg (100%) rename docs/{internals => }/slot_based_scheduling.jpg (100%) rename docs/{internals => }/stack.svg (100%) rename docs/{internals => }/state_machine.jpg (100%) diff --git a/docs/internals/ClientJmTm.svg b/docs/ClientJmTm.svg similarity index 100% rename from docs/internals/ClientJmTm.svg rename to docs/ClientJmTm.svg diff --git a/docs/internals/JobManagerComponents.svg b/docs/JobManagerComponents.svg similarity index 100% rename from docs/internals/JobManagerComponents.svg rename to docs/JobManagerComponents.svg diff --git a/docs/_layouts/docs.html b/docs/_layouts/docs.html index 4b99d4a604cd8..203783559a46e 100644 --- a/docs/_layouts/docs.html +++ b/docs/_layouts/docs.html @@ -71,7 +71,9 @@

Apache Flink {{ site.FLINK_VERSION }} Documentation

  • Internals
  • diff --git a/docs/internals/add_operator.md b/docs/internal_add_operator.md similarity index 100% rename from docs/internals/add_operator.md rename to docs/internal_add_operator.md diff --git a/docs/internals/distributed_runtime.md b/docs/internal_distributed_runtime.md similarity index 100% rename from docs/internals/distributed_runtime.md rename to docs/internal_distributed_runtime.md diff --git a/docs/internals/general_arch.md b/docs/internal_general_arch.md similarity index 100% rename from docs/internals/general_arch.md rename to docs/internal_general_arch.md diff --git a/docs/internals/job_scheduling.md b/docs/internal_job_scheduling.md similarity index 100% rename from docs/internals/job_scheduling.md rename to docs/internal_job_scheduling.md diff --git a/docs/internals/operators_and_memory.md b/docs/internal_operators_and_memory.md similarity index 100% rename from docs/internals/operators_and_memory.md rename to docs/internal_operators_and_memory.md diff --git a/docs/internals/optimizer.md b/docs/internal_optimizer.md similarity index 100% rename from docs/internals/optimizer.md rename to docs/internal_optimizer.md diff --git a/docs/internals/overview.md b/docs/internal_overview.md similarity index 87% rename from docs/internals/overview.md rename to docs/internal_overview.md index 3e2b23f798128..f3090d090522a 100644 --- a/docs/internals/overview.md +++ b/docs/internal_overview.md @@ -15,7 +15,7 @@ or pull request that updates these documents as well.* ### Architectures and Components -- [General Architecture and Process Model](general_arch.html) +- [General Architecture and Process Model](internal_general_arch.html) -- [How-to: Adding a new Operator](add_operator.html) +- [How-to: Adding a new Operator](internal_add_operator.html)