Skip to content

Commit

Permalink
[SPARK-16911] Fix the links in the programming guide
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

 Fix the broken links in the programming guide of the Graphx Migration and understanding closures

## How was this patch tested?

By running the test cases  and checking the links.

Author: Shivansh <shiv4nsh@gmail.com>

Closes #14503 from shiv4nsh/SPARK-16911.
  • Loading branch information
shiv4nsh authored and srowen committed Aug 7, 2016
1 parent 1275f64 commit 6c1ecb1
Show file tree
Hide file tree
Showing 3 changed files with 1 addition and 106 deletions.
17 changes: 0 additions & 17 deletions docs/graphx-programming-guide.md
Expand Up @@ -67,23 +67,6 @@ operators (e.g., [subgraph](#structural_operators), [joinVertices](#join_operato
[aggregateMessages](#aggregateMessages)) as well as an optimized variant of the [Pregel](#pregel) API. In addition, GraphX includes a growing collection of graph [algorithms](#graph_algorithms) and
[builders](#graph_builders) to simplify graph analytics tasks.


## Migrating from Spark 1.1

GraphX in Spark 1.2 contains a few user facing API changes:

1. To improve performance we have introduced a new version of
[`mapReduceTriplets`][Graph.mapReduceTriplets] called
[`aggregateMessages`][Graph.aggregateMessages] which takes the messages previously returned from
[`mapReduceTriplets`][Graph.mapReduceTriplets] through a callback ([`EdgeContext`][EdgeContext])
rather than by return value.
We are deprecating [`mapReduceTriplets`][Graph.mapReduceTriplets] and encourage users to consult
the [transition guide](#mrTripletsTransition).

2. In Spark 1.0 and 1.1, the type signature of [`EdgeRDD`][EdgeRDD] switched from
`EdgeRDD[ED]` to `EdgeRDD[ED, VD]` to enable some caching optimizations. We have since discovered
a more elegant solution and have restored the type signature to the more natural `EdgeRDD[ED]` type.

# Getting Started

To get started you first need to import Spark and GraphX into your project, as follows:
Expand Down
45 changes: 1 addition & 44 deletions docs/programming-guide.md
Expand Up @@ -1097,7 +1097,7 @@ for details.
<tr>
<td> <b>foreach</b>(<i>func</i>) </td>
<td> Run a function <i>func</i> on each element of the dataset. This is usually done for side effects such as updating an <a href="#accumulators">Accumulator</a> or interacting with external storage systems.
<br /><b>Note</b>: modifying variables other than Accumulators outside of the <code>foreach()</code> may result in undefined behavior. See <a href="#ClosuresLink">Understanding closures </a> for more details.</td>
<br /><b>Note</b>: modifying variables other than Accumulators outside of the <code>foreach()</code> may result in undefined behavior. See <a href="#understanding-closures-a-nameclosureslinka">Understanding closures </a> for more details.</td>
</tr>
</table>

Expand Down Expand Up @@ -1544,49 +1544,6 @@ and then call `SparkContext.stop()` to tear it down.
Make sure you stop the context within a `finally` block or the test framework's `tearDown` method,
as Spark does not support two contexts running concurrently in the same program.

# Migrating from pre-1.0 Versions of Spark

<div class="codetabs">

<div data-lang="scala" markdown="1">

Spark 1.0 freezes the API of Spark Core for the 1.X series, in that any API available today that is
not marked "experimental" or "developer API" will be supported in future versions.
The only change for Scala users is that the grouping operations, e.g. `groupByKey`, `cogroup` and `join`,
have changed from returning `(Key, Seq[Value])` pairs to `(Key, Iterable[Value])`.

</div>

<div data-lang="java" markdown="1">

Spark 1.0 freezes the API of Spark Core for the 1.X series, in that any API available today that is
not marked "experimental" or "developer API" will be supported in future versions.
Several changes were made to the Java API:

* The Function classes in `org.apache.spark.api.java.function` became interfaces in 1.0, meaning that old
code that `extends Function` should `implement Function` instead.
* New variants of the `map` transformations, like `mapToPair` and `mapToDouble`, were added to create RDDs
of special data types.
* Grouping operations like `groupByKey`, `cogroup` and `join` have changed from returning
`(Key, List<Value>)` pairs to `(Key, Iterable<Value>)`.

</div>

<div data-lang="python" markdown="1">

Spark 1.0 freezes the API of Spark Core for the 1.X series, in that any API available today that is
not marked "experimental" or "developer API" will be supported in future versions.
The only change for Python users is that the grouping operations, e.g. `groupByKey`, `cogroup` and `join`,
have changed from returning (key, list of values) pairs to (key, iterable of values).

</div>

</div>

Migration guides are also available for [Spark Streaming](streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x),
[MLlib](ml-guide.html#migration-guide) and [GraphX](graphx-programming-guide.html#migrating-from-spark-091).


# Where to Go from Here

You can see some [example Spark programs](http://spark.apache.org/examples.html) on the Spark website.
Expand Down
45 changes: 0 additions & 45 deletions docs/streaming-programming-guide.md
Expand Up @@ -2378,51 +2378,6 @@ additional effort may be necessary to achieve exactly-once semantics. There are
***************************************************************************************************
***************************************************************************************************

# Migration Guide from 0.9.1 or below to 1.x
Between Spark 0.9.1 and Spark 1.0, there were a few API changes made to ensure future API stability.
This section elaborates the steps required to migrate your existing code to 1.0.

**Input DStreams**: All operations that create an input stream (e.g., `StreamingContext.socketStream`, `FlumeUtils.createStream`, etc.) now returns
[InputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.InputDStream) /
[ReceiverInputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.ReceiverInputDStream)
(instead of DStream) for Scala, and [JavaInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaInputDStream.html) /
[JavaPairInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairInputDStream.html) /
[JavaReceiverInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaReceiverInputDStream.html) /
[JavaPairReceiverInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairReceiverInputDStream.html)
(instead of JavaDStream) for Java. This ensures that functionality specific to input streams can
be added to these classes in the future without breaking binary compatibility.
Note that your existing Spark Streaming applications should not require any change
(as these new classes are subclasses of DStream/JavaDStream) but may require recompilation with Spark 1.0.

**Custom Network Receivers**: Since the release to Spark Streaming, custom network receivers could be defined
in Scala using the class NetworkReceiver. However, the API was limited in terms of error handling
and reporting, and could not be used from Java. Starting Spark 1.0, this class has been
replaced by [Receiver](api/scala/index.html#org.apache.spark.streaming.receiver.Receiver) which has
the following advantages.

* Methods like `stop` and `restart` have been added to for better control of the lifecycle of a receiver. See
the [custom receiver guide](streaming-custom-receivers.html) for more details.
* Custom receivers can be implemented using both Scala and Java.

To migrate your existing custom receivers from the earlier NetworkReceiver to the new Receiver, you have
to do the following.

* Make your custom receiver class extend
[`org.apache.spark.streaming.receiver.Receiver`](api/scala/index.html#org.apache.spark.streaming.receiver.Receiver)
instead of `org.apache.spark.streaming.dstream.NetworkReceiver`.
* Earlier, a BlockGenerator object had to be created by the custom receiver, to which received data was
added for being stored in Spark. It had to be explicitly started and stopped from `onStart()` and `onStop()`
methods. The new Receiver class makes this unnecessary as it adds a set of methods named `store(<data>)`
that can be called to store the data in Spark. So, to migrate your custom network receiver, remove any
BlockGenerator object (does not exist any more in Spark 1.0 anyway), and use `store(...)` methods on
received data.

**Actor-based Receivers**: The Actor-based Receiver APIs have been moved to [DStream Akka](https://github.com/spark-packages/dstream-akka).
Please refer to the project for more details.

***************************************************************************************************
***************************************************************************************************

# Where to Go from Here
* Additional guides
- [Kafka Integration Guide](streaming-kafka-integration.html)
Expand Down

0 comments on commit 6c1ecb1

Please sign in to comment.