From 5de8f4dbf798ee4b31d7917a47617eab9e51acda Mon Sep 17 00:00:00 2001 From: Ahmet Altay Date: Tue, 9 May 2017 18:21:39 -0700 Subject: [PATCH] Remove Readme files. extensions -> moving to website @ https://github.com/apache/beam-site/pull/237 jdbc/test -> obsolete content, removed. --- sdks/java/extensions/join-library/README.md | 42 --------------------- sdks/java/extensions/sorter/README.md | 42 --------------------- sdks/java/io/jdbc/src/test/README.md | 32 ---------------- 3 files changed, 116 deletions(-) delete mode 100644 sdks/java/extensions/join-library/README.md delete mode 100644 sdks/java/extensions/sorter/README.md delete mode 100644 sdks/java/io/jdbc/src/test/README.md diff --git a/sdks/java/extensions/join-library/README.md b/sdks/java/extensions/join-library/README.md deleted file mode 100644 index feee64fe571d..000000000000 --- a/sdks/java/extensions/join-library/README.md +++ /dev/null @@ -1,42 +0,0 @@ - - -Join-library -============ - -Join-library provides inner join, outer left and right join functions to -Apache Beam. The aim is to simplify the most common cases of join to a -simple function call. - -The functions are generic so it supports join of any types supported by -Beam. Input to the join functions are PCollections of Key/Values. Both the -left and right PCollections need the same type for the key. All the join -functions return a Key/Value where Key is the join key and value is -a Key/Value where the key is the left value and right is the value. - -In the cases of outer join, since null cannot be serialized the user have -to provide a value that represent null for that particular use case. - -Example how to use join-library: - - PCollection> leftPcollection = ... - PCollection> rightPcollection = ... - - PCollection>> joinedPcollection = - Join.innerJoin(leftPcollection, rightPcollection); diff --git a/sdks/java/extensions/sorter/README.md b/sdks/java/extensions/sorter/README.md deleted file mode 100644 index 6ff3dbeb2f87..000000000000 --- a/sdks/java/extensions/sorter/README.md +++ /dev/null @@ -1,42 +0,0 @@ - - -#Sorter -This module provides the SortValues transform, which takes a `PCollection>>>` and produces a `PCollection>>>` where, for each primary key `K` the paired `Iterable>` has been sorted by the byte encoding of secondary key (`K2`). It will efficiently and scalably sort the iterables, even if they are large (do not fit in memory). - -##Caveats -* This transform performs value-only sorting; the iterable accompanying each key is sorted, but *there is no relationship between different keys*, as Beam does not support any defined relationship between different elements in a PCollection. -* Each `Iterable>` is sorted on a single worker using local memory and disk. This means that `SortValues` may be a performance and/or scalability bottleneck when used in different pipelines. For example, users are discouraged from using `SortValues` on a `PCollection` of a single element to globally sort a large `PCollection`. A (rough) estimate of the number of bytes of disk space utilized if sorting spills to disk is `numRecords * (numSecondaryKeyBytesPerRecord + numValueBytesPerRecord + 16) * 3`. - -##Options -* The user can customize the temporary location used if sorting requires spilling to disk and the maximum amount of memory to use by creating a custom instance of `BufferedExternalSorter.Options` to pass into `SortValues.create`. - -##Using `SortValues` -```java -PCollection>> input = ... - -// Group by primary key, bringing pairs for the same key together. -PCollection>>> grouped = - input.apply(GroupByKey.>create()); - -// For every primary key, sort the iterable of pairs by secondary key. -PCollection>>> groupedAndSorted = - grouped.apply( - SortValues.create(new BufferedExternalSorter.Options())); -``` diff --git a/sdks/java/io/jdbc/src/test/README.md b/sdks/java/io/jdbc/src/test/README.md deleted file mode 100644 index 5a7ac99bedba..000000000000 --- a/sdks/java/io/jdbc/src/test/README.md +++ /dev/null @@ -1,32 +0,0 @@ - - -These are instructions for maintaining postgres as needed for Integration Tests (JdbcIOIT). - -You can always ignore these instructions if you have your own postgres cluster to test against. - -Setting up Postgres -------------------- -1. Setup kubectl so it is configured to work with your kubernetes cluster -1. Run the postgres setup script - src/test/resources/kubernetes/setup.sh -1. Do the data loading - create the data store instance by following the instructions in JdbcTestDataSet - -... and your postgres instances are set up! -