-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #352 from IBM/Readme-Changes
Changes in code2parquet, ingest2parquet, and advance tutorial readmes.
- Loading branch information
Showing
18 changed files
with
36 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,12 @@ | ||
# Doc ID Transform | ||
The Document ID transforms adds a document identification, which later can be used in de-duplication operations. | ||
Per the set of | ||
The Document ID transforms adds a document identification (unique integers and content hashes), which later can be used in de-duplication operations, | ||
per the set of | ||
[transform project conventions](../../README.md#transform-project-conventions) | ||
the following runtimes are available: | ||
|
||
* [ray](ray/README.md) - enables the running of the base python transformation | ||
in a Ray runtime | ||
* [spark](spark/README.md) - enables the running of a spark-based transformation | ||
in a Spark runtime. | ||
* [kfp_ray](kfp_ray/README.md) - enables running the ray docker image for | ||
the transformer in a kubernetes cluster using a generated `yaml` file. | ||
* [kfp](kfp_ray/README.md) - enables running the ray docker image | ||
in a kubernetes cluster using a generated `yaml` file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,10 @@ | ||
# Exact Deduplication Transform | ||
The exact deduplication removes text duplications | ||
Per the set of | ||
# Exect Deduplification Transform | ||
The ededup transforms removes duplicate documents within a set of parquet files, | ||
per the set of | ||
[transform project conventions](../../README.md#transform-project-conventions) | ||
the following runtimes are available: | ||
|
||
* [ray](ray/README.md) - enables the running of the base python transformation | ||
in a Ray runtime | ||
* [kfp_ray](kfp_ray/README.md) - enables running the ray docker image for | ||
the transformer in a kubernetes cluster using a generated `yaml` file. | ||
* [kfp](kfp_ray/README.md) - enables running the ray docker image | ||
in a kubernetes cluster using a generated `yaml` file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,10 @@ | ||
# Fuzzy Deduplication Transform | ||
The fuzzy deduplication removes text duplications | ||
Per the set of | ||
# Fuzzy Deduplification Transform | ||
The fdedup transforms removes documents that are very similar to each other within a set of parquet files, | ||
per the set of | ||
[transform project conventions](../../README.md#transform-project-conventions) | ||
the following runtimes are available: | ||
|
||
* [ray](ray/README.md) - enables the running of the base python transformation | ||
in a Ray runtime | ||
* [kfp_ray](kfp_ray/README.md) - enables running the ray docker image for | ||
the transformer in a kubernetes cluster using a generated `yaml` file. | ||
* [kfp](kfp_ray/README.md) - enables running the ray docker image | ||
in a kubernetes cluster using a generated `yaml` file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters