forked from feathr-ai/feathr
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add docs for input and output format and expected behaviors (feathr-a…
…i#575) * Create feathr-input-format.md * update docs * Update feathr-input-format.md * address comments * Update feathr-job-configuration.md * Update feathr-input-format.md * Update build-and-push-feathr-registry-docker-image.md * move file to the right hierachy * Update README.md
- Loading branch information
1 parent
cffa764
commit 90f328d
Showing
6 changed files
with
49 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 2 additions & 2 deletions
4
...-to-guides/deploy-feathr-api-as-webapp.md → .../dev_guide/deploy-feathr-api-as-webapp.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
layout: default | ||
title: Input File Format for Feathr | ||
parent: How-to Guides | ||
--- | ||
|
||
# Input File Format for Feathr | ||
|
||
Feathr supports multiple file formats, including Parquet, ORC, Avro, JSON, Delta Lake, and CSV. The formats are recognized in the following order: | ||
|
||
1. If the input path has a suffix, that will be honored. For example, `wasb://demodata@demodata/user_profile.csv` will be recognized as csv, while `wasb://demodata@demodata/product_id.parquet` will be recognized as parquet. Note that this is a per file behavior. | ||
2. If the input file doesn't have a name, say `wasb://demodata@demodata/user_click_stream`, users can optionally set a parameter to let Feathr know which format to read those files. Refer to the `spark.feathr.inputFormat` setting in [Feathr Job Configuration](./feathr-job-configuration.md) for more details on how to set those, as well as for code examples. Note that this is a global setting that will apply to every input which the format is not recognized. | ||
3. If all the above conditions are not recognized, Feathr will use `avro` as the default format. | ||
|
||
## Special note for spark outputs | ||
|
||
Many Spark users will use delta lake format to store the results. In those cases, the result folder will be something like this: | ||
![Spark Output](../images/spark-output.png) | ||
|
||
Please note that although the results are shown as "parquet", you should use the path of the parent folder and use `delta` format to read the folder. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.