Skip to content

Conversation

liuml07
Copy link
Member

@liuml07 liuml07 commented Dec 22, 2022

What is the purpose of the change

The file system SQL connector itself is included in Flink and does not require an additional dependency. However, if a user uses the filesystem connector for local execution, for e.g. running Flink job in the IDE, she will need to add dependency. Otherwise, the user will get validation exception: Cannot discover a connector using option: 'connector'='filesystem'. This is confusing and can be documented.

Brief change log

The scope of the files connector dependency should be provided, because they should not be packaged into the job JAR file. So we do not use the sql_download_table shortcodes like {{< sql_download_table "files" >}}. Also that shortcodes has texts saying the dependencies are required for SQL Client with SQL JAR bundles. That is not applicable to files connector as it's already shipped int he /lib directlory.

Verifying this change

This is a doc change, and I have tested it rendered locally.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@liuml07
Copy link
Member Author

liuml07 commented Dec 22, 2022

Screenshot 2022-12-21 at 4 54 31 PM

@liuml07 liuml07 force-pushed the doc-connector-files branch from 3d0a250 to 1ace9fc Compare December 22, 2022 00:54
@flinkbot
Copy link
Collaborator

flinkbot commented Dec 22, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

NOTE: If you use the filesystem connector for [local execution]({{< ref "docs/dev/dataset/local_execution" >}}),
for e.g. running Flink job in your IDE, you will need to add dependency.

```xml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description, we mentioned why this does not use the sql_download_table shortcodes like {{< sql_download_table "files" >}}.

@liuml07
Copy link
Member Author

liuml07 commented Dec 22, 2022

CC: @twalthr @fapaul

Copy link
Contributor

@MartijnVisser MartijnVisser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessarily the best method. I don't think this should be resolved per connector, but there is more value in documenting centrally what's needed to run things locally. If we do it per connector, you need to do it for all connectors, but also for things like running things that require Hadoop etc. WDYT?

@liuml07
Copy link
Member Author

liuml07 commented Dec 22, 2022

Yeah, documenting centrally sounds good.

Maybe I'm limited by how I build the jobs with connectors - I shade all connectors (and dependencies) into the uber job jar. For other connectors (e.g. Kafka), I add the dependency to the Flink job following the Maven snippet in each connector's doc page. That works for both local execution (IDE) and remote deployment. Filesystem connector is a bit special because it's in the Flink deploy (so no need to shade) but not ready for local execution. Adding "provided" scope dependency for this connector solves my problem. I don't find other connectors dependency needs to change for local execution.

I'm thinking where it would be a good central place. There is a short guide for setting Hadoop dependencies for local execution. Do you think it's a good idea to write a new section in the Connectors and Formats page or Advanced Configuration Topics page?

@liuml07 liuml07 closed this Sep 4, 2024
@liuml07 liuml07 deleted the doc-connector-files branch September 4, 2024 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants