Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Document creating new extension APIs #11425

Merged
merged 8 commits into from
Jul 16, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 12, 2024

Which issue does this PR close?

Part of #7013

Rationale for this change

@ozankabak's comments on #11404 #11404 (comment) inspired me to write down some thoughts on how the extension process works. This may help @ameyc as we work on proposing new extensions for streaming

There were also some comments at the DataFusion meetup about the challenge of maintaining forks of DataFusion

Also there were some discussions on discord about when to do extension APIs and I was recently quite pleased with how #11207 worked with @samuelcolvin @jayzhan211 and others.

What changes are included in this PR?

Document what I think the current practice creating extension APIs are as well as maintenance of forks.

Note: I am not trying to change the policy, I am just trying to document what the current state is. It would be great if reviewers could confirm / deny this matched their understanding.

Screenshot 2024-07-12 at 6 24 29 AM

Are these changes tested?

CI tests

Are there any user-facing changes?

Docs only

@github-actions github-actions bot added the core Core datafusion crate label Jul 12, 2024
@alamb alamb added the documentation Improvements or additions to documentation label Jul 12, 2024
@alamb alamb marked this pull request as ready for review July 12, 2024 10:25
welcome you working with us to add new APIs to enable your use case, as
described in the next section.

## Creating new Extension APIs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I wonder if I should try and mention https://github.com/datafusion-contrib here as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should, this comes up fairly often in this context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a paragraph 294ad54

@ozankabak
Copy link
Contributor

This looks good to me.

I want to note the typical outlook on extension APIs we have: Extension APIs that provide "safe" default behaviors are more likely to be suitable for upstream DataFusion, while APIs that require major changes to built-in operators is less likely. Continuing on the stream processing example, supporting certain stream processing features for built-in DF operators will likely result in a performance loss for run-to-completion use cases. Therefore, we may still end up adding extension APIs to ExecutionPlan for such features, but most likely should leave implementation of such operators downstream.

I'm not sure if we need to add such detail to the doc, but I thought leaving an example here could be helpful to future new collaborators.

@alamb
Copy link
Contributor Author

alamb commented Jul 12, 2024

I'm not sure if we need to add such detail to the doc, but I thought leaving an example here could be helpful to future new collaborators.

Thanks @ozankabak -- I think it is helpful. I added a paragrap in e571e45 based off your description

@alamb
Copy link
Contributor Author

alamb commented Jul 12, 2024

I hope to leave this PR open for a few more days to gather any more feedback other community members might have on this content

welcome you working with us to add new APIs to enable your use case, as
described in the next section.

## Creating new Extension APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should, this comes up fairly often in this context

docs/source/contributor-guide/architecture.md Outdated Show resolved Hide resolved
docs/source/contributor-guide/architecture.md Outdated Show resolved Hide resolved
@alamb
Copy link
Contributor Author

alamb commented Jul 15, 2024

I plan to merge this PR tomorrow unless anyone else would like additional time to review

@alamb alamb merged commit 1331288 into apache:main Jul 16, 2024
24 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jul 16, 2024

Thanks again @ozankabak and @wjones127

If anyone else has thoughts or suggestions we can make another PR

@alamb alamb deleted the alamb/extension_docs branch July 16, 2024 11:19
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Docs: Document creating new extension APIs

* fix

* Add clarification about extension APIs. Thanks @ozankabak

* Apply suggestions from code review

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Add a paragraph on datafusion-contrib

* prettier

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 17, 2024
* Docs: Document creating new extension APIs

* fix

* Add clarification about extension APIs. Thanks @ozankabak

* Apply suggestions from code review

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Add a paragraph on datafusion-contrib

* prettier

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
* Docs: Document creating new extension APIs

* fix

* Add clarification about extension APIs. Thanks @ozankabak

* Apply suggestions from code review

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Add a paragraph on datafusion-contrib

* prettier

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
wiedld pushed a commit to influxdata/arrow-datafusion that referenced this pull request Jul 31, 2024
* Docs: Document creating new extension APIs

* fix

* Add clarification about extension APIs. Thanks @ozankabak

* Apply suggestions from code review

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Add a paragraph on datafusion-contrib

* prettier

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants