New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11969: [Rust][DataFusion] Improve Examples in documentation #9710
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
17f3e9b
to
fa58e14
Compare
|
||
Use the DataFrame API to process data stored in a CSV: | ||
|
||
```rust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: this might be a little bit fun with API churn, e.g. I believe the input expr ownership work you've recently opened would change these from slices to vecs and we don't have a way to catch that automatically like we do for the in-crate docs (am I right in thinking that cargo test
runs all doctests?).
Edit: to be clear, I don't think it's a reason to not do it, just curious if anyone has ideas for how to prevent doc drift :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point @returnString
The way I justified the danger of drift to myself was "the main usecase of this documentation (the overview) is likely to help them answer the question of "should I even bother to try and use this crate". Once they decide to try and actually use the crate they will look at the real docs on docs.rs (from which they can copy/paste).
For the purpose of an example of "what does this library do" I felt even a slightly out of date example might be valuable.
Or maybe I am just trying to pad my github stats ;) But in all seriousness I am not committed to this PR. If it isn't a good idea I can just close it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me; agreed that (personally, at least) I'll give less consideration to projects without simple readme examples.
It balloons the scope of this PR quite a lot so I'm not saying this is a good idea, but I just did a bit of digging and it looks like people have gone through this particular problem before: https://blog.guillaume-gomez.fr/articles/2019-04-13+Keeping+Rust+projects%27+README.md+code+examples+up-to-date
And the end result of that is https://crates.io/crates/doc-comment, which looks like it'll wire up any rust
-tagged code blocks in external files as doctests, optionally only for #[cfg(test)]
.
If it's useful, I could log a followup task to integrate that and take a look at it myself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@returnString https://crates.io/crates/doc-comment looks super awesome -- I think that would be most helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logged as https://issues.apache.org/jira/browse/ARROW-12015 :)
.limit(100)? | ||
.collect().await?; | ||
|
||
let results: Vec<RecordBatch> = df.collect().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed when using this as a test for #9749 that we're collecting twice here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol "testing for the win!"
I plan to wait for #9749 to be merged and then rebase this / ensure it passes the tests ! |
…e readme examples remain valid As discussed [here](#9710 (comment)), we were looking into how we might add code examples to the DataFusion readme whilst keeping them in sync with reality as we go through API revisions etc. This PR pulls in a new dev dependency, `doc-comment`, which allows for detecting all the `rust`-tagged code blocks in a Markdown file and treating them as doctests, and wires this up for `README.md`. My only concerns are: - because the end result is a full-blown doctest, you do need to make sure imports etc are present, which makes the samples more verbose than some people would perhaps like - again on the verbosity front: we have lots of async code which requires a `#[tokio::main] async fn main() { ... }` wrapper Neither of these are inherently bad imo, but worth noting upfront. As an example of a readme sample that passes as a doctest (borrowed from @alamb's latest documentation PR, #9710): ```rust use datafusion::prelude::*; use arrow::util::pretty::print_batches; use arrow::record_batch::RecordBatch; #[tokio::main] async fn main() -> datafusion::error::Result<()> { let mut ctx = ExecutionContext::new(); // create the dataframe let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?; let df = df.filter(col("a").lt_eq(col("b")))? .aggregate(&[col("a")], &[min(col("b"))])? .limit(100)?; let results: Vec<RecordBatch> = df.collect().await?; print_batches(&results)?; Ok(()) } ``` Closes #9749 from returnString/readme_doctest Authored-by: Ruan Pearce-Authers <ruanpa@outlook.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
fa58e14
to
cc81bdb
Compare
…e readme examples remain valid As discussed [here](apache#9710 (comment)), we were looking into how we might add code examples to the DataFusion readme whilst keeping them in sync with reality as we go through API revisions etc. This PR pulls in a new dev dependency, `doc-comment`, which allows for detecting all the `rust`-tagged code blocks in a Markdown file and treating them as doctests, and wires this up for `README.md`. My only concerns are: - because the end result is a full-blown doctest, you do need to make sure imports etc are present, which makes the samples more verbose than some people would perhaps like - again on the verbosity front: we have lots of async code which requires a `#[tokio::main] async fn main() { ... }` wrapper Neither of these are inherently bad imo, but worth noting upfront. As an example of a readme sample that passes as a doctest (borrowed from @alamb's latest documentation PR, apache#9710): ```rust use datafusion::prelude::*; use arrow::util::pretty::print_batches; use arrow::record_batch::RecordBatch; #[tokio::main] async fn main() -> datafusion::error::Result<()> { let mut ctx = ExecutionContext::new(); // create the dataframe let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?; let df = df.filter(col("a").lt_eq(col("b")))? .aggregate(&[col("a")], &[min(col("b"))])? .limit(100)?; let results: Vec<RecordBatch> = df.collect().await?; print_batches(&results)?; Ok(()) } ``` Closes apache#9749 from returnString/readme_doctest Authored-by: Ruan Pearce-Authers <ruanpa@outlook.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
# Rationale When people find `datafusion` on crates.io they want both a textual description of what it does as well as some example code to see if it would help them. For example, look at how tokio does it: https://crates.io/crates/tokio # Changes 1. Add an example on the main README.md of datafusion (that appears on the crates.io homepage) that shows a prospective user what DataFusion offers, lifted from the existing example from https://docs.rs/datafusion/3.0.0/datafusion 2. Add formatted output to the `rustdoc` examples in the main `lib.rs` page to show better what the output is 3. Fix some rustdoc warnings I noticed while testing Closes apache#9710 from alamb/alamb/better_examples Authored-by: Andrew Lamb <andrew@nerdnetworks.org> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…e readme examples remain valid As discussed [here](apache#9710 (comment)), we were looking into how we might add code examples to the DataFusion readme whilst keeping them in sync with reality as we go through API revisions etc. This PR pulls in a new dev dependency, `doc-comment`, which allows for detecting all the `rust`-tagged code blocks in a Markdown file and treating them as doctests, and wires this up for `README.md`. My only concerns are: - because the end result is a full-blown doctest, you do need to make sure imports etc are present, which makes the samples more verbose than some people would perhaps like - again on the verbosity front: we have lots of async code which requires a `#[tokio::main] async fn main() { ... }` wrapper Neither of these are inherently bad imo, but worth noting upfront. As an example of a readme sample that passes as a doctest (borrowed from @alamb's latest documentation PR, apache#9710): ```rust use datafusion::prelude::*; use arrow::util::pretty::print_batches; use arrow::record_batch::RecordBatch; #[tokio::main] async fn main() -> datafusion::error::Result<()> { let mut ctx = ExecutionContext::new(); // create the dataframe let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?; let df = df.filter(col("a").lt_eq(col("b")))? .aggregate(&[col("a")], &[min(col("b"))])? .limit(100)?; let results: Vec<RecordBatch> = df.collect().await?; print_batches(&results)?; Ok(()) } ``` Closes apache#9749 from returnString/readme_doctest Authored-by: Ruan Pearce-Authers <ruanpa@outlook.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
# Rationale When people find `datafusion` on crates.io they want both a textual description of what it does as well as some example code to see if it would help them. For example, look at how tokio does it: https://crates.io/crates/tokio # Changes 1. Add an example on the main README.md of datafusion (that appears on the crates.io homepage) that shows a prospective user what DataFusion offers, lifted from the existing example from https://docs.rs/datafusion/3.0.0/datafusion 2. Add formatted output to the `rustdoc` examples in the main `lib.rs` page to show better what the output is 3. Fix some rustdoc warnings I noticed while testing Closes apache#9710 from alamb/alamb/better_examples Authored-by: Andrew Lamb <andrew@nerdnetworks.org> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Rationale
When people find
datafusion
on crates.io they want both a textual description of what it does as well as some example code to see if it would help them. For example, look at how tokio does it: https://crates.io/crates/tokioChanges
rustdoc
examples in the mainlib.rs
page to show better what the output is