Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18825][SPARKR][DOCS][WIP] Eliminate duplicate links in SparkR API doc index #18051

Closed
wants to merge 2 commits into from

Conversation

zero323
Copy link
Member

@zero323 zero323 commented May 21, 2017

What changes were proposed in this pull request?

TODO

How was this patch tested?

Manual inspection of the docs.

@SparkQA
Copy link

SparkQA commented May 21, 2017

Test build #77152 has finished for PR 18051 at commit b28ec94.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 21, 2017

Test build #77156 has finished for PR 18051 at commit 4843950.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

I see - I think this doesn't address doc build via R CMD build pkg?

@zero323
Copy link
Member Author

zero323 commented May 22, 2017

It doesn't, but R CMD build pkg doesn't generate html index. This happens somewhere in the R CMD INSTALL so even if we create custom build script (with devtools), it won't helps us here.

@felixcheung
Copy link
Member

hmm,.. we do want people to be able to install it as a package...

@zero323 zero323 force-pushed the SPARK-18825 branch 2 times, most recently from 8392ba5 to 4843950 Compare May 22, 2017 21:48
@SparkQA
Copy link

SparkQA commented May 22, 2017

Test build #77206 has finished for PR 18051 at commit 8392ba5.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2017

Test build #77207 has finished for PR 18051 at commit 4843950.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Member Author

zero323 commented May 22, 2017

For a moment I thought I found another solution but I was wrong.

I don't think there is a conflict between this and installable package. It won't help with the packaged version (but other packages depending on S4 suffer from the same issue), but we can have an improved online version.

There is one possible alternative - converting all names to long version:

#' abs
#'
#' Computes the absolute value.
#'
#' @param x Column to compute on.
#'
#' @rdname abs
#' @name abs-method
#' @family non-aggregate functions
#' @export
#' @examples \dontrun{abs(df$c)}
#' @aliases abs,Column-method

This would keep CRAN checks happy and removed duplicates but at the cost of having docs like this:

image

and making help unusable from R session, requiring:

?SparkR::`abs-method`

instead

?SparkR::abs

I am not sure if you agree, but IMHO this just makes things worse.

@felixcheung
Copy link
Member

@actuaryzhang - we were just talking this in the other PR. what do you think?
@zero323 - right, I do agree ?abs-method is kind of a big problem...

@actuaryzhang
Copy link
Contributor

Maybe I'm missing something completely, but I still don't get the point why we are removing the xx-method link since we are defining methods as S4 using setMethod. Lots of packages have these entries in the index. Below is a snapshot from the sp package. You can find a lot more there.

image

Even for S3 methods, they tend to repeat as well. Below is a snapshot of the gamm4 package.

image

@zero323
Copy link
Member Author

zero323 commented May 23, 2017

I think there are two different problems here:

  • Quality of the internal R documentation. I think that fixing this is non-goal. It is not only a normal state of R packages, but also impossible to fix without hacks or serious trade-offs.

  • Quality of the online documentation. This is subjective but I think there is a lot to do there including, but not limited to:

    • Removing this IFrame nonsense. It doesn't serve any real purpose and is completely useless.

    • Cleaning duplicate links. Since almost everything here is S4 we duplicate the size of the index with each addition make it only harder to use. It also affects ee also sections.

    • Trying to clean see also. Something like this (example SQL function):

      image

      is just useless.

    • Adding some kind of search functionality.

    • Running all examples as a part of the internal docs build process. Having readable, highlighted examples, with actual output, would be awesome.

@felixcheung
Copy link
Member

Improvements to R API doc online can be useful.
But I think that is somewhat independent of this change/PR. Let's focus on your points on Cleaning duplicate links and Trying to clean see also - what do you think what @actuaryzhang is asking?

@zero323
Copy link
Member Author

zero323 commented May 24, 2017

If we consider improvement of the online documentation to be a separate problem, then I fully agree with @actuaryzhang.

@felixcheung
Copy link
Member

hmm, so what would be the approach to fix the dup links then, change every @rdname to -method?

@zero323
Copy link
Member Author

zero323 commented May 25, 2017

To be honest I thought mostly about online docs here. Duplicate links in the bundled documentation never bothered me before (in SparkR, or any other package for that matter) and don't I think these have to be fixed. Maybe just close this PR, mark upstream ticket as won't fix, and focus on bigger issues? Just saying...

@felixcheung
Copy link
Member

given the work we are doing in #18025 to reorganize a lot of the pages, this might become a very minor issue when the dust settles. So I'm good also just to track this (wouldn't resolve the JIRA just yet) and revisit later.

also looking forward to your proposal on getting online help doc better, @zero323 !
could you put your comment above into a JIRA please?

@actuaryzhang
Copy link
Contributor

@zero323 I really like your thoughts on the docs. As @felixcheung mentioned above, we are doing some cleaning in #18025, which will improve readability and fix the SeeAlso issue.

Regarding making the examples runnable, I would really like to do that. This is not an issue for Jenkins since the examples are tiny. However, that will increase the cran check time significantly and I'm not sure CRAN will allow that.

@felixcheung
Copy link
Member

felixcheung commented May 25, 2017

re: example - agreed, I think @zero323 is proposing that we only run examples when creating the online doc, but not as CRAN (so we keep the dontrun)

@actuaryzhang
Copy link
Contributor

That makes sense!

@zero323
Copy link
Member Author

zero323 commented May 25, 2017

Exactly my point. Run examples internally (it is not hard to patch knitr or even tools::Rd2ex) to validate examples and improve online docs and keep the actual examples intact.

#18025 looks great - I'll try to review it when I have a spare moment.

I guess I close this PR for now. No reason to keep it hanging.

@zero323 zero323 closed this May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants