Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review of epicontacts #114

Open
zkamvar opened this issue Oct 28, 2019 · 7 comments
Open

Code review of epicontacts #114

zkamvar opened this issue Oct 28, 2019 · 7 comments
Assignees

Comments

@zkamvar
Copy link
Member

zkamvar commented Oct 28, 2019

This is based on the rOpenSci code reviews: https://devguide.ropensci.org/

@zkamvar and @finlaycampbell are assigned.

@zkamvar
Copy link
Member Author

zkamvar commented Oct 28, 2019

Sorry, the review process starts here: https://devguide.ropensci.org/reviewerguide.html

@finlaycampbell
Copy link
Collaborator

I was thinking it might be best to conduct the code review on the tree branch? It's up to date with master and largely consists of the new functions vis_ttree, vis_ggplot and a bunch of helper functions. The big changes to the existing code base are in vis_epicontacts (to ensure consistency with the other plotting functions) and other plotting functions - I can look over all of these. All of the remaining functionality (clustering, etc.) aren't changed. Thoughts?

@zkamvar
Copy link
Member Author

zkamvar commented Oct 29, 2019

I don't think that would be a good idea because I think ttree needs its own thorough code review in addition to the current core functionalities of the master branch. It's better to nail down the core functionalities first so that we aren't chasing down bugs while considering new features.

@finlaycampbell
Copy link
Collaborator

OK fair enough let's stick with master for now then!

@zkamvar
Copy link
Member Author

zkamvar commented Oct 29, 2019

Here is the results of goodpractice:

> goodpractice::gp()                                                                                                                                                                            
Registered S3 method overwritten by 'httr':
  method                 from
  as.character.form_file crul
Preparing: covr
Preparing: cyclocomp
✔  checking for file ‘/tmp/RtmpHcSv9P/remotes442d27d9deba/epicontacts/DESCRIPTION’ ...
─  preparing ‘epicontacts’:
✔  checking DESCRIPTION meta-information
✔  checking vignette meta-information
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘epicontacts_1.1.1.tar.gz’
   
* installing *source* package ‘epicontacts’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (epicontacts)
Adding ‘epicontacts_1.1.1_R_x86_64-pc-linux-gnu.tar.gz’ to the cache
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
── GP epicontacts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 86% of code lines are covered by test cases.

    R/colors.R:64:NA
    R/colors.R:65:NA
    R/colors.R:66:NA
    R/colors.R:67:NA
    R/internals.R:36:NA
    ... and 87 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package when you perform `R CMD build` on
    it.
  ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and it is easier to read your code for them if you use
    '<-'.

    R/graph3D.R:111:23
    R/subset_clusters_by_size.R:68:16
    R/subset_clusters_by_size.R:69:16
    R/subset_clusters_by_size.R:75:16
    R/subset_clusters_by_size.R:81:16

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than
    80 characters

    R/get_clusters.R:3:1
    R/get_clusters.R:14:1
    R/get_clusters.R:15:1
    R/get_clusters.R:18:1
    R/get_clusters.R:24:1
    ... and 31 more lines

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

    R/graph3D.R:146:17

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right
    hand side is zero. Use seq_len() or seq_along() instead.

    tests/testthat/test_get_clusters.R:115:30
    tests/testthat/test_get_clusters.R:136:30

  ✖ fix this R CMD check NOTE: Namespace in Imports field not imported from: ‘colorspace’ All declared Imports should be used.

@zkamvar
Copy link
Member Author

zkamvar commented Oct 29, 2019

Package Review

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Review Comments

README

  • Has no statement of need for the package
  • Only one of the three vignettes listed in the README works
  • Maintainer email is only partial

Documentation

  • get_pairwise() does not describe what it returns. Thus, it's not clear how the serial interval is derived in the first example unless you look at the code.
  • [.epicontacts: x[k = FALSE, j = FALSE] removes everything but the ids in the linelist... I don't think this is expected
  • thin() documentation doesn't explicitly say what it returns. Additionally, there are no examples.
  • summary.epicontacts() has documentation, but no examples and it doesn't describe what is summarized
  • as.igraph() doesn't work if igraph is not loaded.
  • get_clusters() this needs to describe how the clusters are obtained.

Tests

I see the ebola_sim linelist used quite a lot. It would be good to define a
small test data set with some known pathologies (contacts without corresponding
linelist components, etc) that we can use for testing known quantities. We also
should get rid of the helper that auto-loads outbreaks. A lot of the tests
additionally seem to repeat a lot of the data construction steps (I believe
because they are set up as part of skip_on_cran(). There is no need to forbid
all tests on CRAN. Some of them are okay to have.

  • as.igraph.epicontacts(): OK
  • colors: OK
  • clusters: the first test of as.igraph.epicontacts is unnecessary. I am unsure of what the second test is doing (what is a net node?). Neither of the get_clusters() tests actually test that the clusters are cromulent.
  • get_degree(): first test should read "both equals sum of in and out".
  • get_id(): I think these first tests need some explanation as to what is going on with the validation.
  • get_pairwise(): The first test needs to be updated to testthat standards. The "different types of attributes" tests need clarification on what they are testing for
  • graph3D(): This takes a long time to run. Use a smaller data set
  • handling: This needs more stress tests and more descriptive tests. A custom test handler might be useful here.
  • make_epicontacts: The first test needs to have more description to confirm that the process actually works
  • plot.epicontacts(): First test has the expectations commented out. Testing plots are always a bit of a PITA. This could benefit from a smaller dataset. Additionally, you can test against the internal workings of the resulting plot. The error expectation needs a specific error message to test against.
  • print.epicontacts(): This is effectively useless. It's not tested on travis or CRAN.
  • subset(): last bit needs to move away from the stored objects. The last test suite should be split up into smaller chunks because it goes all over the place.
  • subset_clusters_by_x: These need to be more specific and targeted. The clustering bit is the only thing that may be uncertain, so we need to make sure we have a data set that can actually test these.
  • summary(): Needs to get rid of the stored object comparison and hard-code the expected values.
  • thin(): OK
  • vis_epicontacts(): see plot.epicontacts()

@finlaycampbell
Copy link
Collaborator

I've implemented some changes on a code_review branch (see ea48430)

Code

as.igraph.epicontacts

  • dplyr dependency?
  • merge(x$linelist, all_ids, by = 'id', all = TRUE)

colors

  • OK

get_clusters

  • dplyr dependency?

get_degree

  • No return value
  • Doesn’t work correctly for only_linelist = FALSE
  • change ‘contacts’ to ‘all’ on L48
  • vapply counting approach is very slow – perhaps switch to table

get_id

  • OK

get_pairwise

  • No return value
  • Description does not specify default behaviours for different classes
  • Vector is named in some cases (attribute is numeric) and not in others (attribute is character)
  • Fixed in commit
  • Could perhaps add a rev argument that calculates to → from
  • @param attribute should specify that you can add a character or the index of the column

graph3D

  • dplyr dependency
  • Changed to merge in commit

thin

  • Typo in warning message (contacts → contacts)

handling

  • No return value specified
  • Throws error when subsetting by date

internals

  • OK

make_epicontacts

  • IDs of class Date should not be allowed
  • Should NA IDs be allowed?

plot

  • The default qualitative color palette doesn’t really make sense for continuous node attributes, not a massive issue though

print.epicontacts

  • Change dplyr::tbl_df → tibble::as_tibble for printing?

print.summary_epicontacts

  • OK

subset_clusters_by_id

  • Description doesn’t node that linelist is also subsetted

subset_clusters_by_size

  • OK

subset.epicontacts

  • No return value
  • Maybe should be able to set a range for numeric attributes

summary.epicontacts

  • OK

thin

  • No return value
  • The example doesn’t seem to be using thin?

vis_epicontacts

  • Looks good for now but will have to looked over again when merging ttree

Tests

test_as.igraph.epicontacts

  • OK

test_colors

  • OK

test_get_clusters

  • “construction of net nodes” doesn’t really test for much AFAICT
  • I’ve added a test that manually constructs clusters and tests for them

test_get_degree

  • Added a name test for only_linelist = FALSE
  • Added a manual test for degree numbers calculation (this method is actually much faster than the current implementation – maybe should be swapped)

test_get_ID

  • OK

test_get_pairwise

  • No test on values
  • Added in commit

test_handling

  • Test for erroneous subsetting by Date

test_make_epicontacts

  • OK

test_plot.R

  • Testing against references not very useful

test_subset_clusters_by_ids

  • Added manual test with known clusters

test_subset_clusters_by_size

  • OK

test_subset.epicontacts

  • OK

test_summary.epicontacts

  • OK

test_thin

  • OK

test_vis_epicontacts

  • Testing against references not very useful
  • More tests trying to break the underlying machinery could be useful
    edge_color, edge_label, edge_width, node_color against numeric, character, factor, date attributes with NAs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants