Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding dupree #12

Open
edward-burn opened this issue Mar 16, 2023 · 1 comment
Open

Adding dupree #12

edward-burn opened this issue Mar 16, 2023 · 1 comment

Comments

@edward-burn
Copy link
Collaborator

Another tool that can be nice to help with refactoring could be: https://russhyde.github.io/dupree/

@mvankessel-EMC
Copy link
Collaborator

@edward-burn how would you like to see this utilized? Lets take IncidencePrevalence as an example. I could:

  1. fetch the most similar lines of code, depending on the score.
library(dupree)
library(dplyr)

dupes <- data.frame(dupree_package(
  "../IncidencePrevalence/"))

dupesToInvestigate <- dupes %>%
  filter(score > 0.5)

lapply(seq_len(nrow(dupesToInvestigate)), function(i) {
  fileA <- normalizePath(as.character(dupes[i, "file_a"]))
  fileB <- normalizePath(as.character(dupes[i, "file_b"]))
  lineA <- unlist(dupes[i, "line_a"])
  lineB <- unlist(dupes[i, "line_b"])

  data.frame(
    A = readLines(fileA)[lineA],
    B = readLines(fileB)[lineB])
})

[[1]]
                                         A                                         B
1 estimatePointPrevalence <- function(cdm, estimatePeriodPrevalence <- function(cdm,

[[2]]
                                             A                                             B
1 checkInputEstimateIncidence <- function(cdm, checkInputEstimatePrevalence <- function(cdm,

[[3]]
                                   A                                   B
1 estimateIncidence <- function(cdm, estimatePrevalence <- function(cdm,
  1. compute the mean score of the entire package.
dupes %>%
  select(score) %>%
  unlist() %>%
  mean()

[1] 0.3124973
  1. Compute the mean score per file (multiple scores can be found per file).
files <- normalizePath(list.files("R", full.names = TRUE))

bind_rows(lapply(seq_len(length(files)), function(i) {
  score <- data.frame(dupree(files[i])) %>%
    select(score) %>%
    unlist() %>%
    mean(na.rm = TRUE)
  
  data.frame(
    file = basename(files[i]),
    meanScore = score
  )
}))

                                 file meanScore
1      benchmarkIncidencePrevalence.R       NaN
2                 estimateIncidence.R       NaN
3                estimatePrevalence.R 0.3086352
4  exportIncidencePrevalenceResults.R       NaN
5  gatherIncidencePrevalenceResults.R       NaN
6      generateDenominatorCohortSet.R 0.1703704
7             getDenominatorCohorts.R       NaN
8                      getIncidence.R       NaN
9                     getPrevalence.R       NaN
10                     getStudyDays.R 0.2082324
11      incidencePrevalence-package.R       NaN
12                 input_validation.R 0.4700622
13       mockIncidencePrevalenceRef.R       NaN
14                    obscureCounts.R       NaN
15                  recordAttrition.R       NaN
16                       utils-pipe.R       NaN
17                            utils.R 0.0952381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants