Adding dupree #12

edward-burn · 2023-03-16T09:38:58Z

Another tool that can be nice to help with refactoring could be: https://russhyde.github.io/dupree/

mvankessel-EMC · 2023-03-16T14:13:15Z

@edward-burn how would you like to see this utilized? Lets take IncidencePrevalence as an example. I could:

fetch the most similar lines of code, depending on the score.

library(dupree)
library(dplyr)

dupes <- data.frame(dupree_package(
  "../IncidencePrevalence/"))

dupesToInvestigate <- dupes %>%
  filter(score > 0.5)

lapply(seq_len(nrow(dupesToInvestigate)), function(i) {
  fileA <- normalizePath(as.character(dupes[i, "file_a"]))
  fileB <- normalizePath(as.character(dupes[i, "file_b"]))
  lineA <- unlist(dupes[i, "line_a"])
  lineB <- unlist(dupes[i, "line_b"])

  data.frame(
    A = readLines(fileA)[lineA],
    B = readLines(fileB)[lineB])
})

[[1]]
                                         A                                         B
1 estimatePointPrevalence <- function(cdm, estimatePeriodPrevalence <- function(cdm,

[[2]]
                                             A                                             B
1 checkInputEstimateIncidence <- function(cdm, checkInputEstimatePrevalence <- function(cdm,

[[3]]
                                   A                                   B
1 estimateIncidence <- function(cdm, estimatePrevalence <- function(cdm,

compute the mean score of the entire package.

dupes %>%
  select(score) %>%
  unlist() %>%
  mean()

[1] 0.3124973

Compute the mean score per file (multiple scores can be found per file).

files <- normalizePath(list.files("R", full.names = TRUE))

bind_rows(lapply(seq_len(length(files)), function(i) {
  score <- data.frame(dupree(files[i])) %>%
    select(score) %>%
    unlist() %>%
    mean(na.rm = TRUE)
  
  data.frame(
    file = basename(files[i]),
    meanScore = score
  )
}))

                                 file meanScore
1      benchmarkIncidencePrevalence.R       NaN
2                 estimateIncidence.R       NaN
3                estimatePrevalence.R 0.3086352
4  exportIncidencePrevalenceResults.R       NaN
5  gatherIncidencePrevalenceResults.R       NaN
6      generateDenominatorCohortSet.R 0.1703704
7             getDenominatorCohorts.R       NaN
8                      getIncidence.R       NaN
9                     getPrevalence.R       NaN
10                     getStudyDays.R 0.2082324
11      incidencePrevalence-package.R       NaN
12                 input_validation.R 0.4700622
13       mockIncidencePrevalenceRef.R       NaN
14                    obscureCounts.R       NaN
15                  recordAttrition.R       NaN
16                       utils-pipe.R       NaN
17                            utils.R 0.0952381

mvankessel-EMC mentioned this issue Mar 20, 2023

Adding dupree darwin-eu-dev/PaRe#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding dupree #12

Adding dupree #12

edward-burn commented Mar 16, 2023

mvankessel-EMC commented Mar 16, 2023

Adding dupree #12

Adding dupree #12

Comments

edward-burn commented Mar 16, 2023

mvankessel-EMC commented Mar 16, 2023