Madc2vcf updates by Cristianetaniguti · Pull Request #53 · Breeding-Insight/BIGr

Cristianetaniguti · 2026-03-27T13:42:09Z

Updates functions madc2vcf

Details:

If verbose = TRUE, the functions output informative messages along the process
both functions (targets and all (targets + off-targets) markers now have check_madc_sanity function implemented. It tests:
- [ChromPos] If MADC has the expected columns
- [allNArow | allNAcol] Presence of columns and rows with all NA (happens often when people open the MADC in excel before loading in R)
- [IUPACcodes] Presence of IUPAC codes on AlleleSequence
- [LowerCase] Presence of lower case bases on AlleleSequence
- [Indels] Presence of Indels
- [ChromPos] If CloneID follows the format Chr_Pos
- [RefAltSeqs] If all Ref Allele has corresponding Alt and vice-versa
- [OtherAlleles] If "Other" exists in the MADC AlleleID
madc2vcf_targets doesn’t run if:
- MADC Column names are not correct
- Ignore Other alleles - but inform the user if they exist or not
See the table for madc2vcf_targets requirements accordingly to MADC content:

	check status	get_REF_ALT	include_others	Requires
IUPAC	TRUE	TRUE	-	markers_info REF/ALT
	TRUE	FALSE	-	-
	FALSE	TRUE	-	botloci or markers_info REF/ALT
	FALSE	FALSE	-	-
Indels	TRUE	TRUE	-	markers_info REF/ALT
	TRUE	FALSE	-	-
	FALSE	TRUE	-	botloci or markers_info REF/ALT
	FALSE	FALSE	-	-
ChromPos	TRUE	TRUE	-	botloci or markers_info REF/ALT
	TRUE	FALSE	-	-
	FALSE	TRUE	-	markers_info CHR/POS/REF/ALT or markers_info CHR/POS/ + botloci
	FALSE	FALSE	-	markers_info CHR/POS
FixAlleleIDs	TRUE	TRUE	-	botloci or markers_info REF/ALT
	TRUE	FALSE	-	-
	FALSE	TRUE	-	markers_info REF/ALT
	FALSE	FALSE	-	-

madc2vcf_targets got two new arguments: collapse_matches_counts

Still in progress:

Users now have the option to generate multiallelic VCF - new function madc2vcf_multi
madc2vcf_all and madc2vcf_multi doesn’t run if:
- MADC Column names are not correct
- If it is raw MADC
- If it has IUPAC codes
See the table for madc2vcf_all and madc2vcf_multi requirements accordingly to MADC content:

	Check status	Requires
Indels	TRUE	markers_info REF/ALT/IndelPos/IndelLenght + botloci
	FALSE	botloci
ChromPos	TRUE	botloci
	FALSE	markers_info CHR/POS + botloci
RefAltSeqs	TRUE	botloci
	FALSE	botloci + hapdb

Help pages updated with all these new information

Copilot

Pull request overview

This PR updates the MADC→VCF conversion workflow (especially madc2vcf_targets) by adding structured verbose messaging, expanding MADC sanity checks, and introducing count-collapsing options—along with substantial new tests and documentation updates.

Changes:

Add vmsg() utility and wire verbose progress messages through MADC/VCF conversion and count-extraction helpers.
Expand check_madc_sanity() with additional validations and integrate it into madc2vcf_targets() / madc2vcf_all().
Add collapse_matches_counts support to aggregate |AltMatch/|RefMatch counts, and update docs/tests accordingly.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
R/utils.R	Adds `vmsg()` and `url_exists()` utilities.
R/check_madc_sanity.R	Expands sanity checks (IUPAC/lowercase/indels/ChromPos/NA rows+cols/etc.) and moves/extends `check_botloci()`.
R/get_countsMADC.R	Adds `madc_object`, `collapse_matches_counts`, and verbose messaging; refactors `get_counts()`.
R/madc2vcf_targets.R	Adds input validation, integrates sanity checks, supports collapsing counts, improves REF/ALT recovery logic, and writes metadata into VCF headers.
R/madc2vcf_all.R	Integrates sanity checks and adds `markers_info` parameter.
tests/testthat/test-madc2vcf_targets.R	Removes REF/ALT cross-check vs `madc2vcf_all`; adds extensive new target conversion tests (incl. external fixtures).
tests/testthat/test-check_madc_sanity.R	Replaces local fixture with GitHub-hosted CSVs and adds expected check vectors.
man/vmsg.Rd	New documentation for `vmsg()`.
man/check_madc_sanity.Rd	Updates documentation for expanded sanity checks.
man/get_counts.Rd	New internal documentation for `get_counts()`.
man/get_countsMADC.Rd	Updates docs for new args/behavior.
man/madc2vcf_targets.Rd	Updates signature/docs for new args and updated requirements.
man/madc2vcf_all.Rd	Adds `markers_info` to usage (docs incomplete).
man/imputation_concordance.Rd	Updates API docs and adds plot/printing options (example currently non-runnable).
man/filterVCF.Rd	Adds `quality.rates` to usage (docs incomplete).
NAMESPACE	Exports `vmsg` and adds several imports (incl. `ggplot2`).
NEWS.md	Adds release notes for 0.6.4 changes.

Comments suppressed due to low confidence (1)

R/madc2vcf_targets.R:161

report <- read.csv(madc_file) uses the default check.names = TRUE, which can silently modify sample column names (e.g., prefixing with X), changing sample IDs in the resulting VCF matrices. Use check.names = FALSE when reading the MADC to preserve original sample identifiers.

  # MADC checks
  report <- read.csv(madc_file)
  checks <- check_madc_sanity(report)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T13:49:48Z

R/madc2vcf_all.R

+  checks <- check_madc_sanity(report)
+
+  messages_results <- mapply(function(check, message) {
+    if (check)  message[1] else message[2]


Same NA-handling issue as in madc2vcf_targets(): if (check) inside mapply() will error if any check is NA. check_madc_sanity() uses NA for skipped checks, so this needs isTRUE()/isFALSE() handling before building messages_results.

Suggested change

if (check) message[1] else message[2]

if (isTRUE(check)) {

message[1]

} else if (isFALSE(check)) {

message[2]

} else {

NA_character_

}

Copilot · 2026-03-27T13:49:49Z

man/madc2vcf_all.Rd

  out_vcf = NULL,
+  markers_info = NULL,
  verbose = TRUE
 )


markers_info was added to the function signature here, but it is not documented in the \arguments section below (only verbose is). Please add an \item{markers_info}{...} entry so the Rd matches the updated API.

Copilot · 2026-03-27T13:49:49Z

R/madc2vcf_targets.R

@@ -106,18 +164,121 @@ madc2vcf_targets <- function(madc_file,
    if (check)  message[1] else message[2]


messages_results <- mapply(function(check, message) { if (check) ... }, checks$checks, ...) will throw missing value where TRUE/FALSE needed when any check is NA (which check_madc_sanity() explicitly uses for skipped checks). Use isTRUE(check) / isFALSE(check) handling (or default message) so NA checks don't error before you can stop() on Columns.

Suggested change

if (check) message[1] else message[2]

if (isTRUE(check)) message[1] else message[2]

Copilot · 2026-03-27T13:49:49Z

R/madc2vcf_all.R

                      "verbose= ", verbose,')">')

-  if(!is.null(madc)) report <- read.csv(madc, check.names = FALSE) else stop("Please provide a MADC file")
+  report <- read.csv(madc, check.names = FALSE)
+  checks <- check_madc_sanity(report)


madc is allowed to be NULL by the signature, but the function now unconditionally calls read.csv(madc, ...). When madc is NULL, this will error with an unhelpful message (and your input checks currently don't stop on NULL). Add an explicit is.null(madc) guard (or make madc required).

Copilot · 2026-03-27T13:49:49Z

NAMESPACE

 export(updog2vcf)
+export(vmsg)
 import(dplyr)
+import(ggplot2)


import(ggplot2) was added, but ggplot2 is not declared in DESCRIPTION Imports/Depends (it's not currently listed). This will fail R CMD check (namespace imports must be declared). Either add ggplot2 to DESCRIPTION Imports, or move it to Suggests and use requireNamespace() conditionally in plotting code.

Suggested change

import(ggplot2)

Copilot · 2026-03-27T13:49:52Z

man/filterVCF.Rd

 filterVCF(
  vcf.file,
+  quality.rates = F,
  filter.OD = NULL,
  filter.BIAS.min = NULL,


quality.rates was added to the function usage but is missing from the \arguments list, and the default is shown as F instead of FALSE. Please document the parameter and use FALSE for consistency/clarity in generated docs.

Copilot · 2026-03-27T13:49:52Z

tests/testthat/test-madc2vcf_targets.R

+  test_that("ALFALFA — clean fixed allele ID MADC", {
+    out <- tempfile(fileext = ".vcf")
+    expect_no_error(
+      madc2vcf_targets(madc_file = alfalfa_madc,
+                       output.file = out,


testthat::test_that() blocks are nested here (a test_that() inside another test_that()). testthat does not support nesting and this will error during test execution. Flatten these into separate top-level test_that() calls (or use local_* helpers/setup code instead of nesting).

Copilot · 2026-03-27T13:49:52Z

tests/testthat/test-madc2vcf_targets.R

+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files
+  alfalfa_madc           <- paste0(github_path, "test_madcs/alfalfa_madc.csv")
+  alfalfa_madc_wrongID   <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")
+  alfalfa_madc_raw       <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv")       # raw DArT format (7-row header)
+  alfalfa_iupac          <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv")
+  alfalfa_lowercase      <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv")
+  alfalfa_botloci        <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci")          # botloci for alfalfa
+  alfalfa_markers_info   <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt
+  alfalfa_markers_info_ChromPos   <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos
+
+
+  # External potato test files
+  potato_indel_madc                 <- paste0(github_path, "test_madcs/potato_indel_madc.csv")
+  potato_indel_iupac                <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv")
+  potato_indel_lowercase            <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv")
+  potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv")
+  potato_botloci                    <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")


These tests pull fixtures directly from GitHub at runtime. Even with skip_if_offline(), this makes CI/test results dependent on external availability and file immutability. Prefer committing small, versioned fixtures under inst/extdata (or tests/testthat/fixtures) and reading locally so tests are deterministic.

Suggested change

github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"

# External alfalfa test files

alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv")

alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")

alfalfa_madc_raw <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv") # raw DArT format (7-row header)

alfalfa_iupac <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv")

alfalfa_lowercase <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv")

alfalfa_botloci <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci") # botloci for alfalfa

alfalfa_markers_info <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt

alfalfa_markers_info_ChromPos <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos

# External potato test files

potato_indel_madc <- paste0(github_path, "test_madcs/potato_indel_madc.csv")

potato_indel_iupac <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv")

potato_indel_lowercase <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv")

potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv")

potato_botloci <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")

fixtures_dir <- testthat::test_path("fixtures")

# External alfalfa test files (now local fixtures)

alfalfa_madc <- file.path(fixtures_dir, "test_madcs", "alfalfa_madc.csv")

alfalfa_madc_wrongID <- file.path(fixtures_dir, "test_madcs", "alfalfa_madc_wrongID.csv")

alfalfa_madc_raw <- file.path(fixtures_dir, "test_madcs", "alfalfa_madc_raw.csv") # raw DArT format (7-row header)

alfalfa_iupac <- file.path(fixtures_dir, "test_madcs", "alfalfa_IUPAC.csv")

alfalfa_lowercase <- file.path(fixtures_dir, "test_madcs", "alfalfa_lowercase.csv")

alfalfa_botloci <- file.path(fixtures_dir, "alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci") # botloci for alfalfa

alfalfa_markers_info <- file.path(fixtures_dir, "alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt

alfalfa_markers_info_ChromPos <- file.path(fixtures_dir, "test_madcs", "alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos

# External potato test files (now local fixtures)

potato_indel_madc <- file.path(fixtures_dir, "test_madcs", "potato_indel_madc.csv")

potato_indel_iupac <- file.path(fixtures_dir, "test_madcs", "potato_indel_IUPAC.csv")

potato_indel_lowercase <- file.path(fixtures_dir, "test_madcs", "potato_indel_lowercase.csv")

potato_more_indels_chrompos_false <- file.path(fixtures_dir, "test_madcs", "potato_more_indels_madc_ChromPosFALSE.csv")

potato_botloci <- file.path(fixtures_dir, "potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")

Copilot · 2026-03-27T13:49:52Z

R/check_madc_sanity.R

+  has_0001 <- any(grepl("_0001", report$AlleleID, fixed = TRUE), na.rm = TRUE)
+  has_0002 <- any(grepl("_0002", report$AlleleID, fixed = TRUE), na.rm = TRUE)
+  checks["FixAlleleIDs"] <- (!all_blank_or_star) & has_0001 & has_0002


check_madc_sanity() reads report$AlleleID to compute has_0001/has_0002 before confirming that the required columns exist. If AlleleID is missing, this will error instead of returning Columns = FALSE as intended. Validate required columns (or guard column access) before using report$AlleleID/report$CloneID/report$AlleleSequence.

Suggested change

has_0001 <- any(grepl("_0001", report$AlleleID, fixed = TRUE), na.rm = TRUE)

has_0002 <- any(grepl("_0002", report$AlleleID, fixed = TRUE), na.rm = TRUE)

checks["FixAlleleIDs"] <- (!all_blank_or_star) & has_0001 & has_0002

if ("AlleleID" %in% names(report)) {

has_0001 <- any(grepl("_0001", report$AlleleID, fixed = TRUE), na.rm = TRUE)

has_0002 <- any(grepl("_0002", report$AlleleID, fixed = TRUE), na.rm = TRUE)

checks["FixAlleleIDs"] <- (!all_blank_or_star) & has_0001 & has_0002

} else {

has_0001 <- NA

has_0002 <- NA

# leave checks["FixAlleleIDs"] as initialized (NA) when AlleleID is missing

}

Copilot · 2026-03-27T13:49:53Z

R/check_madc_sanity.R


+    # --- All NA ----
+    checks["allNArow"] <- any(apply(report, 1, function(x) all(is.na(x) | x == "")))
+    checks["allNAcol"] <- any(apply(report, 2, function(x) all(is.na(x)) | x == ""))


The allNAcol check is incorrect: all(is.na(x)) | x == "" vectorizes over x and will often return TRUE if a column contains any empty string, not only when all values are missing. Use all(is.na(x) | x == "") (matching the allNArow logic).

Suggested change

checks["allNAcol"] <- any(apply(report, 2, function(x) all(is.na(x)) | x == ""))

checks["allNAcol"] <- any(apply(report, 2, function(x) all(is.na(x) | x == "")))

Cristianetaniguti added 4 commits March 13, 2026 18:10

opt messages

5c0b590

messages ok

9afb265

targets okay

c31118d

targets ok

5d54f0d

Cristianetaniguti requested a review from Copilot March 27, 2026 13:42

Copilot started reviewing on behalf of Cristianetaniguti March 27, 2026 13:42 View session

Merge branch 'ped_indels_update' into madc2vcf_all_updates

7c12d49

Cristianetaniguti merged commit df6fe92 into ped_indels_update Mar 27, 2026

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Madc2vcf updates#53

Madc2vcf updates#53
Cristianetaniguti merged 5 commits intoped_indels_updatefrom
madc2vcf_all_updates

Cristianetaniguti commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    if (check)  message[1] else message[2]
+    if (isTRUE(check)) {
+      message[1]
+    } else if (isFALSE(check)) {
+      message[2]
+    } else {
+      NA_character_
+    }

		@@ -106,18 +164,121 @@ madc2vcf_targets <- function(madc_file,
		if (check) message[1] else message[2]

	if (check) message[1] else message[2]
	if (isTRUE(check)) message[1] else message[2]

	checks["allNAcol"] <- any(apply(report, 2, function(x) all(is.na(x)) \| x == ""))
	checks["allNAcol"] <- any(apply(report, 2, function(x) all(is.na(x) \| x == "")))

Conversation

Cristianetaniguti commented Mar 27, 2026

Updates functions madc2vcf

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants