ISAnalytics is an R package developed to analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies.
In gene therapy, stem cells are modified using viral vectors to deliver the therapeutic transgene and replace functional properties since the genetic modification is stable and inherited in all cell progeny. The retrieval and mapping of the sequences flanking the virus-host DNA junctions allows the identification of insertion sites (IS), essential for monitoring the evolution of genetically modified cells in vivo. A comprehensive toolkit for the analysis of IS is required to foster clonal tracking studies and supporting the assessment of safety and long term efficacy in vivo. This package is aimed at (1) supporting automation of IS workflow, (2) performing base and advance analysis for IS tracking (clonal abundance, clonal expansions and statistics for insertional mutagenesis, etc.), (3) providing basic biology insights of transduced stem cells in vivo.
The paper is available here https://academic.oup.com/bib/article/24/1/bbac551/6955274?login=false
You can visit the package website to view documentation, vignettes and more.
- For the release version: ISAnalytics Website release
- For the devel version: ISAnalytics Website dev
ISAnalytics
can be installed quickly in different ways:
- You can install it via Bioconductor
- You can install it via GitHub using the package
devtools
There are always 2 versions of the package active:
RELEASE
is the latest stable versionDEVEL
is the development version, it is the most up-to-date version where all new features are introduced
RELEASE version:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ISAnalytics")
DEVEL version:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("ISAnalytics")
RELEASE:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
ref = "RELEASE_3_17",
dependencies = TRUE,
build_vignettes = TRUE)
DEVEL:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("calabrialab/ISAnalytics",
ref = "devel",
dependencies = TRUE,
build_vignettes = TRUE)
ISAnalytics
has a verbose option that allows some functions to print
additional information to the console while they’re executing. To
disable this feature do:
# DISABLE
options("ISAnalytics.verbose" = FALSE)
# ENABLE
options("ISAnalytics.verbose" = TRUE)
Some functions also produce report in a user-friendly HTML format, to set this feature:
# DISABLE HTML REPORTS
options("ISAnalytics.reports" = FALSE)
# ENABLE HTML REPORTS
options("ISAnalytics.reports" = TRUE)
Show more
- Fixed issues in html report for outlier filtering - reported incorrect numbers due to missing conversion in percentage
- Fixed warnings for bslib::nav deprecation
- Fixed minor issue in
default_af_transform()
, transformation failed if NAs were present in the columns
- Fixed broken tests with new updates in underlying packages
- The package no longer depends on
magrittr
- All functionality associated with
data.table
now it’s completely optional and will be used internally only if the package is available - Several packages were moved from
Imports
toSuggests
- functions will notify when additional packages are requested for the specific functionality - All known deprecated or superseded functions from other packages have been removed or substituted
- Added a new tag “barcode_mux” in
available_tags()
- The function
HSC_population_size_estimate()
now better supports the computation of estimates from different groups of cell types and tissues at the same time. The tabular output now contains an additional column “Timepoints_included” that specifies how many time points the estimate contains - Function
is_sharing()
can now handle better limit cases and has the option of being parallelised provided appropriate packages are available (better performance)
- Functions
import_parallel_Vispa2Matrices_auto()
andimport_parallel_Vispa2Matrices_interactive()
are officially defunct and will not be exported anymore starting from the next release cycle - The argument
mode
ofimport_parallel_Vispa2Matrices()
no longer acceptsINTERACTIVE
as a valid option and the interactive mode is considered now defunct, since the usage is very limiting and limited - The argument
association_file
ofimport_parallel_Vispa2Matrices()
no longer accepts a string representing a path. Association file import is delegated solely to its dedicated function from now on. - The function
threshold_filter()
is deprecated, since its use is rather complicated instead of using standard filtering with dplyr or similar tools
default_af_transform()
now pads time points based on the maximum number of characters + 1 in the column
- Fixed an issue with ifelse function in
top_abund_tableGrob()
- now the function has a new argumenttransform_by
which is useful for controlling ordering of columns - Updated CITATION file
- Package
DT
has been moved (likely temporarily) in Imports - linked to issue calabrialab/ISAnalytics#2 - Fixed other typos and minor issues
- Fixed all
tidyselect
warnings (internal use of .data$ in selection context) - Added bonferroni correction in
gene_frequency_fisher
- Added new section in workflow vignette + sample files
- Fixed minor bugs and typos
- Fixed build issues
- Progress bars for long processing functions are now implemented via
the package
progressr
, added a wrapper function for fast enabling progress bars,enable_progress_bars()
- Introduced logging for issues in
HSC_population_size_estimate()
- signals eventual problems in computing estimates and why
- Fixed minor bugs and typos
- All functions that check for options now have a default value if option is not set
CIS_grubbs
function is now faster (removed dependency frompsych::describe
)
- New functions
CIS_grubbs_overtime()
and associated plotting functiontop_cis_overtime_heatmap()
to compute CIS_grubbs test over time
- Fixed minor issues in
import_association_file()
- function had minor issues when importing *.xlsx files and missing optional columns threw errors - Fixed bug in
as_sparse_matrix()
- function failed when trying to process an aggregated matrix
- Added 2 new utility functions
export_ISA_settings()
andimport_ISA_settings()
that allow a faster workflow setup
- Fixed minor issue in
compute_near_integrations()
- function errored whenreport_path
argument was set toNULL
- Fixed dplyr warning in
integration_alluvial_plot()
internals - Fixed issue with report of VISPA2 stats - report failed due to minor error in rmd fragment
- Internals of
remove_collisions()
use again dplyr internally for joining and grouping operations - needed because of performance issues with data.table fisher_scatterplot()
has 2 new arguments that allow the disabling of highlighting for some genes even if their p-value is under the threshold
- ISAnalytics has now a new “dynamic vars system” to allow more
flexibility on user inputs, view the dedicated vignette with
vignette("workflow_start", package="ISAnalytics")
- All package functions were reviewed to work properly with this system
gene_frequency_fisher()
is a new function of the analysis family that allows the computation of Fisher’s exact test p-values on gene frequency -fisher_scatterplot()
is the associated plotting functiontop_targeted_genes()
is a new function of the analysis family that produces the top n targeted genes based on the number of ISNGSdataExplorer()
is a newly implemented Shiny interface that allows the exploration and plotting of data- zipped examples were removed from the package to contain size. To
compensate, the new function
generate_default_folder_structure()
generates the standard folder structure with package-included data on-demand transform_columns()
is a new utility function, also used internally by other exported functions, that allows arbitrary transformations on data frame columns
remove_collisions()
now has a dedicated parameter to specify how independent samples are identifiedcompute_near_integration_sites()
now has a parameter calledadditional_agg_lambda()
to allow aggregation of additional columnsCIS_grubbs()
now signals if there are missing genes in the refgenes table and eventually returns them as a dfoutlier_filter()
is now able to take multiple tests in input and combine them with a given logic. It now also produces an HTML report.- Several functions now use data.table under the hood
- Color of the strata containing IS below threshold can now be set in
integration_alluvial_plot()
- Fixed a minor bug in
import_Vispa2_stats()
- function failed when passingreport_path = NULL
- Fixed minor issue in
circos_genomic_density()
when trying to use a pdf device
unzip_file_system()
was made defunct in favor ofgenerate_default_folder_structure()
cumulative_count_union()
was deprecated and its functionality was moved tocumulative_is()
- Added arguments
fragmentEstimate_column
andfragmentEstimate_threshold
inHSC_population_size_estimate()
. Slightly revised filtering logic. - Updated package logo and website
- Added function to check for annotation problems in IS matrices
- Added argument
max_workers
in functionremove_collisions()
- Updated default functions for
aggregate_metadata()
- Added annotation issues section in import matrices report
- Fixed minor issue in internals for file system alignment checks
- Fixed minor issue in internal call to
import_Vispa2_stats()
fromimport_association_file()
- Added safe computation of sharing in
remove_collisions()
: if process fails function doesn’t stop
- Attempt to fix issues with parallel computation on Windows for some plotting functions
- Fixed issues with function that make use of BiocParallel that sometimes failed on Windows platform
- Added new feature
iss_source()
- Fixed minor issues in data files
refGenes_mm9
and functioncompute_near_integrations()
- Added new feature
purity_filter()
- Fixed small issue in printing information in reports
- Reworked
is_sharing()
function, detailed usage in vignettevignette("sharing_analyses", package = "ISAnalytics")
- New function
cumulative_is()
- New function for plotting sharing as venn/euler diagrams
sharing_venn()
- Fixed issue in tests that lead to broken build
- Slightly modified included data set for better examples
- Completely reworked interactive HTML report system, for details take a
look at the new vignette
vignette("report_system", package = "ISAnalytics")
- Old
ISAnalytics.widgets
option has been replaced byISAnalytics.reports
- In
remove_collisions()
, removed argumentsseq_count_col
,max_rows_reports
andsave_widget_path
, added argumentsquant_cols
andreport_path
(see documentation for details)
import_single_Vispa2Matrix()
now allows keeping additional non-standard columnscompute_near_integrations()
is now faster on bigger data sets- Changed default values for arguments
columns
andkey
incompute_abundance()
compute_near_integrations()
now produces only re-calibration map in *.tsv formatCIS_grubbs()
now supports calculations for each group specified in argumentby
- In
sample_statistics()
now there is the option to include the calculation of distinct integration sites for each group (if mandatory vars are present)
- Added new plotting function
circos_genomic_density()
- Fixed minor issue with NA values in alluvial plots
import_parallel_Vispa2Matrices_interactive()
andimport_parallel_Vispa2Matrices_auto()
are officially deprecated in favor ofimport_parallel_Vispa2Matrices()
- The package has now a more complete and functional example data set for executable examples
- Reworked documentation
- Corrected issues in man pages
is_sharing
computes the sharing of IS between groupssharing_heatmap
allows visualization of sharing data through heatmapsintegration_alluvial_plot
allows visualization of integration sites distribution in groups over time.top_abund_tableGrob
can be used in combination with the previous function or by itself to obtain a summary of top abundant integrations as an R graphic (tableGrob) object that can be combined with plots.
- Added more default stats functions to
default_stats
- Added optional automatic conversion of time points in months and years when importing association file
- Minor fixes in
generate_Vispa2_launch_AF
HSC_population_size_estimate
andHSC_population_plot
allow estimates on hematopoietic stem cell population size- Importing of Vispa2 stats per pool now has a dedicated function,
import_Vispa2_stats
outlier_filter
andoutliers_by_pool_fragments
offer a mean to filter poorly represented samples based on custom outliers tests
- The argument
import_stats
ofaggregate_metadata
is officially deprecated in favor ofimport_Vispa2_stats
aggregate_metadata
is now a lot more flexible on what operations can be performed on columns via the new argumentaggregating_functions
import_association_file
allows directly for the import of Vispa2 stats and converts time points to months and years where not already present- File system alignment of
import_association_file
now produces 3 separate columns for paths separate_quant_matrices
andcomparison_matrix
now do not require mandatory columns other than the quantifications - this allows for separation or joining also for aggregated matrices
- Fixed a minor issue in
CIS_volcano_plot
that caused duplication of some labels if highlighted genes were provided in input
- Fixed issue in
compute_near_integrations
: when provided recalibration map export path as a folder now the function works correctly and produces an automatically generated file name - Fixed issue in
aggregate_metadata
: now paths to folder that contains Vispa2 stats is looked up correctly. Also, VISPA2 stats columns are aggregated if found in the input data frame independently from the parameterimport_stats
.
compute_abundance
can now take as input aggregated matrices and has additional parameters to offer more flexibility to the user. Major updates and improvements also on documentation and reproducible examples.- Major improvements in function
import_single_Vispa2Matrix
: import is now preferentially carried out usingdata.table::fread
greatly speeding up the process - where not possiblereadr::read_delim
is used instead - Major improvements in function
import_association_file
: greatly improved parsing precision (each column has a dedicated type), import report now signals parsing problems and their location and signals also problems in parsing dates. Report also includes potential problems in column names and signals missing data in important columns. Added also the possibility to give various file formats in input including*.xls(x)
formats. - Function
top_integrations
can now take additional parameters to compute top n genes for each specified group - Removed faceting parameters in
CIS_volcano_plot
due to poor precision (easier to add faceting manually) and added parameters to return the data frame that generated the plot as an additional result. Also, it is now possible to specify a vector of gene names to highlight even if they’re not above the annotation threshold.
- ISAnalytics website has improved graphic theme and has an additional button on the right that leads to the devel (or release) version of the website
- Updated vignettes
- Complete rework of test suite to be compliant to testthat v.3
- Fixed minor issues in internal functions with absolute file paths & corrected typos
- Fixed minor issues in internal functions to optimize file system alignment
- Fixed minor issues in import_association_file when checking parameters
- It is now possible to save html reports to file from import_parallel_Vispa2Matrices_auto and import_parallel_Vispa2Matrices_interactive, remove_collisions and compute_near_integrations
- Fixed sample_statistics: now functions that have data frame output do not produce nested tables. Flat tables are ready to be saved to file or can be nested.
- Simplified association file check logic in remove_collisions: now function blocks only if the af doesn’t contain the needed columns
- Upgraded import_association_file function: now file alignment is not mandatory anymore and it is possible to save the html report to file
- Updated vignettes and documentation
- Greatly improved reports for collision removal function
- General improvements for all widget reports
- Further fixes for printing reports when widgets not available
- Added progress bar to collision processing in
remove_collisions
- Updated vignettes
- Added vignette “Using ISAnalytics without RStudio support”
- Fixed missing restarts for non-blocking widgets
- Functions that make use of widgets do not interrupt execution anymore if errors are thrown while producing or printing the widgets
- Optimized widget printing for importing functions
- If widgets can’t be printed and verbose option is active, reports are now displayed on console instead (needed for usage in environments that do not have access to a browser)
- Other minor fixes (typos)
- Bug fixes: fixed a few bugs in importing and recalibration functions
- Minor fix in import_association_file file function: added multiple strings to be translated as NA
- Vignette building might fail due to the fact that package “knitcitations” is temporarily unavailable through CRAN
- ISAnalytics is finally in release on bioconductor!
- Minor fixes in tests
- Added analysis functions
CIS_grubbs
andcumulative_count_union
- Added plotting functions
CIS_volcano_plot
- Added analysis function
sample_statistics
aggregate_values_by_key
has a simplified interface and supports multi-quantification matrices
- Updated vignettes
import_parallel_Vispa2Matrices_interactive
andimport_parallel_Vispa2Matrices_auto
now have an option to return a multi-quantification matrix directly after import instead of a list
- Added analysis functions
threshold_filter
,top_integrations
- Added support for multi-quantification matrices in
compute_abundance
- Fixed bug in
comparison_matrix
that ignored custom column names - Fixed issues in some documentation pages
ISanalytics is officially on bioconductor!
- Added analysis functions
comparison_matrix
andseparate_quant_matrices
- Added utility function
as_sparse_matrix
- Added package logo
- Changed algorithm for
compute_near_integrations
- Added support for multi-quantification matrices to
remove_collisions
- Added usage of lifecycle badges in documentation: users can now see if a feature is experimental/maturing/stable etc
- Added fix for
import_single_Vispa2Matrix
to remove non significant 0 values
- Added functionality: aggregate functions
- Added vignette on aggregate functions
- Added recalibration functions
- Added first analysis function (compute_abundance)
- Dropped structure
ISADataFrame
: now the package only uses standard tibbles - Modified package documentation
- Submitted to Bioconductor
For help please contact the maintainer of the package or open an issue on GitHub.