AneufinderFileFilter contains R scripts that allows for easy filtering of single-cell DNA sequencing output generated by the R package "Aneufinder". Filtering is based on QC data generated by Aneufinder and filtered model files can be reorganized to new directories. This script only reads the model files and doesn't make any changes to the orginal Aneufinder output and model files.
To use the code you need two included scripts:
-
RUN_AneufinderFileFilter script
Set input/output folders and choose correct settings -
FUNC_AneufinderFileFilter script
Contains the code that is used to filter the data
This script makes use of the following R packages:
Input:
- Aneufinder Model files
Output:
For each sample:
- .txt summary of used filter parameters
- directory with selected model files
- directory with excluded model files
- directory with perfect diploid model files
- .pdf genomewide karyotype plot based on selected/excluded files
- .pdf single karyotype plots based on selected/excluded files
- .pdf heterogeneity/aneuploidy plot based on selected/excluded files
- .csv with QC measurements for each file
- .csv with karyotype measurements for each chromosome
- .csv with karyotype measurements for whole genome
many of the above are optional
To get started I would advise users to make use of R studio and create a new project in R and name it 'AneufinderFileFilter'. Then download both scripts from GitHub and place these within the project folder of your new AneufinderFileFilter project.
In general there's no need to open and/or adjust the function script, this is only needed if you like to make adjustments to the code that performs the actual filtering or the code by which the different plots are generated. You only need to make sure that the RUN script contains the correct source-path to the FUNC script.
The 'Run_AneufinderFileFilter'-script is subdivided in multiple sections to create a good overview of the different settings. Prior to each run you probably like to give your project a new name, assign the correct input folder and check the filtering and plotting settings.
After making all required adjustments, run the code line-by-line. The actual filtering is commenced at the end of the run script by running AneufinderFileFilter(sampleIDs). Soon thereafter you will be prompted to quickly check filter settings; if correct, please enter 'Y' to continue the script.
Available filtering options
- Filter Aneufinder model files generated via edivisive, dnaCopy or hmm.
- Filter files based on total read count per cell, number of chromosome segements, spikiness and/or bhattacharyya distance.
- Exclude model files with too high weighted average copy number
- Exclude model files with a perfect diploid genome
Obtain selected Aneufinder model files
- Copy selected model files to new folder
- Copy model files from perfect diploid cells to new folder
Plots
- PDF with summary statistics for included and excluded files
- PDF with genomewide profile for selected files
- PDF with single cell karyotype profiles for included or excluded files
- PDF with heterogeneity profiles for selected model files
- CSV file with measurement statistics for each model file
This script was one of my first builds, hence the coding could probably have been more efficient. Nevertheless I hope it can be used to your benefit. If you have any questions or need help with running the script, please don't hesitate to send me a message.
Thomas van Ravesteyn