CyclomicsManuscript README

Data analysis scripts used to process and plot the data for the manuscript of CyclomicsSeq.

Date

1 July 2020

Author

Myrthe Jager (contact: m.jager-2@umcutrecht.nl)
Alessio Marcozzi (contact: alessio.marcozzi@gmail.com)

Description

Here, you can find all code (R scripts and Python notebooks) that can be used to reproduce the results of all the paper Figures and most Supplementary Figures of:
"Accurate detection of circulating tumor DNA using nanopore consensus sequencing".

Data

Processed data DOI: 10.5281/zenodo.3925250

Instructions for use

A) Start up (estimated time < 1 hour)

Download and install required packages (see below @ Software dependencies, R packages and Python libraries)
Download the processed data from Zenodo (DOI: 10.5281/zenodo.3925250)
Create output directory /Results/ in the main downloaded folder 'Cyclomics_manuscript' with subfolders:

/10reps/
/10reps/basetype/
/10reps/extra/
/10reps/insert/
/consensus_loop/
/consensus_loop/backbone/
/consensus_loop/insert/
/consensus_loop/all/
/consensus_loop/all/error/
/consensus_loop/all/qscore/
/patients/
/patients/patienta/
/patients/patientb/
/patients/patientc/

Add dir for each script used in steps B and C (except functions.R) @ GET STARTED step 2: directory of downloaded processed data (once per script)

B) snFP and deletion analyses (estimated time ~ 1 hour)

Run functions.R in RStudio
Run consensus_loop.R in RStudio to get & plot (~ 35 minutes runtime):

Figure 3a
Figure 5a
Supplementary figures: 4a, 7a, 7b, 8a, 8b and 9b
Analyses of single-nucleoide FP rate and deletion rate per number of repeat for 1-39 and 40+ repeats
Separate analyses for entire runs and per insert, and for error and qscore
Coverage of all runs per number of repeat for 1-39 and 40+ repeats

Clear RStudio workspace
Run functions.R in RStudio
Run 10reps.R in RStudio to get & plot (~ 25 minutes runtime):

Figure 3 b,c,d
Figure 5 b,c
Supplementary figures: 4b-g, 5b-e, 6, 7c-f and 8c-f
Analyses of single-nucleotide FP rate and deletion rate per nucleotide position and per run (for 10+ repeats)
Nucleotide positions which require snFP forward/reverse correction
Forward/reverse corrected analyses of single-nucleotide FP rate and deletion rate per nucleotide position and per run (for 10+ repeats)
Percentage of bases with <0.1%, 0.1-1.0 and >1.0% snFP and deletion rates per run

Clear RStudio workspace

C) Patients (estimated time ~ 2 minutes)

Run functions in RStudio
Run patientA.R in RStudio to get & plot (~30 seconds runtime)

Figure 6 a,d

Clear RStudio workspace
Run functions
Run patientB.R in RStudio to get & plot (~30 seconds runtime)

Figure 6 b,e

Clear RStudio workspace
Run functions
Run patientC.R in RStudio to get & plot (~30 seconds runtime)

Figure 6 c,f

Clear RStudio workspace

D) CyclomicsSeq run-stats (estimated time < 1 hour)

The notebook stats_from_structure.ipynb was used to generate the plots of Figure 1 and the ones in SuppementaryData.zip.

Start a jupyter notebook server and open stats_from_structure.ipynb
Set the value of the data_folder variable to match the location where you have downloaded the processed data from Zenodo (DOI: 10.5281/zenodo.3925250)
Optionally, modify the samples variable to include/exclude specific runs. The default value include all the runs published in "Accurate detection of circulating tumor DNA using nanopore consensus sequencing"
Specify the value of save_plots_folder, which indicates the folder where the plots will be saved
Run all the cells

E) TP53 exons coverage (estimated time < 10 minutes)

The notebook TP53_panel_coverage.ipynb was used to generate the plots of Figure 2.

Start a jupyter notebook server and open TP53_panel_coverage.ipynb
Set the value of the data_folder variable to match the location where you have downloaded the processed data from Zenodo (DOI: 10.5281/zenodo.3925250)
Set the value of the output_folder variable which indicates the folder where the plots will be saved
Optionally, modify the samples variable to include/exclude specific runs
Run all the cells

F) False positive rates on TP53 COSMIC mutations (estimated time < 10 minutes)

The notebook COSMIC_analysis.ipynb was used to generate the plots of Figure 4.

Start a jupyter notebook server and open COSMIC_analysis.ipynb
Set the value of the data_folder variable to match the location where you have downloaded the processed data from Zenodo (DOI: 10.5281/zenodo.3925250)
Optionally, modify the samples variable to include/exclude specific runs
Optionally, modify the save_as variable to change the name and the extension of the output file
Run all the cells

G) Visualize the structure of a CyclomicsSeq read (estimated time < 1 hour)

The CyclomicsSeq concatemers can be visualized using plot_read_structure.ipynb

Start a jupyter notebook server and open plot_read_structure.ipynb
Set the value of the data_folder variable to match the location where you have downloaded the processed data from Zenodo (DOI: 10.5281/zenodo.3925250)
Set the value of the sample_name variable to specify the run to be analyzed
Optionally, set the other input parameters specified in the second cell of the notebook
Run all the cells

Software dependencies

R version 3.5.1 RSrudio version 1.1.456

Python version 3.6 jupyterlab version 1.2.6

R packages

cowplot 0.9.3
ggplot2 3.0.0
gridExtra 2.3
reshape2 1.4.3
scales 0.5.0

Python libraries

matplotlib 3.1.3
numpy 1.18.5
pandas 1.0
scipy 1.4
biopython 1.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CyclomicsManuscript README

Date

Author

Description

Data

Instructions for use

A) Start up (estimated time < 1 hour)

B) snFP and deletion analyses (estimated time ~ 1 hour)

C) Patients (estimated time ~ 2 minutes)

D) CyclomicsSeq run-stats (estimated time < 1 hour)

E) TP53 exons coverage (estimated time < 10 minutes)

F) False positive rates on TP53 COSMIC mutations (estimated time < 10 minutes)

G) Visualize the structure of a CyclomicsSeq read (estimated time < 1 hour)

Software dependencies

R packages

Python libraries

About

Releases 1

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
db		db
.gitignore		.gitignore
10reps.R		10reps.R
COSMIC_analysis.ipynb		COSMIC_analysis.ipynb
LICENSE		LICENSE
README.md		README.md
TP53_panel_coverage.ipynb		TP53_panel_coverage.ipynb
consensusloop.R		consensusloop.R
cyclomics.py		cyclomics.py
functions.R		functions.R
patientA.R		patientA.R
patientB.R		patientB.R
patientC.R		patientC.R
plot_read_structure.ipynb		plot_read_structure.ipynb
stats_from_structure.ipynb		stats_from_structure.ipynb

License

UMCUGenetics/CyclomicsManuscript

Folders and files

Latest commit

History

Repository files navigation

CyclomicsManuscript README

Date

Author

Description

Data

Instructions for use

A) Start up (estimated time < 1 hour)

B) snFP and deletion analyses (estimated time ~ 1 hour)

C) Patients (estimated time ~ 2 minutes)

D) CyclomicsSeq run-stats (estimated time < 1 hour)

E) TP53 exons coverage (estimated time < 10 minutes)

F) False positive rates on TP53 COSMIC mutations (estimated time < 10 minutes)

G) Visualize the structure of a CyclomicsSeq read (estimated time < 1 hour)

Software dependencies

R packages

Python libraries

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages