No description, website, or topics provided.
R Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
bin Update setupINSPIIRED.R Aug 15, 2016
components integration Jun 23, 2016
databases/SQLite emptied out specimenManagement.db Aug 9, 2016
inputs preintegration Jun 23, 2016
outputs preintegration Jun 23, 2016
INSPIIRED.yml Corrected vector path in INSPIIRED.yml Dec 14, 2017
LICENSE Update LICENSE Jul 5, 2016 Update Aug 30, 2016


INSPIIRED is a software suite designed to study viral integration sites and the longitudinal outcomes of gene therapy patients. Each component of the INSPIIRED software suite is available through GitHub. For detailed instructions on how
to make use of each component, please visit its repository:


INSPIIRED is designed to run in a Linux environment and is best suited to run on a computational cluster where its
memory requirements are proportional to the volume of data being processed. When using a computational cluster,
update INSPIIRED's configuration file to instruct it to use either the bsub or qsub job submission methods.
A single, multi-core computer may be used provided that qsub is installed in order to distribute INPIIRED's jobs
across its cores. Alternatively, a single core computer (or multi-core machine without qsub installed) may be used
by setting the configuration variable 'parallelize' to 'no'.

The INSPIIRED pipeline is available as both a Linux virtual machine and as a Conda supported software suite.
The virtual machine is available here (17GB).

We recommend the free virtualbox software ( to install and run the virtual machine.
The virtual machine has a single user account 'inspiired' using the password 'inspiired1'. The INSPIIRED software
can be found in the inspiired user's home directory. In order to run the demo data set, at least 16GB of memory should be
allocated to the virtual machine.

Below is an example of how to set up INSPIIRED on a 64-bit Linux machine

Set up Conda.
While installing Conda, agree to the license and agree to allow the setup script to update your .bashrc file.
If you are not using a 64-bit Linux machine then please visit
and download the appropriate Python 2.x version of Conda for your machine.

%> wget
%> bash
%> source ~/.bashrc
%> conda config --add channels 'bioconda'  
%> conda config --add channels 'r'  

Running the 'git clone' command below will begin the installation process in the same directory from which it is called.
INSPIIRED depends upon file paths defined in its configuration file (INPIIRED.yml) shown at the bottom of this page.
If you install INSPIIRED in a location other than your home directory then you must update the paths in this configuration file.
At the bottom of this page are known installation problems and their solutions.

The setup script installs / updates a number of R libraries and may take up to 10 minutes to execute.

%> git clone
%> export INSPIIRED=$(pwd)
%> conda env create -f bin/INSPIIRED.conda.yml
%> source activate INSPIIRED
%> Rscript bin/setupINSPIIRED.R

**Identify integration sites.**
%> cd inputs/demoDataSet
%> Rscript $INSPIIRED/components/intSiteCaller/intSiteCaller.R -j demo
%> Rscript $INSPIIRED/components/intSiteCaller/check_stats.R

**Upload data to local database.**   
The data will be uploaded to a SQLite database defined in the INSPIIRED.yml configuration file.
%> Rscript $INSPIIRED/components/intSiteUploader/intSiteUploader.R

**Create a sample management database.**  
The patient report and genomic heat map creators depend on a second sample management data base.
This database contains details about where samples originate from and how they were prepared.
The INSPIIRED software package includes an empty SQLite sample management data base which
can be populated with a csv formatted data file.
%> Rscript $INSPIIRED/bin/uploadSampleData.R ../../databases/SQLite/specimenManagement.db sampleData.csv

**Create HTML patient report.**  
The report will be outputted to the analysis directory with the file name format (project).(patient id).(date).html
%> Rscript $INSPIIRED/components/geneTherapyPatientReportMaker/makeGeneTherapyPatientReport.R demo.csv

**Convert report from HTML to PDF.**  
(!) Note that the report name in the example below contains the current time and will be different on your system.
%> Rscript $INSPIIRED/components/geneTherapyPatientReportMaker/printReportToPdf.R SCIDn1.pP1.20160622.html

**Create an interactive genomic heat map.**  
The heat map files will be outputed to a directory named genomicHeatmap/.  
The heat map and associated files are all SVG (Scalable Vector Graphics) files which can be viewed with most browsers.
%> Rscript $INSPIIRED/components/genomicHeatmapMaker/genomic_heatmap_from_db.R -o genomicHeatmap demo.csv

**Create a epigenic heat map.**  
The heat map files will be outputted to a directory named epiHeatmap/.
%> Rscript $INSPIIRED/components/EpigeneticHeatmapMaker/epi_heatmap_from_db.R -o epiHeatmap  demo.epi.csv

**INSPIIRED configuration file:**
# Log file path.
# This file will be written to the analysis directory.
logFile : intSiteCaller.log

# Database configuration.
# If the database parameter is set to 'sqlite' then the provided sqliteIntSitesDB and sqliteSampleManagement databases will be used.
# If the parameter is set to 'mysql' the [mysqlConnectionGroup] connection credentials defined in your ~/.my.cnf file will be used.
dataBase : sqlite
sqliteIntSitesDB : ~/INSPIIRED/databases/SQLite/intSites.db
sqliteSampleManagement : ~/INSPIIRED/databases/SQLite/specimenManagement.db
mysqlConnectionGroup :

# The parallelize parameter instructs INSPIRRED how to distribute jobs over the available processors.
# Allowed values are: no, qsub, bsub
# If 'no' is selected then all jobs will be ran serially.
# If using qsub on a single, multi-core machine, then set the forceQsubPath to 'Yes'.
parallelize : qsub
forceQsubPath : Yes

# Debuging
# If this parameter is set to Yes than a number of intermediate temp files will be retained.
debug : No

# Run id
# This id is used to distinguish the sequencing run from other previous runs.
runId : demoRun

# Maximum size for each sequence file chunk
chunkSize : 30000

# Epigenetic heat map data source
epigeneticDataDirectory : ~/INSPIIRED/components/EpigeneticHeatmapMaker/Epigenetic

# Log in to system hosting vector and sequencing data files
remoteUser :

# Directory that holds the vector information file defiled in the vectorSeq column of the provided sampleInfo.csv file
vectorDataPath : .

# Directory that holds the R1, R2, and I1 sequencing run gzipped FASTQ files.
# Defined paths should be absolute or relative to the analysis directory.
  I1 :  Data/Undetermined_S0_L001_I1_001.fastq.gz
  R1 :  Data/Undetermined_S0_L001_R1_001.fastq.gz
  R2 :  Data/Undetermined_S0_L001_R2_001.fastq.gz

# Processing parameters
  qualityThreshold     : '?'
  badQualityBases      : 5
  qualitySlidingWindow : 10
  mingDNA              : 20
  minPctIdent          : 95
  maxAlignStart        : 5
  maxFragLength        : 2500

####Known installation issues.
On some systems, the following error may arise with the setupINSPIIRED.R script when installing R libraries:
awk: symbol lookup error: undefined symbol: PC  
This can only be resolved by replacing the conda library with a system library resource, i.e.
mv /home/everett/miniconda2/envs/INSPIIRED/lib/   /home/everett/miniconda2/envs/INSPIIRED/lib/
ln -s /lib/x86_64-linux-gnu/   /home/everett/miniconda2/envs/INSPIIRED/lib/  
Please note that the system level library may be in a different locatin on your system.