# Steps for generating features 

## Feature 1 : Unfolding energy of regions: Splice Site, region around mutation, larger region around splice site
* Splice site is defined as 3 bases upstream and 6 bases downstream of exon-intron boundary

## Ensemble based calculations

### Generate structural ensemble for each sequence (WT and mutations) incorporating DMS data collected for WT sequence

In [None]:
# Invivo data
python generateStructuralEnsemble_NoMaxPairingDist_RNAStructure.py -f ../data/MAPT_exon10intron10withDMSdata.fa -m  ../data/MAPT_SNPs_ToTest.tsv -b 93 -d ../data/Invivo_Exon10Intron10_PooledRep_RenormalizedPerNucleotide_ScaledGUs_GsAndUsSetToLargeNegativeValue_RewrittenCoordinatesIncludeAllExon10.shape -t /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/StructuralEnsemble_SNPs/

In [None]:
# Exvivo data
python generateStructuralEnsemble_NoMaxPairingDist_RNAStructure.py -f ../data/MAPT_exon10intron10withDMSdata.fa -m  ../data/MAPT_SNPs_ToTest.tsv -b 93 -d ../data/Exvivo_Exon10Intron10_PooledRep_RenormalizedPerNucleotide_ScaledGUs_GsAndUsSetToLargeNegativeValue_RewrittenCoordinatesIncludeAllExon10.shape -t /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/StructuralEnsemble_SNPs/

In [None]:
# no data 
python generateStructuralEnsemble_NoMaxPairingDist_RNAStructure.py -f ../data/MAPT_exon10intron10withDMSdata.fa -m  ../data/MAPT_SNPs_ToTest.tsv -b 93 -t /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/StructuralEnsemble_SNPs/

### Calculate the median energy of the ensemble for each mutation and WT

In [None]:
# Invivo data
python calulateMedianDeltaGForEnsemblePerMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -m ../data/MAPT_SNPs_ToTest.tsv -p Rsample -a Median

In [None]:
# Exvivo data
python calulateMedianDeltaGForEnsemblePerMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -m ../data/MAPT_SNPs_ToTest.tsv -p Rsample -a Median

In [None]:
# No data
python calulateMedianDeltaGForEnsemblePerMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -m ../data/MAPT_SNPs_ToTest.tsv -p Partition -a Median

### Unfold just the splice site for each sequence (WT and mutations) and calculate the difference in delta G 

In [None]:
# Invivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_SpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 91 -2 99 -p Rsample -g MAPT -a Median -c 93

In [None]:
# Exvivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_SpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 91 -2 99 -p Rsample -g MAPT -a Median -c 93

In [None]:
# No data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_SpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 91 -2 99 -p Partition -g MAPT -a Median -c 93

### Unfold the local region around each mutation and calculate the difference in delta G - use window of 5

In [None]:
# Invivo data
python calculateDeltaGforUnfoldingEnergyAroundMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -c StructuralEnsemble -r Unfold_LocalRegion -m ../data/MAPT_SNPs_ToTest.tsv -w 5 -b 93 -p Rsample -g MAPT -a Median

In [None]:
# Exvivo data
python calculateDeltaGforUnfoldingEnergyAroundMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -c StructuralEnsemble -r Unfold_LocalRegion -m ../data/MAPT_SNPs_ToTest.tsv -w 5 -b 93 -p Rsample -g MAPT -a Median

In [None]:
# No data
python calculateDeltaGforUnfoldingEnergyAroundMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -c StructuralEnsemble -r Unfold_LocalRegion -m ../data/MAPT_SNPs_ToTest.tsv -w 5 -b 93 -p Partition -g MAPT -a Median

### Unfold 10 bases around the splice site for each sequence (WT and mutations) and calculate the difference in delta G 

In [None]:
# Invivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_20basesAroundSpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 84 -2 103 -p Rsample -g MAPT -a Median

In [None]:
# Exvivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_20basesAroundSpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 84 -2 103 -p Rsample -g MAPT -a Median

In [None]:
# No data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_20basesAroundSpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 84 -2 103 -p Partition -g MAPT -a Median

### Unfold 12 exonic bases and 31 intronic bases for each sequence (WT and mutations) and calculate the difference in delta G -> from splicesome assembly model

In [None]:
# Invivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_MaxLengthRNAinSpliceosome -m ../data/MAPT_SNPs_ToTest.tsv -1 82 -2 124 -p Rsample -g MAPT -a Median -c 93

In [None]:
# Exvivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_MaxLengthRNAinSpliceosome -m ../data/MAPT_SNPs_ToTest.tsv -1 82 -2 124 -p Rsample -g MAPT -a Median -c 93

In [None]:
# No data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b StructuralEnsemble -r Unfold_MaxLengthRNAinSpliceosome -m ../data/MAPT_SNPs_ToTest.tsv -1 82 -2 124 -p Partition -g MAPT -a Median -c 93

## MFE based calculations

### Calculate the energy of the MFE of sequence 

In [None]:
# Invivo data
python generateMFEstructure_NoMaxPairingDist_RNAStructure.py -f ../data/MAPT_exon10intron10withDMSdata.fa -m  ../data/MAPT_SNPs_ToTest.tsv -b 93 -d ../data/Invivo_Exon10Intron10_PooledRep_RenormalizedPerNucleotide_ScaledGUs_GsAndUsSetToLargeNegativeValue_RewrittenCoordinatesIncludeAllExon10.shape -t /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -q SNPs  

In [None]:
# Exvivo data
python generateMFEstructure_NoMaxPairingDist_RNAStructure.py -f ../data/MAPT_exon10intron10withDMSdata.fa -m  ../data/MAPT_SNPs_ToTest.tsv -b 93 -d ../data/Exvivo_Exon10Intron10_PooledRep_RenormalizedPerNucleotide_ScaledGUs_GsAndUsSetToLargeNegativeValue_RewrittenCoordinatesIncludeAllExon10.shape -t /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -q SNPs  

In [None]:
# No data
python generateMFEstructure_NoMaxPairingDist_RNAStructure.py -f ../data/MAPT_exon10intron10withDMSdata.fa -m  ../data/MAPT_SNPs_ToTest.tsv -b 93 -t /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -q SNPs  

### Unfold splice site

In [None]:
# Invivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_SpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 91 -2 99 -p DMSdata -g MAPT -a Median -c 93

In [None]:
# Exvivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_SpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 91 -2 99 -p DMSdata -g MAPT -a Median -c 93

In [None]:
# No data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_SpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 91 -2 99 -p NoDMSdata -g MAPT -a Median -c 93

### Unfold the local region around each mutation and calculate the difference in delta G - use window of 5

In [None]:
# Invivo data
python calculateDeltaGforUnfoldingEnergyAroundMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -c MFEstructures -r Unfold_LocalRegion -m ../data/MAPT_SNPs_ToTest.tsv -w 5 -b 93 -p DMSdata -g MAPT -a Median

In [None]:
# Exvivo data
python calculateDeltaGforUnfoldingEnergyAroundMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -c MFEstructures -r Unfold_LocalRegion -m ../data/MAPT_SNPs_ToTest.tsv -w 5 -b 93 -p DMSdata -g MAPT -a Median

In [None]:
# No data
python calculateDeltaGforUnfoldingEnergyAroundMutation.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -c MFEstructures -r Unfold_LocalRegion -m ../data/MAPT_SNPs_ToTest.tsv -w 5 -b 93 -p NoDMSdata -g MAPT -a Median

### Unfold 10 bases around the splice site for each sequence (WT and mutations) and calculate the difference in delta G 

In [None]:
# Invivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_20basesAroundSpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 84 -2 103 -p DMSdata -g MAPT -a Median

In [None]:
# Exvivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_20basesAroundSpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 84 -2 103 -p DMSdata -g MAPT -a Median

In [None]:
# No data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_20basesAroundSpliceSite -m ../data/MAPT_SNPs_ToTest.tsv -1 84 -2 103 -p NoDMSdata -g MAPT -a Median

### Unfold 12 exonic bases and 31 intronic bases for each sequence (WT and mutations) and calculate the difference in delta G -> from splicesome assembly model### Unfold splice site

In [None]:
# Invivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/InvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_MaxLengthRNAinSpliceosome -m ../data/MAPT_SNPs_ToTest.tsv -1 82 -2 124 -p DMSdata -g MAPT -a Median -c 93

In [None]:
# Exvivo data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/ExvivoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_MaxLengthRNAinSpliceosome -m ../data/MAPT_SNPs_ToTest.tsv -1 82 -2 124 -p DMSdata -g MAPT -a Median -c 93

In [None]:
# No data
python calculateDeltaGforUnfoldingCoordinatePair.py -f /home/jkumar/Projects/Model_MAPTsplicing/tmp/NoDMSdata_PooledRep_Exon10Intron10_Structures/ -t SNPs -b MFEstructures -r Unfold_MaxLengthRNAinSpliceosome -m ../data/MAPT_SNPs_ToTest.tsv -1 82 -2 124 -p NoDMSdata -g MAPT -a Median -c 93

### Calculate the difference in delta G between WT and MUT for all regions interested

In [None]:
# Calculation for invivo, exvivo and nodata all done in a single script
python calculateDiffDeltaGforUnfoldingBetweenWTandNotWT.py StructuralEnsemble SNPs_ToTest
python calculateDiffDeltaGforUnfoldingBetweenWTandNotWT.py MFEstructures SNPs_ToTest

## Feature 2 : 5' Splice Site Strength
* Splice site is defined as 3 bases upstream and 6 bases downstream of exon-intron boundary

In [None]:
python calculate5pSpliceSiteStrength.py -t /home/jkumar/Projects/Model_MAPTsplicing/tmp/ -w ../data/MAPT_exon10intron10withDMSdata.fa -b 93 -m ../data/MAPT_SNPs_ToTest.tsv

In [None]:
python calculateDiffInStrengthforMotifsBetweenWTandNotWT.py SNPs_ToTest

## Feature 3 : SRE Strength
* Calculate changes in strength of SREs - ESEs, ESSs, ISEs, ISSs 

In [None]:
# For all categories
python getLocationAndStrengthOfSREbasedOnPWM.py -w ../tmp/ -t Muts -f ../data/MAPT_exon10intron10withDMSdata.fa -b 93 -m ../data/MAPT_SNPs_ToTest.tsv

In [None]:
# For every cluster of SRE calculate the difference in strength between WT and MUT
python calculateSREstrengthDifferenceBetweenWTandMUT_perCluster.py -w ../tmp/ -t Muts -m ../data/MAPT_SNPs_ToTest.tsv

In [None]:
# Calculate the average diff in strength across all clusters per SRE category
python calculateDiffInStrengthforMotifsBetweenWTandNotWT.py SNPs_ToTest