###Code From the Learning the Sequence Determinants of Alternative Splicing from Millions of Random Sequences
In an effort to make my results reproducible and aid others in their own data analysis, I've included all of the code used in the paper. I hope this will be useful and feel free to conact me with suggestions/comments/questions.
If you want to work with the raw FASTQ files, you can download them from SRA. The links are in the GEO page. Put them into a folder called fastq and everything should work.
If you just want to work with the processed isoform counts, unzip all of the files in the data_gz/ folder and change the name of the folder to data/.
Some of the intermediate results are very large, so I have not included them. This means you will have to go through each notebook sequentially and make the results yourself. If you want to go through a specific notebook, without going through all of the previous ones, please contact me at abros "AT" uw "DOT" edu and I can get you the required intermediate files.
The plasmid files from the paper are available in in the plasmids/ directory.
Notebook 0: Download SRA Files and Convert to Fastq
Notebook 0A: Fastq to Isoform Counts (Alt. 3SS)
Notebook 0B: Fastq to Isoform Counts (Alt. 5SS)
Notebook 1: Library Statistics
Notebook 2: Splice Site Analysis
Notebook 3: Splicing Frame and Nonsense Mediated Decay Analysis
Notebook 4: Estimating Motif Effects
Notebook 5: Combinatorial Motif Effects
Notebook 6: Learning Curves for Models Predicting Alt. 5SS Usage
Notebook 7: A Model of Alternative 5\prime Splicing
Notebook 8: Predictions on Alt. 5SS Events in the Genome
Notebook 9: Training a Joint Model with Both the Alt. 5SS and Alt. 3SS Libraries
Notebook 10: Predicting the Effects of SNPs on Alternative 5\prime Splicing
Notebook 11: Predicting the Effects of SNPs on Exon Skipping