RHseq-scripts

RH-seq Pipeline

Cut adapter out of end of read: 3’ adapter sequence: AGGCGCGCCACAGTACGAGT ⁃ fastx_clipper -a AGGCGCGCCACAGTACGAGT -i input.fq -o output_no_adapter.fastq
Trim first 6 bp of file, so every read begins with barcode ⁃ fastx_trimmer -f 7 -i input_no_adapter.fastq -o no_adapter_trimmed.fastq
Separate barcodes (indexes) ⁃ cat /Users/anna/no_adapter_trimmed.fastq | fastx_barcode_splitter.pl --bcfile /Users/anna/Documents/buck/sequences/barcodes.txt --bol --mismatches 0 --prefix /Users/anna/ --suffix ".fq"
Map reads with map_and_pool_elegans.py ⁃ conda activate py2 ⁃ python map_and_pool_elegans.py /directory/BC.fq /output_directory/
Get normalized log ratios with log_ratios.py
Combine replicates with 5_combine_replicates.py
Remove quotes with: ⁃ sed ’s/“//g’ input.txt > output.txt
Remove n2/ed3077 before WBGene with:
- sed’s/n2//g’ input.txt > output.txt
Filter ED/N2 genes in common with prepare_wilcoxon_dfs.py
Get longest insertion count to input into R script for each file (+1): awk -F '[,]' '{print $1, NF-1}' /Users/anna/Documents/buck/rhseq/ed_combined_reps_n2_overlap.txt | sort -k2 -n
In wilcox_matrix.R: replace col.names=paste0("V",seq_len(71)) with longest insertion count +1 from step 10 for ed and n2 respectively run wilcox matrix

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
5_combine_replicates.py		5_combine_replicates.py
README.md		README.md
log_ratios.py		log_ratios.py
map_and_pool_elegans.py		map_and_pool_elegans.py
prepare_wilcoxon_dfs.py		prepare_wilcoxon_dfs.py
wilcox_matrix.R		wilcox_matrix.R