Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
89 lines (78 sloc) 4.63 KB

Useful tricks and tipps for Galaxy users

"You must feel the Force around you." Yoda

Text processing

  • Convert comma separated files into tab-separated files
    Convert delimiters to TAB
  • FASTA files with unique sequences
    FASTA-to-TabularUnique occurrences of each record (advanced parameters) → Tabular-to-FASTA
  • Remove sequences with N or any other character
    FASTA-to-TabularFilter data on any column using simple expressions with
    (condition: c2.find('N') != -1) → Tabular-to-FASTA
  • Extracting the 3rd column from a 5 column file
    Cut columns from a table with c3
  • Reorder columns or column swap
    Cut columns from a table with c3,c2,c1
  • Count how often one entry appears in column 1
    Datamash with Group by fields: 1 and Operation to perform: count
  • Remove all lines that contain a character (comma in this case)
    Text transformation with sed with SED Programm: /,/d
  • Group all rows where column 1, 4 and 5 are identical
    Datamash with Group by fields: 1,4,5
  • Column-to-rows and rows-to-columns (transpose matrix)
    Transpose rows/columns
  • Make your files smaller, e.g. for testing; subsampling of files
    Select random lines from a file
  • Make your sequence files smaller, e.g. for testing; subsampling sequences
    Sub-sample sequences files
  • Merge two files together according to one column in every file
    Join two files
  • Add unique column
    Add column to an existing dataset with iterate: Yes
  • Get rid of all rows where column 2 has values greater than 0
    Filter data on any column using simple expressions with c2>0
  • Get all rows where column 4 starts with hsa
    Filter data on any column using simple expressions with c4.startswith('hsa')
  • Get rid of all rows where the sum of column 2 and 3 is greater than 10
    Filter data on any column using simple expressions with c2+c3>10
  • Get rid of all rows where the length of my text in column 2 is greater than 10
    Filter data on any column using simple expressions with len(c2)>10
  • Create new rows for every comma separated value in column 3; Unfolding
    Unfold columns from a table with Column 3 and Comma
  • Split the first four characters of a line into it's own column
    Replace Text in entire line with Find Pattern: ^(.{4}) and Replace Pattern: &\t
  • Add the basepairs "TA" to the end of each sequences
    FASTA to TabularAdd column with TAMerge ColumnsCut columnsTabular to FASTA
  • Add a quotation mark to every row
    Compute an expression on every row with chr(34) (34 is the ASCII code for ")
  • Count all columns with numbers that do not contain 0. Usefull if you want to calculate the mean but want to exclude all columns that are 0.
    Compute an expression on every row with bool(c1) + bool(c1) + bool(c3) ...
  • Calculate log2 (not log10) from a column (e.g. c1) adding a new column Compute an expression on every row with log(c1,2)

HTS

  • Map RNA-seq data
    HISAT or TopHat
  • Map DNA-seq data
    Bowtie or BWA
  • Map methylC-seq data
    Bismark
  • Downsample BAM/SAM files
     BAM/SAM Mapping Stats will give you the number of reads/read pair in your BAM file in case you don't know it already. Then you just divide the number of reads you want to downscale to with the number of reads you have and use this fraction as the probability in Picard – Downsample SAM/BAM.
  • Get all genes that are covert by reads
    htseq-count with a gene annotation GTF file on your BAM file → Filter data on any column using simple expressions with c2>0
  • Extract sequences from intercal files, like gff, bed, gtf. Returning FASTA file →
    Extract Genomic DNA using coordinates from assembled/unassembled genomes

Workflows

More Resources

Disclaimer

All tools mentioned here are available from the Galaxy Tool Shed. Kindly ask your Galaxy Administrator to get access to them.