motif_analysis

Python script for motif analysis for Cserhati, 2019

This script is run on a whole genome sequence, and calculates the score for all possible motifs of length n, ordered in lexicographic order. Since there are four bases, there are 4ⁿ possible motifs. The score of any given motif is S = (O - E)/(O + E), where O is the observed occurrence of a given motif, and E is its expected occurrence based on the base pair background distribution, which is also determined by the script. S then takes a value between -1 and 1 for under-represented and over-represented motifs, respectively.

The script is run the following way:

python motif_analysis.py -i <inputfile> -o <outputfile> -s <species name> -n <motif length>

where the flags mean the following things:

-i : the input whole genome sequence in fasta format.
-o : the output statistics file with the number of contigs, the bp distribution, and the motif sequence, the observed and expected occurrences as well as the score value.
-s : the species name
-n : the motif length (usually between 5 and 12 bp)

A sample of the output file is as follows:

#Species No. chr. Genome length A% C% G% T%
#GCF_002217835.1_Dobs_1.0_genomic.fna 1935 181868570 0.278804967467 0.221933285975 0.222036923996 0.277224822562
#Motif Observed Expected Score
AAAAAAA 257119 23815.5822242 0.83045460594
AAAAAAC 57298 18957.5905639 0.502788177924
AAAAAAG 51056 18966.4433466 0.458275306026
AAAAAAT 88520 23680.6058956 0.577888092376
AAAAACA 63707 18957.5905639 0.541337096463
AAAAACC 29016 15090.5502374 0.315722941096
AAAAACG 23060 15097.5971964 0.208671493717

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
S1_Fig.tiff		S1_Fig.tiff
S1_File.xlsx		S1_File.xlsx
S2_Fig.tiff		S2_Fig.tiff
S2_File.xlsx		S2_File.xlsx
S3_File.xlsx		S3_File.xlsx
S4_Figa.tiff		S4_Figa.tiff
S4_Figb.tiff		S4_Figb.tiff
S4_File.xlsx		S4_File.xlsx
S5_Figa.tiff		S5_Figa.tiff
S5_Figb.tiff		S5_Figb.tiff
S5_File.xlsx		S5_File.xlsx
S6_File.xlsx		S6_File.xlsx
S7_File.xlsx		S7_File.xlsx
S8_File.xlsx		S8_File.xlsx
SFig3_10092019_k7.pdf		SFig3_10092019_k7.pdf
SFig3_10092019_k9.pdf		SFig3_10092019_k9.pdf
motif_analysis.py		motif_analysis.py
motif_analysis_k-1.py		motif_analysis_k-1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

motif_analysis

About

Releases

Packages

Languages

csmatyi/motif_analysis

Folders and files

Latest commit

History

Repository files navigation

motif_analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages