Skip to content
forked from jspenger/ACME

Implementation of the ACME / CAST motifs extraction algorithm, i.e. finding repeating patterns in sequences such as strings.

License

Notifications You must be signed in to change notification settings

AnasGhareib/ACME

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ACME

This is a reimplementation of the serial execution (not the parallel) (and not restricted to left or right-maximal motifs) algorithm (ACME / CAST Motifs Extraction) proposed in:

Sahli, Majed, Essam Mansour, and Panos Kalnis. "ACME: A scalable parallel system for extracting frequent patterns from a very long sequence." The VLDB Journal 23.6 (2014): 871-893.

Prerequisites

  • SeqAn - The Library for Sequence Analysis (v. 2.3.2 tested)
  • Boost
  • CMake

Installing

cd build;
cmake -DCMAKE_BUILD_TYPE=Release ..;
make;
cd ..;

or

sh build.sh;

Testing

Run test script:

sh test.sh;

Examples

Find and print all approximate motifs (output: motif : frequency : [list of occurrences]) of inputfilename.txt having minimum frequency 2 (at least 2 occurrences in inputfilename) and maximum distance 1 (the approximate matches are at most hamming distance 1 from the motif) from the motif.

./build/ACME -i test/test_sequence.txt -f 2 -d 1

For more information:

./build/ACME -h

Author

  • Jonas Spenger

License

  • This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • The software was developed as part of the Bachelor's thesis in 2017 at Humboldt University of Berlin.

About

Implementation of the ACME / CAST motifs extraction algorithm, i.e. finding repeating patterns in sequences such as strings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 88.0%
  • C 8.0%
  • CMake 2.4%
  • Shell 1.6%