Skip to content

aviadtamir/pystringmatcher

Repository files navigation

badge PyPI version fury.io PyPI pyversions Ask Me Anything !

pystringmatcher

description

a small utility tool for finding substrings and text patterns in an input file the tool is cutting the text in the file into chunks and processes every chunk in a separate process using python's multiprocessing module

installation:

pip install pystringmatcher

usage:

  • using the python module
python -m py pyringmatcher -h

Finding text patterns in input text file

optional arguments:
  -h, --help            show this help message and exit
  -f FILE_PATH, --file FILE_PATH
                        the input file to search the patterns in
  -p PATTERNS, --patterns PATTERNS
                        the pattern\s to search in the file separated by ,
  -n NUM_LINES_PER_CHUNK, --num-lines NUM_LINES_PER_CHUNK
                        the number of lines per chunk of text from the input file
  • or by using the included console script
stringmatcher -h 
  • In your own program
from pystringmatcher.Algorithms import RabinKarp
from pystringmatcher.Types import TextFile


try:
    text = TextFile(file_path="/path/to/file")
    algorithm = RabinKarp()
    chunks = text.divide_into_chunks(num_of_lines_each_chunk=1000)
    patterns = "alpha,beta,charlie,delta,echo,foxtrot".split(",")
    print(f"[X] - Start finding the patterns : {patterns} in the file: {text}")
    matches = text.find_matches(chunks=chunks, patterns=patterns, algorithm=algorithm)

    if matches:
        print("Found matches")
        print(matches)

    print("No matches were found")
except FileNotFoundError:
    print(f"The file: {text} was not found and may not exist")
  • Implementing your own matching algorithm
from pystringmatcher.Algorithms import Algorithm
from pystringmatcher.Types import Match


class MyAlgorithm(Algorithm):
    
    def preprocess(self, pattern, text, *args, **kwargs):
        """some preprocess logic goes here if needed"""
    
    def run(self, pattern, text, *args, **kwargs):
        matches = []
        """the mathcing algorithm logic goes here
        for any match: matches.append(Match(char_offset=start_index_of_match)) 
        """         
        return matches
        

About

a python package and utility for text patterns matching in text file using known algorithms

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages