Skip to content
PMS full-text search engine with no external dependencies written in C#
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Datasets/Simple
Src
.gitattributes
.gitignore
.travis.yml
LICENSE
README.md

README.md

PMS Full-Text Search Engine for .NET Core

License: MIT Travis Status

Full-Text Search Engine with no external dependencies written in C# for .NET Core.

The aim of this project is to showcase algorithms, data structures and techniques that are used to create full-text search engines.

Getting Started

On Windows:

  1. Download and build code. Use the following commands:

    dotnet restore
    dotnet build
  2. Open folder with binaries: bin\Debug\netcoreapp2.0

  3. Start the following command. Replace DATA_PATH with a path to Datasets folder

    run_test.bat DATA_PATH
  4. If everything goes well the following messages are printed:

    Log from index construction:

    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll index --input "F:\Sources\FullTextSearch\Datasets"
    
    PMS Full-Text Search (c) Petro Protsyk 2017-2018
    F:\Sources\FullTextSearch\Datasets\Simple\TestFile001.txt
    F:\Sources\FullTextSearch\Datasets\Simple\TestFile002.txt
    F:\Sources\FullTextSearch\Datasets\Simple\TestFile003.txt
    Indexed documents: 3, time: 00:00:00.1010004
    

    Dump of the index (for each term in the dictionary - the list of all occurrences):

    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll print
    
    PMS Full-Text Search (c) Petro Protsyk 2017-2018
    2017 -> [1,1,9]
    algorithms -> [1,1,19]
    and -> [1,1,20]
    apple -> [3,1,1]
    banana -> [3,1,2]
    build -> [1,1,25]
    c -> [1,1,16]
    data -> [1,1,21]
    demonstrate -> [1,1,18]
    ...
    

    Search with query WORD(pms):

    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll search --query "WORD(pms)"
    
    {filename:"TestFile001.txt", size:"180", created:"2018-04-02T10:09:41.4208444+02:00"}
    {[1,1,1]}
    
    {filename:"TestFile002.txt", size:"29", created:"2018-04-02T10:09:41.4248447+02:00"}
    {[2,1,1]}
    
    Documents found: 2, matches: 2, time: 00:00:00.0564721
    

    Lookup in the dictionary using a pattern i.e. all terms matching pattern:

    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll lookup --pattern "WILD(pet*)"
    petro-mariya-sophie
    Terms found: 1, time: 00:00:00.0704173
    
    dotnet Protsyk.PMS.FullText.ConsoleUtil.dll lookup --pattern "EDIT(projct, 1)"
    project
    Terms found: 1, time: 00:00:00.0847931
    

Query Language

  • WORD(apple) - single word
  • WILD(app*) - wildcard pattern
  • EDIT(apple, 1) - Levenshtein (edit distance, fuzzy search)

Conjunction operators

  • OR - boolean or
  • AND - boolean and
  • SEQ - sequence of words, phrase

Examples of queries:

  • AND(WORD(apple), OR(WILD(a*), EDIT(apple, 1)))
  • SEQ(WORD(hello), WORD(world))

Data Structures

Algorithms

References

Links

Publishing

MacOS

dotnet publish -c Release --self-contained -r osx.10.13-x64

Windows

dotnet publish -c Release --self-contained -r win10-x64
You can’t perform that action at this time.