Skip to content
This repository has been archived by the owner on Oct 8, 2019. It is now read-only.
/ search-engine-oop Public archive

The task of Advanced Programming class on 2019/04/23. Text search engine using inversed index and TF-IDF in C++.

License

Notifications You must be signed in to change notification settings

Menci/search-engine-oop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search Engine

The task of Advanced Programming class on 2019/04/23. Text search engine using inversed index and TF-IDF in C++.

Requirements

  • GCC (>= 8), Clang (>= 6)
  • CMake (>= 3.0)
  • Boost
  • GNU Readline
  • Python3 (>= 3.4)
  • jieba

Build

Replace clang++ with your compiler.

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++ ..
make
# Compiled program will be in `bin/search-engine`

Usage

Build and run this project, you'll get a REPL.

Crawl a website with web-crawler (or another crawler), put only HTML documents (path preserved) in a directory. Build index database with:

>>> ADD-INDEX documents-directory

Then start search with:

>>> QUERY keywords

Notices

This project is only aimed to studying OOP and the structure of Lucene. No optimized algorithm is used so it's very slow. Serialized data is in XML format so it costs a lot time to load data and large space to store, so don't use this project with non-study purpose.

About

The task of Advanced Programming class on 2019/04/23. Text search engine using inversed index and TF-IDF in C++.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published