Skip to content

atthakorn/search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search Engine in Go

MIT license Build Status

This is experimental project baked with go-colly and blevesearch to build search engine in Go.

Prerequisite

To use package in blevex (blevesearch's extension), it is required to install C dependency as following steps.

This is verified against Debian GNU/Linux 8.11 (jessie)

$ sudo apt-get install libleveldb-dev libstemmer-dev libicu-dev build-essential
$ cd ~
$ git clone https://github.com/blevesearch/cld2.git
$ cd cld2/internal/
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib

Minimum requirement for ICU analyzer and tokenizer, libicu-dev only is required. Thus, go ahead and simply install it by issuing this command in prompt

$ sudo apt-get install libicu-dev

Installation

Use git clone or go get to download project to your go workspace in $GOPATH then run dep ensure to initialise project.

$ go get github.com/atthakorn/search-engine
$ cd $GOPATH/src/github.com/atthakorn/search-engine
$ dep ensure

Config

Here is the list of parameter you can find in .config.yml

# sites to crawl
entryPoint:
- https://en.wikipedia.org/wiki/Main_Page


# max depth for crawler to follow
maxDepth: 2

# max worker
parallelism: 1

# random delay (second)
delay: 1


# path to store scraped data from crawler
dataPath: "data/crawl"

# path to store indexed data
indexPath: "data/index"


# http address, 0.0.0.0:8080 or :8080, it means listening all ipv4 address in local machine
httpAddress: ":8080"

Crawling & Indexing

To crawling websites, just cd to project root and run

$ go run cmd/crawl/main.go

Once, crawl complete, the scraped data will be kept at ./data/crawl/*.json

Now you can index crawler's scraped data by issuing following command

$ go run cmd/index/main.go

The data will be indexed by boltdb, the file will be located at ./data/index/*

Start Server

Search Engine comes with simple search application , you can start http server by following command

$ go run cmd/http/main.go

and you can open application via browser at http://localhost:8080

License

This project is licensed under MIT License

Releases

No releases published

Packages

No packages published

Languages