Skip to content

adekunle-oOo/rummage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rummage

Rummage is an open, collaborative, and decentralised search mechanism for IPFS. A rummage node is able to crawl content on IPFS and add this to the index, which itself is stored in a decentralised manner on IPFS.

Rummage currenly supports parsing PDF's and any HTML files stored on IPFS.

How Rummage Works

Rummage divides the crawled data into two levels of indices.

  • High Level Index (HLI)
  • Key Word Index (KWI)

High Level Index stores the CID for the KWI of each keyword, then making this a highly indexable mesh. This data is constantly updated when a new craw is submitted to the network

We allow for two specific actions on the client

  • Search
  • Crawl

Search will be used to perform sophisticated searches, while Crawl will be used to submit crawl data by clients

Installation

git clone https://github.com/adekunle-oOo/rummage.git

Install dependencies

sudo apt-get install g++
sudo apt-get install autoconf automake libtool
sudo apt-get install autoconf-archive
sudo apt-get install pkg-config
sudo apt-get install libpng-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
wget http://www.leptonica.org/source/leptonica-1.81.1.tar.gz
sudo tar xf leptonica-1.81.1.tar.gz
cd leptonica-1.81.1 &&\
sudo ./configure &&\
sudo apt install make
sudo make &&\
sudo make install
sudo apt-get install tesseract-ocr # or sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

Install Packages

go get -t github.com/otiai10/gosseract
go get github.com/ipfs/go-ipfs-api
go get github.com/wealdtech/go-ens/v3
go get github.com/otiai10/gosseract/v2

Build

sudo go build Rummage/CLI/.

Run (BETA)

./CLI