This project makes use of powerful python library of tika to extract text from files of several formats (pdf,doc etc).
Install tika library using command in linux:
pip install tika
It then connects with elasticsearh,creates an index and inserts data into the index.
It uses elasticsearch python library to connect and perform operations on elasticsearch.
Install elasticsearch library using below command in linux:
pip install elasticsearch