Skip to content

We are leveraging AI to combat with a solution to tackle the problem of reading long scripted pdf for finding a particular answers to questions.

Notifications You must be signed in to change notification settings

abhishekbhatt209/Hackathon2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BotRead

This project is an effort to make the hefty job of going through tons of books, images, and reference materials in search of our desired answers or references much easier.

Using the CDQA architecture, We can scan through multiple sources at the same time and NLP is used over it to find the most relevant document as well as the most relevant answer to the query asked.

We allow users to upload data in the forms of Images, Text doucments, and PDFs. These are then converted to text. The TF-IDF features based on uni-grams and bi-grams and the cosine similarity is computed between the Query asked and the Documents available in the database to select the most relevant Document. We then use BERT with the paragraphs of the most relevant document selected and the Query provided to find the most relevant part of the paragraph that has the most potential to be our desired answer.

Packages used

  • torch==1.7.1
  • django
  • cdqa
  • transformers==2.0
  • pydub
  • BERT_Squad1.1 [Model]
  • pytesseract
  • PIL
  • fpdf
  • tika
  • markdown
  • prettytable
  • wget

Installation

NOTE : Tested on Python 3.6.8

pip install torch==1.7.1
pip install django
pip install cdqa
pip install transformers==2.0
pip install pydub
pip install BERT_Squad1.1 [Model]
pip install pytesseract
pip install PIL
pip install fpdf
pip install tika
pip install markdown
pip install prettytable
pip install wget

git clone https://github.com/abhishekbhatt209/Hackathon2021.git
cd Hackathon2021

Additional software: Tesseract

Usage

python manage.py runserver

References

Contributers

👤 Abhishek Bhatt - Ganpat University - @abhishekbhatt209

👤 Kunal Malvi - Ganpat University - @kunalmalvi18

👤 Anil Prajapati - Ganpat University - @anilprajapati22

👤 Viswash Mehta - Ahmedadad Institute of Technology - @ViswashMehta

👤 Sunny Kushwaha - Ahmedadad Institute of Technology - @Ares358

License

MIT

About

We are leveraging AI to combat with a solution to tackle the problem of reading long scripted pdf for finding a particular answers to questions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published