This project is an effort to make the hefty job of going through tons of books, images, and reference materials in search of our desired answers or references much easier.
Using the CDQA architecture, We can scan through multiple sources at the same time and NLP is used over it to find the most relevant document as well as the most relevant answer to the query asked.
We allow users to upload data in the forms of Images, Text doucments, and PDFs. These are then converted to text. The TF-IDF features based on uni-grams and bi-grams and the cosine similarity is computed between the Query asked and the Documents available in the database to select the most relevant Document. We then use BERT with the paragraphs of the most relevant document selected and the Query provided to find the most relevant part of the paragraph that has the most potential to be our desired answer.
- torch==1.7.1
- django
- cdqa
- transformers==2.0
- pydub
- BERT_Squad1.1 [Model]
- pytesseract
- PIL
- fpdf
- tika
- markdown
- prettytable
- wget
NOTE : Tested on Python 3.6.8
pip install torch==1.7.1
pip install django
pip install cdqa
pip install transformers==2.0
pip install pydub
pip install BERT_Squad1.1 [Model]
pip install pytesseract
pip install PIL
pip install fpdf
pip install tika
pip install markdown
pip install prettytable
pip install wget
git clone https://github.com/abhishekbhatt209/Hackathon2021.git
cd Hackathon2021
Additional software: Tesseract
python manage.py runserver
- Django - For Django references
- Unsupervised Question Answering by Cloze Translation
- BERT FineTuning - Reference article
- BERT FineTuning - Reference article
- cdQA - cdQA Architecture reference
- GPT3 for Response - Understanding Emails and Drafting Responses: An Approach Using GPT-3
- PyTesseract - For OCR
- PIL - PIL references
- Pyfpdf - PDF Generation
👤 Abhishek Bhatt - Ganpat University - @abhishekbhatt209
👤 Kunal Malvi - Ganpat University - @kunalmalvi18
👤 Anil Prajapati - Ganpat University - @anilprajapati22
👤 Viswash Mehta - Ahmedadad Institute of Technology - @ViswashMehta
👤 Sunny Kushwaha - Ahmedadad Institute of Technology - @Ares358