The project is a Python-based tool that uses various natural language processing (NLP) and computer vision techniques to extract and summarize textual content from various sources such as images, PDFs, and websites. It makes use of several libraries such as NLTK, docx, bs4, cv2, pytesseract, and tika to preprocess the input data and generate a concise and relevant summary.
-
Clone the repository
git clone https://github.com/4bdul4ziz/GraphInsight.git
-
Install the required packages using pip -
nltk
-docx
-bs4
-cv2
-pytesseract
-tika
Note: pytesseract requires Tesseract OCR to be installed in the system. Please follow the installation instructions for your specific operating system.
-
Run the main.py file
python main.py
-
Enter the path to the input file, make sure to have the files in the same directory as the main.py file.