GitHub - Meghana-rao/Document-Classifier

Document-Classifier

Liabraries needed include pdf2image, PyPDF2, os, numpy, subprocess, csv, sklearn, cv2, tensorflow

Files to download

retrained_graph.pb
retrained_labels.txt
count_ops.py
evaluate.py
graph_pb2tb.py

__init__.py

label_image.py
retrain.py
quantize_graph.py
show_image.py
net_tester.py

To retrain the model

python scripts/retrain.py  --image_dir ./tf_files/train --output_graph=tf_files/retrained_graph.pb --output_labels=tf_files/retrained_labels.txt --how_many_training_steps=1000

Adjust the training steps accordingly.

To test it on unlabelled data (single file)

python3 scripts/label_image.py --graph=tf_files/retrained_graph.pb --image=test.png

To test it on labelled data and drawing statistics (should have a .csv file with labels and file name)

python3 tester.py

Adjust src_label, src_graph, folder_name, csv_file file locations

The tester.py returns the classification report and the confusion matrix.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
README.md		README.md
__init__.py		__init__.py
cannycropper.py		cannycropper.py
count_ops.py		count_ops.py
crop_split.py		crop_split.py
cropper.py		cropper.py
evaluate.py		evaluate.py
file_label.py		file_label.py
graph_pb2tb.py		graph_pb2tb.py
label_image.py		label_image.py
net_tester.py		net_tester.py
quantize_graph.py		quantize_graph.py
requirements.txt		requirements.txt
retrain.py		retrain.py
retrained_labels.txt		retrained_labels.txt
show_image.py		show_image.py
tester.py		tester.py

Meghana-rao/Document-Classifier

Folders and files

Latest commit

History

Repository files navigation

Document-Classifier

Files to download

To retrain the model

To test it on unlabelled data (single file)

To test it on labelled data and drawing statistics (should have a .csv file with labels and file name)

About

Resources

Stars

Watchers

Forks

Languages