# OLT Answer Checker
This code provides you a way to compare image of question that you provide with the existing images of solutions available and find the closest match. So that you don't have to scroll hundreds of questions before finding your answer.


##Prerequisites:
1. Install tesseract: 
    1. Install tesseract using windows installer available at: https://github.com/UB-Mannheim/tesseract/wiki
    2. Note the tesseract path from the installation.Default installation path at the time the time of this edit was: C:\Users\USER\AppData\Local\Tesseract-OCR. It may change so please check the installation path.
    3. pip install pytesseract
2. Install Levenshtein
    1. pip install python-Levenshtein

###Steps:
1. Put screenshots of solutions in data folder.
2. Put screenshot of  question in input folder.
3. Run Code 
4. Select `Create dataset` to scan images present in data folder.
5. Select `Provide input image` to scan images present in input folder.
6. Select `Find Answer` to search solution of your question.
7. Select `Refresh dataset` , if new image is added to data folder.



###Code:

In [0]:
'''importing neccessary packages'''
# Levenshtein distance is a string metric for measuring the difference between two sequences
import Levenshtein
# PIL is the Python Imaging Library which provides the python interpreter with image editing capabilities
from PIL import Image
# The OS module in python provides functions for interacting with the operating system
import os
#Tesseract is an optical character recognition engine for various operating systems.
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'

'''This function is used to covert all available images to text'''
def create_dataset():
    data_dict={}
    #accessing images in data folder and trasferring to list  
    img_lst=os.listdir("data")
    img_lst=img_lst[1:]
    print("-----Getting images----")
    print("--wait--")
    count=0 #used to count number of images
    for i in img_lst:
        print(i)
        count=count+1
        loc="data\\\\"+i+""
        #converting image to text 
        actual_txt=pytesseract.image_to_string(loc,lang='eng')
        #removing extra lines
        actual_txt=actual_txt.replace('\n',' ')
        #storing aquired data in a dictionary
        data_dict[i]=actual_txt
    print("number of images read=",count,"\n")
    return data_dict

'''This function is used to get input images'''
def give_input():
    data_dict={}
    #accessing images in input folder and trasferring to list 
    img_lst=os.listdir("input")
    print(img_lst,"\nThis is your input image")
    for i in img_lst:
        loc="input\\\\"+i+""
        #converting image to text
        actual_txt=pytesseract.image_to_string(loc,lang='eng')
        #removing extra lines
        actual_txt=actual_txt.replace('\n',' ')
        #storing aquired data in a dictionary
        data_dict[i]=actual_txt
    return data_dict


'''This function is used to show image to user '''    
def show_img(path):
    try:  
        # an object of Image type is returned and stored in img variable
        img  = Image.open(path)  
    except IOError: 
        pass
    img.show()
    
'''This fuction is used to find closest match tothe input image'''
def check(input_img,data):
    for i in input_img:
        inp_txt=input_img[i]
    min_len=len(inp_txt)
    ans=""
    for i in data:
        #computing Levenshtein distance
        diff=Levenshtein.distance(inp_txt,data[i])
        if diff <= min_len:
            min_len=diff
            ans=i
    path="data\\\\"+ans
    #showing image
    show_img(path)

    
def main():
    chk=1
    while(chk!=0):
        print("Which operation do you want to perform?")
        print("1. Create Dataset")
        print("2. Refresh Dataset")
        print("3. Provide Input Image")
        print("4. Find Answer")
        qry=int(input())
        if qry==1:
            data=create_dataset()
        elif qry==2:
            data=create_dataset()
        elif qry==3:
            input_img=give_input()
        elif qry==4:
            check(input_img,data)
        else:
            print("invalid input")
        print("Do you wish to continue? 0/1")
        chk=int(input())
        
main()

###Refrences:
1. Calculating String Similarity in Python
https://towardsdatascience.com/calculating-string-similarity-in-python-276e18a7d33a
2. Extract text with OCR for all image types in python using pytesseract
https://medium.com/@MicroPyramid/extract-text-with-ocr-for-all-image-types-in-python-using-pytesseract-ec3c53e5fc3a
3. Working with Images in Python
https://www.geeksforgeeks.org/working-images-python/?ref=rp
4. Idea credit goes to Aryaveer Bajpai