# QuestionBankCreatorGimini

##### For academic institutions, tutors have a hug challenge that is to appropriately assess the students according to the presented course material. This simple tool targets to exploit the capabilities of the famous generative tool namely Google Gimini to create question banks. The input to this tool is a given course content which is usually presented as a list of pdf files or list of presentation files. The tool automatically extracts different text portions from course content and then appropriately call Gimini API to create different forms of questions with different deepness and dumps these questions into excel sheet form. The created questions have very interesting properties such as deepness, variety and clearness. So, a tutor in this course can easily create full exams -with simple randomization tool- based on the questions prepared sheet. Primarily results on sample course contents show very interesting and handy use with open-ended extensions.    

In [1]:
import google.generativeai as genai
import pathlib
import textwrap
from IPython.display import display
from IPython.display import Markdown

In [2]:
def to_markdown(text):# simple text editing function to format the output of Gimini
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))
key='AIzaSyATvEujMwu2uYQxce-3ZyEL4XqUEZiF6u4'
genai.configure(api_key=key)
model = genai.GenerativeModel('gemini-pro')

#### The list of operation parameters

In [50]:
coursePath="D:\\sampleCourse\\" #the path containing the course presentations
minTxtContent=100 #min text size(in characters) per slide to query for
nMCQQuestionsPerSlid=2;#number of questions of type MCQ per slide
mxQperLec=3 #maximum number of questions allowed per presentation
questionStyle=['MCQ', 'true and false questions','essay questions']# list of styles

In [51]:
from pptx import Presentation
import glob

### The main function

##### This part of the code extracts text from input file and pass it continiously to Gimin with appropriate question prompt and collect the results

In [64]:
questionBatch=[]
WeekN=1
for eachfile in glob.glob(coursePath+"*.pptx"):#for each file in the course
    prs = Presentation(eachfile)
    print(eachfile)
    print("_"*len(eachfile))
    nQperLec=0
    for slide in prs.slides:#for each slide
        if nQperLec>mxQperLec:
            break
        for shape in slide.shapes:
            if hasattr(shape, "text"):
                if len(shape.text)<minTxtContent:# too short text to query for
                    continue
                for styl in questionStyle:
                    response = model.generate_content("generate "+str(nMCQQuestionsPerSlid)+' '+styl+" with answers from "+shape.text)
                    questionBatch.append([response.text, WeekN])
                nQperLec=nQperLec+1
    WeekN=WeekN+1

D:\sampleCourse\1- Introduction to AI.pptx
__________________________________________
D:\sampleCourse\2-Search.pptx
_____________________________
D:\sampleCourse\3-State value.pptx
__________________________________


In [65]:
from pypdf import PdfReader
for eachfile in glob.glob(coursePath+"*.pdf"):#for each file in the course
    prs  = PdfReader(eachfile)
    print(eachfile)
    print("_"*len(eachfile))
    nQperLec=0
    for ind,slide in enumerate(prs.pages):#for each slide
        page = reader.pages[ind] 
        text = page.extract_text()
        if nQperLec>mxQperLec:
            break
        if len(text)<minTxtContent:# too short text to query for
            continue
        for styl in questionStyle:
            query="generate "+str(nMCQQuestionsPerSlid)+' '+styl+" with answers from "+text
            response = model.generate_content(query)
            questionBatch.append([response.text, WeekN])
        nQperLec=nQperLec+1
    WeekN=WeekN+1

D:\sampleCourse\01_DNN.pdf
__________________________
D:\sampleCourse\M818-B_03_RNN.pdf
_________________________________


##### This part of the code applies simple text processings and format the questions and dump to excel sheet form

In [66]:
def exportQuestionsToXLSX(coursePath,questionBatch,questionStyles):
    import re
    import xlsxwriter
    workbook = xlsxwriter.Workbook(coursePath+'questionBank.xlsx')
    for sIndex,style in enumerate(questionStyles):
        worksheet = workbook.add_worksheet(style)
        nQ=1
        for ind,st in enumerate(questionBatch):
            if not ((ind%len(questionStyles))==sIndex):
                continue
            qs = re.split(r"\*\*MCQ", st[0])
#             if len(qs)<nMCQQuestionsPerSlid:#if not appropriatly found questions' tags
#                 continue
            for q in qs:
                if len(q)<3:#question with answer text is too short
                    continue
                quest,choices,answ=splitQuestionTxt(q)
                if len(quest)<2:#too short question
                    continue
                worksheet.write('A'+str(nQ), str(st[1]))#adding week no to xl file
                worksheet.write('B'+str(nQ), str(q))#adding the question text
                nQ=nQ+1
    workbook.close()

In [67]:
exportQuestionsToXLSX(coursePath,questionBatch,questionStyle)