I have made this project to check the level of my Resume before applying for any job. For this I have used Google Gemini Pro. To use this model we will be need the Google API KEY which we can get from the following website :- https://makersuite.google.com/app/apikey
From this website go to "Create API Key in new project" and copy paste the API KEY in the code.
in PDF2Image.py file I have converted the resume (in PDF format) into Images. To use the Gemini model we have to setup the configuration :- genai.configuration(api_key=“your_api_key”).
Then we will create a function to get the responses from Gemini by giving the “input” which will be the job description in which we want to apply, “pdf_content” the resume in PDF format, and “prompt” exactly what we want from the model either to get the ATS score, or to summarize the resume, or to get an idea about the candidate who has applied. Then inside the function we have to decide which model we will be using, model = genai.GenerativeModel(‘gemini-pro-vision’).
Then we will take the response from the model, response = model.generate_content([input, pdf_content[0], prompt]). Then we will return the response generated by the model, return response.text. The prompt will be playing a very important role for generating the response as for LLM it is necessary to have a good prompt so we have to play with the prompt, perform some prompt engineering so as to improve the response.
Then we will make another function, to convert the PDF into image (JPEG format) so that the input to the model can be in an appropriate format, images = pdf2image.convert_from_bytes(upload_file.read()). From these images the first page will be the content of the entire PDF, first_page = images[0]. To convert into bytes we will use the io and base64 libraries of python. We will convert the PDF into bytes and then save it in the form of images. Also try to add the condition that if the uploaded PDF is empty or not uploaded then raise error “FileNotFoundError”.
Then we will make the web application using streamlit, we will make a text area in which the user can write the job description and then we will use this as input to generate responses. Then we will create a file uploader to upload the resume in PDF format type=[“pdf”]. Then depending on our usage we can add different features like “Describe the Resume”, “How can I improve my Skills”, “Percentage Match”, etc. and for each of these features we have to write an appropriate prompt.
Then to show the response if feature 1 is selected then check whether the PDF is uploaded or not, the to extract the pdf_content use the second function that we have created above, then to generate response use the first function by passing input_prompt of feature 1, pdf_content (output of second function), and input (job description). Use st.write(response) to show the response. Then do the same for the other features also.
The library that we are using to convert pdf into images “pdf2image” will throw an error, it won’t be able to convert the pdf into image due to a missing file. To resolve this error we need to download the poppler-windows. Download and extract the poppler folder in windows from this link https://github.com/oschwartz10612/poppler-windows/releases/ . After extraction we can see two folders “Library” and “share”, copy these folders and in C drive in Program Files(x86) create a poppler folder and paste these two folders. From the library folder go to the bin and copy the bin path and add it in the Environment Variables.
Environment Variables → System Variables → PATH → new → copy → OK
In PDF2Text.py file instead of converting PDF to image we will convert PDF into text. For this we will be using the PyPDF2 library. The first function will be the same to get the response but the second function will be modified. In the second function we will first read the pdf, pdf.PdfReader(uploaded_file). Then in the reader variable there will be multiple pages so we will explore every page and extract all the information, page = reader.pages[page]. And put the information into the text variable, text = str(page.extract_text()). Then we will make a prompt template. And rest we will follow the same procedure.
Using the same code just by changing the prompt we can use this model to summarize Research Papers for us. The results can be seen in ResearchPaperSummary.py file.