# <font color="Blue">Abstract</font>

In this notebook, we delve in using [Gradio](https://www.gradio.app/) to let people interact with our first draft of app.

**<font color="red">Our goal</font>** is to explore how to use Gradio and how to make our model work in graio.  

We used following knowledge:  
Useful links:
1. [Gradio Playground](https://www.gradio.app/playground)
2. [Gradio Docs](https://www.gradio.app/docs/interface)

In [1]:
!pip install gradio
!pip install transformers
import gradio as gr

Collecting gradio
  Downloading gradio-3.50.2-py3-none-any.whl (20.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.3/20.3 MB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.104.0-py3-none-any.whl (92 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.9/92.9 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.3.1.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client==0.6.1 (from gradio)
  Downloading gradio_client-0.6.1-py3-none-any.whl (299 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m299.2/299.2 kB[0m [31m25.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx (from gradio)
  Downloading httpx-0.25.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Example

In [2]:
def greet(name):
    return "Hello " + name + "!!"

demo = gr.Interface(fn=greet, inputs="text", outputs="text")

demo.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://bb722ad3fd50cc65a8.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




# Demo 1

## Set layout

In [37]:
from logging import PlaceHolder
import gradio as gr

with gr.Blocks() as demo:
    with gr.Row():
        text1_1 = gr.Textbox(label="Company Name*")
        text1_2 = gr.Textbox(label="Job Title*",
                           placeholder="The job title you are going to apply goes here ...")
        drop1_3 = gr.Dropdown(["Job Recruiter", "HR Manager", "Hiring Manager",
                             "Executives", "Potential Coworkers"],
                            label="Who do you want to interview with?*")
    with gr.Row():
        with gr.Column():
          text2_1 = gr.Textbox(label="Company Info (Optional)",
                               placeholder="Sepcial information about this company you want to include")
          text2_2 = gr.Textbox(label="Job Description (Optional)",
                               placeholder="Paste the job description here ...")
          text2_3 = gr.Textbox(label="Interviewer's LinkedIn URL (Optional)",
                               placeholder="Paste the URL here ...")

    with gr.Row():
        with gr.Column():
          with gr.Tab("Your Resume*"):
            file4_1 = gr.File(file_types=[".pdf"])
          with gr.Tab("Cover Letter"):
            file4_2 = gr.File(file_types=[".pdf"])
          btn = gr.Button("Resume Review")
    with gr.Row():
        textout = gr.Textbox(label="Feedbacks for your resume")

## Output in gradio

In [38]:
demo.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://8affdb01d792b5af36.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




# Demo 2

## Setup packages and API key

In [40]:
# 1. Download required packages to Colab

!pip install -q langchain
!pip install -q openai
!pip install -q chromadb
!pip install -q tiktoken
!pip install -q duckduckgo-search

# 2. Import packages

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts import PromptTemplate
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain, RetrievalQA
from langchain import ConversationChain
from langchain.agents import load_tools, initialize_agent, AgentType
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.tools import DuckDuckGoSearchRun
from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper
import os
from getpass import getpass

from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.schema import HumanMessage, AIMessage, SystemMessage

import textract

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.3/43.3 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.[0m[31m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m479.8/479.8 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m13.8 MB/s

In [41]:
# 3. Setup OpenAI API key

openai_api_key = getpass()
os.environ["OPENAI_API_KEY"] = openai_api_key

··········


## Define resume_review function

In [203]:
def resume_review(comp_name, job_title, interviewer_title,
                  comp_info="", job_des="", interviewer_url="",
                  your_resume=""):

  #### Step 1: Create prompt ####
  # 1. Define variables in prompt
  input_var = ["interviewer_role", "company_name", "resume", "role_title", "role_description", "company_description"]

  # 2. Define prompt template
  template = """
  You are a {interviewer_role}, for {company_name}. Please review the following resume {resume}.
  For the role of {role_title}.
  (optional: Here is the job description: {role_description})
  (optional: Here is a description of the company: {company_description})
  Give point by point feedback with rationale and suggested edits.
  """

  # 3. Create prompt template
  prompt = PromptTemplate(
    input_variables = input_var,
    template = template
  )
  ###############################

  #### Step 2: Read resume PDF to text ####
  extracted_resume = textract.process(your_resume.name, method='pdfminer')
  #########################################

  #### Step 3: Format prompt with user input ####
  new_prompt = prompt.format(interviewer_role=interviewer_title,
                            company_name=comp_name,
                            resume=extracted_resume,
                            role_title=job_title,
                            role_description=job_des,
                            company_description=comp_info)
  ###############################################

  #### Step4: Use chat model to answer the prompt ####
  chat = ChatOpenAI(model_name = "gpt-4")
  result = chat([HumanMessage(content=new_prompt)])
  ####################################################

  return result.content

In [204]:
demo2 = gr.Interface(
    fn = resume_review,
    inputs = [gr.Textbox(label="Company Name*"),
              gr.Textbox(label="Job Title*",
                         placeholder="The job title you are going to apply goes here ..."),
              gr.Dropdown(["Job Recruiter", "HR Manager", "Hiring Manager", "Executives", "Potential Coworkers"],
                          label="Who do you want to interview with?*"),
              gr.Textbox(label="Company Info (Optional)",
                         placeholder="Sepcial information about this company you want to include"),
              gr.Textbox(label="Job Description (Optional)",
                         placeholder="Paste the job description here ..."),
              gr.Textbox(label="Interviewer's LinkedIn URL (Optional)",
                         placeholder="Paste the URL here ..."),
              gr.File(file_types=[".pdf"])
    ],
    outputs = gr.Textbox(label="Feedbacks", lines=50))

## Output in gradio

In [205]:
demo2.launch(debug=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://7917f84a355f678e50.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7880 <> https://7917f84a355f678e50.gradio.live




# Appendix

## Compare different PDF reader

### PyPDF2 (not getting format)

In [42]:
!pip install PyPDF2

import PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [43]:
reader = PyPDF2.PdfReader('drive/MyDrive/sample_resume.pdf')

In [50]:
reader.pages[0].extract_text()

"EDUCATION \n EXPERIENCE Washington University School of Medicine – Institute for Informatics Data Science Intern, Advisor: Dr. Ruijin Lu      St. Louis, MO May 2023 – Aug. 2023 North American Prodromal Synucleinopathy (NAPS) Project \x9f Data Quality Control Automating: Implemented Python-based automation, reducing document download time by 99.2%, equivalent to saving a full week's work for one individual. Minimized manual & repetitive tasks, enhancing project efficiency by improving data quality, and aiding the group in identifying Alzheimer's Disease biomarkers across ten US sites. \x9f Database & Data Management: Leveraged Python and Rave to optimize database updates, achieved a 75% time reduction updating birth records. Integrated with Amazon Textract API to halve the time of handwritten text extraction, enhanced overall efficiency, accuracy, and expanded system scalability. Minimized human bias through automated handwritten text recognition. \x9f Collaboration & Problem Solving: 

### textract (get the format, but a lot format sign)

In [51]:
!pip install textract

import textract

Collecting textract
  Downloading textract-1.6.5-py3-none-any.whl (23 kB)
Collecting argcomplete~=1.10.0 (from textract)
  Downloading argcomplete-1.10.3-py2.py3-none-any.whl (36 kB)
Collecting beautifulsoup4~=4.8.0 (from textract)
  Downloading beautifulsoup4-4.8.2-py3-none-any.whl (106 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.9/106.9 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chardet==3.* (from textract)
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.4/133.4 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting docx2txt~=0.8 (from textract)
  Downloading docx2txt-0.8.tar.gz (2.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting extract-msg<=0.29.* (from textract)
  Downloading extract_msg-0.28.7-py2.py3-none-any.whl (69 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.0/69.0 kB[0m [31m6.1 MB/s[0m eta [36

In [52]:
text = textract.process('drive/MyDrive/sample_resume.pdf', method='pdfminer')

In [None]:
text

## Test how to extract text from PDF in gradio.File()

### Define test_read_pdf

In [187]:
def test_read_pdf(resume_dir):
  text = textract.process(resume_dir.name)
  return(text)

### Test whether it works in gradio

In [188]:
test_demo = gr.Interface(
    fn = test_read_pdf,
    inputs = gr.File(file_types=[".pdf"]),
    outputs = gr.Textbox(label="resume")
)

test_demo.launch(debug=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://ec9148c7e8ea6a6e2c.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7880 <> https://ec9148c7e8ea6a6e2c.gradio.live


