<a href="https://colab.research.google.com/github/EmicoBinsfinder/EPOCodeFestProject/blob/main/DrillDownV1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [19]:
#@title Configure OpenAI API key

# access your OpenAI API key

# installing llmx first isn't necessary but avoids a confusing error when installing openai
!pip install -q llmx
!pip install -q openai
from openai import OpenAI
import google.generativeai as genai
from google.colab import userdata


openai_api_secret_name = 'Test'
## @param {type: "string"}

try:
  OPENAI_API_KEY=userdata.get(openai_api_secret_name)
  OpenAIclient = OpenAI(
    api_key=OPENAI_API_KEY
  )
except userdata.SecretNotFoundError as e:
   print(f'''Secret not found\n\nThis expects you to create a secret named {openai_api_secret_name} in Colab\n\nVisit https://platform.openai.com/api-keys to create an API key\n\nStore that in the secrets section on the left side of the notebook (key icon)\n\nName the secret {openai_api_secret_name}''')
   raise e
except userdata.NotebookAccessError as e:
  print(f'''You need to grant this notebook access to the {openai_api_secret_name} secret in order for the notebook to access Gemini on your behalf.''')
  raise e
except Exception as e:
  # unknown error
  print(f"There was an unknown error. Ensure you have a secret {openai_api_secret_name} stored in Colab and it's a valid key from https://platform.openai.com/api-keys")
  raise e

### System Setup

In [20]:
!pip install gradio
!pip install elasticsearch
!pip install langchain



In [21]:
########## IMPORTING REQUIRED PYTHON PACKAGES ##########
import pandas as pd
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from transformers import AutoTokenizer, AutoModel
import torch
import math
import time
import csv
import pandas as pd
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('punkt')
import string
import gradio
import os
import pprint
from elasticsearch import Elasticsearch
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ElasticsearchChatMessageHistory
from uuid import uuid4
import os, sys
import json, csv

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [22]:
# Get my os environment
os.environ['ELASTICSEARCH_PASSWORD'] = 'l0ng-r4nd0m-p@ssw0rd'
pwd = os.environ["ELASTICSEARCH_PASSWORD"]

# Password for the 'elastic' user generated by Elasticsearch
ELASTIC_PASSWORD = pwd

# Found in the 'Manage Deployment' page
CLOUD_ID = "http://AnkarDev-Elasticsearch-1891076460.eu-west-2.elb.amazonaws.com:9200"

# Create the client instance
client = Elasticsearch(
    CLOUD_ID,
    basic_auth=("eogbomo", ELASTIC_PASSWORD),
    verify_certs=False
)

###Gradio App

In [23]:
example_query1 = {"size": 1,"sort": [{"publicationDate": {"order": "desc"}}],"query": {"bool": {"must": [{"match": {"applicants": "apple"}}]}}}
example_query2 = {"size": 1,"sort": [{"publicationDate": {"order": "desc"}}],"query": {"bool": {"must": [{"match": {"applicants": "apple"}}]}}}

In [48]:
def loadprevresponses():
  try:
    with open ('responses.json', 'r+') as file:
      try:
        data = json.load(file)
        datastr = list(data.values())
        datastr = ', '.join(datastr)
      except:
        print('Error loading responses')
        data = {}
  except FileNotFoundError:
    with open ('responses.json', 'w') as file:
      data = {}
      datastr = ''
      json.dump(data, file)
  return data, datastr

data, datastr = loadprevresponses()
print(data, datastr)

def saveresponse(input):
  history, historystr = loadprevresponses()

  history[f'Input{len(history)+1}'] = input
  with open ('responses.json', 'r+') as file:
    json.dump(history, file)


{} 


In [55]:
def DrillDown(input):

  history, historystr = loadprevresponses()
  saveresponse(input)

  input += historystr

  prompt = """You are an expert in translating natural language queries about patents into ElasticSearch Queries.
    Given a user input, create an Elasticsearch query enabling the user to return as many relevant patents as possible when querying in Elastic

    input: {input}

    """.format(input=input)

  additional_prompt="""
    Instructions:
    1. Generate Elasticsearch queries based on the provided natural language queries.
    2. Only use fields present in the mapping. If the user is asking about a field that is not in the mapping ignore it.
    3. Ensure that the generated queries follow Elasticsearch's query DSL syntax and structure.
    4. You can correct or reformulate the user's query if it has errors.
    5. Return all fields in your response when applicable.
    6. Make sure that the query only performs full text search when applicable i.e. don't use keyword search
    7. When returning the json portion of the answer, compress the json output removing spaces. Remove any mention of json in the output or triple backtick sand make sure that it's valid.
    8. Ensure that as many aspects of the user input are captired as possible

    Examples of expected behavior:
    Natural Language Query: "What is the title of the most recent Apple patent"
    Expected Elasticsearch Query:
    {
      "size": 1,
      "sort": [
        {
          "publicationDate": {
            "order": "desc"
          }
        }
      ],
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "applicants": "apple"
              }
            }
          ]
        }
      }
    }

    Natural Language Query: "What are the most recent methods to deal with cell group failure?"
    Expected Elasticsearch Query:
    {
      "query": {
        "bool": {
          "must": [
            {
              "bool": {
                "should": [
                  {
                    "match": {
                      "patentTitle": "cell group failure"
                    }
                  },
                  {
                    "match": {
                      "patentAbstract": "cell group failure"
                    }
                  },
                  {
                    "match": {
                      "claims.claimText": "cell group failure"
                    }
                  },
                  {
                    "match": {
                      "patentDescription": "cell group failure"
                    }
                  }
                ]
              }
            }
          ],
          "filter": [
            {
              "range": {
                "publicationDate": {
                  "gte": "now-5y/d"
                }
              }
            }
          ]
        }
      },
      "_source": ["*"]
    }"""
  prompt += additional_prompt

  completion = OpenAIclient.chat.completions.create(
  model="gpt-4-0125-preview",
  messages=[
  {"role": "user", "content": f'Your function is that of a bot optimised for summarising patent text. Answer the following query as accurately as possible based on your function {prompt}'}
  ]
  )
  response = completion.choices[0].message.content

  response = '\n'.join([response, f"{'#'*120} \n"])
  response = '\n'.join([response, f"History USED TO GENERATE RESPONSE:\n {history}"])

  return response

inputs = gradio.Textbox(lines=7, label="Generate Queries for use with Elastic Search, allowing for search refinement")
outputs = gradio.Textbox(label="Reply")

gradio.Interface(fn=DrillDown, inputs=inputs, outputs=outputs, title="Patent DrillDown Prototype",
             theme="compact").launch(share=True, debug=True)


Sorry, we can't find the page you are looking for.


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://d0413c1bc5e5fa2b0e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://d0413c1bc5e5fa2b0e.gradio.live




### Get search response

In [56]:
query = {
"query":{
"bool":{
"must":[
{
"bool":{
"should":[
{
"match":{
"patentTitle":"Aquablade"
}
},
{
"match":{
"patentAbstract":"liquid passing through the wiper blade"
}
},
{
"match":{
"claims.claimText":"projected from the blade onto the windshield"
}
},
{
"match":{
"patentDescription":"Aquablade"
}
}
]
}
},
{
"term":{
"jurisdiction":"CN"
}
}
],
"filter":[
{
"terms":{
"status":["pending","granted"]
}
}
]
}
}
}


In [57]:
resp = client.search(index="patents",
                     body=query)

print(resp)

{'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 0, 'relation': 'eq'}, 'max_score': None, 'hits': []}}
