# Retrieve data from OPS

In this notebook, we will retrieve data from the OPS (Open Patent Services) API. The OPS API provides access to a wide range of patent data, including bibliographic information, legal status, and full-text documents.

## Requirements

Please make sure you are in the right environment. To install the required packages, run the following command:

```bash
poetry install --with dev
```

Finally, please make sure you configure the OPS API key in the `.env` file at the root of the repository. You can find the `.env.example` file in the root of the repository. Copy it to `.env` and fill it with your OPS API key.

In [1]:
import os

def mask_value(value, show_last=10):
    if value is None:
        return None
    return '*' * (len(value) - show_last) + value[-show_last:]

consumer_key = os.environ.get('CONSUMER_KEY')
consumer_secret_key = os.environ.get('CONSUMER_SECRET_KEY')
ops_api_url = os.environ.get('OPS_API_URL')

print(f"ConsumerKey: {mask_value(consumer_key)}")
print(f"ConsumerSecret: {mask_value(consumer_secret_key)}")
print(f"OpsApiUrl: {ops_api_url}")

ConsumerKey: **************************************0aQizgAP8M
ConsumerSecret: ******************************************************DdvfLjVjV0
OpsApiUrl: https://ops.epo.org/3.2


In [2]:
DEBUG = os.environ.get('DEBUG', 'false').lower() == 'true'
if DEBUG:
    print("Debug mode is enabled.")
else:
    print("Debug mode is disabled.")

Debug mode is enabled.


## Use the OPS API

In this second part, we will use the OPS API to retrieve data. The goal is to retrieve the description and claims of a patent. Then we want to automated the process to get all the patents published between two dates. The output will be in a first time in a JSON format. In a second time, we will use the data to create a PostgreSQL database.

### Retrieve the description and claims of a patent

In [3]:
import requests
import base64

def get_access_token(api_url: str, consumer_key: str, consumer_secret_key: str) -> str:
    """Get access token from Ops API.

    Args:
        api_url (str): The Ops API URL.
        consumer_key (str): The consumer key for authentication.
        consumer_secret_key (str): The consumer secret key for authentication.
    Returns:
        str: The access token.
    Raises:
        Exception: If the request fails or the access token is not found.
    """
    
    # Encode the consumer key and secret key in base64
    base_64_encoded = base64.b64encode(bytes(f"{consumer_key}:{consumer_secret_key}", 'utf-8')).decode('utf-8')
    
    
    url = f"{api_url}/auth/accesstoken"
    headers = {
        'Content-Type': 'application/x-www-form-urlencoded',
        'Authorization': f'Basic {base_64_encoded}'
    }
    data = {
        'grant_type': 'client_credentials'
    }
    
    try:
        # Make the request to get the access token
        response = requests.post(url, headers=headers, data=data)
        response.raise_for_status()  # Raise an error for bad responses
        
        # Extract the access token from the response
        access_token = response.json().get('access_token')
        if not access_token:
            raise ValueError("Access token not found in the response.")
        
        return access_token
    
    except requests.exceptions.RequestException as e:
        raise Exception(f"Request failed: {e}")

access_token = get_access_token(ops_api_url, consumer_key, consumer_secret_key)
print(f"AccessToken: {mask_value(access_token)}")

AccessToken: ******************5Igv4seLWt


In [4]:
def get_patent_description(api_url: str, access_token: str, type: str = "publication", format: str = "epodoc", number: str = "EP1000000") -> list[str]:
    """Get patent data from Ops API.

    Args:
        api_url (str): The Ops API URL.
        access_token (str): The access token for authentication.
        type (str): Reference type (application, priority, publication).
        format (str): The format of the patent data (docdb, epodoc).
        number (str): The patent number.
    Returns:
        list[str]: The patent data in the specified format.
    Raises:
        Exception: If the request fails or the patent data is not found.
    """
    
    url = f"{api_url}/rest-services/published-data/{type}/{format}/{number}/description"
    headers = {
        'Authorization': f'Bearer {access_token}',
        'Accept': 'application/json'
    }
    
    try:
        # Make the request to get the patent data
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise an error for bad responses
        
        # Extract the patent data from the response
        patent_data = response.json()
        if not patent_data:
            raise ValueError("Patent data not found in the response.")
        
        # Extract only the description from the patent data
        description_data = patent_data.get("ops:world-patent-data", {}).get("ftxt:fulltext-documents", {}).get("ftxt:fulltext-document", {}).get("description", {}).get("p", [])
        
        # Ensure the description is a list of strings
        if isinstance(description_data, dict):
            description_data = [description_data]
        
        description = [p["$"] for p in description_data if "$" in p]
        
        return description
    
    except requests.exceptions.RequestException as e:
        raise Exception(f"Request failed: {e}")
    except KeyError as e:
        raise Exception(f"Unexpected response structure: {e}")
    
patent_data = get_patent_description(ops_api_url, access_token, type="publication", format="docdb", number="WO2023028077")
print(f"PatentDescription: {patent_data[:5]}")

PatentDescription: ['SODIUM CHANNEL INHIBITORS AND METHODS OF DESIGNING SAME CLAIM OF PRIORITY [0001] This application claims the benefit of priority under 35 U.S.C. § 119(e) to U.S. provisional application serial no.63/236,594, filed on August 24, 2021, which application is incorporated herein by reference. BACKGROUND [0002] The present invention relates to organic compounds useful for therapy in a mammal, particularly a human, and in particular to inhibitors of sodium channel (e.g., NaV1.7) that are useful for treating sodium channel-mediated diseases or conditions, such as pain, as well as other diseases and conditions associated with the modulation of sodium channels. The invention further includes methods of designing organic compounds that inhibit the NaV1.7 channel based on atom-resolution structures thereof, such as obtained by cryogenic electron microscopy (“Cryo- EM”, or “cryoEM”). [0003] Voltage-gated sodium channels are transmembrane proteins that initiate action potentials

In [5]:
def get_patent_claims(api_url: str, access_token: str, type: str = "publication", format: str = "epodoc", number: str = "EP1000000") -> list[str]:
    """Get patent claims from Ops API.

    Args:
        api_url (str): The Ops API URL.
        access_token (str): The access token for authentication.
        type (str): Reference type (application, priority, publication).
        format (str): The format of the patent data (docdb, epodoc).
        number (str): The patent number.
    Returns:
        list[str]: A list of patent claims in the specified format.
    Raises:
        Exception: If the request fails or the patent claims are not found.
    """
    
    url = f"{api_url}/rest-services/published-data/{type}/{format}/{number}/claims"
    headers = {
        'Authorization': f'Bearer {access_token}',
        'Accept': 'application/json'
    }
    
    try:
        # Make the request to get the patent claims
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise an error for bad responses
        
        # Extract the patent claims from the response
        patent_data = response.json()
        if not patent_data:
            raise ValueError("Patent data not found in the response.")
        
        # Extract only the claims from the patent data
        claims_data = patent_data.get("ops:world-patent-data", {}).get("ftxt:fulltext-documents", {}).get("ftxt:fulltext-document", {}).get("claims", {}).get("claim", {}).get("claim-text", [])
        
        # Ensure claims_data is a list
        if isinstance(claims_data, dict):  # Single claim case
            claims_data = [claims_data]
        
        # Extract the claim text
        claims_text = [claim.get("$", "") for claim in claims_data]
        
        return claims_text
    
    except requests.exceptions.RequestException as e:
        raise Exception(f"Request failed: {e}")
    except KeyError as e:
        raise Exception(f"Unexpected response structure: {e}")
    
patent_claims = get_patent_claims(ops_api_url, access_token, type="publication", format="docdb", number="WO2023237174")
print(f"PatentClaims: {patent_claims}")

PatentClaims: ['CLAIMS\n1 . A method for controlling a yaw motion of a vehicle (1 ), the vehicle (1 ) comprising a first set of steerable wheels (10) and a second set of wheels (20), the vehicle (1 ) comprising a first set of motion support devices (11 ) for controlling a movement of said first set of steerable wheels (10), and at least one second motion support device (21 ) for controlling a movement of said second set of wheels (20), whereby each motion support device in said first set of motion support devices (11 ) is drivingly connected, directly or indirectly, to a respective individual wheel of the first set of steerable wheels (10), such that each motion support device out of the first set of motion support devices (11 ) can produce a load via the respective individual wheel, the method comprising:\n- obtaining (301 ) an indication of a desired yaw motion to be applied by the vehicle (1 ),\n- determining (302) whether or not the desired yaw motion is obtainable by the first set

In [15]:
def get_patent_biblio(api_url: str, access_token: str, type: str = "publication", format: str = "epodoc", number: str = "EP1000000") -> dict:
    """Get patent bibliographic data from Ops API.

    Args:
        api_url (str): The Ops API URL.
        access_token (str): The access token for authentication.
        type (str): Reference type (application, priority, publication).
        format (str): The format of the patent data (docdb, epodoc).
        number (str): The patent number.
    Returns:
        dict: The patent bibliographic data in the specified format.
        The dictionary contains the patent number, title, and country code.
        The title is a dictionary with language codes as keys and title text as values.
        The country code is extracted from the first applicant name.
    Example:
        {
            "number": "EP1000000",
            "title": {
                "en": "Example Title in English",
                "fr": "Titre d'exemple en français"
            },
            "country": "EP"
        }
    Raises:
        Exception: If the request fails or the bibliographic data is not found.
    """
    
    if DEBUG:
        print(f"DEBUG: get_patent_biblio: api_url={api_url}, access_token={mask_value(access_token)}, type={type}, format={format}, number={number}")
    
    url = f"{api_url}/rest-services/published-data/{type}/{format}/{number}/biblio"
    headers = {
        'Authorization': f'Bearer {access_token}',
        'Accept': 'application/json'
    }
    
    try:
        # Make the request to get the patent bibliographic data
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise an error for bad responses
        
        # Extract the bibliographic data from the response
        patent_data = response.json()
        if not patent_data:
            raise ValueError("Patent data not found in the response.")
        
        # Extract the contry code (assume that it is the code in brackets given at the end of the first applicant name)
        applicants = patent_data.get("ops:world-patent-data", {}).get("exchange-documents", {}).get("exchange-document", [])
        if applicants:
            if isinstance(applicants, dict):
                applicants = [applicants]
            first_exchange_document = applicants[0]
            parties = first_exchange_document.get("bibliographic-data", {}).get("parties", {})
            applicants = parties.get("applicants", {}).get("applicant", [])

            if applicants:
                first_applicant = applicants[0] if isinstance(applicants, list) else applicants
                applicant_name = first_applicant.get("applicant-name", {}).get("name", {}).get("$", "")

                # Extract the country code from the applicant name
                if applicant_name:
                    country_code = applicant_name.split()[-1].strip("[]")
                else:
                    country_code = None
            else:
                country_code = None
        else:
            country_code = None
            
        # Extract the patent titles
        invention_titles = first_exchange_document.get("bibliographic-data", {}).get("invention-title", [])
        titles = {}
        if invention_titles:
            for title in invention_titles:
                lang = title.get("@lang")
                title_text = title.get("$")
                if lang and title_text:
                    titles[lang] = title_text
        
        
        return {
            "number": number,
            "title": titles,
            "country": country_code
        }
    
    except requests.exceptions.RequestException as e:
        raise Exception(f"Request failed: {e}")
    except KeyError as e:
        raise Exception(f"Unexpected response structure: {e}")
    
patent_biblio = get_patent_biblio(ops_api_url, access_token, type="publication", format="docdb", number="EP4113390A2")
print(f"PatentBiblio: {patent_biblio}")

DEBUG: get_patent_biblio: api_url=https://ops.epo.org/3.2, access_token=******************5Igv4seLWt, type=publication, format=docdb, number=EP4113390A2
PatentBiblio: {'number': 'EP4113390A2', 'title': {'de': 'VERFAHREN ZUR DATENVERARBEITUNG, UND ELEKTRONISCHE VORRICHTUNG', 'fr': 'PROCEDE DE TRAITEMENT DES DONNEES, ET DISPOSITIF ELECTRONIQUE', 'en': 'METHOD FOR PROCESSING DATA, AND ELECTRONIC DEVICE'}, 'country': 'CN'}


At this time, we have the possibility to get the description and claims of a patent. We can also get the credentials to use the OPS API. Now, we want to get all patents published between two dates.

In [22]:
def get_patents(api_url: str, consumer_key: str, consumer_secret_key: str, date: str, patent_type: str = "publication") -> list[dict]:
    """Get patents from Ops API based on the given date and type.
    Args:
        api_url (str): The Ops API URL.
        consumer_key (str): The consumer key for authentication.
        consumer_secret_key (str): The consumer secret key for authentication.
        date (str): The date to search for patents (YYYYMMDD).
        patent_type (str): The type of patent to search for (application, priority, publication).
    Returns:
        list[dict]: A list of patents with detailed information.
    """
    
    # Get the access token
    access_token = get_access_token(api_url, consumer_key, consumer_secret_key)
    
    headers = {
        'Authorization': f'Bearer {access_token}',
        'Accept': 'application/json'
    }
    
    total_count = 2000
    first_range = 1
    last_range = 100
    patents = []
    
    while total_count > last_range:
        # Construct the URL for the patent search
        url = f"{api_url}/rest-services/published-data/search?Range={first_range}-{last_range}&q=pd=\"{date}\" and pn any \"EP\""
    
        try:
            # Make the request to get the patents matching the criteria
            response = requests.get(url, headers=headers)
            response.raise_for_status()  # Raise an error for bad responses
            
            # Extract the patents from the response
            patent_data = response.json()
            if not patent_data:
                raise ValueError("Patent data not found in the response.")
            
            # Extract the total count of patents
            total_count = int(patent_data.get("ops:world-patent-data", {}).get("ops:biblio-search", {}).get("@total-result-count", 0))
            
            if DEBUG:
                print(f"DEBUG: Total count of patents: {total_count}")
                print(f"DEBUG: First range: {first_range}, Last range: {last_range}")
            
            # Update the range for the next request
            total_count = min(total_count, 2000)
            first_range = last_range + 1
            last_range = last_range + 100
            if last_range > total_count:
                last_range = total_count
            
            # Extract publication references
            publications = patent_data.get("ops:world-patent-data", {}).get("ops:biblio-search", {}).get("ops:search-result", {}).get("ops:publication-reference", [])
            
            # Process each publication
            for publication in publications:
                try:
                    document_id = publication.get("document-id", {})
                    doc_number = document_id.get("doc-number", {}).get("$", "")
                    format = document_id.get("@document-id-type", "")
                    kind = document_id.get("kind", {}).get("$", "")
                    country = document_id.get("country", {}).get("$", "")
                    number = f"{country}{doc_number}{kind}"
                    
                    # Fetch detailed data for each patent
                    biblio = get_patent_biblio(api_url, access_token, type=patent_type, format=format, number=number)
                    description = get_patent_description(api_url, access_token, type=patent_type, format=format, number=number)
                    claims = get_patent_claims(api_url, access_token, type=patent_type, format=format, number=number)
                    
                    # Add the patent to the list
                    patents.append({
                        "number": number,
                        "title": biblio.get("title"),
                        "country": biblio.get("country"),
                        "format": format,
                        "type": patent_type,
                        "publicationDate": f"{date}",
                        "description": description,
                        "claims": claims
                    })
                
                except Exception as e:
                    # Log the error and continue with the next publication
                    print(f"An error occurred while processing patent {number}: {e}")
        
        except Exception as e:
            # Log the error and continue with the next range
            print(f"An error occurred while processing range {first_range}-{last_range}: {e}")
        
    return patents
    
# Example usage
date = "20230104"
patent_type = "publication"
patents = get_patents(ops_api_url, consumer_key, consumer_secret_key, date, patent_type)
print(f"Patents: {patents[:5]}")  # Print the first 5 patents

DEBUG: Total count of patents: 7039
DEBUG: First range: 1, Last range: 100
DEBUG: get_patent_biblio: api_url=https://ops.epo.org/3.2, access_token=******************6IgM5vPeVW, type=publication, format=docdb, number=EP4113390A2
DEBUG: get_patent_biblio: api_url=https://ops.epo.org/3.2, access_token=******************6IgM5vPeVW, type=publication, format=docdb, number=EP4113399A2
DEBUG: get_patent_biblio: api_url=https://ops.epo.org/3.2, access_token=******************6IgM5vPeVW, type=publication, format=docdb, number=EP4113394A2
DEBUG: get_patent_biblio: api_url=https://ops.epo.org/3.2, access_token=******************6IgM5vPeVW, type=publication, format=docdb, number=EP4113393A2
DEBUG: get_patent_biblio: api_url=https://ops.epo.org/3.2, access_token=******************6IgM5vPeVW, type=publication, format=docdb, number=EP4113956A2
DEBUG: get_patent_biblio: api_url=https://ops.epo.org/3.2, access_token=******************6IgM5vPeVW, type=publication, format=docdb, number=EP4113398A2
DEBUG: 

In [None]:
# Save the patents to a file
output_file = f"outputs/patents_{date}.json"
with open(output_file, 'w') as f:
    import json
    json.dump(patents, f, indent=4)
print(f"Patents saved to {output_file}")

Patents saved to patents_20230104.json
