# Classify Patent — v0.1.0

In this notebook, we aim to automatically analyze the content of a patent and associate its text with the United Nations Sustainable Development Goals (SDGs), plus a neutral fallback.
The process is divided into two main stages:

1. **Classify Description**: Assign each text block in the patent to one of the 17 SDGs or mark it as neutral.
2. **Generate Summary**: For each identified SDG, extract relevant text blocks and generate a concise summary.


## Requirements

Please make sure you are in the right environment. To install the required packages, run the following command:

```bash
poetry install --with dev
```

Finally, please make sure you configure the Ollama host address in the `config.yaml` file in backend/src/api/config/config.yaml. You can find the `config-example.yaml` file in the same directory. Copy it to `config.yaml` and fill it with your Ollama host address.

## 1. Classify Description

In this section, we classify the description from the patent using a zero-shot classification approach based on [HuggingFace's](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=trending).

The classification model evaluates the semantic content of each block and assigns it to one of the following:
- **17 Sustainable Development Goals (SDGs)**  
- **Neutral**, if no SDG alignment is detected


### 🔹 Step 1.1. Load a Patent

We begin by retrieving a full patent from our data source.

The patent is loaded into a `FullPatent` object.

In [1]:
from api.services.patent_service import get_full_patent_by_number

# Retrieve a full patent by its number
full_patent = get_full_patent_by_number("EP3653777A1")

### 🔹 Step 1.2. Load the SDG Classifier

We use HuggingFace's `facebook/bart-large-mnli` zero-shot classification model to associate text blocks with one of the 17 SDGs.

In [2]:
from transformers import pipeline

# Initialize the classifier
classifier = pipeline(model="facebook/bart-large-mnli")

# Dict of SDG candidate labels
sdg_labels_dict = {
    "SDG1": "End poverty in all its forms everywhere", 
    "SDG2": "End hunger, achieve food security and improved nutrition and promote sustainable agriculture", 
    "SDG3": "Ensure healthy lives and promote well-being for all at all ages", 
    "SDG4": "Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all", 
    "SDG5": "Achieve gender equality and empower all women and girls", 
    "SDG6": "Ensure availability and sustainable management of water and sanitation for all", 
    "SDG7": "Ensure access to affordable, reliable, sustainable and modern energy for all", 
    "SDG8": "Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all", 
    "SDG9": "Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation", 
    "SDG10": "Reduce inequality within and among countries", 
    "SDG11": "Make cities and human settlements inclusive, safe, resilient and sustainable", 
    "SDG12": "Ensure sustainable consumption and production patterns", 
    "SDG13": "Take urgent action to combat climate change and its impacts", 
    "SDG14": "Conserve and sustainably use the oceans, seas and marine resources for sustainable development", 
    "SDG15": "Protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss", 
    "SDG16": "Promote peaceful and inclusive societies for sustainable development, provide access to justice for all and build effective, accountable and inclusive institutions at all levels", 
    "SDG17": "Strengthen the means of implementation and revitalize the Global Partnership for Sustainable Development"
}

def get_sdg_code_from_label(label: str, label_dict: dict) -> str:
    """Reverse lookup SDG code from full label text."""
    for code, text in label_dict.items():
        if label == text:
            return code
    return "None"

candidate_label_values = list(sdg_labels_dict.values())

Device set to use cpu


### 🔹 Step 1.3. Classify Full Patent Descriptions

We classify each description block from the `FullPatent` object.

The top SDG label is assigned to the `sdg` field of each `Description` object.  
If the model is not confident enough (i.e., the score is too low), we assign `"None"`.


In [None]:
from tqdm.notebook import tqdm
from api.models.Patent import FullPatent

def classify_full_patent_description(patent: FullPatent,
                          classifier=classifier,
                          candidate_labels=candidate_label_values,
                          label_dict=sdg_labels_dict,
                          threshold: float = 0) -> FullPatent:
    """
    Classify all descriptionsv block in a FullPatent and enrich them with SDG label.

    Args:
        patent (FullPatent): The patent to analyze.
        classifier: HuggingFace classifier.
        candidate_labels (list): SDG label texts.
        label_dict (dict): Map from SDG code to label text.
        threshold (float): Minimum score to accept prediction.

    Returns:
        FullPatent: Enriched object.
    """

    for desc in tqdm(patent.description, desc=f"Classifying {len(patent.description)} descriptions"):
        try:
            result = classifier(desc.description_text, candidate_labels)
            top_score = result["scores"][0]

            if top_score >= threshold:
                label_text = result["labels"][0]
                label_code = get_sdg_code_from_label(label_text, label_dict)
                score = top_score
            else:
                label_code = "None"
                score = -1.0

            desc.sdg = label_code

            print(f"[{desc.description_number}] Label: {desc.sdg} | Score: {score:.3f} | Text: {desc.description_text}")

        except Exception as e:
            print(f"Error on description {desc.description_number}: {e}")
            desc.sdg = "Error"

    patent.is_analyzed = True
    return patent

### 🔹 Step 1.4. Run Classification on the Patent and update in database

We now apply the classifier to enrich the `FullPatent` with SDG labels.

In [5]:
# Run the classification
full_patent = classify_full_patent_description(full_patent)

Classifying 298 descriptions:   0%|          | 0/298 [00:00<?, ?it/s]

[1] Label: SDG12 | Score: 0.166 | Text: The present disclosure relates to a tub for a washing machine and a washing machine having the same.
[2] Label: SDG9 | Score: 0.130 | Text: In general, a washing machine is a home appliance for removing contaminants on clothes, bedding, or the like (hereinafter referred to as laundry) through processes such as washing, rinsing, dehydrating, drying, and the like, using water, detergent, a mechanical action, and the like.
[3] Label: SDG12 | Score: 0.149 | Text: Such washing machine may include a cabinet forming an outer shape of the washing machine, a tub installed inside the cabinet, a drum rotatably installed inside the tub and having a plurality of through-holes through which washing water or foam flows in and out, and a motor installed in the tub to rotate the drum. A rotation shaft of the motor may pass through one side of the tub to be connected to the drum.
[4] Label: SDG9 | Score: 0.129 | Text: The tub may define a washing space therein for

In [7]:
from api.repositories.patent_repository import update_full_patent

# Update the Patent in Database
update_full_patent(full_patent.model_dump())

## 2. Generate Summary

Once the relevant SDGs are identified, we group all associated text blocks under each detected SDG.

For each SDG:
- The collected text blocks are aggregated.
- A concise and coherent summary is generated using a language model, providing a high-level overview of the patent’s contributions to that SDG.

This summarization offers a quick yet insightful perspective on the sustainable impact of the patent.

### 🔹 Step 2.1. Generate full patent summaries

We generate summaries for each detected SDGs from the `FullPatent` object.

In [8]:
from api.models.SDGSummary import SDGSummary
from api.repositories.sdg_summary_repository import create_sdg_summary

def generate_summary(patent: FullPatent,
                     ai_client,
                     ai_model: str,
                     sdg_labels_dict: dict) -> list[SDGSummary]:
    """
    Generate SDG justification summaries for a FullPatent.

    This function groups description blocks by their assigned SDG labels 
    (excluding "None" or "Error"), then uses an AI model to generate a 
    summary explaining how the content supports the respective SDG.

    Args:
        patent (FullPatent): The patent object containing the description blocks.
        ai_client: AI client used to generate summaries.
        ai_model (str): The identifier of the AI model to be used.
        sdg_labels_dict (dict): Mapping from SDG codes to their textual descriptions.

    Returns:
        list[SDGSummary]: A list of SDGSummary objects created for the patent.
    """

    summaries = []

    # Group descriptions by SDG label, excluding "None" and "Error"
    sdg_to_descriptions = {}
    for desc in patent.description:
        if desc.sdg not in ["None", "Error"]:
            sdg_to_descriptions.setdefault(desc.sdg, []).append(desc)

    for sdg_code, descriptions in tqdm(sdg_to_descriptions.items(), desc=f"Generating summaries for patent {patent.number}"):
        try:
            sdg_description = sdg_labels_dict[sdg_code]
            combined_text = "\n".join(desc.description_text for desc in descriptions)

            system_prompt = f"""
            You are an AI specialized in sustainable development and patents. Read the following patent excerpt and explain how it contributes to this Sustainable Development Goal (SDG): {sdg_code} - {sdg_description}.

            Focus on:
            - The main innovation or idea in the patent.
            - How it supports the targets of the SDG.
            - Any positive impact (social, environmental, or economic) it may have.

            Patent text:
            {combined_text}

            Write a short, clear summary showing the link between the patent and the SDG.
            """

            user_prompt = f"""
            Summarize how this patent helps achieve the SDG: {sdg_code} - {sdg_description}."""

            response = ai_client.chat(
                model=ai_model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}],
                options = {"num_predict":512}
            )

            summary_text = response["message"]["content"].strip()

            summary = SDGSummary(
                patent_number=str(patent.number),
                sdg=str(sdg_code),
                sdg_description=summary_text
            )

            create_sdg_summary(summary.model_dump())
            summaries.append(summary)

            print(f"[{patent.number}] SDG: {sdg_code} | Summary:\n{summary_text}\n")

        except Exception as e:
            print(f"Error generating summary for SDG {sdg_code} in patent {patent.number}: {e}")

    return summaries


### 🔹 Step 2.2. Run summaries on the patent and update database

In [9]:
from api.config.ai_config import ai_model, ai_client 

generate_summary(full_patent, ai_client, ai_model, sdg_labels_dict)

Generating summaries for patent EP3653777A1:   0%|          | 0/8 [00:00<?, ?it/s]

2025-05-15 18:24:16,889 - httpx - INFO - _send_single_request - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


[EP3653777A1] SDG: SDG12 | Summary:
The invention relates to a method for generating and using renewable energy based on solar radiation, which has particular advantages in achieving the sustainable use of resources.

SDG 12 aims at ensuring sustained consumption and production without degrading natural systems or causing environmental degradation. The invention's aim is to contribute towards SDG 12 by utilizing renewable energy sources effectively and sustainably. This can be achieved through efficient generation, utilization, and distribution of solar energy that reduces reliance on non-renewable resources. By reducing the need for fossil fuels, this invention supports a lower carbon footprint in its production and consumption patterns, helping to mitigate climate change effects while also promoting sustainable resource use.

Moreover, by generating and utilizing renewable energy from sources like solar radiation efficiently, it contributes towards SDG 12's goal of sustainable consum

2025-05-15 18:24:23,795 - httpx - INFO - _send_single_request - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


[EP3653777A1] SDG: SDG9 | Summary:
This invention relates to a method of detecting defects in a magnetic stripe on an identification card or other object.

The invention has the following potential impact on global access to banking services:

1) This method enables detection of defects that can affect user safety.
2) It can be used for high-speed screening at bank ATMs and other financial establishments. 
3) It is intended for use with existing magnetic stripe cards without requiring any changes in card manufacturing or design.

4) The invention reduces the need to manually search through a pile of cards, thereby improving work efficiency.
5) This detection method can be incorporated into automatic identification systems used for checking deposits and withdrawals at banks.



2025-05-15 18:25:10,294 - httpx - INFO - _send_single_request - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


[EP3653777A1] SDG: SDG6 | Summary:
This patent invention focuses on enhancing the stability and sealing integrity of a washing machine's tub, particularly in preventing leakage of water between its first case and second case. This is crucial to achieving SDG6 as it ensures that water resources are managed sustainably. The improved design facilitates better integration of components within the main body while maintaining structural integrity, thereby maximizing both the size of the drum and overall capacity for washing tasks.

Moreover, by reducing leakage issues, the patent indirectly contributes to improving hygiene standards by minimizing the accumulation of contaminants in the tub, which can lead to better water quality. This reduction also helps prevent health risks associated with improper sanitation practices, thus directly supporting SDG6 through improved access to clean drinking water and sanitation facilities.

The enhanced sealing capability ensures a more efficient cycle by 

2025-05-15 18:25:31,136 - httpx - INFO - _send_single_request - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


[EP3653777A1] SDG: SDG15 | Summary:
The present application discloses a system for automatic identification of environmental change in vegetation monitoring which can detect forest disturbances, fires, deforestation, water quality changes, etc. The system is based on the analysis of spectral information from satellites or ground-based sensors that are used to classify land use and land cover types such as forests and grasslands, but also non-target classes (e.g., human activities, trees with diseases, etc.) which can be identified based on specific features extracted through a supervised learning algorithm. The system is designed to provide efficient monitoring of the ecological environment in real time, thus allowing for effective prediction and management of environmental changes that could lead to biodiversity loss.

The application describes how this system would help achieve SDG15 - Protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests,

2025-05-15 18:26:46,897 - httpx - INFO - _send_single_request - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


[EP3653777A1] SDG: SDG4 | Summary:
The patent disclosed in FIGS. 1-11 describes a novel design for a washing machine's internal configuration, particularly focusing on improving the structural integrity and durability of the parts that connect the front case (first case) to the rear case (second case). This innovation contributes to ensuring inclusive and equitable quality education by:

1. **Structural Integrity**: The patent addresses the need for reliable joint connections between components, which is essential in maintaining the functionality and longevity of the washing machine. This ensures that the parts can withstand repeated use without compromising their structural integrity or causing premature failure.

2. **Durability and Longevity**: By enhancing the welding structure around key components (such as coupling protrusions), this design aims to reduce the likelihood of these areas experiencing failures, thereby extending the overall lifespan of the washing machine. This aspec

2025-05-15 18:27:08,939 - httpx - INFO - _send_single_request - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


[EP3653777A1] SDG: SDG16 | Summary:
The patent excerpt describes a design feature of a first case or casing that allows for only one main coupling protrusion 321 in the vertically straight portion 311 on both left and right sides as well as on the upper side of the first case 300. This innovation is likely intended to contribute positively towards achieving SDG 16 by providing access to justice for all and building effective, accountable, and inclusive institutions at all levels.

The main idea revolves around simplifying or streamlining the design process while ensuring that there are no unnecessary complexities in certain areas of the case (such as vertical straight portions). This could potentially reduce costs associated with manufacturing, maintenance, or compliance requirements related to legal documents. By focusing on a key feature rather than a broad design overhaul, it addresses one specific element that may have been overlooked during previous iterations of the design.

This

2025-05-15 18:27:17,605 - httpx - INFO - _send_single_request - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


[EP3653777A1] SDG: SDG3 | Summary:
The patent excerpt describes front perspective views of a second case, indicating that it provides visual representation or illustration related to product design or manufacturing processes, which can contribute positively towards promoting health and well-being in the context of ensuring healthy lives and promoting well-being for all at all ages (SDG 3). The specific innovation illustrated is not detailed here but suggests potential contributions could include improved accessibility, ergonomics, or user-friendly designs that enhance overall quality of life.



2025-05-15 18:27:34,461 - httpx - INFO - _send_single_request - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


[EP3653777A1] SDG: SDG13 | Summary:
This patent is focused on improving the design of a coupling system, specifically for a flash drum (like those used in the production of semiconductors). The main innovation involves modifying the constraining protrusion to be positioned outside the coupling protrusion rather than inside it.

### How It Supports SDG13:

The key contribution of this patent is its focus on reducing emissions and improving environmental impact through more efficient heat management systems. By preventing flash from escaping out of the tub through spaces between surfaces, this innovation helps reduce energy waste and overall operational efficiency within a production process that relies heavily on heat transfer.

### Positive Impact:

- **Environmental:** Reduces unnecessary energy consumption by minimizing heat losses through uncontrolled emission.
- **Sustainable Development:** Helps decrease carbon footprint associated with excessive material usage and waste disposal.

[SDGSummary(patent_number='EP3653777A1', sdg='SDG12', sdg_description="The invention relates to a method for generating and using renewable energy based on solar radiation, which has particular advantages in achieving the sustainable use of resources.\n\nSDG 12 aims at ensuring sustained consumption and production without degrading natural systems or causing environmental degradation. The invention's aim is to contribute towards SDG 12 by utilizing renewable energy sources effectively and sustainably. This can be achieved through efficient generation, utilization, and distribution of solar energy that reduces reliance on non-renewable resources. By reducing the need for fossil fuels, this invention supports a lower carbon footprint in its production and consumption patterns, helping to mitigate climate change effects while also promoting sustainable resource use.\n\nMoreover, by generating and utilizing renewable energy from sources like solar radiation efficiently, it contributes towa

In [7]:
from api.repositories.sdg_summary_repository import get_sdg_summary_by_patent_number

get_sdg_summary_by_patent_number("EP3653777A1")

[{'patent_number': 'EP3653777A1',
  'sdg': 'SDG12',
  'sdg_description': 'This patent relates to a method for producing carbon neutral hydrogen from natural gas using water vapor as a catalyst, which is beneficial towards achieving Sustainable Development Goal 12 (Ensuring sustainable consumption and production patterns). By reducing greenhouse gas emissions through the efficient utilization of fossil fuels like natural gas, it contributes positively toward environmental preservation. Additionally, by decreasing reliance on imported hydrocarbons and promoting cleaner energy sources that reduce air pollution, this technology facilitates a shift towards more sustainable economic practices.'},
 {'patent_number': 'EP3653777A1',
  'sdg': 'SDG13',
  'sdg_description': 'The patent text addresses a critical issue related to flash discharge in tubular devices, particularly focusing on preventing heat from escaping through an outer surface of a tub. This problem is directly linked to climate cha