# 🧠 Classify Patent — v0.1.0

In this notebook, we aim to automatically analyze the content of a patent and associate its text with the United Nations Sustainable Development Goals (SDGs), plus a neutral fallback.
The process is divided into two main stages:

1. **Classify Description**: Assign each text block in the patent to one of the 17 SDGs or mark it as neutral.
2. **Generate Summary**: For each identified SDG, extract relevant text blocks and generate a concise summary.


## Requirements

Please make sure you are in the right environment. To install the required packages, run the following command:

```bash
poetry install --with dev
```

## 🔍 1. Classify Description

In this section, we classify the description from the patent using a zero-shot classification approach based on [HuggingFace's](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=trending).

The classification model evaluates the semantic content of each block and assigns it to one of the following:
- **17 Sustainable Development Goals (SDGs)**  
- **Neutral**, if no SDG alignment is detected


### 🔹 Step 1.1. Load a Patent

We begin by retrieving a full patent from our data source.

The patent is loaded into a `FullPatent` object.

In [1]:
from api.services.patent_service import get_full_patent_by_number

# Retrieve a full patent by its number
full_patent = get_full_patent_by_number("EP3653777A1")

### 🔹 Step 1.2. Load the SDG Classifier

We use HuggingFace's `facebook/bart-large-mnli` zero-shot classification model to associate text blocks with one of the 17 SDGs.

In [2]:
from transformers import pipeline

# Initialize the classifier
classifier = pipeline(model="facebook/bart-large-mnli")

# Dict of SDG candidate labels
sdg_labels_dict = {
    "SDG1": "End poverty in all its forms everywhere", 
    "SDG2": "End hunger, achieve food security and improved nutrition and promote sustainable agriculture", 
    "SDG3": "Ensure healthy lives and promote well-being for all at all ages", 
    "SDG4": "Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all", 
    "SDG5": "Achieve gender equality and empower all women and girls", 
    "SDG6": "Ensure availability and sustainable management of water and sanitation for all", 
    "SDG7": "Ensure access to affordable, reliable, sustainable and modern energy for all", 
    "SDG8": "Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all", 
    "SDG9": "Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation", 
    "SDG10": "Reduce inequality within and among countries", 
    "SDG11": "Make cities and human settlements inclusive, safe, resilient and sustainable", 
    "SDG12": "Ensure sustainable consumption and production patterns", 
    "SDG13": "Take urgent action to combat climate change and its impacts", 
    "SDG14": "Conserve and sustainably use the oceans, seas and marine resources for sustainable development", 
    "SDG15": "Protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss", 
    "SDG16": "Promote peaceful and inclusive societies for sustainable development, provide access to justice for all and build effective, accountable and inclusive institutions at all levels", 
    "SDG17": "Strengthen the means of implementation and revitalize the Global Partnership for Sustainable Development"
}

def get_sdg_code_from_label(label: str, label_dict: dict) -> str:
    """Reverse lookup SDG code from full label text."""
    for code, text in label_dict.items():
        if label == text:
            return code
    return "None"

candidate_label_values = list(sdg_labels_dict.values())

Device set to use cpu


### 🔹 Step 1.3. Classify Full Patent Descriptions

We classify each description block from the `FullPatent` object.

The top SDG label is assigned to the `sdg` field of each `Description` object.  
If the model is not confident enough (i.e., the score difference between top labels is too low), we assign `"None"`.


In [None]:
from tqdm.notebook import tqdm
from api.models.Patent import FullPatent

def classify_full_patent_description(patent: FullPatent,
                          classifier=classifier,
                          candidate_labels=candidate_label_values,
                          label_dict=sdg_labels_dict,
                          threshold: float = 0,
                          verbose: bool = False) -> FullPatent:
    """
    Classify all descriptionsv block in a FullPatent and enrich them with SDG label.

    Args:
        patent (FullPatent): The patent to analyze.
        classifier: HuggingFace classifier.
        candidate_labels (list): SDG label texts.
        label_dict (dict): Map from SDG code to label text.
        threshold (float): Minimum score difference to accept prediction.
        verbose (bool): If True, print classification results.

    Returns:
        FullPatent: Enriched object.
    """

    for desc in tqdm(patent.description, desc=f"Classifying {len(patent.description)} descriptions"):
        try:
            result = classifier(desc.description_text, candidate_labels)
            top_score = result["scores"][0]

            if top_score >= threshold:
                label_text = result["labels"][0]
                label_code = get_sdg_code_from_label(label_text, label_dict)
                score = top_score
            else:
                label_code = "None"
                score = -1.0

            desc.sdg = label_code

            if verbose:
                print(f"[{desc.description_number}] Label: {desc.sdg} | Score: {score:.3f} | Text: {desc.description_text}")

        except Exception as e:
            print(f"Error on description {desc.description_number}: {e}")
            desc.sdg = "Error"

    patent.is_analyzed = True
    return patent

### 🔹 Step 1.4. Run Classification on the Patent

We now apply the classifier to enrich the `FullPatent` with SDG labels.

In [None]:
# Run the classification
full_patent = classify_full_patent_description(full_patent, verbose=True)

Classifying 298 descriptions:   0%|          | 0/298 [00:00<?, ?it/s]

[1] Label: SDG12 | Score: 0.166 | Text: The present disclosure relates to a tub for a washing machine and a washing machine having the same.
[2] Label: SDG9 | Score: 0.130 | Text: In general, a washing machine is a home appliance for removing contaminants on clothes, bedding, or the like (hereinafter referred to as laundry) through processes such as washing, rinsing, dehydrating, drying, and the like, using water, detergent, a mechanical action, and the like.
[3] Label: SDG12 | Score: 0.149 | Text: Such washing machine may include a cabinet forming an outer shape of the washing machine, a tub installed inside the cabinet, a drum rotatably installed inside the tub and having a plurality of through-holes through which washing water or foam flows in and out, and a motor installed in the tub to rotate the drum. A rotation shaft of the motor may pass through one side of the tub to be connected to the drum.
[4] Label: SDG9 | Score: 0.129 | Text: The tub may define a washing space therein for

## 📝 2. Generate Summary

Once the relevant SDGs are identified, we group all associated text blocks under each detected SDG.

For each SDG:
- The collected text blocks are aggregated.
- A concise and coherent summary is generated using a language model, providing a high-level overview of the patent’s contributions to that SDG.

This summarization offers a quick yet insightful perspective on the sustainable impact of the patent.