# A brief introduction to the patent system
The patent is a register, tipically a document, that to document a exclusive discovery, invention or method and aims to give to the patent holder exclusive rights over the discovery/invention.

<--TODO: Explain the T,O method of organizing patents -->

To organize the patents and find a suitable way to structure its information, a commonly used method defines a patent with 2 characteristics:
1. **Task:** the method used in the described patent. In can be compress something or agilize a effect, for example.
2. **Object:** the "target" of the task. It can be a food, a construction material or any other object that, combined with the task, defines the patent.

This method is defined by the Hallbach matrix, that defines a list of Task and Objects that can be extracted from the Title or the Resume of the patent.

# T,O Finder
The T,O Finder is the method that identifies the Task and the Object from a given patent and in this notebook we will construct a method to do such thing.

In [1]:
import numpy as np
import pandas as pd
import string
import spacy
import unicodedata

In [2]:
# Loading the patents dataset
df = pd.read_csv('../../data/processed/patentes_inpi_lemmatized.csv')
df.head()

Unnamed: 0,id_pedido,data_deposito,titulo,ipc,url,resumo,classifica_ipc,titulo_lemmatized,resumo_lemmatized
0,BR 11 2021 018393 0,02/03/2020,TRATAMENTO DE COLISÕES EM UPLINK,H04L 1/18,https://busca.inpi.gov.br/pePI/servlet/Patente...,"A presente invenção se refere a métodos, sis...",H04L 1/18,tratamento colisao uplink,presente invencao referir metodo sistema di...
1,BR 11 2021 018071 0,02/03/2020,ALOJAMENTO DE VELA DE IGNIÇÃO COM PROTEÇÃO ANT...,H01T 13/14,https://busca.inpi.gov.br/pePI/servlet/Patente...,ALOJAMENTO DE VELA DE IGNIÇÃO COM PROTEÇÃO A...,H01T 13/14 ; H01T 13/20 ; H01T 13/32 ; H0...,alojamento vela ignicao protecao anticorrosivo...,alojamento vela ignicao protecao anticorros...
2,BR 11 2021 016947 4,02/03/2020,ANTICORPOS QUE RECONHECEM TAU,C07K 16/18,https://busca.inpi.gov.br/pePI/servlet/Patente...,ANTICORPOS QUE RECONHECEM TAU. A invenção fo...,C07K 16/18 ; G01N 33/68,anticorpo reconhecer tau,anticorpo reconhecer tau invencao fornecer ...
3,BR 10 2020 004169 0,02/03/2020,AQUECEDOR DE AR A LENHA COM DUPLA EXAUSTÃO PAR...,F24H 3/00,https://busca.inpi.gov.br/pePI/servlet/Patente...,AQUECEDOR DE AR A LENHA COM DUPLA EXAUSTAO P...,F24H 3/008 ; F24H 4/06,aquecedor ar lenha dupla exaustao utilizar amb...,aquecedor ar lenha dupla exaustao utilizar ...
4,BR 11 2021 006234 3,02/03/2020,BIBLIOTECAS DE CÉLULAS ÚNICAS E NÚCLEOS ÚNICOS...,C12N 15/10,https://busca.inpi.gov.br/pePI/servlet/Patente...,BIBLIOTECAS DE CÉLULAS ÚNICAS E NÚCLEOS ÚNIC...,C12N 15/10,biblioteca celula unico nucleo unico alto rend...,biblioteca celula unico nucleo unico alto r...


In [13]:
df_match = pd.read_csv("../../data/processed/base_efeitos_físicos_publicada_lemmatized.csv")
df_match.head()

Unnamed: 0,TIPO DE EFEITO,TAREFA,OBJETO,EFEITO FÍSICO,SINONIMO 1 EFEITO FISICO,SINONIMO 2 EFEITO FISICO,PT Link,PT Description,Link Wiki (English),TAREFA_lemmatized
0,Aplicação,Apertar,Sólido,Matriz de Halbach,,,,,http://en.wikipedia.org/wiki/Halbach_array,apertar
1,Aplicação,Apertar,Sólido dividido,Matriz de Halbach,,,,,http://en.wikipedia.org/wiki/Halbach_array,apertar
2,Aplicação,Concentrar,Campo,Matriz de Halbach,,,,,http://en.wikipedia.org/wiki/Halbach_array,concentrar
3,Aplicação,Concentrar,Sólido dividido,Matriz de Halbach,,,,,http://en.wikipedia.org/wiki/Halbach_array,concentrar
4,Aplicação,Depositar,Sólido dividido,Matriz de Halbach,,,,,http://en.wikipedia.org/wiki/Halbach_array,depositar


# POS tagging
**POS (Part-of-Speech) Tagging** is the process of labeling each word in a text with its corresponding part of speech, such as noun, verb, adjective, etc. This is a fundamental step in Natural Language Processing (NLP) as it helps in understanding the grammatical structure and meaning of a sentence.

For example:
- Sentence: "The cat is sleeping."
- POS Tags: `The (Determiner)`, `cat (Noun)`, `is (Verb)`, `sleeping (Verb)`.

POS tagging is useful for tasks like:
- Text parsing and syntactic analysis.
- Named Entity Recognition (NER).
- Sentiment analysis and text classification.

## Example Code
```python
import spacy

# Load a spaCy language model
nlp = spacy.load("en_core_web_sm")  # Replace with your desired language model

# Input text
text = "tratamento colisao uplink."

# Process the text
doc = nlp(text)

# Print each token and its POS tag
for token in doc:
    print(f"{token.text} -> {token.pos_} ({token.tag_})")
```

It gives the following output:

```sh
tratamento -> NOUN (NOUN)
colisao -> ADJ (ADJ)
uplink -> VERB (VERB)
```

## POS Tagging with spaCy
spaCy provides an efficient and easy-to-use method for [POS](https://universaldependencies.org/u/pos/) tagging. It extracts from the words the following characteristics:
- `token.pos_`: The coarse-grained part-of-speech tag (e.g., NOUN, VERB).
- `token.tag_`: The fine-grained part-of-speech tag (e.g., VBZ, NN).

In spaCy, each token in a text is assigned with a bunch of characteristics:

1. Text: The original word text.
2. Lemma: The base form of the word.
3. **POS:** The simple UPOS part-of-speech tag.
4. **Tag:** The detailed part-of-speech tag.
5. Dep: Syntactic dependency, i.e. the relation between tokens.
6. Shape: The word shape – capitalization, punctuation, digits.
7. is alpha: Is the token an alpha character?
8. is stop: Is the token part of a stop list, i.e. the most common words of the language?


## Reference
Common `pos_` Tags
| Tag   | Description           |
|-------|-----------------------|
| `ADJ` | Adjective             |
| `ADP` | Adposition            |
| `ADV` | Adverb                |
| `AUX` | Auxiliary verb        |
| `CONJ`| Coordinating conjunction |
| `DET` | Determiner            |
| `INTJ`| Interjection          |
| `NOUN`| Noun                  |
| `NUM` | Numeral               |
| `PART`| Particle              |
| `PRON`| Pronoun               |
| `PROPN`| Proper noun          |
| `PUNCT`| Punctuation          |
| `SCONJ`| Subordinating conjunction |
| `SYM` | Symbol                |
| `VERB`| Verb                  |
| `X`   | Other                 |

Common `tag_` Tags (English Example)
| Tag   | Description                          |
|-------|--------------------------------------|
| `NN`  | Noun, singular                      |
| `NNS` | Noun, plural                        |
| `VB`  | Verb, base form                     |
| `VBD` | Verb, past tense                    |
| `VBG` | Verb, gerund or present participle  |
| `VBN` | Verb, past participle               |
| `VBZ` | Verb, 3rd person singular present   |
| `JJ`  | Adjective                           |
| `RB`  | Adverb                              |
| `IN`  | Preposition or subordinating conjunction |
| `DT`  | Determiner                          |

In [4]:
nlp = spacy.load("pt_core_news_lg")

In [5]:
# Process the text
doc = nlp(df.loc[25, "titulo_lemmatized"])

# Print each token and its POS tag
print("Original text POS")
for token in doc:
    print(f"{token.text} -> {token.pos_} ({token.tag_})")

# Process the text
doc = nlp(df.loc[25, "titulo"])

# Print each token and its POS tag
print("\n\nProcessed text POS")
for token in doc:
    print(f"{token.text} -> {token.pos_} ({token.tag_})")

Original text POS
metodo -> NOUN (NOUN)
codificacao -> ADJ (ADJ)
video -> PROPN (PROPN)
codificador -> ADJ (ADJ)
decodificador -> ADJ (ADJ)
produto -> NOUN (NOUN)
programa -> ADJ (ADJ)
computador -> ADJ (ADJ)


Processed text POS
MÉTODO -> PROPN (PROPN)
DE -> ADP (ADP)
CODIFICAÇÃO -> PROPN (PROPN)
DE -> PROPN (PROPN)
VÍDEO -> PROPN (PROPN)
, -> PUNCT (PUNCT)
CODIFICADOR -> PROPN (PROPN)
, -> PUNCT (PUNCT)
DECODIFICADOR -> PROPN (PROPN)
E -> CCONJ (CCONJ)
PRODUTO -> PROPN (PROPN)
DE -> ADP (ADP)
PROGRAMA -> PROPN (PROPN)
DE -> ADP (ADP)
COMPUTADOR -> NOUN (NOUN)


As seen, the portuguese POS is not very effection into matching with the words, as it cannot identify properly the Task and Objects, making it harder to complete the purposed work.

It is also important to notice that much of this issues are because the portuguese core is not well developed. The necessity to make language-specific models and tools are a big limitation of more conventional NLP techniques.

To final test other approach, we will be using the cossine distance of vectors to try to find other Tasks and Objects in the search.

In [6]:
def find_matches(input_string, list_of_vectors, threshold=None, top_n=None):
    """
    Vectorizes every word in the input string and compares its vectors to a list of precomputed vectors.
    Returns matches based on a similarity threshold or the top_n most similar terms.

    Args:
        input_string (str): The input string to be vectorized.
        list_of_vectors (list): Precomputed vectors of the list of strings.
        threshold (float, optional): Similarity threshold for matches. Default is None.
        top_n (int, optional): Number of top matches to return based on similarity. Default is None.

    Returns:
        list: Indices of matches in the list_of_vectors.
    """
    if threshold is None and top_n is None:
        raise ValueError("You must provide either a threshold or top_n.")

    words = input_string.split()
    all_similarities = []

    for word in words:
        input_vector = nlp(word).vector

        # Calculate similarity for each vector
        for i, vector in enumerate(list_of_vectors):
            similarity = np.dot(input_vector, vector) / (np.linalg.norm(input_vector) * np.linalg.norm(vector))
            all_similarities.append((i, similarity))

    # Filter by threshold if provided
    matches = []
    if threshold is not None:
        matches.extend([i for i, sim in all_similarities if sim >= threshold])

    # Get top_n matches across all words if provided
    if top_n is not None:
        top_matches = sorted(all_similarities, key=lambda x: x[1], reverse=True)[:top_n]
        matches.extend([i for i, _ in top_matches])

    # Remove duplicates and return
    return list(matches)

In [8]:
# Path to the saved vectors
tarefas_vectors_path = "../../data/processed/tarefas_vectors.npz"

try:
    # Try to load the precomputed vectors
    tarefas_vectors = np.load(tarefas_vectors_path)["vectors"]
    print("Loaded precomputed vectors successfully.")
except (FileNotFoundError, KeyError):
    # If loading fails, recalculate the vectors
    print("Precomputed vectors not found. Recalculating...")
    tarefas = df_match["TAREFA_lemmatized"].astype(str).tolist()
    tarefas_vectors = np.array([nlp(tarefa).vector for tarefa in tarefas])
    
    # Save the recalculated vectors
    np.savez(tarefas_vectors_path, vectors=tarefas_vectors)
    print("Vectors recalculated and saved.")

Loaded precomputed vectors successfully.


In [14]:
tarefas = df_match["TAREFA_lemmatized"].astype(str).tolist()

In [10]:
# Apply the function to df.loc[25, "titulo_lemmatized"]
matches = find_matches(df.loc[25, "titulo_lemmatized"], tarefas_vectors, threshold=0.45)

# Print the matched indices and corresponding strings
print("\n\n")
print(df.loc[25, "titulo"])
print("Matched indices:", matches)
print("Matched strings:", set([tarefas[i] for i in matches]))

  similarity = np.dot(input_vector, vector) / (np.linalg.norm(input_vector) * np.linalg.norm(vector))





MÉTODO DE CODIFICAÇÃO DE VÍDEO, CODIFICADOR, DECODIFICADOR E PRODUTO DE PROGRAMA DE COMPUTADOR
Matched indices: [28, 65, 171, 174, 359, 406, 407, 482, 483, 484, 485, 540, 660, 692, 707, 708, 723, 793, 794, 795, 796, 809, 810, 840, 847, 863, 864, 865, 866, 918, 922, 926, 976, 986, 987, 988, 1012, 1013, 1014, 1031, 1155, 1156, 1157, 1158, 1170, 1171, 1213, 1214, 1282, 1313, 1314, 1315, 1316, 1317, 1390, 1426, 1448, 1449, 1481, 1482, 1483, 1484, 1503, 1583, 1584, 1585, 1586, 1704, 1723, 1724, 1725, 1726, 1748, 1749, 1750, 1757, 1767, 1768, 1824, 1825, 1826, 1827, 1900, 1991, 2132, 2133, 2134, 2151, 2152, 2153, 2225, 2226, 2243, 2399, 2410, 2420, 2421, 2500, 2547, 2549, 2558, 2568, 2605, 2620, 2643, 2644, 2645, 2646, 2666, 2667, 2668, 2669, 2712, 2713, 2714, 2778, 2812, 2958, 3057, 3058, 3102, 3256, 3257, 3258, 3280, 3281, 3292, 3293, 3328, 3378, 3379, 3380, 3381, 3402, 3445, 3470, 3471, 3472, 3473, 3562, 3582, 3583, 3636, 3637, 3654, 3670, 3715, 3716, 3834, 4122, 4138, 4161, 4162, 4163

In [11]:
# Apply the function to df.loc[25, "titulo_lemmatized"]
matches = find_matches(df.loc[25, "titulo_lemmatized"], tarefas_vectors, top_n=5)

# Print the matched indices and corresponding strings
print("\n\n")
print(df.loc[25, "titulo"])
print("Matched indices:", matches)
print("Matched strings:", set([tarefas[i] for i in matches]))

  similarity = np.dot(input_vector, vector) / (np.linalg.norm(input_vector) * np.linalg.norm(vector))





MÉTODO DE CODIFICAÇÃO DE VÍDEO, CODIFICADOR, DECODIFICADOR E PRODUTO DE PROGRAMA DE COMPUTADOR
Matched indices: [28, 65, 171, 174, 359]
Matched strings: {'comprimir'}


In [65]:
from tqdm import tqdm

result = []
for index, row in tqdm(df.iterrows(), total=df.shape[0]):
    res = find_matches(str(row["titulo_lemmatized"]), tarefas_vectors, top_n=10)
    result.append(res)

  similarity = np.dot(input_vector, vector) / (np.linalg.norm(input_vector) * np.linalg.norm(vector))
100%|██████████| 3299/3299 [20:22<00:00,  2.70it/s]


In [69]:
# Ensure each row in the DataFrame gets its corresponding set of matches
df["match_top_10_title"] = [set([tarefas[i] for i in res]) for res in result]

# Save the updated DataFrame to a CSV file
df.to_csv("../../data/processed/patentes_inpi_lemmatized_matched.csv", index=False)

# Revisiting T,O Finder

In [2]:
df_patents_sample = pd.read_csv("../../data/processed/patents_inpi_llm_matched_10percent_v2.csv")
df_patents_sample.head()

Unnamed: 0,id_pedido,data_deposito,titulo,ipc,url,resumo,classifica_ipc,titulo_english,match_top_10_title,kind_effect,task,object,physical_effect,derived_from
0,BR 20 2020 001664 0,24/01/2020,CONFIGURAÇÃO APLICADA EM SOBRETAMPA METÁLICA,B65D 43/02,https://busca.inpi.gov.br/pePI/servlet/Patente...,RESUMO CONFIGURAÇÃO APLICADA EM SOBRETAMPA MET...,B65D 43/02,Configuration applied in metallic substrate,"{'Move', 'Break Down', 'Remove', 'Change Phase...",,Vedação,Embalagem,Impermeabilização,"O/a ""Apertar"" atua no/na Sólido dividido para ..."
1,BR 11 2021 015521 0,11/02/2020,"MISTURAS DE PESTICIDAS, MÉTODO PARA CONTROLAR ...",A01N 43/58,https://busca.inpi.gov.br/pePI/servlet/Patente...,. Misturas de pesticidas compreendendo como co...,A01N 43/58 ; A01N 33/20 ; A01N 37/34 ; A0...,"Pesticide mixture, method for controlling phyt...","{'Produce', 'Prevents', 'Change Phase', 'Mix',...",,Controlar,Fungos fitopatogênicos,Inibição de processos metabólicos,"O/a ""Misturar"" atua no/na Campo para produzir ..."
2,BR 11 2021 012872 7,08/01/2020,"CHAPA DE AÇO ELÉTRICO COM GRÃO ORIENTADO, MÉTO...",C21D 8/12,https://busca.inpi.gov.br/pePI/servlet/Patente...,. É provida uma chapa de aço magnética com grã...,C21D 8/12 ; C22C 38/00 ; C22C 38/60 ; H01...,"Electric power sheet with large-oriented, meth...","{'Produce', 'Move', 'Break Down', 'Change Phas...",,Orientar,Sólido dividido,Pó ferromagnético,
3,BR 11 2021 012274 5,21/01/2020,MÉTODO E SERVIDOR DE REDE PARA AUTENTICAÇÃO E ...,H04W 12/04,https://busca.inpi.gov.br/pePI/servlet/Patente...,MÉTODO E SERVIDOR DE REDE PARA AUTENTICAÇÃO E ...,H04W 12/04 ; H04W 12/00 ; H04W 12/06,Network method and server for key authenticati...,"{'Produce', 'Move', 'Break Down', 'Change Phas...",,Autenticar,Dispositivo de terminal,Validação de identidade,"O/a ""Proteger"" atua no/na Gás para produzir um..."
4,BR 11 2021 014438 2,23/01/2020,PRESERVAÇÃO DE CÉLULAS-TRONCO,A01N 1/02,https://busca.inpi.gov.br/pePI/servlet/Patente...,. A invenção refere-se a um campo de preservaç...,A01N 1/02 ; C12N 5/07,Preservation of Tronco Cells,"{'Produce', 'Move', 'Break Down', 'Preserves',...",,Preservar,Célula-tronco,Inibição da degradação celular,"O/a ""Preservar"" atua no/na Sólido para produzi..."


In [3]:
nlp = spacy.load("pt_core_news_lg")


def lemmatizer_spacy(text: str):
    """
    Lemmatizes the input text using spaCy.

    This function processes the input text with a spaCy language model to extract 
    the lemmatized (base) form of each word. It excludes stop words and punctuation 
    from the output.

    Args:
        text (str): The input text to be lemmatized.

    Returns:
        str: A string containing the lemmatized words, separated by spaces.

    Example:
        >>> lemmatizer_spacy("Os carros estão correndo rapidamente.")
        'carro correr rapidamente'

    Notes:
        - This function assumes that the `nlp` object (spaCy language model) is 
          already loaded and available in the global scope.
        - Stop words (e.g., "os", "estão") and punctuation are removed from the output.
        - The input text is converted to lowercase before processing.
    """
    str_text = str(text)
    doc = nlp(str_text.lower())
    return " ".join([token.lemma_ for token in doc if not token.is_stop and not token.is_punct])


def remove_punctuation_and_accents(text: str) -> str:
    """
    Removes punctuation and accents from the input text.

    This function removes all punctuation and converts accented characters 
    (e.g., "ã", "ç") to their unaccented equivalents (e.g., "a", "c").

    Args:
        text (str): The input text to be processed.

    Returns:
        str: A cleaned string without punctuation or accents.

    Example:
        >>> remove_punctuation_and_accents("pão, maçã!")
        'pao maca'
    """
    # Normalize text to decompose accents
    normalized_text = unicodedata.normalize('NFD', text)
    # Remove accents by filtering out combining characters
    no_accents = ''.join(char for char in normalized_text if unicodedata.category(char) != 'Mn')
    # Remove punctuation
    cleaned_text = no_accents.translate(str.maketrans('', '', string.punctuation))
    return cleaned_text


def process_text(text: str) -> str:
    text = lemmatizer_spacy(text)
    return remove_punctuation_and_accents(text)

In [4]:
df_patents_sample["titulo_lemmatized"] = df_patents_sample["titulo"].apply(process_text)
df_patents_sample.head()

Unnamed: 0,id_pedido,data_deposito,titulo,ipc,url,resumo,classifica_ipc,titulo_english,match_top_10_title,kind_effect,task,object,physical_effect,derived_from,titulo_lemmatized
0,BR 20 2020 001664 0,24/01/2020,CONFIGURAÇÃO APLICADA EM SOBRETAMPA METÁLICA,B65D 43/02,https://busca.inpi.gov.br/pePI/servlet/Patente...,RESUMO CONFIGURAÇÃO APLICADA EM SOBRETAMPA MET...,B65D 43/02,Configuration applied in metallic substrate,"{'Move', 'Break Down', 'Remove', 'Change Phase...",,Vedação,Embalagem,Impermeabilização,"O/a ""Apertar"" atua no/na Sólido dividido para ...",configuracao aplicar sobretampa metalico
1,BR 11 2021 015521 0,11/02/2020,"MISTURAS DE PESTICIDAS, MÉTODO PARA CONTROLAR ...",A01N 43/58,https://busca.inpi.gov.br/pePI/servlet/Patente...,. Misturas de pesticidas compreendendo como co...,A01N 43/58 ; A01N 33/20 ; A01N 37/34 ; A0...,"Pesticide mixture, method for controlling phyt...","{'Produce', 'Prevents', 'Change Phase', 'Mix',...",,Controlar,Fungos fitopatogênicos,Inibição de processos metabólicos,"O/a ""Misturar"" atua no/na Campo para produzir ...",mistura pesticida metodo controlar fungo nociv...
2,BR 11 2021 012872 7,08/01/2020,"CHAPA DE AÇO ELÉTRICO COM GRÃO ORIENTADO, MÉTO...",C21D 8/12,https://busca.inpi.gov.br/pePI/servlet/Patente...,. É provida uma chapa de aço magnética com grã...,C21D 8/12 ; C22C 38/00 ; C22C 38/60 ; H01...,"Electric power sheet with large-oriented, meth...","{'Produce', 'Move', 'Break Down', 'Change Phas...",,Orientar,Sólido dividido,Pó ferromagnético,,chapa aco eletrico grao orientar metodo fabric...
3,BR 11 2021 012274 5,21/01/2020,MÉTODO E SERVIDOR DE REDE PARA AUTENTICAÇÃO E ...,H04W 12/04,https://busca.inpi.gov.br/pePI/servlet/Patente...,MÉTODO E SERVIDOR DE REDE PARA AUTENTICAÇÃO E ...,H04W 12/04 ; H04W 12/00 ; H04W 12/06,Network method and server for key authenticati...,"{'Produce', 'Move', 'Break Down', 'Change Phas...",,Autenticar,Dispositivo de terminal,Validação de identidade,"O/a ""Proteger"" atua no/na Gás para produzir um...",metodo servidor rede autenticacao gerenciament...
4,BR 11 2021 014438 2,23/01/2020,PRESERVAÇÃO DE CÉLULAS-TRONCO,A01N 1/02,https://busca.inpi.gov.br/pePI/servlet/Patente...,. A invenção refere-se a um campo de preservaç...,A01N 1/02 ; C12N 5/07,Preservation of Tronco Cells,"{'Produce', 'Move', 'Break Down', 'Preserves',...",,Preservar,Célula-tronco,Inibição da degradação celular,"O/a ""Preservar"" atua no/na Sólido para produzi...",preservacao celulastronco


In [9]:
from tqdm import tqdm

result = []
for index, row in tqdm(df_patents_sample.iterrows(), total=df_patents_sample.shape[0]):
    res = find_matches(str(row["titulo_lemmatized"]), tarefas_vectors, top_n=10)
    result.append(res)

  similarity = np.dot(input_vector, vector) / (np.linalg.norm(input_vector) * np.linalg.norm(vector))
100%|██████████| 358/358 [01:32<00:00,  3.89it/s]


In [15]:
df_patents_sample["match_top_10_vec_distance_tasks"] = [set([tarefas[i] for i in res]) for res in result]
df_patents_sample.head()

Unnamed: 0,id_pedido,data_deposito,titulo,ipc,url,resumo,classifica_ipc,titulo_english,match_top_10_title,kind_effect,task,object,physical_effect,derived_from,titulo_lemmatized,match_top_10_vec_distance_tasks
0,BR 20 2020 001664 0,24/01/2020,CONFIGURAÇÃO APLICADA EM SOBRETAMPA METÁLICA,B65D 43/02,https://busca.inpi.gov.br/pePI/servlet/Patente...,RESUMO CONFIGURAÇÃO APLICADA EM SOBRETAMPA MET...,B65D 43/02,Configuration applied in metallic substrate,"{'Move', 'Break Down', 'Remove', 'Change Phase...",,Vedação,Embalagem,Impermeabilização,"O/a ""Apertar"" atua no/na Sólido dividido para ...",configuracao aplicar sobretampa metalico,"{concentrar, incorporar}"
1,BR 11 2021 015521 0,11/02/2020,"MISTURAS DE PESTICIDAS, MÉTODO PARA CONTROLAR ...",A01N 43/58,https://busca.inpi.gov.br/pePI/servlet/Patente...,. Misturas de pesticidas compreendendo como co...,A01N 43/58 ; A01N 33/20 ; A01N 37/34 ; A0...,"Pesticide mixture, method for controlling phyt...","{'Produce', 'Prevents', 'Change Phase', 'Mix',...",,Controlar,Fungos fitopatogênicos,Inibição de processos metabólicos,"O/a ""Misturar"" atua no/na Campo para produzir ...",mistura pesticida metodo controlar fungo nociv...,{segurar}
2,BR 11 2021 012872 7,08/01/2020,"CHAPA DE AÇO ELÉTRICO COM GRÃO ORIENTADO, MÉTO...",C21D 8/12,https://busca.inpi.gov.br/pePI/servlet/Patente...,. É provida uma chapa de aço magnética com grã...,C21D 8/12 ; C22C 38/00 ; C22C 38/60 ; H01...,"Electric power sheet with large-oriented, meth...","{'Produce', 'Move', 'Break Down', 'Change Phas...",,Orientar,Sólido dividido,Pó ferromagnético,,chapa aco eletrico grao orientar metodo fabric...,{orientar}
3,BR 11 2021 012274 5,21/01/2020,MÉTODO E SERVIDOR DE REDE PARA AUTENTICAÇÃO E ...,H04W 12/04,https://busca.inpi.gov.br/pePI/servlet/Patente...,MÉTODO E SERVIDOR DE REDE PARA AUTENTICAÇÃO E ...,H04W 12/04 ; H04W 12/00 ; H04W 12/06,Network method and server for key authenticati...,"{'Produce', 'Move', 'Break Down', 'Change Phas...",,Autenticar,Dispositivo de terminal,Validação de identidade,"O/a ""Proteger"" atua no/na Gás para produzir um...",metodo servidor rede autenticacao gerenciament...,{orientar}
4,BR 11 2021 014438 2,23/01/2020,PRESERVAÇÃO DE CÉLULAS-TRONCO,A01N 1/02,https://busca.inpi.gov.br/pePI/servlet/Patente...,. A invenção refere-se a um campo de preservaç...,A01N 1/02 ; C12N 5/07,Preservation of Tronco Cells,"{'Produce', 'Move', 'Break Down', 'Preserves',...",,Preservar,Célula-tronco,Inibição da degradação celular,"O/a ""Preservar"" atua no/na Sólido para produzi...",preservacao celulastronco,{preservar}


In [16]:
df_patents_sample.to_csv("../../data/processed/patents_inpi_llm_matched_10percent.csv", index=False)