# 1. Intorduction
## 1.1 Definition
* **Relation Extraction (RE)** in Natural Language Processing (NLP) is the process of identifying and categorizing semantic relationships between entities in a text.
* It aims to extract meaningful connections, such as relationships between people, organizations, locations, or events.
* These relationships are often represented in the form of triplets: (entity1, relation, entity2), where "relation" describes how the two entities are connected.

## 1.2 Implementation
* **Relation Extraction** implementation by using deep learning models and transformers like BERT.
* Utilize **Hugging Face's** transformers library to fine-tune a BERT model for relation extraction.


# 2. Install Required Libraries
Install the **Hugging Face transformers** and datasets libraries

* **transformers:** Provides pre-trained models like **BERT.**
* **datasets:** Includes ready-to-use datasets and tools for dataset preparation.
* **torch:** Required for using **PyTorch** with the **BERT** model.
* **scikit-learn:** Provides tools for evaluation metrics.


In [1]:
!pip install transformers datasets torch scikit-learn

Collecting datasets
  Downloading datasets-3.0.2-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.2-py3-none-any.whl (472 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m472.7/472.7 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading xx

# 3. Prepare the Dataset
* For this example, we'll use a popular dataset such as the **TACRED dataset**, which contains sentences annotated with entity relationships.
* The dataset preparation involves formatting the data into input pairs (**sentence, entity1, entity2**) and a corresponding label for the relationship.

## 3.1 Load the BERT Model:
* **BertTokenizer** is used to tokenize the input sentences.
* **BertForSequenceClassification** is loaded with a pre-trained BERT model and modified for relation classification by setting **num_labels** to the number of unique relation types in the dataset.

In [2]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

## 3.2 Load the Dataset:
The **datasets** library is used to load a relation extraction dataset like **TACRED**.


In [5]:
# Load the TACRED dataset (or any other dataset)
dataset = load_dataset('DFKI-SLT/tacred', data_dir='/path/to/tacred')

FileNotFoundError: /path/to/tacred does not exist. Make sure you insert a manual dir via `datasets.load_dataset('DFKI-SLT/tacred', data_dir=...)` that includes the unzipped files from the TACRED_LDC zip. Manual download instructions: To use TACRED you have to download it manually. It is available via the LDC at https://catalog.ldc.upenn.edu/LDC2018T24Please extract all files in one folder and load the dataset with: `datasets.load_dataset('DFKI-SLT/tacred', data_dir='path/to/folder/folder_name')`