# What will we do ‚Åâ

### In this Notebook, we will perform the following steps with some changes

1. Compare Model Architectures: We will start by comparing two different model architectures suitable for the target task. This includes understanding the differences between the models in terms of their structure, complexity, and capabilities.

2. Data Loading and Preprocessing: We will implement data loading and preprocessing routines for JSON format file. This will involve reading data from the respective files, handling missing values, and preparing the input features and labels for each model.

3. Model Training and Evaluation: With the data loaded and preprocessed, we will train both model architectures separately using data generated. We will then evaluate the performance of each model on a test dataset to measure their accuracy and generalization capabilities.

4. Compare Results: Once the models are trained and evaluated, we will compare their performance and analyze the results to understand how the different model architectures and data formats impact the model's performance and predictive capabilities.

5. Considerations for Model and Data Comparison: After the comparison, we will discuss the insights gained from the experiment. We will consider the implications of using different model architectures and data formats, and how they affect the model's strengths and weaknesses in tackling the target task.

6. Best Model Selection: Based on the comparison results, we will identify the best-performing model for the specific task. The selected model will be chosen considering its performance, computational efficiency, and other relevant criteria.

----------


By conducting this experiment with different model architectures and data formats, we aim to gain valuable insights into the interplay between model choice and data representation. This will help us make informed decisions in future projects when selecting appropriate models and data formats for specific tasks and datasets.

### Data Descripsion


Format: Each line represents a single word in a sentence.
- Column 1 (Sentence ID): The sentence ID is listed in the first column.

- Column 2 (Word): This column contains the word itself.

- Column 3 (POS Tag): It contains the Part-of-Speech (POS) tag for the word.

- Column 4 (Chunking Tag): This column contains the chunking tag for the word. Chunking is the process of dividing text into syntactically related chunks or phrases.

- Column 5 (NE Label): If a word is part of a named entity, the Named Entity (NE) label is provided in this column. Otherwise, it is filled with "O" to indicate that the word does not have an NE label.

- Column 6 (Nested NE Label): This column is not used in this format and is also filled with "O".

NE labels are annotated using the IOB notation as in the CoNLL Shared Tasks. There are 7 labels: B-PER and I-PER are used for persons, B-ORG and I-ORG are used for organizations, B-LOC and I-LOC are used for locations, and O is used for other elements.


# Let's do this.

In [1]:
!pip install pandas
!pip install spacy
!pip install pyvi
!pip install spacy-transformers

Collecting pyvi
  Downloading pyvi-0.1.1-py2.py3-none-any.whl (8.5 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m8.5/8.5 MB[0m [31m40.7 MB/s[0m eta [36m0:00:00[0m
Collecting sklearn-crfsuite (from pyvi)
  Downloading sklearn_crfsuite-0.3.6-py2.py3-none-any.whl (12 kB)
Collecting python-crfsuite>=0.8.3 (from sklearn-crfsuite->pyvi)
  Downloading python_crfsuite-0.9.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (993 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m993.5/993.5 kB[0m [31m42.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: python-crfsuite, sklearn-crfsuite, pyvi
Successfully installed python-crfsuite-0.9.9 pyvi-0.1.1 sklearn-crfsuite-0.3.6
Collecting spacy-transformers
  Downloading spacy_transformers-1.2.5-cp310-cp310-manylinux_2_17_x86_64.manyli

In [2]:
import spacy
import spacy.cli
import string
import spacy_transformers
from spacy.lang.vi import Vietnamese
import pandas as pd

## Data cleaning and something else...

#### Load dataset

In [18]:
df = pd.read_csv("spaCy.csv")
df

Unnamed: 0,sentence_id,word,pos_tag,chunk_tag,ne_label,nested_ne_label
0,vn-01,B√†,N,B-NP,O,O
1,vn-01,Mai,Np,I-NP,B-PER,O
2,vn-01,l√†,V,B-VP,O,O
3,vn-01,gi√°o_vi√™n,N,B-NP,O,O
4,vn-01,t·∫°i,E,B-PP,O,O
...,...,...,...,...,...,...
919,vn-80,b√†,N,B-PER,O,O
920,vn-80,T√¥,Np,I-PER,B-PER,O
921,vn-80,Y·∫øn,N,I-PER,I-PER,O
922,vn-80,Hoa,Np,I-PER,I-PER,O


Check the NULL values in table

In [19]:
df.isnull().sum()

sentence_id         0
word               32
pos_tag             3
chunk_tag           0
ne_label            0
nested_ne_label     5
dtype: int64

#### Drop 'NULL' values before set new index of the data

In [20]:
df = df.dropna()
df

Unnamed: 0,sentence_id,word,pos_tag,chunk_tag,ne_label,nested_ne_label
0,vn-01,B√†,N,B-NP,O,O
1,vn-01,Mai,Np,I-NP,B-PER,O
2,vn-01,l√†,V,B-VP,O,O
3,vn-01,gi√°o_vi√™n,N,B-NP,O,O
4,vn-01,t·∫°i,E,B-PP,O,O
...,...,...,...,...,...,...
919,vn-80,b√†,N,B-PER,O,O
920,vn-80,T√¥,Np,I-PER,B-PER,O
921,vn-80,Y·∫øn,N,I-PER,I-PER,O
922,vn-80,Hoa,Np,I-PER,I-PER,O


In [21]:
df.isnull().sum()

sentence_id        0
word               0
pos_tag            0
chunk_tag          0
ne_label           0
nested_ne_label    0
dtype: int64

In [22]:
df = df.reset_index()
df

Unnamed: 0,index,sentence_id,word,pos_tag,chunk_tag,ne_label,nested_ne_label
0,0,vn-01,B√†,N,B-NP,O,O
1,1,vn-01,Mai,Np,I-NP,B-PER,O
2,2,vn-01,l√†,V,B-VP,O,O
3,3,vn-01,gi√°o_vi√™n,N,B-NP,O,O
4,4,vn-01,t·∫°i,E,B-PP,O,O
...,...,...,...,...,...,...,...
882,919,vn-80,b√†,N,B-PER,O,O
883,920,vn-80,T√¥,Np,I-PER,B-PER,O
884,921,vn-80,Y·∫øn,N,I-PER,I-PER,O
885,922,vn-80,Hoa,Np,I-PER,I-PER,O


In [23]:
df = df.set_index("index")

You can see, the presence of duplicate IDs within each dataset is undesirable, so we will remove them.

In [24]:
df = df.drop(["sentence_id"], axis=1)
df

Unnamed: 0_level_0,word,pos_tag,chunk_tag,ne_label,nested_ne_label
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,B√†,N,B-NP,O,O
1,Mai,Np,I-NP,B-PER,O
2,l√†,V,B-VP,O,O
3,gi√°o_vi√™n,N,B-NP,O,O
4,t·∫°i,E,B-PP,O,O
...,...,...,...,...,...
919,b√†,N,B-PER,O,O
920,T√¥,Np,I-PER,B-PER,O
921,Y·∫øn,N,I-PER,I-PER,O
922,Hoa,Np,I-PER,I-PER,O


Save the cleaned dataset

In [25]:
import csv
df.to_csv("/content/spaCy_vs2.csv")

## Now we need to create a new dataset for Fine tune process.

Why we need create a new dataset for Fine tune procces ‚Åâ üòï


---


In `version 1`, you can see in this [link]().
How the data significantly influences a model's results can be demonstrated by using a data format similar to the previous JSON format. This format allows us to observe the impact of data representation on the model's performance. JSON organizes data hierarchically with nested objects, which can affect how the model processes and learns from the information.

The quality, quantity, and relevance of the data play a crucial role in determining how well the model generalizes to new, unseen examples. In supervised learning, where the model learns from labeled data, the training data directly influences the model's ability to learn patterns and make accurate predictions.

By using JSON data, we can assess how the model performs compared to other data formats. We may encounter variations in data loading, preprocessing, and input representations. It will be essential to ensure that the dataset remains relevant to the task at hand, and any changes in data format do not introduce biases or inconsistencies that could affect the overall evaluation.

Throughout this experiment, we will maintain the dataset's integrity and relevance, focusing on how data preparation impacts the model's behavior and predictions. This analysis will help us understand the importance of data processing and its role in achieving optimal model performance. üòÄ

In [26]:
# Convert JSON file dataset from CSV file
import csv
import json

def csv_to_json(csv_path, json_path):
  jsonArr = []

  with open(csv_path, "r", encoding="utf-8") as csv_file:
    # Load csv file data using csv library's dictionary reader
    csvReader = csv.DictReader(csv_file)

    # Convert each csv row into Python dict
    for row in csvReader:
      jsonArr.append(row)

  with open(json_path, "w", encoding="utf-8") as json_file:
    # Use the json.dump() method with the ensure_ascii=False parameter
    # to ensure that Unicode characters are written as-is without being escaped
    json.dump(jsonArr, json_file, ensure_ascii=False, indent=4)
    # print(jsonString)

  print("Completed")

csv_path = "/content/spaCy_vs2.csv"
json_path = "/content/spaCy_vs2.json"
csv_to_json(csv_path, json_path)

Completed


And now we have the JSON file like this:
```JSON
 [
    {
        "index": "0",
        "word": "B√†",
        "pos_tag": "N",
        "chunk_tag": "B-NP",
        "ne_label": "O",
        "nested_ne_label": "O"
    },
    {
        "index": "1",
        "word": "Mai",
        "pos_tag": "Np",
        "chunk_tag": "I-NP",
        "ne_label": "B-PER",
        "nested_ne_label": "O"
    },
    {
        "index": "2",
        "word": "l√†",
        "pos_tag": "V",
        "chunk_tag": "B-VP",
        "ne_label": "O",
        "nested_ne_label": "O"
    },
    ...
 ]
```


Or you can see documentation [here](https://spacy.io/api/cli#convert) to convert from `.csv` to `.json` file.


> **In the next steps, we will use the method done in version 1 and then evaluate the effectiveness of the model.**


Make sure GPU used

In [12]:
import torch
torch.cuda.is_available()

False

# Fine-tuning model using generated datasets

Load the pre-trained model

In [13]:
nlp = Vietnamese()
nlp

<spacy.lang.vi.Vietnamese at 0x7b4fb83febf0>

Test model

In [14]:
doc_string = "Th·∫©m ph√°n - Ch·ªß t·ªça phi√™n t√≤a B√† ƒê·∫∑ng Th·ªã Tuy·∫øt H·∫£i"
doc = nlp(doc_string)
for token in doc:
    print(token)

Th·∫©m ph√°n
-
Ch·ªß
t·ªça
phi√™n t√≤a
B√†
ƒê·∫∑ng Th·ªã Tuy·∫øt H·∫£i


Look it's good, right?
But try another example.

In [15]:
docs_string = "Th∆∞ k√Ω phi√™n t√≤a: B√† Tr√† Th·ªã Th√∫y Di·ªÖm ‚Äì Th∆∞ k√Ω T√≤a √°n nh√¢n d√¢n Qu·∫≠n 10, Th√†nh ph·ªë H·ªì Ch√≠ Minh."
tokens = nlp(docs_string)
for token in tokens:
  print(token)

Th∆∞ k√Ω
phi√™n t√≤a
:
B√†
Tr√† Th·ªã
Th√∫y Di·ªÖm
‚Äì
Th∆∞ k√Ω
T√≤a √°n
nh√¢n d√¢n
Qu·∫≠n
10
,
Th√†nh ph·ªë
H·ªì Ch√≠ Minh
.


Uhh, maybe something is wrong üòü  
Ok let's start next step.

#### Import json file

In [16]:
import json

with open("/content/spaCy_vs2.json", "r") as f:
  data = json.load(f)

#### Convert the data

In this project, we need to recognize human names, so I have added some conditions to filter out human names, reduce the size of the data file, and speed up the training process. If you want to recognize more components within a sentence, replace
```python
training_data = []
for example in data:
  ...
  entities = [(0, len(text), tag) for tag in (pos_tag, ne_label) if (pos_tag == "Np" and ne_label in ("B-PER", "I-PER"))]
    if entities:
      training_data.append({"text": text, "entities": entities})
```
with the following code:
```python
training_data = []
for example in data:
  ...
  entities = [(0, len(tag), tag) for tag in (pos_tag, chunk_tag, ne_label)]
  training_data.append({"text": text, "entities": entities})
```

In [None]:
training_data = []
for example in data:
  text = example["word"].replace(string.punctuation, "")
  pos_tag = example["pos_tag"]
  chunk_tag = example["chunk_tag"]
  ne_label = example["ne_label"]

  # filter to make sure we can collect all person names
  # entities = [(0, len(text), tag) for tag in (pos_tag, ne_label) if (pos_tag == "Np" and ne_label in ("B-PER", "I-PER"))]
  entitites = [(0, len(text), tag) for tag in ne_label if ne_label in ("B-PER", "I-PER")]
  if entities:
    training_data.append({"text": text, "entities": entities})

print([x for x in training_data])

#### Import training libraries

In [86]:
from spacy.tokens import DocBin
from tqdm import tqdm
from spacy.util import filter_spans

In [87]:
nlp = spacy.blank("vi")
nlp

<spacy.lang.vi.Vietnamese at 0x7b4fb23cc760>

#### Train the model
The below code will create a custom model with the data that we give. A binary file  named `train.spacy` will be generated at the end.

In [88]:
doc_bin = DocBin()
for training_example in tqdm(training_data):
    text = training_example['text']
    labels = training_example['entities']
    doc = nlp.make_doc(text)
    ents = []
    # Process each training example and add to DocBin
    for training_example in training_data:
      text = training_example['text']
      labels = training_example['entities']
      doc = nlp.make_doc(text)
      ents = []
      for start, end, ent_label in labels:
        for tag in ent_label:
          span = doc.char_span(start, end, label=ent_label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    filtered_ents = filter_spans(ents)
    doc.ents = filtered_ents
    doc_bin.add(doc)

doc_bin.to_disk("train.spacy")

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 140/140 [00:01<00:00, 111.41it/s]


Or you can convert the training JSON files to .spacy binary file using this command (update the file path with your own):

`!python -m spacy convert content/spaCy_vs2.json ./ -t spacy`  

see more [here](https://spacy.io/api/cli#convert)

In [89]:
!python -m spacy init fill-config base_config.cfg config.cfg

[38;5;2m‚úî Auto-filled config with all values[0m
[38;5;2m‚úî Saved config[0m
config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


Debuging

In [90]:
!python -m spacy debug data ./config.cfg

[1m
[38;5;2m‚úî Pipeline can be initialized with data[0m
[38;5;2m‚úî Corpus is loadable[0m
[1m
Language: vi
Training pipeline: tok2vec, ner
140 training docs
140 evaluation docs
[38;5;3m‚ö† 1 training examples also in evaluation data[0m
[38;5;3m‚ö† Low number of examples to train a new pipeline (140)[0m
[1m
[38;5;4m‚Ñπ 140 total word(s) in the data (1 unique)[0m
[38;5;4m‚Ñπ No word vectors present in the package[0m
[1m
[38;5;4m‚Ñπ 1 label(s)[0m
0 missing value(s) (tokens with '-' label)
[38;5;2m‚úî Good amount of examples for all labels[0m
[38;5;2m‚úî Examples without occurrences available for all labels[0m
[38;5;2m‚úî No entities consisting of or starting/ending with whitespace[0m
[38;5;2m‚úî No entities crossing sentence boundaries[0m
[1m
[38;5;2m‚úî 6 checks passed[0m


In [91]:
!python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy

[38;5;2m‚úî Created output directory: output[0m
[38;5;4m‚Ñπ Saving to output directory: output[0m
[38;5;4m‚Ñπ Using CPU[0m
[1m
[2023-07-22 08:28:28,974] [INFO] Set up nlp object from config
[2023-07-22 08:28:29,011] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-07-22 08:28:29,018] [INFO] Created vocabulary
[2023-07-22 08:28:29,018] [INFO] Finished initializing nlp object
[2023-07-22 08:28:29,323] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
[38;5;2m‚úî Initialized pipeline[0m
[1m
[38;5;4m‚Ñπ Pipeline: ['tok2vec', 'ner'][0m
[38;5;4m‚Ñπ Initial learn rate: 0.001[0m
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     83.33  100.00  100.00  100.00    1.00
192     200          0.05     96.68  100.00  100.00  100.00    1.00
392     400          0.00      0.00  100.00  100.00  100.00    1.00
592     600          0.00      0.00  100.00  100.00  100.00    1.0

In [92]:
nlp_ner = spacy.load("output/model-best")
nlp_ner

<spacy.lang.vi.Vietnamese at 0x7b4fb505b6a0>

### Test our model

In [93]:
doc = nlp_ner("√îng T√¥ B√¨nh Yi, sinh nƒÉm 1970 (C√≥ ƒë∆°n xin v·∫Øng m·∫∑t)")

spacy.displacy.render(doc, style="ent", jupyter=True)

In [94]:
doc1 = nlp_ner("Th∆∞ k√Ω phi√™n t√≤a: B√† Tr√† Th·ªã Th√∫y Di·ªÖm ‚Äì Th∆∞ k√Ω T√≤a √°n nh√¢n d√¢n Qu·∫≠n 10, Th√†nh ph·ªë H·ªì Ch√≠ Minh. ")
spacy.displacy.render(doc1, style="ent", jupyter=True)

In [101]:
doc2 = nlp_ner("B·ªã ƒë∆°n: √îng Nguy·ªÖn ƒêƒÉng T, sinh nƒÉm: 1989")
spacy.displacy.render(doc2, style="ent", jupyter=True)

In [103]:
ents = [(e.text, e.label_) for e in doc.ents]
ents

[('√îng', 'Np'), ('T√¥ B√¨nh Yi, sinh nƒÉm 1970 (C√≥ ƒë∆°n xin v·∫Øng m·∫∑t)', 'Np')]

# Transformer BERT using the same dataset

In [105]:
!python -m spacy init fill-config base_config_transfer.cfg config_transfer.cfg

[38;5;2m‚úî Auto-filled config with all values[0m
[38;5;2m‚úî Saved config[0m
config_transfer.cfg
You can now add your data and train your pipeline:
python -m spacy train config_transfer.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


In [106]:
!python -m spacy train config_transfer.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy

[38;5;4m‚Ñπ Saving to output directory: output[0m
[38;5;4m‚Ñπ Using CPU[0m
[1m
[2023-07-22 09:46:08,017] [INFO] Set up nlp object from config
[2023-07-22 09:46:08,036] [INFO] Pipeline: ['transformer', 'ner']
[2023-07-22 09:46:08,040] [INFO] Created vocabulary
[2023-07-22 09:46:08,041] [INFO] Finished initializing nlp object
Downloading (‚Ä¶)okenizer_config.json: 100% 28.0/28.0 [00:00<00:00, 163kB/s]
Downloading (‚Ä¶)lve/main/config.json: 100% 625/625 [00:00<00:00, 3.79MB/s]
Downloading (‚Ä¶)solve/main/vocab.txt: 100% 872k/872k [00:00<00:00, 18.3MB/s]
Downloading (‚Ä¶)/main/tokenizer.json: 100% 1.72M/1.72M [00:00<00:00, 46.6MB/s]
Downloading model.safetensors: 100% 672M/672M [00:03<00:00, 212MB/s]
Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.de

In [108]:
nlp = spacy.load("output/model-best")
nlp

<spacy.lang.vi.Vietnamese at 0x7b4fb0f92410>

### Testing

In [109]:
doc = nlp("√îng T√¥ B√¨nh Yi, sinh nƒÉm 1970 (C√≥ ƒë∆°n xin v·∫Øng m·∫∑t)")

spacy.displacy.render(doc, style="ent", jupyter=True)

In [110]:
doc1 = nlp("Th∆∞ k√Ω phi√™n t√≤a: B√† Tr√† Th·ªã Th√∫y Di·ªÖm ‚Äì Th∆∞ k√Ω T√≤a √°n nh√¢n d√¢n Qu·∫≠n 10, Th√†nh ph·ªë H·ªì Ch√≠ Minh. ")
spacy.displacy.render(doc1, style="ent", jupyter=True)

In [111]:
doc2 = nlp("B·ªã ƒë∆°n: √îng Nguy·ªÖn ƒêƒÉng T, sinh nƒÉm: 1989")
spacy.displacy.render(doc2, style="ent", jupyter=True)

In [None]:
text = 
'''N∆Ø·ªöC C·ªòNG H√íA X√É H·ªòI CH·ª¶ NGHƒ®A VI·ªÜT NAM
T√íA √ÅN NH√ÇN D√ÇN QU·∫¨N 10, TH√ÄNH PH·ªê H·ªí CH√ç MINH
- Th√†nh ph·∫ßn H·ªôi ƒë·ªìng x√©t x·ª≠ s∆° th·∫©m g·ªìm c√≥:
Th·∫©m ph√°n - Ch·ªß t·ªça phi√™n t√≤a: B√† L√™ Th·ªã Lan 
C√°c H·ªôi th·∫©m nh√¢n d√¢n:
1. B√† Nguy·ªÖn Th·ªã Thu H·∫±ng
2. √îng Nguy·ªÖn Vi T∆∞·ªùng Th·ª•y 
- Th∆∞ k√Ω phi√™n t√≤a: B√† Ph·∫°m H√† Thi√™n T√¢m - Th∆∞ k√Ω T√≤a √°n, T√≤a √°n nh√¢n d√¢n Qu·∫≠n 10, Th√†nh ph·ªë H·ªì Ch√≠ Minh.
- ƒê·∫°i di·ªán Vi·ªán ki·ªÉm s√°t nh√¢n d√¢n Qu·∫≠n 10, Th√†nh ph·ªë H·ªì Ch√≠ Minh tham gia phi√™n t√≤a: √îng Nguy·ªÖn Tu·∫•n Anh - Ki·ªÉm s√°t vi√™n
Ng√†y 06 th√°ng 01 nƒÉm 2020 t·∫°i tr·ª• s·ªü To√† √°n nh√¢n d√¢n Qu·∫≠n 10, Th√†nh ph·ªë H·ªì Ch√≠ Minh, x√©t x·ª≠ s∆° th·∫©m c√¥ng khai v·ª• √°n th·ª• l√Ω s·ªë: 629/2019/TLST-HNGƒê ng√†y 07 th√°ng 10 nƒÉm 2019 v·ªÅ tranh ch·∫•p ly h√¥n, theo Quy·∫øt ƒë·ªãnh ƒë∆∞a v·ª• √°n ra x√©t x·ª≠ s·ªë: 331/2019/QƒêXXST-HNGƒê ng√†y 12 th√°ng 12 nƒÉm 2019 v√† Quy·∫øt ƒë·ªãnh ho√£n phi√™n to√† s·ªë: 231/2019/QƒêST-HNGƒê ng√†y 25 th√°ng 12 nƒÉm 2019, gi·ªØa c√°c ƒë∆∞∆°ng s·ª±:
- Nguy√™n ƒë∆°n: B√† L√™ Ng√¢n H, sinh nƒÉm: 1989
ƒê·ªãa ch·ªâ: S·ªë 73 ƒë∆∞·ªùng Ph√≥ ƒê·ª©c Ch√≠nh, ph∆∞·ªùng V, Th√†nh ph·ªë Nha Trang, t·ªânh Kh√°nh Ho√†. (C√≥ ƒë∆°n xin v·∫Øng m·∫∑t)
- B·ªã ƒë∆°n: √îng Nguy·ªÖn ƒêƒÉng T, sinh nƒÉm: 1989
ƒê·ªãa ch·ªâ: S·ªë 132 ƒë∆∞·ªùng H√πng v∆∞∆°ng, Ph∆∞·ªùng X, Qu·∫≠n D, Th√†nh ph·ªë H·ªì Ch√≠ Minh. (V·∫Øng m·∫∑t)
N·ªòI DUNG V·ª§ √ÅN:
- T·∫°i ƒë∆°n kh·ªüi ki·ªán ng√†y 23/9/2019, c√πng c√°c t√†i li·ªáu, ch·ª©ng c·ª© c√≥ trong h·ªì s∆°, nguy√™n ƒë∆°n b√† L√™ Ng√¢n H tr√¨nh b√†y: B√† v√† √¥ng Nguy·ªÖn ƒêƒÉng T t·ª± nguy·ªán chung s·ªëng v√† ƒëƒÉng k√Ω k·∫øt h√¥n t·∫°i U·ª∑ ban nh√¢n d√¢n Ph∆∞·ªùng X, Qu·∫≠n D, Th√†nh ph·ªë H·ªì Ch√≠ Minh, theo gi·∫•y ch·ª©ng nh·∫≠n k·∫øt h√¥n s·ªë 98, quy·ªÉn s·ªë 01/2014 ng√†y 06/11/2014.
Sau khi k·∫øt h√¥n, v√¨ nhi·ªÅu nguy√™n nh√¢n trong ƒë√≥ c√≥ vi·ªác √¥ng T c√≥ quan h·ªá t√¨nh c·∫£m v·ªõi ng∆∞·ªùi ph·ª• n·ªØ kh√°c d·∫´n ƒë·∫øn v·ª£ ch·ªìng ƒë√£ b·∫Øt ƒë·∫ßu ph√°t sinh nhi·ªÅu m√¢u thu·∫´n. V√¨ mu·ªën n√≠u k√©o h·∫°nh ph√∫c gia ƒë√¨nh, b√† H ƒë√£ nhi·ªÅu l·∫ßn b·ªè qua nh∆∞ng √¥ng T v·∫´n kh√¥ng thay ƒë·ªïi. T·ª´ th√°ng 9 nƒÉm 2018, b√† H ƒë√£ d·ªçn ra kh·ªèi nh√† v√† v·ª£ ch·ªìng s·ªëng ly th√¢n cho ƒë·∫øn nay. Nh·∫≠n th·∫•y t√¨nh c·∫£m v·ª£ ch·ªìng kh√¥ng c√≤n kh·∫£ nƒÉng h√†n g·∫Øn n√™n b√† y√™u c·∫ßu To√† gi·∫£i quy·∫øt cho ly h√¥n ƒë·ªÉ ·ªïn ƒë·ªãnh cu·ªôc s·ªëng.
V·ªÅ con chung: B√† H khai, gi·ªØa b√† v√† √¥ng T chung s·ªëng kh√¥ng c√≥ con chung.
V·ªÅ t√†i s·∫£n chung: B√† H kh√¥ng y√™u c·∫ßu To√† √°n gi·∫£i quy·∫øt.
V√† n·ª£ chung: B√† H khai kh√¥ng c√≥
Ng√†y 10/12/2019, b√† H c√≥ ƒë∆°n ƒë·ªÅ ngh·ªã To√† √°n x√©t x·ª≠ v·∫Øng m·∫∑t.
To√† √°n t·ªëng ƒë·∫°t th√¥ng b√°o th·ª• l√Ω, c√°c vƒÉn b·∫£n t·ªë t·ª•ng kh√°c cho √¥ng T nh∆∞ng √¥ng T v·∫Øng m·∫∑t kh√¥ng c√≥ l√Ω do.
ƒê·∫°i di·ªán Vi·ªán Ki·ªÉm s√°t nh√¢n d√¢n Qu·∫≠n D ph√°t bi·ªÉu quan ƒëi·ªÉm v·ªÅ vi·ªác tu√¢n th·ªß ph√°p lu·∫≠t v·ªÅ t·ªë t·ª•ng c·ªßa Th·∫©m ph√°n v√† H·ªôi ƒë·ªìng x√©t x·ª≠ t·ª´ giai ƒëo·∫°n th·ª• l√Ω ƒë·∫øn khi ngh·ªã √°n l√† tu√¢n th·ªß ƒë√∫ng quy ƒë·ªãnh ph√°p lu·∫≠t, ƒë·∫ßy ƒë·ªß.
V·ªÅ n·ªôi dung: Ki·ªÉm s√°t vi√™n ƒë·ªÅ ngh·ªã ch·∫•p nh·∫≠n y√™u c·∫ßu c·ªßa nguy√™n ƒë∆°n. 
'''

documents = nlp(text)
ents = [(ents.text, ents.start_char, ents.end_char, ents.label_) for ents in documents.ents]
ents

You can see the the result is not too much better, cause 1 part of data in missing many tag likes: pos_tag, chunk_tag so the model does not reach the best state, you can see version 3 we using all tag in dataset, or Name_Entity_Recognition model in here