If you're opening this Notebook on colab, you will probably need to install 🤗 `Transformers` and 🤗 `Datasets` as well as other dependencies. 

* `datasets`
* `transformers`
* `rogue-score`
* `nltk`
* `pytorch`
* `ipywidgets`

*Note*: Since we are using the GPU to optimize the performance of the deep learning algorithms, `CUDA` needs to be installed on the device.

In [1]:
! pip install datasets transformers rouge-score nltk ipywidgets

Collecting datasets
  Downloading datasets-1.18.3-py3-none-any.whl (311 kB)
[K     |████████████████████████████████| 311 kB 8.3 MB/s 
[?25hCollecting transformers
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 61.1 MB/s 
[?25hCollecting rouge-score
  Downloading rouge_score-0.0.4-py2.py3-none-any.whl (22 kB)
Collecting fsspec[http]>=2021.05.0
  Downloading fsspec-2022.2.0-py3-none-any.whl (134 kB)
[K     |████████████████████████████████| 134 kB 69.8 MB/s 
Collecting xxhash
  Downloading xxhash-3.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |████████████████████████████████| 212 kB 72.5 MB/s 
Collecting aiohttp
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 41.9 MB/s 
[?25hCollecting huggingface-hub<1.0.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-p

When using `nltk`, `punkt` also needs to be installed. I guess it is not installed automatically. Not having `punkt` will result in an error during the analysis.

In [2]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

If you're opening this notebook locally, make sure your environment has an install from the last version of those libraries.

To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:

In [3]:
from huggingface_hub import notebook_login

notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


Then you need to install `Git-LFS`.

If you are not using `Google Colab`, you may need to install `Git-LFS` manually, since the code below may not work and depending on your operating system. You can read about `Git-LFS` and how to install it [here](https://git-lfs.github.com/).

In [4]:
! apt install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-470
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 1s (1,656 kB/s)
Selecting previously unselected package git-lfs.
(Reading database ... 155320 files and directories currently installed.)
Preparing to unpack .../git-lfs_2.3.4-1_amd64.deb ...
Unpacking git-lfs (2.3.4-1) ...
Setting up git-lfs (2.3.4-1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...


Make sure your version of `Transformers` is at least 4.11.0 since the functionality was introduced in that version:

In [5]:
import transformers

print(transformers.__version__)

4.17.0


You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/seq2seq).

# Fine-tuning a model on a summarization task

In this notebook, we will see how to fine-tune one of the [🤗`Transformers`](https://github.com/huggingface/transformers) model for a summarization task. We will use the [PubMed Summarization dataset](https://huggingface.co/datasets/ccdv/pubmed-summarization) which contains PubMed articles accompanied with abstracts.

![Widget inference on a summarization task](https://github.com/huggingface/notebooks/blob/master/examples/images/summarization.png?raw=1)

We will see how to easily load the dataset for this task using 🤗 `Datasets` and how to fine-tune a model on it using the `Trainer` API.

In [6]:
model_checkpoint = "deep-learning-analytics/wikihow-t5-small"

This notebook is built to run  with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a sequence-to-sequence version in the Transformers library. Here we picked the [`deep-learning-analytics/wikihow-t5-small`](https://huggingface.co/deep-learning-analytics/wikihow-t5-small) checkpoint. 

## Loading the dataset

We will use the [🤗 `Datasets`](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.  

In [7]:
from datasets import load_dataset, load_metric

raw_datasets = load_dataset("ccdv/pubmed-summarization")
metric = load_metric("rouge")

Downloading:   0%|          | 0.00/4.88k [00:00<?, ?B/s]

No config specified, defaulting to: pub_med_summarization_dataset/document


Downloading and preparing dataset pub_med_summarization_dataset/document to /root/.cache/huggingface/datasets/ccdv___pub_med_summarization_dataset/document/1.0.0/5792402f4d618f2f4e81ee177769870f365599daa729652338bac579552fec30...


Downloading:   0%|          | 0.00/779M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.7M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.8M [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset pub_med_summarization_dataset downloaded and prepared to /root/.cache/huggingface/datasets/ccdv___pub_med_summarization_dataset/document/1.0.0/5792402f4d618f2f4e81ee177769870f365599daa729652338bac579552fec30. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

The `dataset` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set:

In [8]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['article', 'abstract'],
        num_rows: 119924
    })
    validation: Dataset({
        features: ['article', 'abstract'],
        num_rows: 6633
    })
    test: Dataset({
        features: ['article', 'abstract'],
        num_rows: 6658
    })
})

To access an actual element, you need to select a split first, then give an index:

In [9]:
raw_datasets["train"][0]

{'abstract': "<S> background : the present study was carried out to assess the effects of community nutrition intervention based on advocacy approach on malnutrition status among school - aged children in shiraz , iran.materials and methods : this case - control nutritional intervention has been done between 2008 and 2009 on 2897 primary and secondary school boys and girls ( 7 - 13 years old ) based on advocacy approach in shiraz , iran . </S> <S> the project provided nutritious snacks in public schools over a 2-year period along with advocacy oriented actions in order to implement and promote nutritional intervention . for evaluation of effectiveness of the intervention growth monitoring indices of pre- and post - intervention were statistically compared.results:the frequency of subjects with body mass index lower than 5% decreased significantly after intervention among girls ( p = 0.02 ) . </S> <S> however , there were no significant changes among boys or total population . </S> <S> 

Since the `pubmed` data is extremely large, we are going to remove rows so that we have a training set of 8,000, a validation set of 2,000, and a test set of 2,000. 

In [10]:
raw_datasets["train"] = raw_datasets["train"].select(range(1, 8001))
raw_datasets["validation"] = raw_datasets["validation"].select(range(1, 2001))
raw_datasets["test"] = raw_datasets["test"].select(range(1, 2001))

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [11]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=5):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [12]:
show_random_elements(raw_datasets["train"])

Unnamed: 0,article,abstract
0,"interdigitating dendritic cell sarcoma ( idcs ) is a very rare neoplasm arising from interdigitating reticular cells , which participate in the immune response as antigen presenting cells that stimulate t lymphocytes [ 1 , 2 ] . tumor occurrence is usually seen at t - cell rich areas of lymph nodes in the cervical , mediastinal , and axillary regions ; however , the involvement of extra - nodal sites such as the spleen , testis , urinary bladder , and pleura have also been reported [ 3 - 6 ] . although various treatment modalities , including surgery , radiation therapy , chemotherapy , and combinations of these , have been tried , to date there is no consensus on the preferred treatment . among chemotherapeutic regimens such as chop ( cyclophosphamide , doxorubicin , vincristine , and prednisone ) , abvd ( doxorubicin , bleomycin , vinblastine , and dacarbazine ) , dhap ( dexamethasone , cisplatin , and high - dose cytarabine ) , epoch ( etoposide , prednisone , vincristine , cyclophosphamide , and doxorubicin ) , ice ( ifosfamide , carboplatin , and etoposide ) , and cisplatin / epirubicin [ 1 , 2 , 7 ] , only abvd , currently used for the treatment of hodgkin 's lymphoma , has been successful for the treatment of disseminated idcs . until now , 2 cases of idcs have been reported in korea [ 3 , 9 ] ; 1 patient with localized idcs showed a nearly complete response to a combination of chop chemotherapy and adjuvant radiation therapy ; however , the other patient , who presented with extra - nodal involvement of the pleura , died of progressive disease after 2 cycles of chop and 1 cycle of imep ( ifosfamide , methotrexate , etoposide , and prednisolone ) . we report the first case in korea of successful disseminated idcs treatment using only abvd chemotherapy . a 64-year - old man presented to the primary care physician with a 1-year history of nasal congestion . he was referred to our hospital because of abnormal physical findings in the nasal cavity . rhinoscopic examination showed an ulcerative lesion in the inferior concha , and physical examination indicated multiple lymphadenopathies in both axillae and subcutaneous nodules in the left back . except for thrombocytosis ( 47210/l ) , laboratory tests did not show abnormal findings . after obtaining written informed consent , an excisional biopsy of the inferior concha the lesion showed diffuse infiltration of spindle cells , large pleomorphic cells , foamy histiocytes , lymphocytes , and plasma cells ( fig . the distinction between inflammatory conditions , such as rhinoscleroma , and neoplastic lesions , such as rosai - dorfman disease , idcs , langerhans cell histiocytosis ( lch ) , and follicular dendritic cell sarcoma ( fdcs ) , was difficult ; therefore , various immunohistochemical studies were performed . on the basis of the immunohistochemical results , i.e. , a strong positive reaction for cd68 , lysozyme , lca , and s-100 protein but a negative reaction for cd34 , cd1a , smooth muscle actin , and cd21 ( fig . computed tomography ( ct ) scans of the chest , abdomen , and pelvis showed multiple enhancing nodules in the subcutaneous layer of the back , with lymphadenopathies in both axillae ( fig . 2a ) . a head and neck ct scan showed a soft tissue attenuating lesion in the right anterior ethmoid and nasal cavities and multiple lymphadenopathies on both sides of the neck level ii ( fig . further to the previously reported successful treatment of idcs with abvd chemotherapy , the same abvd doses ( 25 mg / m adriamycin , 10 mg / m bleomycin , 6 mg / m vinblastine , and 375 mg / m dacarbazine ) were infused on days 1 and 15 every 4 weeks . during chemotherapy , no significant complications were observed . after 8 cycles , ct scans of the chest , abdomen , pelvis , and neck showed complete resolution in the lymph nodes of both axillae and the subcutaneous nodules ( fig . 3b ) . ct and pet scans showed no evidence of relapse after 1 year . dendritic cell neoplasms are rare tumors , but they are being diagnosed with increasing frequency . the world health organization ( who ) classifies dendritic cell neoplasms into 5 groups : lch , langerhans cell sarcoma , idcs , fdcs , and not specified otherwise . among the tumors arising from reticular cells , idcs is difficult to diagnose because of histopathologic features that are similar to other tumor types ; therefore , a high index of suspicion is required , particularly in cases with extra - nodal involvement , considering its rarity . histological findings of an excisional biopsy specimen from the nasal concha showed diffuse infiltration of spindle - shaped cells and pleomorphic cells , in addition to foamy cells with various inflammatory cells . tumor cells in lch are positive for cd1a and s-100 protein , whereas tumor cells in fdcs are positive for cd21 , cd35 , and clusterin . in this case , the tumor cells were positive for cd68 , lysozyme , lca , and s-100 protein but were negative for cd21 and cd1a ; these findings were consistent with the idcs immunophenotype [ 1 , 2 , 11 - 14 ] . wide dissemination of tumors , as in this patient , has been described in most cases of idcs . the age range in reported cases of idcs is 61 - 88 years with a mean age of 71.2 years . because of its rarity , data regarding the clinical behavior of idcs and treatment outcomes are relatively scarce . in patients with localized disease , surgical resection has been reported to be the mainstay of treatment ; however , a recurrence rate of up to 40% has been reported . the role of chemotherapy and radiotherapy in idcs treatment has not been established . to date , chemotherapy for idcs includes the use of chemotherapeutic regimens for non - hodgkin 's disease or hodgkin 's disease ; however , results of these combination chemotherapies have been very disappointing . the duration of remission was usually short with frequent recurrences . because of a lack of clinical data , high - dose chemotherapy and autologous bone marrow transplantation have also not been recommended [ 1 , 2 , 11 , 15 ] . the abvd regimen is the only treatment known to have resulted in complete remission for disseminated idcs to date . here , we report the first case in korea of the successful treatment of disseminated idcs with 8 cycles of chemotherapy using the abvd regimen . further studies are warranted to arrive at a consensus for the successful treatment of idcs . however , the case reported here highlighted the efficacy of the abvd regimen , and we recommend abvd chemotherapy as a feasible treatment option for disseminated idcs in the future .","<S> interdigitating dendritic cell sarcoma ( idcs ) is a very rare and aggressive neoplasm that arises from antigen presenting cells . </S> <S> idcs usually involves lymph nodes ; however , extra - nodal involvement has also been reported . </S> <S> because a consistent standard therapy for idcs has not been established to date , we report a case of the successful treatment of disseminated idcs using abvd chemotherapy ( doxorubicin , bleomycin , vinblastine , and dacarbazine ) . </S> <S> a 64-year - old man was diagnosed with idcs on the basis of immunohistochemical findings of a biopsy specimen of the inferior nasal concha . </S> <S> immunohistochemical staining showed a positive reaction for cd68 , leukocyte common antigen , and s-100 protein , but a negative reaction for cd34 , cd1a , and cd21 . </S> <S> imaging studies showed cervical and axillary lymphadenopathies , subcutaneous nodules , and a soft tissue lesion in the nasal cavity . </S> <S> treatment with the abvd regimen resulted in complete remission after 8 cycles of chemotherapy . </S>"
1,"early loss of primary teeth is most commonly caused by inappropriate oral hygiene , dental injuries , and tooth decay . tooth decay continues to be the main causative factor for the high rate of loss . as a result , those of us in the pedodontics and orthodontics practice frequently encounter malocclusion problems that occur from the premature loss of these primary teeth in children . the premature loss of primary teeth is a major factor that can cause malocclusion in the sagittal , transverse , and vertical planes . studies have shown that the premature loss of primary teeth is associated with the reduction of the dental arch length and migration of the marginal and antagonist teeth , leading to rotation , crowding , and impaction of the permanent teeth . in addition , the reduction of the dental arch length is greater in the mandible than in the maxilla if a primary second molar , rather than primary first molar , is lost . in addition , this effect is also apparent if tooth loss occurs at an earlier age , and if it occurs in crowded dentition as opposed to well - spaced dentition . for instance , early loss of primary second molar , especially in the maxillary arch , results in arch length reduction due to mesial migration of permanent first molars . premature loss in the maxillary arch may require extractions of the permanent teeth to align the dental arch , whereas premature loss in the mandible may require long - term orthodontic treatment in most cases . in yemeni children , early loss of primary teeth correlates to a marked increase in the prevalence of primary teeth extractions that are not further evaluated for spacing treatment needs . because of the severity of the consequences that arise with the premature loss of primary teeth , we decided to study the prevalence of early loss of primary teeth among children in thamar city . a cross - sectional study was conducted involving all the children aged between 5 and 10 years , attending the clinic of child dentistry of thamar university dental school for dental care , during the academic year 2014 - 2015 . all the children ( n = 185 ) present in the clinic of child dentistry were invited to participate in the study . all the children who fulfilled the study inclusion criteria , ( a ) 5 to 10 years of age and ( b ) a parent or guardian agreeing to participate in the study , were included in the study . the exclusion criteria included : ( a ) medically comprised children , ( b ) a parent or guardian not willing to participate , and ( c ) children with uncooperative behavior to receive a clinical examination . the study was approved by the research ethics committee of faculty of dentistry , university of thamar . all procedures were performed with adequate understanding and written consent of the parents / guardians . detailed information of all participants personal data and general health was recorded through individual interviews conducted by a researcher on the day of the dental check - up . the early loss was classified according to the chronological age table of eruption of the permanent teeth proposed by kronfeld , and decreasing 12 months as proposed by cardoso et al . we took into consideration age , sex , general health , and type of missing tooth . all statistical analyses were performed with the statistical package for social sciences , version 10 ( spss inc . , the data were analyzed using descriptive statistics techniques to obtain the absolute and percentage frequency . chi - square tests were applied to verify the existence of significant associations among the variables at a level of significance of 5% ( p < 0.05 ) . of the total number of children included in this study group , 75 ( 40.54% ) had prematurely lost primary teeth ( 49.33% in boys and 50.67% in girls ) , and the prevalence peak was registered at approximately the age of 8 years [ table 1 ] . distribution of children with premature loss of primary teeth according to age and gender we found a total of 170 primary teeth that were prematurely lost , with fdi tooth number 75 ( 13.5% ) [ table 2 ] being lost at the highest rate . distribution of primary teeth that were affected the most by premature loss according to gender according to the number of prematurely lost teeth per person , 30 children ( 40% ) had 1 missing tooth , 22 children ( 29.3% ) had 2 missing teeth , 11 children ( 14.7% ) had 3 missing teeth , 4 children ( 5.3% ) had 4 missing teeth , 3 children ( 4% ) had 5 missing teeth , 4 children ( 5.3% ) had 6 missing teeth , and 1 child ( 1.3% ) had 8 prematurely lost primary teeth . according to the distribution of prematurely lost teeth relative to tooth type , the molars ( 60.6% ) were the most commonly prematurely lost teeth followed by the cuspids ( 27.6% ) and the incisors ( 11.8% ) [ tables 3 and 4 ] . distribution of primary teeth that were affected by premature loss according to tooth types ( central incisors , lateral incisors , cuspids , first molars , second molars ) relative to gender distribution of primary teeth that were affected by premature loss according to tooth types ( incisors , cuspids , molars ) relative to gender according to the distribution of prematurely lost teeth relative to the dental arch , the mandibular arch ( 53.5% ) had more cases than the maxillary arch ( 46.5% ) . the mandibular left quadrant had the highest loss of primary teeth ( 28.8% ) [ tables 5 and 6 ] . distribution of primary teeth that were affected by premature loss according to dental quadrants relative to gender distribution of primary teeth that were affected by premature loss according to dental arches relative to gender there was a noticeable increase in the prevalence of extractions of primary teeth which were not followed by space maintenance , particularly for cases with the early loss of primary teeth among yemeni children . therefore , the major aim of this cross sectional study to give a comprehensive overview of the premature loss of primary teeth condition in the target sample in order to predict the future health care needs in preventing the disturbances in the development of normal occlusion in children , with no interest in generalizing our findings to the total yemeni population . in addition , this study is , to the best of our knowledge , the first study to explore this issue among the children in yemen . in our study , the status of premature loss of primary teeth in the study group was high with a prevalence rate of 40.54% , which is in accordance to a recent study conducted in saudi arabia . the early loss of primary teeth status in the present study was represented by a sample of children who sought dental care at the clinic of child dentistry of thamar university dental school . therefore , we can expect that these children present with more dental treatment needs than the general population . this supposition may also be made because parents , and occasionally even dentists , do not emphasize the importance of the primary dentition of the child . these parents and dentists may believe that the prevention and treatment of the primary teeth is unimportant because these teeth will eventually be replaced anyhow , and these might explain this high rate of premature loss of these teeth in the present study . the present study also showed that there was not a statistically significant difference in the premature loss of primary teeth between boys and girls . this finding implies that premature loss of primary teeth in the study group is due to poor oral health care , rather than gender . in this study , the majority of children had one or two teeth prematurely lost , similar to that reported in a previous study . the present study revealed that the highest percentage of premature tooth loss was at the age of 8 years , and the primary molars were more prematurely lost than other primary teeth . the most frequently missing teeth were the lower left primary second molars , which is similar to the results reported by previous studies . this finding could be observed because the likelihood of streptococci mutans acquisition in infants increases with age or as the number of erupted teeth increases . the primary molars may be particularly susceptible to initial s. mutans colonization because they emerge into the oral cavity between 16 and 29 months of age and impact both fissured occlusal surfaces and concave approximal surfaces . this can result in caries of the primary molars , that if left untreated , might result in premature extraction and thus contributing to early loss . in addition , by analyzing the distribution of premature loss on the arches , we observed a higher prevalence in the mandibular arch . the greater loss of primary teeth in the mandibular arch may be due to food packing potential and greater plaque accumulation in the mandibular posterior region . in addition , saliva has anticarious properties and is relatively abundant in the maxillary molar teeth , thus reducing the rate of premature loss of these teeth . the premature loss of the primary teeth can cause problems due to loss of function and the increased possibility of migration of other teeth . the incidence of space closure increases with the time that elapses from the moment of extraction . previous studies have demonstrated that the closing rate of a space is higher for the maxillary than for the mandible arch but decreases after the first 6 months . a greater amount of time elapsed because extraction positively correlates with greater space loss , especially in extractions performed for primary second molars . therefore , it is necessary to increase awareness of the importance of oral health in our children and to inform parents of the potential for malocclusion problems caused by the early loss of primary teeth . parents of children with early loss of primary teeth should be advised to bring their children to the dental clinic to have space maintainers if necessary . the main limitation in the present study is the fact that sample size was small . in addition , this study was conducted on children who sought dental care at the clinic of child dentistry of thamar university dental school , hence , we can expect that these children present with more dental treatment needs than does the general population . therefore , generalization must be made carefully as this study population may not reflect the prevelance of premature loss of primary teeth in the children in yemen . further studies are needed to address this issue by observation of large groups of children from different regions in order to form reliable conclusion . the following conclusions can be drawn from the findings of this study : \n prevalence of early loss of primary teeth was high ( 40.54% ) , and was higher at 8 years of age.majority of the children had one missing tooth ( 40%).more number of teeth was lost in the mandibular arch ( 53.5%).the lower left primary second molar was the most commonly missing teeth ( 13.5% ) . \n prevalence of early loss of primary teeth was high ( 40.54% ) , and was higher at 8 years of age . the lower left primary second molar was the most commonly missing teeth ( 13.5% ) . this conclusion is important in light of existing studies that revealed the malocclusion problems associated with early loss of primary teeth . therefore , it is imperative to increase oral health awareness for our children and their parents to make them realize the significance of the primary teeth and how to care for them as well as the deleterious effect of the early loss of primary teeth . parents of children with early loss of primary teeth should be advised to bring their children to the dental clinic to have space maintainers if necessary .","<S> objectives : the premature loss of primary teeth is a potential risk factor for poor arch length development . </S> <S> adequate arch length is important to the progression of the permanent teeth . </S> <S> poor arch length can lead to crowding , ectopic eruption , or impaction of these teeth . </S> <S> this study is designed to assess the prevalence of premature loss of primary teeth in the 5 - 10-year - old age group.materials and methods : the study group included 185 children , that is , 91 boys and 94 girls . </S> <S> the dental examination was conducted by an experienced examiner under sufficient artificial light . </S> <S> data including patient age and missing teeth were collected . </S> <S> descriptive statistics were applied for data analysis , and from the results , chi - square tests were used at a level of significance of 5% ( p < 0.05).results : we observed a 40.54% prevalence of premature loss of primary teeth with no statistically significant difference between genders . </S> <S> the lower left primary second molar was the most commonly absent tooth in the dental arch ( 13.5%).conclusion : the status of premature loss of primary teeth was high in the study group . </S> <S> implementation of efficient educational and preventive programs to promote oral health would help children maintain a healthy primary dentition and eventually prevent the disturbances in the future development of normal occlusion . </S> <S> early detection and management of the space problems associated with the early loss of primary teeth would help in reducing malocclusion problems . </S>"
2,"hip resurfacing arthroplasty ( hra ) is a conservative alternative to total hip arthroplasty in a young and active patient , with the midterm survival reported between 95% and 96% . clinical outcomes in hip resurfacing have been shown to be dependent on both patient selection and surgical technique . femoral neck fracture remains a common failure mode in hip resurfacing and mechanical error , while preparing the femoral head has been well established as a risk factor for catastrophic neck fracture . the use of computer navigation has been shown to improve the accuracy of femoral component placement , thus reducing the likelihood of preparatory error . compared to conventional instrumentation , imageless computer navigation increases component alignment accuracy and reduces outliers . there is a challenging learning curve associated with hip resurfacing , with many technical errors occurring early within a surgeon 's experience . the use of computer navigation has been demonstrated to reduce the length of the initial learning curve and improve the surgeon 's ability to perform the procedure safely . despite these demonstrated advantages , imageless computer navigation is sparsely used in many surgical centers . the lack of widespread use may be attributed to availability as well as cost of the navigation systems . considering the predisposition to technical error early on in hip resurfacing , it would be advantageous for the surgeon as a trainee to utilize computer - based methods to optimize the surgical technique and solidify component implantation methodology . evidence suggests that the use of computer navigation in the operating room may improve the accuracy of freehand component placement in the absence of navigation . thus , there may be a role for computer navigation as a training device for novice surgeons , particularly in the context of learning challenging orthopedic procedures , to improve component implantation once navigation is discontinued . the aim of this study was to examine whether femoral component alignment improved with conventional mechanical guidewire jig following experience with using imageless computer navigation in hsa . 213 consecutive hip resurfacings were performed by a single surgeon ( ehs ) between december 2004 and december 2008 . we retrospectively compared the first 17 ( cohort 1 ) and last 9 ( cohort 2 ) hip resurfacings performed using the conventional lateral pin guidewire alignment jig [ figure 1 ] . cohort 1 was the surgeon 's initial 17 cases of hip resurfacing , which were performed prior to our center 's acquisition of an imageless computer navigation system ( vectorvision sr , brainlab , feldkirchen , germany ) . after the center acquired the navigation system , the surgeon performed 187 hip resurfacings using the computer navigation . in december 2008 , the navigation unit required replacing . in the period pending replacement of the unit , the surgeon performed nine birmingham hip resurfacings ( bhrs ) using the conventional jig ; these nine patients comprise cohort 2 . thus , the hip resurfacings in cohort 2 were performed after the surgeon had gained significant experience with using imageless computer navigation [ figure 2 ] . the birmingham hip resurfacing conventional lateral pin femoral guidewire alignment jig ( smith and nephew inc . ) ( a ) anteroposterior ( ap ) and lateral radiographs of a right bhr implanted using a conventional guidewire alignment jig prior to any experience with imageless computer navigation . ( b ) ap and lateral radiographs of a right bhr implanted using conventional guidewire alignment jig after experience with imageless computer navigation . ( c ) ap and lateral radiographs of a right bhr implanted using imageless computer navigation of the 17 patients comprising cohort 1 , 16 patients had a preoperative diagnosis of osteoarthritis and one patient was diagnosed with avascular necrosis , who , as a result of this diagnosis was excluded from the analysis . the mean age of the patients was 48.7 years ( sd 6.6 , range 39 - 63 ) with a mean body mass index ( bmi ) of 30.4 kg / m ( sd 3.9 , range 23.3 - 40.4 ) . cohort 2 included nine patients all of whom were males with a preoperative diagnosis of osteoarthritis . the mean age of this group was 52.6 years ( sd 10.8 , range 29 - 71 years ) with a mean bmi of 28.5 kg / m ( sd 3.0 , range 25.1 - 34.3 kg / m2 ) . the differences in age and bmi between cohorts were not found to be significant ( p > 0.203 ) . the 3-month postoperative digital anteroposterior and cross - table lateral x - rays were used for comparison . images were obtained via a computed radiography system ( directview cr850/950 ; eastman kodak , rochester , ny , usa ) using a standardized imaging technique and positioning protocol , and were stored on our institutional picture archive and communication systems server ( sienet magicstore ve50 ; siemens medical , erlangen , germany ) . an observer ( zm ) experienced in using digital radiograph templating software ( magicview 300 , siemens medical ) analyzed the radiographs , and was blinded to all patient data and operative dates . the component positions in both the coronal and sagittal planes were measured . the coronal stem shaft angle ( ssa ) was defined as the angle subtended by the diaphyseal axis of the femur and a line drawn from the center of the prosthesis along the component stem toward the lateral cortex of the femur . the sagittal stem neck angle ( sna ) was defined as the angle subtended by the neck and component stem axis . measured values for component alignment were compared to the preoperatively planned position determined by the senior surgeon 's ( ehs ) surgical protocol . the preoperative plan in each case positioned the component in 10 of valgus relative to the native neck shaft angle ( nsa ) of the femur in the coronal plane and neutral to the neck axis in the sagittal plane . the component was considered neutral in the sagittal plane if the degree of component anteversion or retroversion was within 10 of the native neck version . descriptive statistics were calculated using microsoft excel ( microsoft inc . , redmond , wa , usa ) to determine differences between the final component placement and the target position . spss 16 ( spss inc . , chicago , il , usa ) was used to calculate two - sample t - tests for comparison of demographics as well as alignment values between the two cohorts . statistical power was determined to be 85.2% ( = 0.05 , effect size d = 1.02 ) for the comparison of ssa . descriptive statistics were calculated using microsoft excel ( microsoft inc . , redmond , wa , usa ) to determine differences between the final component placement and the target position . chicago , il , usa ) was used to calculate two - sample t - tests for comparison of demographics as well as alignment values between the two cohorts . statistical power was determined to be 85.2% ( = 0.05 , effect size d = 1.02 ) for the comparison of ssa . coronal alignment of the femoral component in cohort 2 was more accurate than cohort 1 . the mean deviation of the ssa from the target alignment was 2.2 ( sd 2.2 , 95% ci 0.8-3.7 ) in cohort 2 and 5.6 ( sd 4.3 , 95% ci 3.6-7.6 ) in cohort 1 [ figure 3 ] . the variance of cohort 2 ( 4.9 , range 4 varus to 7 valgus ) was threefold less than cohort 1 ( 17.6 , range 14 varus to 1 valgus ) . the mean coronal alignment in cohort 1 erred in varus relative to the planned ssa [ figure 3 ] . the component version in cohort 2 was also more accurate than cohort 1 [ figure 4 ] . the mean deviation from the target sna of cohort 2 had a mean difference of 4.0 ( sd 2.2 , 95% ci 2.6-5.4 ) , while that of cohort 1 was 7.3 ( sd 5.3 , 95% ci 4.8-9.9 ) . the variance in cohort 2 ( 27.7 , range 8.2 retroversion to 3.6 anteversion ) was half that of cohort 1 ( 4.7 , range 17.2 retroversion to 5.8 anteversion ) [ figure 4 ] . four implants in cohort 1 were considered to be retroverted ( > 10 ) [ figure 5 ] . box and whisker plot of stem shaft angle accuracy for the two conventional jig cohorts ( negative values denote relative varus and positive values denote relative valgus ) box and whisker plot of stem - neck angle accuracy of the two conventional jig cohorts ( negative values denote retroversion and positive values denote anteversion ) comparison of the accuracy of implant positioning using a conventional jig between the pre and post navigation cohorts hip resurfacing provides a viable bone conserving option for a young , active patient with end - stage hip disease . in addition to patient selection , surgical technique contributes greatly to the clinical outcomes of the procedure . in spite of many advances in surgical technique , femoral neck fracture remains a concern with hip resurfacing and continues to be the most common reason for revision . the etiology of femoral neck fracture in hip resurfacing has been studied thoroughly , and although the causes are often multifactorial , the biomechanics of implant alignment play a large role in resurfacing construct strength and resilience . previous biomechanical studies investigating implant alignment have shown that relative valgus alignment of the femoral component strengthens the proximal femur and may be protective against neck fracture . in addition , studies looking at femoral neck notching have demonstrated that as little as a 2-mm superior femoral neck notch may increase the risk of neck fracture . despite this knowledge , notching of the femoral neck and femoral components implanted in relative varus are still encountered , particularly during the surgical learning period for this procedure . these adverse events may be attributed to the difficulty of the procedure or the lack of experience of the surgeon . in this study , we found that the cohort of hip resurfacing patients following experience using computer navigation ( cohort 2 ) was more accurate and showed less variance in component positioning . the mean ssa for cohort 2 was 3.4 less than the mean ssa for cohort 1 . this decrease in mean ssa for component positioning results in a decrease of stress across the superior neck , potentially reducing the risk of femoral neck fracture . further , improved accuracy of positioning in the sagittal plane may theoretically reduce the risk of impingementn [ figure 5 ] . it has been well documented that computer - assisted surgery by way of imageless navigation functions to curtail femoral implant malalignment in hip resurfacing . however , the cost and availability of current navigation systems make ubiquitous use unrealistic , particularly for those centers that perform only small volumes of hip resurfacings . in order for surgeons new to hip resurfacing to perform optimally using conventional instrumentation , it may be necessary to first train using computer - assisted methods in order to enhance both surgical technique and component insertion protocol . a concern of using computer navigation for training purposes is reliance on the technology with poor retention performance following discontinuation . thus , this study looked to establish whether femoral component implantation accuracy utilizing a conventional guidewire alignment jig improves following the use of imageless computer navigation in hsa . in this study , a limitation of this study is the inability to account for the learning process that would occur normally after performing a series of hip resurfacings . in a study by seyler et al . , fellowship trained staff surgeons with experience in hip resurfacing ( > 75 cases ) exhibited a greater scatter of insertion angles when using conventional instrumentation than less experienced residents using imageless navigation as a surgical aid . this not only demonstrates the accuracy of computer navigation but also that experience alone may not prevent a greater degree of inaccuracy when using conventional manual instrumentation . a second limitation is that the number of resurfacings performed using a conventional jig is small relative to the number performed using computer navigation . the optimal number of procedures to achieve competency and a higher level of accuracy using conventional guidewire alignment jigs may be smaller than in the current study . further investigation is required to determine the ideal number of training cases required in order to obtain proficiency using conventional instrumentation in hip resurfacing . lastly , the lateral pin jig utilized in this study may not be representative of other guidewire alignment devices . the results of this study may not be extrapolated to other conventional guidewire alignment jigs in hip resurfacing . in this study , all femoral components implanted with a manual jig after acquiring experience using imageless navigation achieved the desired minimum of 10 of valgus relative to the native nsa and all were considered to have neutral sna angles . this is compared to three implants which were positioned more than 10 varus relative to the target ssa and another four which were considered retroverted in the group performed prior to experience using navigation . the improved use of the manual jig may be attributable to an increased familiarity with the location of the optimal guidewire insertion point . often the native anatomy of the end - stage hip disease patient is distorted with osteophytes and remodeled bone which can prove problematic when using a manual jig , as alignment and positioning depend largely on a visual assessment of the local anatomy for guidewire placement . the results from this study show improved accuracy and precision using a conventional guidewire alignment jig after training with computer navigation . this improvement conflicts with the literature on cognitive motor learning which suggests that the form of feedback that computer - assisted surgery provides may actually be detrimental to learning . according to motor learning theory , individuals learn new motor skills by evaluating available feedback to alter future performance . feedback can either be intrinsic ( as a natural consequence of the action ) or extrinsic ( from an external source such as an instructor or a computer ) . computer navigation provides a form of extrinsic feedback , or continuous concurrent feedback , in which continuous visual feedback guides the trainee to the correct position , thus minimizing errors and reinforcing proper technique . it has been hypothesized , however , that concurrent feedback does not contribute to retention of task performance as a result of the learner developing dependence on extrinsic feedback or being distracted from using intrinsic feedback . in contrast , a prospective randomized study by gofton et al . analyzing the effect of computer navigation on the learning of surgical skills by trainees demonstrated that concurrent feedback during the insertion of the acetabular cup in hip replacement did not compromise the learning process of trainees . this finding is supported by a systematic review by saithna and dekker looking at the influence of computer navigation in hip resurfacing training in which they concluded that there exists minimal evidence to support concerns regarding the detrimental impact of computer navigation on trainee learning and subsequent performance in hip resurfacing . the study demonstrates that femoral component placement utilizing conventional instrumentation may be more accurate following experience using imageless computer navigation . training or experience using computer navigation may provide the surgeon with appropriate feedback to facilitate adequate motor skill acquisition and spatial awareness that can be transferred in turn to conventional instrumentation . the success of hip resurfacing is particularly sensitive to surgical technique and component alignment ; training with computer navigation early in the learning curve may help optimize the subsequent use of conventional hip resurfacing instrumentation .","<S> background : the use of computer navigation has been shown to improve the accuracy of femoral component placement compared to conventional instrumentation in hip resurfacing . whether exposure to computer navigation improves accuracy when the procedure is subsequently performed with conventional instrumentation without navigation has not been explored . </S> <S> we examined whether femoral component alignment utilizing a conventional jig improves following experience with the use of imageless computer navigation for hip resurfacing.materials and methods : between december 2004 and december 2008 , 213 consecutive hip resurfacings were performed by a single surgeon . </S> <S> the first 17 ( cohort 1 ) and the last 9 ( cohort 2 ) hip resurfacings were performed using a conventional guidewire alignment jig . in 187 cases , </S> <S> the femoral component was implanted using the imageless computer navigation . </S> <S> cohorts 1 and 2 were compared for femoral component alignment accuracy.results:all components in cohort 2 achieved the position determined by the preoperative plan . </S> <S> the mean deviation of the stem shaft angle ( ssa ) from the preoperatively planned target position was 2.2 in cohort 2 and 5.6 in cohort 1 ( p = 0.01 ) . </S> <S> four implants in cohort 1 were positioned at least 10 varus compared to the target ssa position and another four were retroverted.conclusions:femoral component placement utilizing conventional instrumentation may be more accurate following experience using imageless computer navigation . </S>"
3,"aortoesophageal fistula is a rare , devastating and usually fatal condition which has multiple aetiological factors . the first report , in 1818 , described the death of a 28-year - old soldier who exsanguinated after ingesting a beef bone fragment . a comprehensive review , published by hollander and quick , which included 500 cases of fistula , identified three major causes of aortoesophageal fistula , the main aetiologic factor being aortic disease with 54.2% of cases secondary to rupture of a descending thoracic aorta aneurysm into the oesophagus . foreign body ingestion ( 19.2% ) and advanced esophageal carcinoma ( 17.0% ) were the next commonest causes . regardless of cause , the optimal management of the fistula remains controversial , and recent literature has focused on the role of endovascular techniques and open thoracic surgery . however , high mortality from the condition usually results from massive uncontrolled haemorrhage prior to these interventions being possible . although non - surgical measures to control the initial haemodynamic insult caused by haemorrhage have been described , this remains a relatively unexplored area of management . a 47-year - old caucasian man with recently diagnosed advanced esophageal cancer attended the imaging department for an ultrasound scan . in the waiting area , the patient had a massive haematemesis and proceeded into a peri - arrest state . tech , seoul , distributed by mtw , wesel , germany ) was inserted across the lesion . ct confirmed an esophageal tumour in continuity with the right main bronchus with an associated right lung abscess . an overlapping fully covered 12 18 alveolus aero stent ( alveolus , charlotte , n.c . , 1 ) , presumably because the bronchial fistula had been adequately sealed with the stents , creating an undrained collection in the lung which subsequently fistulated through to the pleural cavity . while awaiting a video - assisted thoracoscopic surgical procedure to drain and seal the abscess and empyema , he spontaneously drained the abscess by coughing up nearly 1 litre of pus in 24 h ; this led to a dramatic improvement in his clinical state and resolution of his chest x - ray changes . a week before the ultrasound scan he experienced mid - thoracic pain and small episodes of haematemesis . when he collapsed , he was tachycardic ( heart rate 128 bpm ) with unrecordable blood pressure . its clearance identified a large arterial bleeding point , just above the proximal end of the stent in the mid - esophagus ( fig . a further 28 mm diameter niti - s covered esophageal stent ( taewong medical , seoul , korea ) was inserted to tamponade the bleeding point ( fig . despite aggressive resuscitation with fluids and 8 units of blood , there was no improvement in his haemodynamic status until after stent deployment . the patient was transferred to the intensive care unit where fresh bleeding continued to be aspirated from his nasogastric tube overnight . a further 8 units of blood and 6 units of fresh frozen plasma were transfused to maintain his haemoglobin between 7 and 7.7 g / dl . with this ct angiography was performed and demonstrated an aortoesophageal fistula at the level of the carina with ongoing bleeding into the mediastinum ( fig . was inserted into the thoracic aorta distal to the origin of the left subclavian artery sealing the fistula . he survived for 2 months at home , during which he was able to attend to his personal affairs , before dying of disseminated malignancy . this case highlights the successful non - surgical management of a massive bleed caused by aortoesophageal fistula . the esophageal stent acted as a means to tamponade and provided partial initial control of the bleeding point . this allowed time for appropriate endovascular intervention in the form of a thoracic endoluminal stent device to further stabilise the haemodynamic status of the patient . the challenge to recognise and treat successfully before the patient exsanguinates is probably why success stories for closure of aortoesophageal fistulas are few . high clinical suspicion and recognition of warning symptoms such as mid - thoracic pain and sentinel arterial upper gastrointestinal haemorrhage is crucial . transient self - limiting ' herald ' bleeds may precede fatal exsanguination by more than 24 h . this ' window of opportunity ' allows early transfer to a specialist centre . high clinical suspicion and first reported using an endoluminal stent graft for repair of abdominal aortic aneurysms , thoracic endovascular aortic repair ( tevar ) has become more common in the management of aortoesophageal fistula , with much of the literature focusing on endovascular and open surgical management . a recently reported successful surgical approach advocated open transthoracic esophageal resection , cervical esophagectomy and gastrostomy . the patient had developed a fistula secondary to previous tevar , so the authors could not decide the best approach to salvage the thoracic aorta . although successful in stabilising the patient , this highly invasive approach may not be suitable for patients with advanced esophageal malignancy . less invasive successful endovascular management of fistulas in patients with esophageal malignancy has been reported . one report describes the use of a dacron prosthesis interposed into the descending thoracic aorta to restore aortic flow as successful endovascular intervention ; the same patient later underwent an esophagectomy for definitive treatment . a separate report describes successful endovascular intervention to stabilise bleeding , although again this patient also went on to require further definitive gastrointestinal surgical management . further reports [ 9 , 10 , 11 ] appear to suggest that although tevar is a useful method for achieving hemodynamic stability , for definitive long - term management the patient will require some form of open surgical intervention . of course with advanced esophageal malignancy , patients may simply not be suitable for surgery , leaving tevar as their only option . the problem with tevar remains that it does not prevent immediate exsanguination in patients admitted with fistula . it was found in patients who underwent tevar that early esophageal repair appeared to improve survival . literature on the successful esophageal management of the initial exsanguination remains limited , with the successful use of a sengstaken - blakemore tube prior to tevar being reported . newer techniques have been emerging , such as the use of cyanoacrylate embolisation of the fistula followed by tevar . the use of esophageal stenting as a means of controlling initial exsanguination remains a relatively unexplored area of management . in conclusion , with early recognition , esophageal stenting may have a role in the initial emergency control of bleeding due to aortoesophageal fistula , but clearly , immediate access to anaesthetic , interventional endoscopic and radiological services is mandatory .","<S> aortoesophageal fistulas are a rare but commonly fatal complication of esophageal cancer </S> <S> . reports of successfully managed cases are few , with high mortality and morbidity usually resulting from failure to control the initial massive haemodynamic insult . </S> <S> we report the case of a 47-year - old caucasian man with recently diagnosed advanced esophageal cancer who suffered an episode of massive haematemesis . </S> <S> emergency gastroscopy revealed an arterial bleeding point in the proximal esophagus . </S> <S> a self - expanding metal esophageal stent was placed to achieve initial partial haemostasis . </S> <S> ct angiography confirmed an aortoesophageal fistula . </S> <S> an endoluminal stent device was thus inserted within the thoracic aorta stabilising the bleeding point . </S> <S> the patient subsequently made an uneventful recovery and was discharged on long - term antibiotics for palliative care . </S> <S> he survived for 2 months at home before dying of disseminated malignancy . </S> <S> the successful use of esophageal stenting as a means of achieving haemostasis , allowing time for endovascular intervention , is as yet a relatively unexplored area of management of this rare condition . </S>"
4,"the most common site is legs and melanomas in men are most common on the back . melanoma of the clivus is an extremely rare case presentation with only a few cases reported in the literature . conventional imaging techniques like computed tomography ( ct ) and magnetic resonance imaging ( mri ) may be suboptimal in evaluating such tumor , and may lead to inaccurate staging . a multimodality whole body imaging technique , 2-deoxy-2-[18f ] fluoro - d - glucose positron emission tomography / ct ( 18f - fdg pet / ct ) is being increasingly used in oncology for staging of multiple malignancies to know the spread of the tumor in the body . this rare case is important because it highlights the extensive disease that can be caused by a clival tumor and the role of noninvasive imaging , that is , 18f - fdg pet / ct in correct staging and hence , guiding further management of the disease . a 55-year - old woman , presented to the hospital with chief complaints of headache , decreased vision in the left eye , and occasional episodes of vomiting since 3 months . mri brain revealed altered signal intensity lesion with solid , hemorrhagic , and few cystic components in basiocciput , basisphenoid , clivus , sella , and right petrous apex ; displacing optic chiasma superiorly . there was associated soft tissue component extending into cavernous sinus with partial encasement of cavernous segment of right internal carotid artery . cemr study revealed a large moderately enhancing mass lesion involving the clivus with sellar - suprasellar extension with encasement of bilateral internal carotid arteries suggestive of plasmacytoma / chordoma or metastasis [ figure 1 ] . she underwent endonasal transsphenoidal excision of clival tumor and the black colored , relatively avascular tumor was confirmed to be melanocytic melanoma of clivus on histopathological examination . the patient was thoroughly examined to rule out any lesion on the skin and mucosa with other investigations including chest x - ray . a week after the surgery , this patient was referred to our department for a whole body 18f - fdg pet / ct scan for restaging . whole body pet - ct scan was performed after intravenous ( iv ) administration of 10 mci of 18f - fdg . pet and contrast - enhanced ct images were acquired and reconstructed to obtain transaxial , coronal , and sagittal views . the study revealed residual hypermetabolic well - defined lobulated soft tissue lesion in the basisphenoid and sella turcica region extending into the extraaxial space of right middle cranial fossa causing destruction of the sella turcica , sphenoid sinus , dorsal sella , and clivus ; suggestive of residual disease . also multiple metabolically active skeletal lesions were noted suggestive of skeletal metastasis [ figure 2 ] . ( a and b ) magnetic resonance images - t1 and t2 weighted axial sections of the brain ( preoperative ) showing altered signal intensity lesion with solid , hemorrhagic , and few cystic components in basiocciput , basisphenoid , clivus , sella , and right petrous apex ; displacing optic chiasma superiorly associated with soft tissue component extending into cavernous sinus with partial encasement of cavernous segment of right internal carotid artery ( a ) maximal intensity projection image of the patient from base of skull to mid - thigh showing focal areas of hypermetabolism throughout the body corresponding to multiple metastatic skeletal lesions . physiological uptake noted in heart , liver , bowel , kidneys , and urinary bladder , ( b ) sagittal positron emission tomography and fused pet - computed tomography images reveal abnormal fluoro-2-deoxy - d - glucose uptake in spinal column corresponding to lytic lesions on ct , ( c ) metabolically active well - defined lobulated soft tissue lesion in basisphenoid and sella turcica region , extending into the extraaxial space of right middle cranial fossa and indenting the medial temporal lobe causing destruction of the sella turcica , sphenoid sinus , dorsal sella , and clivus , ( d ) hypermetabolic lytic intradiploic lesions noted in left anterior frontal , high frontal , and parietal region eighty five percent of the patients diagnosed in early stages can be cured with surgery . primary intracranial malignant melanoma is a rare entity with incidence estimated to be 0.005 cases per 100,000 population . the age of the patients usually range from 15 - 71 years , with a peak incidence in the 5 decade . symptoms at presentation include headache ; vomiting due to intracranial hypertension ; hydrocephalus ; focal neurological deficits due to compression of the brain , spinal cord , or cauda equina ; subarachnoid hemorrhage ; and seizures . to our knowledge , very few cases of primary melanoma of the clivus have been cited in the literature previously . metastases involving this area have been previously described as a single case report or included in series with other skull base tumors . in 2009 , a literature review was performed by pallini et al . , which reveals that out of 46 patients who underwent surgery for clival bone tumor , seven proved to be metastatic , representing 0.18 and 0.42% , respectively of intracranial and skull base tumors which were treated in their institution in the study period between january 1995 and december 2007 . the primary tumors associated were lung adenocarcinoma ( n = 2 ) , prostate carcinoma ( n = 2 ) , skin melanoma ( n = 1 ) , hepatocarcinoma ( n = 1 ) , and lung squamous cell carcinoma ( n = 1 ) . in 2010 , chaudhary et al . , presented a case of an atypical clival meningeal melanoma treated with a multidisciplinary staged transcrural and transsphenoidal endoscopic surgical approach . no other metastases was evident for 2 years after initial symptoms and with no evidence of a cutaneous source , diagnosis of a primary meningeal lesion of the clivus was made . bone metastases occur in a significant proportion of patients with metastatic melanoma . in such patients survival , in 108 patients in 2008 , which revealed median survival following diagnosis of bone metastases in malignant melanoma to be 3.2 months ( range 0.3 - 47.4 months ) . bone metastases most commonly occurred in patients with the primary melanoma originating on the back and lower limbs and spine was the commonest site of bone involvement , followed by ribs , pelvis , long bones , and skull . fdg pet is a sensitive and specific technique for patients with melanoma but has limitations with small ( less than 1 cm ) , pulmonary , and brain metastases . it is felt to be superior to ct alone in detecting abdominal , nodal , subcutaneous , and skin sites . it is useful in assessing extent of disease in patients with surgically resectable disease by conventional methods as it may render them unresectable in a considerable population . in our patient , surgical removal of the tumor was done from an outside institution and was referred to us for further management . pet - ct scan was performed for the patient in view of histological diagnosis of melanocytic melanoma . also , no primary site could be localized after thorough general examination of the skin . the findings on pet - ct scan suggest that the clival mass represents the primary site of malignancy . this case adds to the literature on the occurrence of intracranial malignant melanoma in patients with extensive skeletal metastasis , and supports the finding that such tumors need to be followed carefully .","<S> malignant melanoma of the clivus is a rare entity , for which there is little evidence - based literature for guiding clinicians to understand the importance of disease staging via noninvasive imaging strategy . </S> <S> this report highlights the case of a 55-year - old lady with histopathologically confirmed melanocytic melanoma of the clivus postoperative status , with multiple skeletal metastasis , demonstrated on 2-deoxy-2-[18f ] fluoro - d - glucose positron emission tomography / computed tomography ( 18f - fdg pet / ct scan ) . the experience gained with this patient demonstrates the feasibility and usefulness of this noninvasive application in accurate staging and hence , correct decision making regarding further treatment . </S>"


The metric is an instance of [`datasets.Metric`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric):

In [13]:
metric

Metric(name: "rouge", features: {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}, usage: """
Calculates average rouge scores for a list of hypotheses and references
Args:
    predictions: list of predictions to score. Each predictions
        should be a string with tokens separated by spaces.
    references: list of reference for each prediction. Each
        reference should be a string with tokens separated by spaces.
    rouge_types: A list of rouge types to calculate.
        Valid names:
        `"rouge{n}"` (e.g. `"rouge1"`, `"rouge2"`) where: {n} is the n-gram based scoring,
        `"rougeL"`: Longest common subsequence based scoring.
        `"rougeLSum"`: rougeLsum splits text using `"
"`.
        See details in https://github.com/huggingface/datasets/issues/617
    use_stemmer: Bool indicating whether Porter stemmer should be used to strip word suffixes.
    use_agregator: Return aggregates if this is set to True
Retu

You can call its `compute` method with your predictions and labels, which need to be list of decoded strings:

In [14]:
fake_preds = ["hello there", "general kenobi"]
fake_labels = ["hello there", "general kenobi"]
metric.compute(predictions=fake_preds, references=fake_labels)

{'rouge1': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rouge2': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeL': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeLsum': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0))}

## Preprocessing the data

Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 `Transformers` `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that the model requires.

To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

That vocabulary will be cached, so it's not downloaded again the next time we run the cell.

For this we need `sentencepiece` installed.

In [15]:
! pip install sentencepiece

Collecting sentencepiece
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[?25l[K     |▎                               | 10 kB 26.2 MB/s eta 0:00:01[K     |▌                               | 20 kB 33.6 MB/s eta 0:00:01[K     |▉                               | 30 kB 14.3 MB/s eta 0:00:01[K     |█                               | 40 kB 7.2 MB/s eta 0:00:01[K     |█▍                              | 51 kB 7.3 MB/s eta 0:00:01[K     |█▋                              | 61 kB 8.6 MB/s eta 0:00:01[K     |██                              | 71 kB 9.1 MB/s eta 0:00:01[K     |██▏                             | 81 kB 8.7 MB/s eta 0:00:01[K     |██▍                             | 92 kB 9.7 MB/s eta 0:00:01[K     |██▊                             | 102 kB 8.2 MB/s eta 0:00:01[K     |███                             | 112 kB 8.2 MB/s eta 0:00:01[K     |███▎                            | 122 kB 8.2 MB/s eta 0:00:01[K     |███▌       

Now we can instantiate the tokenizer.

In [16]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/736 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.74k [00:00<?, ?B/s]

By default, the call above will use one of the fast tokenizers (backed by Rust) from the 🤗 `Tokenizers` library.

You can directly call this tokenizer on one sentence or a pair of sentences:

In [17]:
tokenizer("Hello, this one sentence!")

{'input_ids': [8774, 6, 48, 80, 7142, 55, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}

Depending on the model you selected, you will see different keys in the dictionary returned by the cell above. They don't matter much for what we're doing here (just know they are required by the model we will instantiate later), you can learn more about them in [this tutorial](https://huggingface.co/transformers/preprocessing.html) if you're interested.

Instead of one sentence, we can pass along a list of sentences:

In [18]:
tokenizer(["Hello, this one sentence!", "This is another sentence."])

{'input_ids': [[8774, 6, 48, 80, 7142, 55, 1], [100, 19, 430, 7142, 5, 1]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}

To prepare the targets for our model, we need to tokenize them inside the `as_target_tokenizer` context manager. This will make sure the tokenizer uses the special tokens corresponding to the targets:

In [19]:
with tokenizer.as_target_tokenizer():
    print(tokenizer(["Hello, this one sentence!", "This is another sentence."]))

{'input_ids': [[8774, 6, 48, 80, 7142, 55, 1], [100, 19, 430, 7142, 5, 1]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}


If you are using one of the five T5 checkpoints we have to prefix the inputs with "summarize:" (the model can also translate and it needs the prefix to know which task it has to perform).

In [20]:
if model_checkpoint in ["t5-small", "t5-base", "t5-larg", "t5-3b", "t5-11b"]:
    prefix = "summarize: "
else:
    prefix = ""

We can then write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model. The padding will be dealt with later on (in a data collator) so we pad examples to the longest length in the batch and not the whole dataset.

The max input length of `deep-learning-analytics/wikihow-t5-small` is 512, so `max_input_length = 512`.

In [21]:
max_input_length = 512
max_target_length = 256

def preprocess_function(examples):
    inputs = [prefix + doc for doc in examples["article"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["abstract"], max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists for each key:

In [22]:
preprocess_function(raw_datasets['train'][:2])

{'input_ids': [[34, 6986, 16, 72, 145, 5743, 13, 1221, 11, 164, 1535, 12669, 16, 824, 1308, 13, 1874, 7, 3, 6, 902, 16, 1221, 3, 22725, 26324, 11, 87, 127, 11423, 3918, 5, 536, 46, 11658, 19, 4802, 38, 46, 22666, 3, 30715, 593, 13, 24731, 14063, 77, 41, 3, 107, 115, 3, 61, 41, 3, 107, 115, 3, 2, 586, 3, 122, 3, 87, 3, 26, 40, 3, 61, 11, 164, 7931, 38, 3, 9, 741, 13, 8, 3, 10067, 1994, 3, 6, 19021, 3, 6, 2714, 7470, 3, 6, 26324, 3, 6, 42, 11423, 3918, 3, 5, 17413, 2116, 3130, 24, 9990, 11, 2072, 32, 3, 18, 3518, 610, 227, 11423, 3918, 3, 6, 902, 16, 819, 11, 5378, 1874, 7, 3, 6, 164, 36, 22001, 57, 46, 11658, 5, 2266, 46, 11658, 557, 4131, 29, 7, 3976, 224, 38, 13034, 3, 6, 18724, 3, 6, 11, 16633, 102, 29, 15, 9, 3, 6, 11, 2932, 164, 43, 3, 9, 2841, 1504, 30, 463, 13, 280, 41, 3, 1824, 32, 40, 3, 61, 11, 821, 2637, 16, 1221, 28, 1874, 3, 5, 2932, 3, 6, 12, 1172, 1722, 11850, 3, 6, 3, 1824, 32, 40, 3, 6, 11, 813, 6715, 7, 159, 16, 1221, 28, 1874, 3, 6, 34, 133, 36, 4360, 12, 240, 3, 9, 2

To apply this function on all the pairs of sentences in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command.

In [23]:
tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)

  0%|          | 0/8 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Even better, the results are automatically cached by the 🤗 `Datasets` library to avoid spending time on this step the next time you run your notebook. The 🤗 `Datasets` library is normally smart enough to detect when the function you pass to map has changed (and thus requires to not use the cache data). For instance, it will properly detect if you change the task in the first cell and rerun the notebook. 🤗 `Datasets` warns you when it uses cached files, you can pass `load_from_cache_file=False` in the call to `map` to not use the cached files and force the preprocessing to be applied again.

Note that we passed `batched=True` to encode the texts by batches together. This is to leverage the full benefit of the fast tokenizer we loaded earlier, which will use multi-threading to treat the texts in a batch concurrently.

## Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it. Since our task is of the sequence-to-sequence kind, we use the `AutoModelForSeq2SeqLM` class. Like with the tokenizer, the `from_pretrained` method will download and cache the model for us.

In [24]:
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/231M [00:00<?, ?B/s]

Note that  we don't get a warning like in our classification example. This means we used all the weights of the pretrained model and there is no randomly initialized head in this case.

To instantiate a `Seq2SeqTrainer`, we will need to define three more things. The most important is the [`Seq2SeqTrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.Seq2SeqTrainingArguments), which is a class that contains all the attributes to customize the training. It requires one folder name, which will be used to save the checkpoints of the model, and all other arguments are optional:

In [25]:
batch_size = 2
model_name = model_checkpoint.split("/")[-1]
args = Seq2SeqTrainingArguments(
    f"{model_name}-finetuned-pubmed",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=5,
    predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
    seed = 42,
)

Here we set the evaluation to be done at the end of each epoch, tweak the learning rate, use the `batch_size` defined at the top of the cell and customize the weight decay. Since the `Seq2SeqTrainer` will save the model regularly and our dataset is quite large, we tell it to make three saves maximum. Lastly, we use the `predict_with_generate` option (to properly generate summaries) and activate mixed precision training (to go a bit faster).

The last argument to setup everything so we can push the model to the [Hub](https://huggingface.co/models) regularly during training. Remove it if you didn't follow the installation steps at the top of the notebook. If you want to save your model locally in a name that is different than the name of the repository it will be pushed, or if you want to push your model under an organization and not your name space, use the `hub_model_id` argument to set the repo name (it needs to be the full name, including your namespace: for instance `"sgugger/t5-finetuned-xsum"` or `"huggingface/t5-finetuned-xsum"`).

Then, we need a special kind of data collator, which will not only pad the inputs to the maximum length in the batch, but also the labels:

In [26]:
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

The last thing to define for our `Seq2SeqTrainer` is how to compute the metrics from the predictions. We need to define a function for this, which will just use the `metric` we loaded earlier, and we have to do a bit of pre-processing to decode the predictions into texts:

In [27]:
import nltk
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    
    # Rouge expects a newline after each sentence
    decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
    
    result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    # Extract a few results
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    
    # Add mean generated length
    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
    result["gen_len"] = np.mean(prediction_lens)
    
    return {k: round(v, 4) for k, v in result.items()}

Then we just need to pass all of this along with our datasets to the `Seq2SeqTrainer`:

In [28]:
trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Cloning https://huggingface.co/Kevincp560/wikihow-t5-small-finetuned-pubmed into local empty directory.
Using amp half precision backend


We can now finetune our model by just calling the `train` method:

In [29]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `T5ForConditionalGeneration.forward` and have been ignored: abstract, article. If abstract, article are not expected by `T5ForConditionalGeneration.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 8000
  Num Epochs = 5
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 20000


Epoch,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum,Gen Len
1,2.5984,2.369621,10.237,3.8609,8.9776,9.677,19.0
2,2.5677,2.313223,9.302,3.4499,8.3816,8.8831,19.0
3,2.5038,2.28838,9.0578,3.3103,8.23,8.6723,19.0
4,2.4762,2.275827,9.0001,3.2882,8.1845,8.6084,19.0
5,2.4393,2.27024,8.9619,3.2719,8.1558,8.5714,19.0


Saving model checkpoint to wikihow-t5-small-finetuned-pubmed/checkpoint-500
Configuration saved in wikihow-t5-small-finetuned-pubmed/checkpoint-500/config.json
Model weights saved in wikihow-t5-small-finetuned-pubmed/checkpoint-500/pytorch_model.bin
tokenizer config file saved in wikihow-t5-small-finetuned-pubmed/checkpoint-500/tokenizer_config.json
Special tokens file saved in wikihow-t5-small-finetuned-pubmed/checkpoint-500/special_tokens_map.json
Copy vocab file to wikihow-t5-small-finetuned-pubmed/checkpoint-500/spiece.model
tokenizer config file saved in wikihow-t5-small-finetuned-pubmed/tokenizer_config.json
Special tokens file saved in wikihow-t5-small-finetuned-pubmed/special_tokens_map.json
Copy vocab file to wikihow-t5-small-finetuned-pubmed/spiece.model
Saving model checkpoint to wikihow-t5-small-finetuned-pubmed/checkpoint-1000
Configuration saved in wikihow-t5-small-finetuned-pubmed/checkpoint-1000/config.json
Model weights saved in wikihow-t5-small-finetuned-pubmed/checkp

TrainOutput(global_step=20000, training_loss=2.543815234375, metrics={'train_runtime': 3303.2155, 'train_samples_per_second': 12.109, 'train_steps_per_second': 6.055, 'total_flos': 5408735783878656.0, 'train_loss': 2.543815234375, 'epoch': 5.0})

You can now upload the result of the training to the Hub, just execute this instruction:

In [30]:
trainer.push_to_hub()

Saving model checkpoint to wikihow-t5-small-finetuned-pubmed
Configuration saved in wikihow-t5-small-finetuned-pubmed/config.json
Model weights saved in wikihow-t5-small-finetuned-pubmed/pytorch_model.bin
tokenizer config file saved in wikihow-t5-small-finetuned-pubmed/tokenizer_config.json
Special tokens file saved in wikihow-t5-small-finetuned-pubmed/special_tokens_map.json
Copy vocab file to wikihow-t5-small-finetuned-pubmed/spiece.model
Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 3.37k/231M [00:00<?, ?B/s]

Upload file runs/Mar03_19-09-01_d1b49f6e90b1/events.out.tfevents.1646334572.d1b49f6e90b1.81.0:  26%|##6       …

To https://huggingface.co/Kevincp560/wikihow-t5-small-finetuned-pubmed
   8dca54f..0bc5343  main -> main

To https://huggingface.co/Kevincp560/wikihow-t5-small-finetuned-pubmed
   0bc5343..abecc80  main -> main



'https://huggingface.co/Kevincp560/wikihow-t5-small-finetuned-pubmed/commit/0bc5343d659404ffb18cd5388ec016d7ed7b861a'

You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `"your-username/the-name-you-picked"` so for instance:

```python
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("sgugger/my-awesome-model")
```