If you're opening this Notebook on colab, you will probably need to install 🤗 `Transformers` and 🤗 `Datasets` as well as other dependencies. 

* `datasets`
* `transformers`
* `rogue-score`
* `nltk`
* `pytorch`
* `ipywidgets`

*Note*: Since we are using the GPU to optimize the performance of the deep learning algorithms, `CUDA` needs to be installed on the device.

In [None]:
! pip install datasets transformers rouge-score nltk torch ipywidgets

Collecting datasets
  Downloading datasets-1.18.3-py3-none-any.whl (311 kB)
[?25l[K     |█                               | 10 kB 28.6 MB/s eta 0:00:01[K     |██                              | 20 kB 18.0 MB/s eta 0:00:01[K     |███▏                            | 30 kB 10.2 MB/s eta 0:00:01[K     |████▏                           | 40 kB 8.3 MB/s eta 0:00:01[K     |█████▎                          | 51 kB 4.4 MB/s eta 0:00:01[K     |██████▎                         | 61 kB 5.2 MB/s eta 0:00:01[K     |███████▍                        | 71 kB 5.3 MB/s eta 0:00:01[K     |████████▍                       | 81 kB 5.3 MB/s eta 0:00:01[K     |█████████▌                      | 92 kB 5.9 MB/s eta 0:00:01[K     |██████████▌                     | 102 kB 5.0 MB/s eta 0:00:01[K     |███████████▋                    | 112 kB 5.0 MB/s eta 0:00:01[K     |████████████▋                   | 122 kB 5.0 MB/s eta 0:00:01[K     |█████████████▊                  | 133 kB 5.0 MB/s eta 0:00:01

When using `nltk`, `punkt` also needs to be installed. I guess it is not installed automatically. Not having `punkt` will result in an error during the analysis.

In [None]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

If you're opening this notebook locally, make sure your environment has an install from the last version of those libraries.

To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:

In [None]:
from huggingface_hub import notebook_login

notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


Then you need to install `Git-LFS`.

If you are not using `Google Colab`, you may need to install `Git-LFS` manually, since the code below may not work and depending on your operating system. You can read about `Git-LFS` and how to install it [here](https://git-lfs.github.com/).

In [None]:
! apt install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-470
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 1s (2,443 kB/s)
Selecting previously unselected package git-lfs.
(Reading database ... 155320 files and directories currently installed.)
Preparing to unpack .../git-lfs_2.3.4-1_amd64.deb ...
Unpacking git-lfs (2.3.4-1) ...
Setting up git-lfs (2.3.4-1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...


Make sure your version of `Transformers` is at least 4.11.0 since the functionality was introduced in that version:

In [None]:
import transformers

print(transformers.__version__)

4.16.2


You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/seq2seq).

# Fine-tuning a model on a summarization task

In this notebook, we will see how to fine-tune one of the [🤗`Transformers`](https://github.com/huggingface/transformers) model for a summarization task. We will use the [PubMed Summarization dataset](https://huggingface.co/datasets/ccdv/pubmed-summarization) which contains PubMed articles accompanied with abstracts.

![Widget inference on a summarization task](https://github.com/huggingface/notebooks/blob/master/examples/images/summarization.png?raw=1)

We will see how to easily load the dataset for this task using 🤗 `Datasets` and how to fine-tune a model on it using the `Trainer` API.

In [None]:
model_checkpoint = "facebook/bart-base"

This notebook is built to run  with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a sequence-to-sequence version in the Transformers library. Here we picked the [`facebook/bart-base`](https://huggingface.co/facebook/bart-base?text=World+War+II+or+the+Second+World+War%2C+often+abbreviated+as+WWII+or+WW2%2C+was+a+global+war+that+lasted+from+1939+to+1945.+It+involved+the+vast+majority+of+the+world%27s+countries%E2%80%94including+all+of+the+great+powers%E2%80%94forming+two+opposing+military+alliances%3A+the+Allies+and+the+Axis+powers.+In+a+total+war+directly+involving+more+than+100+million+personnel+from+more+than+30+countries%2C+the+major+participants+threw+their+entire+economic%2C+industrial%2C+and+scientific+capabilities+behind+the+war+effort%2C+blurring+the+distinction+between+civilian+and+military+resources.+Aircraft+played+a+major+role+in+the+conflict%2C+enabling+the+strategic+bombing+of+population+centres+and+the+only+two+uses+of+nuclear+weapons+in+war.+World+War+II+was+by+far+the+deadliest+conflict+in+human+history%3B+it+resulted+in+70+to+85+million+fatalities%2C+a+majority+being+civilians.+Tens+of+millions+of+people+died+due+to+genocides+%28including+the+Holocaust%29%2C+starvation%2C+massacres%2C+and+disease.+In+the+wake+of+the+Axis+defeat%2C+Germany+and+Japan+were+occupied%2C+and+war+crimes+tribunals+were+conducted+against+German+and+Japanese+leaders.) checkpoint. 

## Loading the dataset

We will use the [🤗 `Datasets`](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.  

In [None]:
from datasets import load_dataset, load_metric

raw_datasets = load_dataset("ccdv/pubmed-summarization")
metric = load_metric("rouge")

Downloading:   0%|          | 0.00/4.88k [00:00<?, ?B/s]

No config specified, defaulting to: pub_med_summarization_dataset/document


Downloading and preparing dataset pub_med_summarization_dataset/document to /root/.cache/huggingface/datasets/ccdv___pub_med_summarization_dataset/document/1.0.0/5792402f4d618f2f4e81ee177769870f365599daa729652338bac579552fec30...


Downloading:   0%|          | 0.00/779M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.7M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.8M [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset pub_med_summarization_dataset downloaded and prepared to /root/.cache/huggingface/datasets/ccdv___pub_med_summarization_dataset/document/1.0.0/5792402f4d618f2f4e81ee177769870f365599daa729652338bac579552fec30. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

The `dataset` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set:

In [None]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['article', 'abstract'],
        num_rows: 119924
    })
    validation: Dataset({
        features: ['article', 'abstract'],
        num_rows: 6633
    })
    test: Dataset({
        features: ['article', 'abstract'],
        num_rows: 6658
    })
})

To access an actual element, you need to select a split first, then give an index:

In [None]:
raw_datasets["train"][0]

{'abstract': "<S> background : the present study was carried out to assess the effects of community nutrition intervention based on advocacy approach on malnutrition status among school - aged children in shiraz , iran.materials and methods : this case - control nutritional intervention has been done between 2008 and 2009 on 2897 primary and secondary school boys and girls ( 7 - 13 years old ) based on advocacy approach in shiraz , iran . </S> <S> the project provided nutritious snacks in public schools over a 2-year period along with advocacy oriented actions in order to implement and promote nutritional intervention . for evaluation of effectiveness of the intervention growth monitoring indices of pre- and post - intervention were statistically compared.results:the frequency of subjects with body mass index lower than 5% decreased significantly after intervention among girls ( p = 0.02 ) . </S> <S> however , there were no significant changes among boys or total population . </S> <S> 

Since the `pubmed` data is extremely large, we are going to remove rows so that we have a training set of 8,000, a validation set of 2,000, and a test set of 2,000. 

In [None]:
raw_datasets["train"] = raw_datasets["train"].select(range(1, 8001))
raw_datasets["validation"] = raw_datasets["validation"].select(range(1, 2001))
raw_datasets["test"] = raw_datasets["test"].select(range(1, 2001))

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [None]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=5):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(raw_datasets["train"])

Unnamed: 0,article,abstract
0,"it is characterized by progressive thinning of the scalp hair and a reduction in hair density and diameter . male aga presents with a typical pattern of bitemporal and frontal recession of the hair line or vertex thinning which gradually extends anteriorly . the prevalence increases with age , from 30% for men in their 30s to 50% for men in their 50s . nongenetic causes have received little scientific attention , and data on environmental factors that may aggravate male aga remain sparse . the present population - based cross - sectional study was carried out between may and september 2011 . the study sample consisted of caucasian men aged 30 to 40 years who had uploaded a profile to one of two dating websites : jdate , which targets jewish subjects , and okcupid , a general dating service from which we selected non - jewish subjects using the search criteria . a total of 26,340 profiles were examined , each containing several photographs of the individual subject . the photographs were magnified and graded for the presence of alopecia according to the norwood classification . thereafter , randomly selected photographs were again evaluated by an independent dermatologist blinded to the results of the first observer . intra- and interobserver variability were analyzed with cramer 's reliability test and joint probability of agreement statistics . on the basis of the findings , subjects were divided into three groups : severe aga ( norwood type vi or vii ) , non - severe aga ( norwood type < vi ) , and unknown . profiles for which we were unable to ascertain the aga status were excluded from the analysis . besides religion , data on age and place of residence were collected for each subject from the websites . we calculated the age - specific prevalence of aga and the prevalence of severe aga for every country of residence that was cited by at least 50 subjects . in addition , body mass index ( bmi ) was determined for subjects for whom data on weight and height were available . logistic regression analysis was used to identify potential risk factors of aga , with aga status as the dependent variable , age and bmi as independent continuous variables , and website as an independent categorical variable . several volunteers provided written informed consent to be photographed from distances of several meters using their personal cameras ( regular photographs ) , followed by close - up head photographs of the frontal , temporal , mid - pattern , and vertex regions , similar to global photographs used in clinical trials and follow - up studies of alopecia . the regular and close - up photographs were compared for their ability to serve as a tool for predicting severe alopecia by an observer . of the 26,340 profiles included in the study , 15,091 were uploaded in jdate and 11,249 in okcupid . on average , the jdate profiles contained 3 photographs of each subject ( sd 1.9 ) , and the okcupid profiles , 4.9 photographs ( sd 4.2 ) . on the initial evaluation , intraobserver variability was 84% by cramer 's reliability test and 96% by joint probability of agreement . interobserver variability was 81% by cramer 's reliability test and 96% by joint probability of agreement . a total of 1638 subjects were excluded because their aga status could not be determined on the basis of the photographs , leaving 24,702 profiles for analysis : 14,709 on jdate and 9993 on okcupid . the overall success rate of indirect evaluation of severe aga using photographs was 94% , with a significant difference between websites : 97.46% for jdate profiles and 88.83% okcupid profiles ( p < 0.001 ) . the ability to clearly distinguish severe from nonsevere aga using this method was supported by the evaluation of the regular photographs of the volunteers [ figure 1 ] . severe aga was identified in 3786 subjects ( 2,419 on jdate and 1,367 on okcupid ) , for an overall prevalence of 15.33% . the top photographs are of two representative volunteers taken from a distance of several meters using the individual 's personal camera ( regular photographs ) . underneath each regular photograph are close - up head photographs of the same volunteers showing their frontal , temporal , mid - pattern , or vertex regions . the regular and close - up photographs were compared for their ability to serve as a tool for predicting severe alopecia by an observer table 1 shows the findings for potential risk factors . logistic regression analysis of the total 24,702 subjects yielded an increased risk of 1.092 for every yearly increase in age between 30 and 40 ( p < 0.0001 ) [ figure 2 ] . in addition , there was a positive association between the presence of severe aga and higher bmi . logistic regression analysis of 10,691men on jdate for whom data on height and weight were available yielded an increased risk of 1.027 for each unit of bmi for individuals of the same age ( p < 0.001 ) . the risk of having severe aga was higher by 1.426 for jewish men on jdate than for non - jewish men of the same age on okcupid ( p < 0.0001 ) . logistic regression analysis with estimates of odds ratios and 95% confidence intervals and p values of potential risk factors of severe androgenetic alopecia logistic regression analysis of the relationship of aga with age . there was an increased risk of 1.092 for every yearly increase in age between 30 and 40 years ( p < 0.0001 ) . the results from the two websites are compared table 2 shows the findings for the geographical analysis . comparison of the two countries listed most often by the study sample yielded a 19.89% prevalence of severe aga in the 5886 jewish men from israel and 13.75% for the 8066 jewish men from the usa ( p < 0.001 ) . the main purpose of the present study was to suggest a novel method for conducting epidemiological studies of aga using photographs and data from the internet . the secondary purpose of the study was to compare the prevalence of aga among different countries and to investigate potential risk factors for aga such as age , bmi and genetic background . obviously , we could not authenticate the collected photographs and data viewed on the internet . therefore , our working assumption was that we would find similar amount of true and false data uploaded by internet users among the different countries enabling us to compare the prevalence of aga and the potential risk factors between the investigated countries . the strengths of this method are its ease , rapidity , and low cost , and the access it provides to large populations with an international distribution . the limitations of this method are low accuracy for individual diagnoses of aga compared to the traditional method of face - to - face examination , the inert inability to verify the authenticity of the gathered data and the limited background data available for the subjects . for instance , the prevalence may be underestimated because people are more likely to upload their photograph with more hair . furthermore , we focused only on single men as a subgroup of the general male population of the studied age . for these reasons the overall studied prevalence may differ from that of the general population . yet , we were able to compare the prevalence of the particular studied population between different countries . it was also impossible to exclude other potential types of hair loss that mimic aga , such as acute and chronic telogen effluvium , diffuse or reverse ophiasis , alopecia areata , and early cicatricial alopecia . we selected a narrow study population of 24,702 caucasian men aged 30 - 40 years who had uploaded profiles to one of two major dating websites . we defined severe aga as type vi or vii in the norwood classification , and non - severe aga as any level below vi . similarly , previous epidemiological studies classified aga into two or three levels . by examining the photographs , we identified severe aga in 15.33% of the subjects . to determine the extent to which our findings might be affected by bias or confounding as a consequence of the indirect method of evaluation , we searched the medical literature for previous studies based on direct examination by face - to - face interviews in similar age groups . reported a 19% rate of severe ( hamilton - norwood types iv - vii ) frontal and vertex aga in men aged 40 to 55 years , and in a study from india , krupa shankar et al . reported an 18.52% rate of severe ( hamilton - norton type vi ) aga in men aged 30 to 35 years . in the dayton study , rhodes et al . found that 23% of men aged 30 - 39 years had severe aga ( hamilton - norwood type vi or vii ) compared to 22% in the hamilton study and only 3% in the norwood study . in a self - report study of 7250 men aged 20 - 50 years several others noted that 30% of caucasian men in their 30s have some level of aga . the similar prevalence of aga in all these studies , in the same age group as in the present study , supports the use of our novel method . the evaluation of severe aga using regular photographs was based on the rationale that frontal - pattern aga is the most common type of aga in caucasian males and vertex involvement is apparently associated with temporal and frontal involvement in virtually all patients . therefore , photographs that show the face and the frontal and temporal regions of the scalp can be used to differentiate severe from nonsevere aga [ figure 1 ] . pathomvanich performed an indirect survey of 20,000 males in bangkok based on quick and distant visual studies in shopping malls and on the street in order to evaluate the prevalence of aga in all age groups . our study offers the extra advantages of access to several photographs of each subject , several times over and at high magnitude . using the websites search criteria , we were also able to collect epidemiological data on age , religion , place of residence , and bmi , which made it possible to conduct more in - depth analyses for potential risk factors . although previous studies noted a link between aga and age , our study refined this association by focusing on a specific population within a narrow age range . our results show that in caucasian men , the risk of aga increased by 1.092 for every year between ages 30 and 40 years . in addition , the jewish population is known to have a high prevalence of several genetic diseases such as tay sachs , gaucher 's disease , familial mediterranean fever , phenylketonuria , and beta - thalassemia . aga has a known polygenetic mode of inheritance , with newly identified susceptibility genes on chromosomes 3q26 and 20p11 . thus , to determine if genetic background plays a role in the prevalence of aga , we compared subjects from jdate , a website targeted to the jewish population , with subjects from okcupid , a website targeted to the general population from which we selected the non - jewish subjects . a significant difference in the rate of aga was found ( 16.45% vs 13.68% , p < 0.0001 ) . however , several differences between the websites may have affected these results , such as the number and quality of the photographs per profile and the number of profiles from which we were able to identify the aga status . we presume that some of these differences are related to the fact that jdate charges a membership fee whereas okcupid does not . we attempted to overcome these differences by excluding the profiles from which we were unable to identify the aga status . we suggest that future comparisons be done between websites that share more features . the involvement of environmental and other nongenetic factors in aga has received little scientific attention . found significant variations in the prevalence of aga among six cities in china , which they presumptively attributed to differences in climate , lifestyle , and socioeconomic levels . in the present study , the analysis by geographical region was based on the assumption that men within the same age group who subscribe to dating websites have a similar tendency to acquire severe aga . the results showed a significant difference in prevalence among the 19 different countries cited by at least 50 subjects each ( p < 0.0001 ) . when we compared the subjects from the two most - cited locations , we found a prevalence of 19.89% among 5,886 jewish men from israel as opposed to 13.75% among 8066 jewish men from the usa ( p < 0.001 ) . these findings are consistent with those of wang et al . and suggest a possible effect of local environmental risk factors on severe aga . metabolic syndrome ( mets ) is defined by the national cholesterol education program adult treatment panel iii as the combination of three or more of the following criteria : waist circumference > 90 cm , serum triglyceride level > 150 mg / dl , high - density lipoprotein cholesterol level < 40 mg / dl , impaired fasting glucose level 110125 mg / dl , and blood pressure > 130/85 mmhg or treated hypertension . aga was found to be linked to cardiovascular diseases , coronary heart diseases , insulin resistance , hypertension , abnormal serum lipid profiles and obesity . su and chen noted that patients with severe aga ( type v or above ) had a 2.6-fold higher prevalence of mets than patients with moderate aga ( types iii and iv ) , and pathomvanich et al . suggested that the increasing incidence of obesity in bangkok may be contributing to the higher prevalence of male aga compared to other asian countries . recently reported that the risk of acquiring norwood type iv aga or greater was not increased in subjects with mets relative to those without mets . we examined the bmi of 10,691men on jdate who had included data on height and weight in their profile . there was a positive association between the presence of severe aga and higher bmi , with an increased risk of 1.027 for every unit of bmi for men of the same age . if this finding is confirmed , severe aga may serve as a predictive factor for the early diagnosis of mets and aid physicians in the prevention of its complications . in conclusion , the present study describes a novel method for conducting epidemiological research of aga using photographs and data from the internet . to the best of our knowledge , this is the largest epidemiological study to investigate aga and the first to use this method . we focused on young caucasian men within a narrow age range of 30 - 40 years . our findings link aga with advancement in age and with higher bmi , and show significant differences in the prevalence of aga among countries pointing to some potential environmental risk factors . \n evaluation of photographs and data from the internet can serve as a novel method of studying the international epidemiology of androgenetic alopecia.high body mass index and exposure to high environmental levels of ultraviolet radiation may aggravate aga . \n evaluation of photographs and data from the internet can serve as a novel method of studying the international epidemiology of androgenetic alopecia . high body mass index and exposure to high environmental levels of ultraviolet radiation may aggravate aga .","<S> background : the epidemiological evaluation of androgenetic alopecia ( aga ) is based mainly on direct observation and questionnaires . </S> <S> the international epidemiology and environmental risk factors of aga in young caucasian men remain unknown.aim:to use photographs and data from the internet to evaluate severe aga and generate greater understanding of the international epidemiology of the disorder in young caucasian men.materials and methods : a population - based cross - sectional study design was used . </S> <S> the sample included 26,340 caucasian men aged 30 to 40 years who had uploaded profiles to two dating websites . </S> <S> their photographs were evaluated for aga and graded as follows : severe aga ( norwood type vi - vii ) , non - severe aga , and unknown . </S> <S> epidemiological data were collected from the sites . </S> <S> logistic regression was used to analyze the effect of risk factors on the prevalence of severe aga.resultsthe overall success rate for identifying severe aga by indirect evaluation of internet photographs was 94% . </S> <S> the prevalence of severe aga was 15.33% overall and varied significantly by geographical region . </S> <S> the risk of having severe aga was increased by 1.092 for every year of age between 30 and 40 years . </S> <S> severe aga was more prevalent in subjects with higher body mass index.conclusions:photographs from the internet can be used to evaluate severe aga in epidemiological studies . </S> <S> the prevalence of severe aga in young caucasian men increases with age and varies by geographical region . </S> <S> body mass index is an environmental risk factor for severe aga . </S>"
1,"vision and the somatosensory and vestibular sensory systems play an important role in \n posture control1,2,3,4,5 . vision , based on a change \n in the information projected on the retina , guides the relationship between the environment \n and the body . plantar sensation provides information on the base of support and position of \n the center of gravity . vestibular sensation detects gravity and acceleration to provide \n information about the position and movement of the head . if these sensory systems are \n damaged due to brain diseases , failures in standing posture control are elicited , even \n without motor paralysis . pusher syndrome , wallenberg s syndrome , thalamic astasia , \n unilateral spatial neglect , etc . although there are previous studies about the \n change in standing posture due to vibration stimulation and/or vestibular stimulation10,11,12 , there are few studies about the role of \n visual intervention in standing posture . prism glasses can bias a view on sagittal and/or \n horizontal planes , but can not shift it on coronal plane13 , 14 . although there is a \n method of tilting seat surface during sitting , this method can not be an intervention of only \n vision because it influences on somatosensory and vestibular sensory15 . a tool to cause a lateral tilt of a view on the coronal \n plane was only a large screen16 , 17 . however , recently , an inexpensive \n immersive head - mounted display ( hmd ) has been developed and applied in rehabilitation18 . in the present study , we developed a \n visual inclination system by using an hmd for investigating a visual intervention ( tilted \n view ) on standing posture control . eleven healthy male university students ( mean standard deviation : age , 21.5 1.5 years ; \n height , 170.9 6.1 cm ; weight , 64.4 8.0 kg ) participated in this experiment . we excluded participants \n who wore glasses on a daily basis and those with any orthopedic , neurological , ophthalmic or \n otolaryngological disease . the ethics committee of the international university of health \n and welfare approved all study procedures ( no . 15-io-58 ) , which were consistent with the \n principles of the declaration of helsinki . the authors obtained written informed consent \n from all the subjects prior to their participation in the study . ca , usa ) , a small stereo camera \n ( ovrvision 1 , shinobiya inc . , osaka , japan ) , and a laptop pc ( 15x8550-i7-vsb , unitcom , \n osaka , japan ) were employed to constitute a visual inclination system for the experiment \n ( fig . ( c ) an hmd shows the tilted \n visual information to the wearer . ) . the subjects were asked to maintain the standing posture twice for 5 s while \n wearing the experimental system . they were instructed to perform the experiment under two \n visual conditions : normal view and 20 leftward inclined view conditions , in that order . two \n force plates were used to measure the vertical component of the floor reaction force of each \n leg . a three - dimensional motion analysis system was used to quantify the subjects body \n movements ( i.e. inclination angles of the head ( h ) , trunk \n ( t ) , and pelvis ( p ) in \n absolute coordinates ) . we adopted several angle definitions , h , \n t , and p , which are described \n in fig 2fig . 2.definition of the h , t , and \n p . the black dots are reflective markers on top of \n the head , bilateral acromions , anterior superior iliac spine , and posterior superior \n iliac spine . point b is the midpoint of points c and \n d. points c and d are iliac \n crests estimated from the position of the markers on the pelvis . points \n e and f are midpoints of the anterior superior \n iliac spine and posterior superior iliac spine . h is the \n angle between the axis connecting from the top of the head to point a \n and the vertical axis . t is the angle between the axis \n connecting from point a to b and the vertical axis . \n p is the angle between the axis connecting from \n point e to f and the horizontal plane .. furthermore , we defined the relative head and trunk bending angles as the difference \n between the inclination angles of the head and trunk , and those of the trunk and pelvis , \n respectively . we defined a leftward inclination angle as positive and a rightward \n inclination angle as negative . visual inclination system . definition of the h , t , and \n p . the black dots are reflective markers on top of \n the head , bilateral acromions , anterior superior iliac spine , and posterior superior \n iliac spine . point b is the midpoint of points c and \n d. points c and d are iliac \n crests estimated from the position of the markers on the pelvis . points \n e and f are midpoints of the anterior superior \n iliac spine and posterior superior iliac spine . h is the \n angle between the axis connecting from the top of the head to point a \n and the vertical axis . t is the angle between the axis \n connecting from point a to b and the vertical axis . \n p is the angle between the axis connecting from \n point e to f and the horizontal plane . a three - dimensional motion analysis system consisting of 10 infrared cameras ( vicon mx , \n vicon , oxford , uk ) and two force plates ( amti , watertown , ma , usa ) was used to record \n three - dimensional marker displacements and floor reaction force data at a sampling frequency \n of 100 hz . thirty - three reflective markers ( helen - hayes marker set ) were attached to each \n subject . in the analysis , we used seven markers , which were attached to the top of the head , \n the bilateral acromions , anterior superior iliac spine , and posterior superior iliac spine \n of the participants . a two - tailed paired t - test was used to assess individual differences between the inclined \n view and normal view conditions . the statistical analysis was conducted using the software package spss version \n 20 ( ibm inc . the result of the paired t - test demonstrated that the vertical component of floor reaction \n forces in both legs significantly changed in the inclined view condition . the vertical \n component of floor reaction forces decreased in the right leg ( 324.4 38.6 vs. 303.4 51.3 \n n , p=0.03 ) , whereas it increased in the left leg ( 322.3 53.3 vs. 345.9 52.2 n , \n p=0.02 ) . table 1table 1.the mean inclination angles of the head , trunk , and pelvis , and the relative \n bending angle of the head and trunknormal view conditioninclined view conditionhead leftward inclination angle ( )0.2 2.31.7 3.7*trunk leftward inclination angle ( )0.8 0.82.0 0.9*pelvis leftward inclination angle ( )1.2 2.00.9 1.8neck leftward bending angle ( )1.0 2.70.3 3.3trunk leftward bending angle ( )0.4 1.91.1 2.1*values are expressed as a mean standard deviation . * significant difference \n ( p<0.05 ) between the normal view condition and the inclined view condition . represents the mean inclination angles of the head , trunk , and pelvis , and the \n relative bending angle of the head and trunk . in the comparison between the normal view and \n inclined view conditions , the paired t - test indicated that there was a significant increase \n in the inclination angle of the head and trunk and the relative bending angle of the trunk , \n but not in the inclination angle of the pelvis and the relative bending angle of the \n head . * significant difference \n ( p<0.05 ) between the normal view condition and the inclined view condition . in this study , we created a novel hmd system to alter inclined standing posture in a group \n of male university students . by using this system , the head and trunk of participants \n inclined leftward and the vertical component of the floor reaction force of the lower \n extremities inclined leftward due to the view presented on the display . these results are \n identical to those of a previous experiment with a large - sized screen16 , 17 . the present \n study proves that it is possible to elicit a change of standing posture due to a visual \n stimulus using an immersive hmd , and a large - scale apparatus is no longer necessary . as for the inclination angle of each body segment , even though the head and trunk angles \n inclined toward the tilted direction , the angle of the pelvis did not incline . the reason \n why inclination in the pelvis did not occur is that it could not physically occur when the \n participant stood with both legs extended . therefore , inclination of the pelvis might occur \n in dynamic movements such as walking . as for the relative head and trunk bending angles , the effect of the tilted view was \n confirmed only in the trunk . the reason why the neck did not bend to the side is that a very \n large number of muscle spindles were distributed in the neck muscles19 ; therefore , it seemed that these muscle spindles \n compensated for the proprioceptive sensation . the previous research about siting posture \n using an electric balance board also showed lateral bending of trunk and no lateral bending \n of the neck during tilting the seat surface15 . these experimental results suggest that lateral bending of the \n trunk is more available than the neck in the postural control using vision . the inclination angle of the standing posture observed in the present study was smaller \n than the tilt angle of view . it seemed that the reason was compensation by the somatosensory \n or the vestibular sensory systems . somatosensory function decreases with age , and therefore \n the elderly tend to rely on vision for postural control20 , 21 . thus , it seems that \n visual inclination has a large effect in the elderly . in addition , if the presented tilted \n view is combined with a vibration stimulus or vestibular stimulation , compensation due to \n the somatosensory and the vestibular sensory systems will be difficult . therefore , the body \n inclination effect might be enhanced . when the tilt angle is too small , the \n effect of inclining the body is low . on the other hand , even if the tilt angle is too large , \n the effect of inclining the body becomes low too , because the subject is no longer trust the \n view . there is a previous study examining the effect of the standing posture when changing \n the tilt angle of the view projected on a large screen17 . the study compared three tilt angle conditions ( 5.1 , 9.1 and \n 20.1 ) and reported that 20.1 is the most effective . it suggests that the tilt angle of \n view in the present study is appropriate . since the immersive hmd is wearable , unlike large screens or mirrors , its advantage is that \n it continues to provide visual information to the wearer even when moving forward or \n changing direction . therefore , it is possible to use this system to reveal the effects of \n the tilted view during walking or turning movements . furthermore , we may be able to use the \n visual inclination system for balance exercises to treat vertical misperception in brain \n disease patients . the balance exercises should be adjusted to the degree of difficulty for \n maintaining balance for individuals22 . if \n patients who can not maintain an upright posture because of severe impairment of vertical \n perception , it may be easier for them to hold the upright posture by seeing a view that is \n inclined opposite to the inclination of the vertical axis that the patient is aware of . on \n the other hand , there are patients with unstable gait due to the mild impairment of vertical \n perception . for such patients , the balance exercises in normal visual conditions are less \n effective because the degree of difficulty is low . thus , presenting an inclined view that \n emphasizes the inclination of the vertical axis that the patients are aware of might be able \n to increase the degree of difficulty of the balance exercises to an appropriate level . \n moreover , if the presented tilted view is combined with vibration and vestibular stimuli , \n the training effect may be increased further10,11,12 . the first is the possibility that the weight of the hmd \n affected somatosensory perception , although it was truly a lightweight device . use of a \n lighter hmd may increase the effect of inclination in the standing posture . the second is \n that we did not strictly define the visual environment in the experimental room . however , we \n wanted to ensure that our results would not be affected by the presence or absence of \n vertical products . the third is that what we measured is only the effect of the presentation \n of the leftward tilted view in order to avoid the burdens of the subjects . inclination of standing posture due \n to brain disease is likely to occur more toward the left than right6 . therefore , the effect of the presentation of the rightward \n tilted view may be less than that of the leftward tilted view . the fourth is that we did not \n measure the sensory modality that the subject was focused upon . a method to determine the \n sensory modality that the subject focused on should be developed . the fifth limitation is \n that we did not consider the effects of aging , since our study group consisted of young male \n participants . however , the current study revealed that presenting a tilted view using immersive hmd can \n shift the relative trunk bending angle and center of gravity toward the same direction of \n the tilted view . the developed visual inclination system seems useful and it can be \n applicable to various psychophysical experiments in future research .","<S> [ purpose ] the purpose of the present study is to clarify whether tilted scenery presented \n through an immersive head - mounted display ( hmd ) causes the inclination of standing \n posture . </S> <S> [ subjects and methods ] eleven healthy young adult males who provided informed \n consent participated in the experiment . </S> <S> an immersive hmd and a stereo camera were employed \n to develop a visual inclination system . </S> <S> the subjects maintained a standing posture twice \n for 5s each while wearing the visual inclination system . </S> <S> they performed this task under \n two conditions : normal view and 20 leftward tilted view . </S> <S> a three - dimensional motion \n analysis system was used to measure the subjects postures , and two force plates were used \n to measure the vertical component of the floor reaction force of each leg . </S> <S> [ results ] in \n the 20 leftward tilted view , the head and trunk angles in the frontal plane were \n similarly inclined toward the left , and the vertical component of the floor reaction force \n increased in the left leg , whereas it decreased in the right leg . </S> <S> [ conclusion ] when the \n view in the immersive hmd was tilted , the participants trunk side bent toward the same \n side as that of the view . </S> <S> this visual inclination system seems to be a simple intervention \n for changing standing posture . </S>"
2,"von hippel - lindau disease ( vhl ; mim # 193300 ) is a hereditary multi - systemic tumour syndrome that pre - disposes affected individuals to haemangioblastomas of the central nervous system and retina , pheochromocytomas , clear - cell renal carcinomas , adenomas and carcinomas of the pancreas , paragangliomas , renal and pancreatic cysts , papillary cystadenomas of the epididymis and , rarely , cystadenomas of the endolymphatic sac and broad ligament . vhl affects approximately 1 in 36,000 newborns and is transmitted in an autosomal dominant manner with a penetrance of more than 90% by the age of 65 years [ 1 , 2 ] . vhl is a tumour suppressor gene located on chromosome 3p2526 [ 3 , 4 ] . the gene consists of three exons , is highly conserved across species , and is ubiquitously expressed in both foetal and adult tissues [ 5 , 6 ] . expression of the vhl gene is not restricted to the organs affected in vhl [ 3 , 79 ] . the vhl protein pvhl ) has been implicated in a variety of functions , including transcriptional regulation , posttranscriptional gene expression , extracellular matrix assembly , protein folding and ubiquitination as reviewed by kaelin . an increasing number of germline mutations have been reported in vhl - affected individuals [ 1114 ] , and genotype - to - phenotype correlations are now emerging . somatic mutations in vhl have also been detected in several types of sporadic and hereditary tumours [ 1618 ] . phenotypes vary among families , reflecting genotypic differences [ 1 , 12 , 14 , 19 ] . clinically , vhl is classified in type 1 or type 2 based on the absence or presence of pheochromocytomas . the occurrence of renal cell carcinoma ( rcc ) allows a further distinction between type 2 a ( low risk of rcc ) and type 2 b ( high risk of rcc ) . some type 2 families develop pheochromocytomas only , with no other neoplastic findings of vhl ( vhl - type 2 c ) . vhl tumours , including pheochromocytomas and paragangliomas , may appear clinically to be sporadic but represent milder cases of vhl , with the attenuated phenotype resulting from either a mild impairment in function of the mutated pvhl or somatic mosaicism . the latter is a condition in which genetically different cells coexist in tissues of the same individual , and the intratumoural mixture of vhl - mutated and vhl - non - mutated cells clearly can modulate the resulting phenotype . we have analysed the vhl gene in the available members of a vhl family in which the pro - band presented with bilateral pheochromocytomas and multiple paragangliomas . her father showed what we believe is a very mild and relatively late - onset vhl phenotype . in this study , we describe the somatic mosaicism of the pro - band 's father and we reviewed the literature for all the described cases of vhl - mosaicism . the clinical features of the female pro - band have been reported in our previously published study , briefly , the young girl now 26 years old , underwent surgery at 11 years of age to resect a pheochromocytoma associated with hypertension . at age 18 years , she underwent further surgery to remove a pheochromocytoma in the contralateral adrenal gland and two concurrent paragangliomas of the abdominal aorta and urinary bladder . one year ago , she was also found to have a right - sided , extra - axial , 1.6-cm supratentorial frontal meningioma ( fig . post - operatively , neuroendocrine serum markers ( plasma free metanephrines , chromogranin a , neuron- specific enolase , and gastrointestinal hormones carcinoembryonic antigen [ cea ] and calcitonin ) have remained negative . a family history obtained from her parents , one brother and one sister was uninformative , except for the father 's history . mr image of the pro - band 's brain.the arrow indicates a 1.6-cm - diameter right - sided supratentorial frontal meningioma . the patient 's father , now 51 years old , was found to have an angioma of the glans penis and had had surgery for a mandibular cyst and epididymal cystadenomas at age 43 years . indeed , abdominal ultrasonography and total - body magnetic resonance imaging ( mri ) revealed a 2.3-cm cyst in his right kidney . his blood pressure and levels of plasma free metanephrines , fractionated urinary metanephrines , chromogranin a were normal . genomic dna was extracted from peripheral blood lymphocytes ( pbls ) of the pro - band and her four first - degree relatives . to assess the possibility that the pro - band 's father had an attenuated vhl phenotype caused by mosaicism , somatic dna was extracted from his oral epithelial cells , hair roots and skin fibroblasts . the entire coding sequence of the vhl gene was pcr - amplified with 36-cycle reactions using conditions and primers , with minimum variations , described previously . pcr products were analysed by denaturing high - performance liquid chromatography ( dhplc ) using a wave 2100 dna fragment analysis system ( transgenomic wave system , omaha , ne ) at column temperatures recommended by the wavemaker version 4.1.31 software ( transgenomic ) and at melt temperatures determined by the dhplc melt software ( http://insertion.stanford.edu/melt.html ) . nucleic acids were separated in the column according to size and degree of denaturation in a gradient of two buffers ( a : 0.1 m triethylammonium acetate , [ teaa , ph 7 ] ; b : 0.1 m teaa , 25% acetonitrile ) . dhplc analysis was performed at a melt temperature of 60c at a constant flow rate of 0.9 ml / min using a linear gradient of acetonitrile . to 13.75% , increased over 5 min . to 16.25% , was kept constant for 1 min . , increased over 1 min . to 25% , was kept constant for 1 min . ( wash ) , decreased over 1 min . to 8.75% , and was kept constant for 1 min . heteroduplex molecules can be detected as an additional peak , or shoulder , in the chromatogram . amplimers with abnormal denaturing profiles were purified ( microcon pcr , millipore , bedford , ma ) and sequenced bidirectionally using an abi bigdye terminator cycle sequencing kit v.3.1 and an abi prism 310 genetic analyser ( both from applied biosystems , foster city , ca , usa ) . sequencing results were analysed using the sequencing analysis v.3.6.1 and autoassembler v.2.1 software packages ( both from applied biosystems ) . for semiquantitative analysis , the pcr products were cloned into a vector ( topo ta cloning , invitrogen carlsbad , ca , usa ) according to the manufacturer 's instructions , directly amplified from bacterial colonies , and sequenced . the clinical features of the female pro - band have been reported in our previously published study , briefly , the young girl now 26 years old , underwent surgery at 11 years of age to resect a pheochromocytoma associated with hypertension . at age 18 years , she underwent further surgery to remove a pheochromocytoma in the contralateral adrenal gland and two concurrent paragangliomas of the abdominal aorta and urinary bladder . one year ago , she was also found to have a right - sided , extra - axial , 1.6-cm supratentorial frontal meningioma ( fig . post - operatively , neuroendocrine serum markers ( plasma free metanephrines , chromogranin a , neuron- specific enolase , and gastrointestinal hormones carcinoembryonic antigen [ cea ] and calcitonin ) have remained negative . a family history obtained from her parents , one brother and one sister was uninformative , except for the father 's history . mr image of the pro - band 's brain.the arrow indicates a 1.6-cm - diameter right - sided supratentorial frontal meningioma . the patient 's father , now 51 years old , was found to have an angioma of the glans penis and had had surgery for a mandibular cyst and epididymal cystadenomas at age 43 years . indeed , abdominal ultrasonography and total - body magnetic resonance imaging ( mri ) revealed a 2.3-cm cyst in his right kidney . his blood pressure and levels of plasma free metanephrines , fractionated urinary metanephrines , chromogranin a were normal . genomic dna was extracted from peripheral blood lymphocytes ( pbls ) of the pro - band and her four first - degree relatives . to assess the possibility that the pro - band 's father had an attenuated vhl phenotype caused by mosaicism , somatic dna was extracted from his oral epithelial cells , hair roots and skin fibroblasts . the entire coding sequence of the vhl gene was pcr - amplified with 36-cycle reactions using conditions and primers , with minimum variations , described previously . pcr products were analysed by denaturing high - performance liquid chromatography ( dhplc ) using a wave 2100 dna fragment analysis system ( transgenomic wave system , omaha , ne ) at column temperatures recommended by the wavemaker version 4.1.31 software ( transgenomic ) and at melt temperatures determined by the dhplc melt software ( http://insertion.stanford.edu/melt.html ) . nucleic acids were separated in the column according to size and degree of denaturation in a gradient of two buffers ( a : 0.1 m triethylammonium acetate , [ teaa , ph 7 ] ; b : 0.1 m teaa , 25% acetonitrile ) . dhplc analysis was performed at a melt temperature of 60c at a constant flow rate of 0.9 ml / min using a linear gradient of acetonitrile . to 13.75% , increased over 5 min . to 16.25% , was kept constant for 1 min . , increased over 1 min . to 25% , was kept constant for 1 min . ( wash ) , decreased over 1 min . to 8.75% , and was kept constant for 1 min . heteroduplex molecules can be detected as an additional peak , or shoulder , in the chromatogram . amplimers with abnormal denaturing profiles were purified ( microcon pcr , millipore , bedford , ma ) and sequenced bidirectionally using an abi bigdye terminator cycle sequencing kit v.3.1 and an abi prism 310 genetic analyser ( both from applied biosystems , foster city , ca , usa ) . sequencing results were analysed using the sequencing analysis v.3.6.1 and autoassembler v.2.1 software packages ( both from applied biosystems ) . for semiquantitative analysis , the pcr products were cloned into a vector ( topo ta cloning , invitrogen carlsbad , ca , usa ) according to the manufacturer 's instructions , directly amplified from bacterial colonies , and sequenced . the vhl gene sequence was altered in both the proband and her father , though to different extents in each . dhplc analysis of vhl exon 3 pbls dna showed quantitative differences in the dna elution profiles between daughter and father ( fig . this finding suggested a heterozygous mutation in both relatives and mosaicism in the father . to test for mosaicism dna exon 3 amplicons extracted from the father 's buccal mucosa , hair roots , and skin fibroblasts showed different levels of intensity of the same altered dhplc peaks , although the intensity of the peaks in these tissues was comparable to that seen in the pbl dna ( fig . 2b ) . denaturing high - performance liquid chromatography ( dhplc ) analysis of exon 3 of the von hippel - lindau disease ( vhl ) gene in all the family members . ( a ) dhplc analysis of the pcr products of exon 3 of the vhl gene in the pro - band ( dhplc 3 ) , her father ( dhplc 1 ) , her mother ( dhplc 2 ) and her brother and sister ( dhplc 4 , 5 ) . in the first - degree pedigree of this family , the pro - band is indicated by number 03 , while her father , mother , sister and brother are indicated by the numbers 01 , 02 , 04 and 05 , respectively . dhplc analysis of the pro - band 's dna shows an extra peak that is barely visible ( but reproducibly so ) in her father 's dna . ( b ) dhplc analysis of dna extracted from a normal control ( nc ) sample , the father 's pbls , and the father 's oral mucosa ( tissue 1 ) , hair roots ( tissue 2 ) and fibroblasts ( tissue 3 ) . sequence analysis demonstrated that the abnormal elution profile ( abnormal peak ) seen in the proband and at various levels in the different cells from the pro - band 's father was the result of a missense mutation in exon 3 . this mutation was a g - to - a substitution at cdna nucleotide 695 , predicting the replacement of wild - type arginine with glutamine codon 161 of pvhl ( r161q ) . the mutation - related peak in the pbls dna of the father was smaller than the same peak in the daughter 's pbls dna ( compare fig . 3a with fig . amplicon cloning revealed that only a few clones contained the g - to - a mutant allele . in other words , in the dna extracted from the father 's circulating pbls , the ratio of wild - type to mutated gene was approximately 85:15 , instead of the 50:50 ratio expected in the absence of mosaicism . sequencing analysis of exon 3 of the vhl gene in dna from pbls of the pro - band ( a ) and her father ( b ) . to summarize , direct sequencing analysis suggested that only a small fraction ( about 15% ) of pbls contained the vhl mutation , lower than the typical 50% observed in a heterozygous individual . furthermore , only in the dna extracted from buccal cells , hair roots and cultured skin fibroblast cells was this mutation evident . the mosaic individual we described here presented at age 51 years with an angioma of the glans penis and a renal cyst . at age his daughter , in contrast , had a full - blown germline vhl gene mutation at a much younger age ( 11 years ) . in the mosaic subject , the late disease onset and mild vhl phenotype might have been mediated by the presence of two different cell populations , a prevailing population with a normal vhl gene and a smaller one with a mutated vhl gene . very few abnormal cells are likely to be present in the father 's chromaffin cells , thus explaining the absence of pheochromocytomas / paragangliomas , because decreased / lower likelihood to have a so - called second hit event with subsequent movement of mutated cells more towards homozygosity , as it has been shown in multiple endocrine neoplasia type 2 associated tumours [ 23 , 24 ] . the inaccessibility of the chromaffin cells in the father 's adrenal medulla and paraganglia precluded the experimental quantification the ratio of normal to mutated vhl . as a consequence of the atypical phenotype of vhl in the father , the presence of familial vhl was not recognized initially . in recent years , the role of pvhl in the regulation of hypoxia - inducible genes through the targeted ubiquitination and degradation of hif1 has been elucidated , leading to a model of how disruption of the vhl gene can result in highly vascularized tumours . when pvhl is absent or mutated , hif1 subunits accumulate , resulting in cell proliferation and neovascularization in vhl tumours . vhl is inherited in an autosomal dominant fashion , with about 80% of cases being familial and about 20% sporadic as a result of a de novo mutation . the family history can sometimes be falsely negative because of failure to recognize the disorder in some family members , reduced penetrance , intrafamilial variability of clinical expression , death of the affected parent before the onset of symptoms or late onset of the disease in the affected parent . another reason why vhl may go unrecognized in either parent , so that the disease in the child is erroneously considered to be sporadic , is somatic mosaicism . first described vhl mosaicism in 2 ( 5% ) of 42 unrelated families , both of which lacked a history of vhl . the two patients ( one man and one woman ) had clinical evidence of vhl , but no vhl mutations were detected in the initial genetic test performed on dna from their pbls ; in contrast , their clinically affected offspring tested positive for vhl mutations in their pbls . another case of parental mosaicism was described by murgia et al . in kindred in whom this pro - band presented at age 26 years with cerebellar haemangioblastoma , retinal haemangioma , multiple bilateral renal cysts and bilateral pheochromocytomas . in contrast , his asymptomatic ( mosaic ) mother , whose pbls dna showed a barely visible single - strand conformation polymorphism bandshift identical to that seen in her son , presented with only a small pheochromocytoma and renal microcysts by age 48 years . in both of the above studies , dna was also extracted from tissues , such as buccal cells and skin fibroblasts to confirm the somatic mosaicism . in similar families , the mosaic individual may be mildly or minimally affected , generally tends to have less severe disease than his or her offspring , and may be asymptomatic or present with less severe features of the disease . disease severity varies among mosaic individuals depending on whether mutations occur early or late in embryogenesis . asymptomatic carriers have been described in a number of heritable tumour syndromes reviewed by zlotogora and in many other heritable diseases , such as hutchinson - gilford progeria . mosaicism has also been demonstrated to be the cause of many cases of clinical recurrence . in a mosaic individual , the co - existence of mutated cells with even a small population of normal cells is an important parameter in predicting phenotype and overall prognosis and can increase the difficulty of obtaining a correct diagnosis of vhl . vhl in a mosaic individual may be difficult to recognize merely on clinical grounds , but it should always be considered when evaluating patients with isolated vhl - related tumours or parents of affected individuals . under these conditions , such individuals should be analysed for low - level mutations by emerging dna analysis techniques . it is presumed that the genotype - phenotype correlation in vhl reflects the degree to which the functions of pvhl are quantitatively and qualitatively altered by different mutations . a number of mutation carriers have been described , but not in sufficient numbers to define mutation - based phenotypes . furthermore , the percentages of symptomatic and asymptomatic vhl mutation carriers and the most important variable affecting disease penetrance and severity ( age at diagnosis , sex distribution , genetic co - factors or environmental modifiers ) have yet to be evaluated . this would allow the development of unified surveillance guidelines for vhl patients or those at risk for this disease . in summary , we have provided molecular evidence of somatic mosaicism in the father of a patient with full vhl . because of the incomplete penetrance of the disease the diagnosis of vhl was first not recognized . because of penetrance age related in vhl , a long - term follow up is warranted . counselling of patients and closely related family members must take a central place in the management of hereditary multi - organ cancer syndromes , such as vhl . a careful and complete clinical examination in the parents of each patient affected by an apparently de novo vhl germline mutation is warranted . the evaluation of the parents of a pro - band with an apparent de novo vhl gene mutation should include molecular genetic testing if the vhl disease - causing mutation in the pro - band is known . if the disease - causing vhl mutation in the pro - band is unknown , both parents should be offered a complete and extensive clinical and images examination , such as neurological test including mri of the craniospinal axis ; ophthalmologic evaluation ; measurement of plasma free metanephrines , chromogranin a , and fractionated urinary metanephrines ; and abdominal ultrasonography , mri or computed tomography . the real incidence of mosaicism in vhl remains uncertain , but such a phenomenon has important consequences for molecular testing , clinical diagnosis and genetic counselling , in terms of prediction of phenotype and risk of recurrence after the initial diagnosis . the real incidence of mosaicism will hopefully yield better data on the real incidence of de novo vhl mutations .","<S> abstractvon hippel - lindau disease ( vhl ) is an autosomal dominant , familial neoplastic disorder with variable interfamilial and intrafamilial expression . </S> <S> vhl is characterized by pre - disposition to development of a combination of benign and malignant tumours affecting multiple organs . </S> <S> we provide molecular evidence of somatic mosaicism in nearly asymptomatic man whose daughter had vhl . </S> <S> the mosaic subject was found to have a cyst of the kidney and an angioma of the glans penis and had had surgery for a mandibular cyst and epididymal cystadenomas . </S> <S> mosaicism could provide a genetic explanation for the clinical heterogeneity and variable severity of vhl . </S> <S> the real incidence of mosaicism is still unclear and the identification of mosaicism has important consequences in genetic counseling of vhl patients who appear to have de novo vhl mutations and should be considered when evaluating patients with isolated vhl - related tumours . </S> <S> our results strongly suggest a complete and extensive clinical examination in the parents of each patient affected by an apparently de novo vhl germline mutation.we recommend performing a mutation screening of both parents of a proband with techniques that permit detection of low percentages of mosaicism before concluding that the proband has a de novo vhl mutation . </S>"
3,"children s oncology group ( cog ) member institutions care for the majority of infants , children , and adolescents with acute lymphoblastic leukemia in north america and oceania . work by the legacy children s cancer group ( ccg ) , dating back more than 40 years , serves as the foundation for many current cog trials . studies prior to 1983 built on the pioneering work of donald pinkel and his colleagues at st . jude children s research hospital,(1 ) and introduced berlin frankfurt mnster ( bfm)-based post induction intensification ( protocol ib or consolidation and protocol ii or delayed intensification ( di),(2 ) and a widely - used age - based dosing schedule for it ( it ) therapy.(3 ) the prognostic significance of early marrow response , assessed by marrow blast percentage 7 and 14 days into induction , was defined . ( 4 ) event - free survival ( efs ) improved with vincristine and prednisone pulses as the sole post - induction intensification and extended maintenance it methotrexate replaced pre - symptomatic whole brain irradiation for lowest risk patients.(5 ) thirteen trials , conducted from 1983 through 1995 were summarized in the december 2000 issue of leukemia . ( 6 ) two 19831988 studies ( 100 series ) , namely , ccg-106 ( 7 ) and ccg-123 , ( 8) proved the advantage of early bfm - based strategies , prior to the introduction of bfm protocol m or methotrexate 5 g / m , over previous ccg efforts for higher risk ( hr ) children . a third study , ccg-105 , showed that more effective systemic therapy and extended it methotrexate could spare all cns negative standard risk ( sr ) children from whole brain irradiation and proved the value of post induction intensification.(9 , 10 ) induction anthracycline , higher dose induction prednisone , and intensive consolidation added no further benefit for standard risk patients receiving di and vincristine - prednisone pulses in maintenance . the 19891995 studies ( 1800 series ) further restricted whole brain irradiation ( 11 ) and showed the value of longer and stronger post - induction intensification , the so - called augmented bfm regimen for hr patients ( 12 ) and dexamethasone for sr patients . ( 13 ) patients received monthly vincristine and prednisone or dexamethasone pulses through maintenance in all of these trials , unlike contemporary bfm practice . this report provides further follow - up on past 19831995 studies , and adds 4 additional trials and 3482 additional patients from 19962002 ( 1900 series ) . replacement of 6-mercaptopurine ( 6mp ) with 6-thioguanine ( 6tg ) provided an efs advantage but with unacceptable liver toxicity for sr patients on ccg-1952.(14 ) it triple therapy , i.e. , cytarabine , methotrexate , and hydrocortisone , halved cns relapse rates compared to it methotrexate alone but allowed excess marrow and testes relapses on a methotrexate - poor platform , resulting in an inferior survival.(15 ) ccg-1962 showed that pegylated asparaginase safely and effectively replaced native asparaginase.(16 ) ccg-1961 explored the components of the augmented bfm regimen for higher risk patients with a rapid day 7 response and showed the advantage for stronger , not longer post induction intensification.(17 ) this report excludes the final ccg trial , ccg-1991 , which completed accrual only in 2005.(18 , 19 ) the clinical trials evaluation panel of the national cancer institute of the united states approved all protocols . details of all studies have been published . between 1983 and 2002 , 13,298 infants , children , and adolescents , age < 21 years at diagnosis enrolled on one of 16 treatment protocol . the diagnosis of all was based on morphology,(20 ) histochemistry , and increasingly on flow cytometry . patients with fab l3 morphology and myeloperoxidase positivity were excluded . between 1983 and 1988 , a total of 3713 eligible , evaluable patients were entered on the ccg-100 series studies . patients were stratified by age , white blood cell count ( wbc ) , gender , platelet count , fab classification,(21 ) and lymphomatous features.(22 ) the lowest risk patients received vincristine , prednisone , and l - asparaginase during induction , it methotrexate in induction , consolidation , and maintenance , and daily oral 6mp , weekly oral methotrexate , and monthly vincristine / prednisone pulses in maintenance on ccg-104 . with a 222 factorial design , intermediate risk patients were randomly allocated to receive standard or intensive induction / consolidation , di or no intensification , and 18 gy cranial irradiation or every 12 week it methotrexate on ccg-105.(9 , 10 ) a small number of intermediate risk patients were enrolled on ccg-139 and received either intermediate dose methotrexate 0.5 g / m2 with leucovorin rescue or oral methotrexate . no patient received di.(23 ) higher risk patients with lymphomatous features were randomly allocated to lsa2l2 with or without cranial irradiation , the new york ( ny ) i regimen , or the ccg modified bfm regimen on ccg-123.(8 ) higher risk patients without lymphomatous features were randomly allocated to the standard ccg regimen , ny i regimen , or the ccg - modified bfm regimen on ccg-106.(7 ) infants were treated on ccg-107 , which employed very high - dose methotrexate ( 33.6 g / m2 ) with leucovorin rescue . ( 24 ) between 1989 and 1995 , a total of 5121 eligible , evaluable patients were entered on the ccg-1800 series studies . ( 25 ) intermediate risk patients, now excluding anyone 10 years of age or older , all received prednisone in induction and a single di phase and were randomly allocated to receive or not a second di phase and vincristine / prednisone pulses every 3 or 4 weeks on ccg-1891.(26 ) upon completion of these initial studies in 1992 and 1993 , subsequent sr patients(27 ) were enrolled on ccg-1922 , which compared oral vs parenteral 6mp and dexamethasone vs prednisone in induction and maintenance . all patients received dexamethasone during a single di phase and it methotrexate every 12 weeks in maintenance . ( 13 ) cranial irradiation was reserved for those with overt cns disease at diagnosis . higher risk patients with lymphomatous features were randomly allocated to nyi or nyii therapy on ccg-1901 . all received cranial irradiation . ( 28 ) higher risk patients with wbc 50,000/l or age 10 years who lacked lymphomatous features were assigned to ccg-1882 . on ccg-1882 , patients with no cns disease at diagnosis ( < 5 leukocytes/l or no blasts in the cerebrospinal fluid ) and < 25% marrow blasts on day 7 of an induction phase consisting of vincristine , prednisone , l - asparaginase , and daunorubicin ( rapid early responders , rer ) were randomly allocated to receive 18 gy cranial irradiation or additional it methotrexate.(11 ) patients on ccg-1882 with > 25% marrow blasts on day 7 of induction ( slow early responders , ser ) were initially treated on a pilot study of longer and stronger post induction intensification , the augmented bfm regimen . after an initial cohort demonstrated the safety of this regimen , ser patients were randomly allocated to our standard ccg - modified bfm regimen or to the augmented bfm regimen.(12 ) infants < 1 year of age , were treated on ccg-1883 and received intensive induction , consolidation including very high - dose methotrexate ( 33.6 g / m2 ) , and intensive post - consolidation therapy without cranial irradiation.(24 ) classification as b - precursor and t - lineage was determined centrally . treatment was allocated by age , wbc , and day 7 or 14 marrow response . specifically , t - cell patients who met sr age and wbc criteria were now classified as sr . on ccg-1952 , sr patients received three drug vincristine , prednisone , and native e. coli asparaginase induction , two 2-month di phases , and daily oral 6mp , weekly oral methotrexate , every 4 week vincristine / prednisone pulses , and every 12 week it therapy . ( 14 , 15 ) all patients received it cytarabine at the start of treatment , it methotrexate in induction and 6tg in di . patients were randomly assigned to receive it methotrexate or it triple therapy after induction and to receive either 6tg or 6mp in consolidation , interim maintenance , and maintenance . sr patients with marrow blasts > 25% on day 14 of induction received the augmented bfm regimen after induction . patients were randomized to receive native ( 21 doses ) or pegylated ( 3 doses ) asparaginase . ( 16 ) on ccg-1961 , hr rer patients were randomly assigned to receive standard or longer duration and standard or stronger intensity post induction intensification . hr ser patients received the augmented bfm regimen and were randomly assigned to either weekly doxorubicin or sequential idarubicin / cyclophosphamide in each of two di phases . ( 17 ) at the start of the study , b - precursor ser patients were randomly assigned to receive or not to receive b43-pap , an anti - cd19 pokeweed antiviral protein immunotoxin . however , the manufacturer withdrew the drug from study . ( 29 ) infants , < 1 year of age , were treated on ccg-1953 ( 30 ) with an intensive triple induction strategy shared with pog 9407 ( 31 , 32 ) and received intensive induction , consolidation including high - dose methotrexate ( 5 g / m ) , and intensive post - consolidation therapy with no cranial irradiation . classification as b - precursor and t - lineage was determined at a central reference laboratory . efs time was defined as the time from diagnosis to first event ( induction failure , relapse , death , or second malignant neoplasm ) or last contact for those who did not have an event . overall survival ( os ) time was defined as time from diagnosis to death or last contact . event - free survival and os rates were computed by the method of kaplan - meier ( 33 ) and were compared using the log - rank test . cox proportional hazards regression was used to identify independent prognostic factors for efs . for patients who achieved complete remission , cumulative incidence rates of isolated cns or any ( isolated plus combined ) cns relapse , therapy - related second malignancies , and remission deaths , were computed and compared using gray s method ( 34 ) adjusting for competing events . efs time was defined as the time from diagnosis to first event ( induction failure , relapse , death , or second malignant neoplasm ) or last contact for those who did not have an event . overall survival ( os ) time was defined as time from diagnosis to death or last contact . event - free survival and os rates were computed by the method of kaplan - meier ( 33 ) and were compared using the log - rank test . cox proportional hazards regression was used to identify independent prognostic factors for efs . for patients who achieved complete remission , cumulative incidence rates of isolated cns or any ( isolated plus combined ) cns relapse , therapy - related second malignancies , and remission deaths , were computed and compared using gray s method ( 34 ) adjusting for competing events . table 1 summarizes the 21 randomized questions posed in the twelve studies that posed a randomized question . in all three periods , patients with marrow blasts 25% at the end of induction were removed from protocol therapy as induction failures ( an event ) and may have later undergone allogeneic stem cell transplantation . the ccg-100 series ( 19831988 ) and ccg-1800 series ( 19891995 ) studies made no specific allowance for first remission transplantation . patients with t(4;11 ) , t(9;22 ) , hypodiploidy ( chromosomes 44 ) or induction failure were eligible . in addition , infants ( 212 months ) with cd10 negativity , presenting wbc 100,000/l or day 14 marrow blasts > 5% and older children , age 10 years , with presenting wbc 200,000/l were also eligible . ( 35 ) on the ccg-1900 series ( 19962002 ) , patients with t(4;11 ) , t(9;22 ) , hypodiploidy < 44 chromosomes , or marrow blasts between 5% and 25% at the end of induction were eligible for allogeneic transplant , if a suitable donor might be found . any transplanted patients are included in all analyses . over time , the percentage of patients receiving cranial irradiation decreased substantially with 65% , 35% , and 15% of patients receiving 18 gy pre - symptomatic whole brain irradiation therapy in the 2 month of therapy on the ccg-100 series ( 19831988 ) , ccg-1800 series ( 19891995 ) , and ccg-1900 series ( 19962002 ) , respectively . table 2 summarizes the data on induction failures , induction deaths , relapses , secondary malignant neoplasm and remission deaths for the three series . the data are presented separately for b - precursor sr and hr , infant , and t - cell all . induction failure rates for the b - precursor sr patients ranged from 0% to 0.4% across the three series and between 0.9% and 1.3% for the hr patients . induction death rates fell from 1.1% to 0.2% for sr patients ; and from 2.5% to 1.4% for the hr patients . induction death rates for t - cell all fell from 2.2% to 1.3% across series . however the induction death rates increased significantly ( 3.1% , 1.5% , and 13.0% , respectively ) in the last time period . relapses are broken down by site , namely , isolated marrow , isolated cns , and combined or other sites . isolated marrow relapse comprised about one - half of all relapses across the three series for b - precursor sr patients . isolated marrow relapse among b - precursor hr patients comprised a similar proportions in the 100 and 1800 series , namely 72% and 78% , but decreased significantly to 59% in the most recent 1900 series . for infants , the proportion of isolated marrow relapses increased ( 60% vs 73% vs 83% ) while the proportion of cns relapse fell across series . for t - cells , the proportion of isolated marrow relapses remained the same across series ( 48% to 55% ) . analyses include estimation of outcomes by lineage and nci risk classification and by study series . gender , age , wbc , and early marrow response maintained prognostic significance over all three time periods . the prognostic significance of cns disease at diagnosis increased over the three time intervals as outcomes did not improve for this challenging subset while impropving substantially for patients without cns disease at diagnosis . ethnicity lost significance . at 10-years , efs improved from 51% for black patients diagnosed between 1983 and 1988 ( 100 series ) to 67% for patients diagnosed between 1996 and 2002 ( 1900 series ) , while efs for white patients improved from 63% to 73% . the 5-year efs for t(1;19 ) , t(4;11 ) , and t(9;22 ) also improved from 69% , 24% , and 30% , respectively , for the 1800 series to 78% , 44% and 37% , respectively , for the 1900 series . hypodiploid ( < 45 chromosomes ) and hyperdiploid ( > 50 chromosomes ) patients went from 35% and 80% to 54% and 83% , respectively , over the same time periods . as the mix of infants and higher and lower risk patients differed over time , figures 13 display efs by study series for sr and hr b - precursor , t - cell , and infants , respectively , in order to facilitate cross series comparisons . the efs and os improved significantly overtime for sr b - precursor patients ( p<0.0001 and p=0.0001 , respectively ) and for hr b - precursor patients . for hr t cell patients , 5-year efs was 58% and 73% in 19831988 and 19962002 ( tables 3 and 5 ) . for sr t cell patients , 5-year efs was 68% and 73% in 19831988 and 19962002 ( tables 3 and 5 ) with gains to 80% in 19891995 ( table 4 ) , which were subsequently lost when sr t - cell patients were assigned to less intensive therapy . the change in outcome for infants was not statistically significant . for 100 series , 1800 series , and 1900 series patients , the 10-year cumulative incidence rates for death in remission were 2.60.3% , 3.00.3% , and 3.60.7% , respectively ( table 7 , figure 4a , b , c ) . rates were highest in the infant studies with 5-year remission death rates of 7.02.9% and 31.35.5% on ccg-1883 and ccg-1953 . ten - year rates were between 1% and 1.5% in the sr studies . on the hr study ccg-1961 , the remission death rate was 3.20.4% at 5 years and increased to 5.01.5% at 10 years . two of the 4 late deaths are attributed to the late complications of bone marrow transplantation ; 1 death was accidental and 1 was unknown . for 100 series and 1900 series patients , the 10-year cumulative incidence of isolated and combined cns relapse ( table 7 , figure 4a , b , c ) decreased from 7.00.5% to 4.60.3% and 9.50.5% to 7.20.5% , despite less use of brain irradiation . for 100 series , 1800 series , and 1900 series patients , the 10-year cumulative incidence of second malignant neoplasm ( table 7 , figure 4a , b , c ) was 0.70.2% , 1.10.2% , and 1.00.2% , respectively . for sr patients in this report , we review the outcome of 13,298 children with all and enrolled in one of sixteen ccg trials between 1983 and 2002 . during this period , efs and os increased significantly for all groups except infants < 1 year of age , who had only a 4-percentage point improvement in 5-year os . the smallest gains were attained for t - all patients with sr features for whom outcomes actually deteriorated between 19891995 and 19962002 , likely due to allocation to less aggressive sr regimens on the ccg-1900 series studies as opposed to treatment on hr regimens in earlier eras . overall , patients in first remission at 5 years had a consistent 4% risk for an adverse event between 5 and 10 years from diagnosis . the results of the randomized questions of these trials have shaped contemporary cog all therapy . vincristine and prednisone pulses , shown to be effective for sr patients as the sole post induction intensification on ccg-161 , were the first effective post induction intensification introduced in ccg ( 5 ) and remain a part of current cog regimens . subsequently , every three - week pulses had no advantage over four - weekly pulses on ccg 1891 . ( 26 ) recent ibfm data show no advantage for vincristine / dexamethasone pulses in the context of intensive bfm - based therapy ( 36 ) but yet more recent eortc data differ ( 37 ) for uncertain reasons . nonetheless , maintenance vincristine and steroid pulses may now be redundant in the context of more aggressive current bfm - based therapy . ( 38 , 39 ) ccg-105 showed that induction anthracycline added nothing to a three - drug , vincristine , l - asparaginase , and prednisone induction for sr patients who received di . ( 9 , 10 ) on ccg-1922 , omission of induction anthracycline facilitated near iso- toxic substitution of induction and maintenance dexamethasone for prednisone at a dose ratio of 1 to 6.7 ( 13 ) recent mrc ( 40 ) and bfm ( 41 ) data support this advantage at ratios of 1 to 6.1 and 1 to 6 , with no advantage evident in japanese ( 42 ) and eortc ( 43 ) trials with ratios of 1 to 7.5 and 1 to 10 . the ccg-1922 results were not available when ccg 1952 opened and ccg 1952 patients received induction prednisone but subsequent ccg and cog sr all trials have used dexamethasone in three - drug induction to good effect . the augmented bfm regimen , employing longer and stronger post induction intensification , was found superior for hr ser patients on ccg 1882 and has become the mainstay of current cog therapy . ( 12 ) the successor trial , ccg 1961 , trial found that stronger intensification , derived from the augmented bfm regimen improved outcome for hr rer patients also , but that longer intensification did not . ( 17 ) longer intensification also added nothing for sr patients who received induction dexamethasone on ccg-1991 . ( 18 ) these findings focus attention on improving the quality of the first six months of post induction therapy . administration of the second block of therapy , termed protocol ib by the bfm group and consolidation by cog , requires approximately two months . together with similar cyclophosphamide , cytarabine , and 6tg block of di , this element occupies 3 of the first 7 months of treatment . despite its long - standing place in treatment , augmented consolidation introduced vincristine and asparaginase during the neutropenic periods that follow administration of cyclophosphamide , cytarabine and 6mp . the mrc ( uk ) reports that the addition of vincristine and asparaginase in consolidation increases the clearance of minimal residual disease for patients who are still positive at the end of the first month of therapy . ( 44 ) cog is now testing augmented consolidation for sr b - precursor patients on aall0331 . the two months of therapy following ib ( bfm ) or consolidation ( cog ) and preceding protocol ii ( bfm ) or di ( cog ) , termed interim maintenance ( i m ) by cog and consolidation by the bfm group , have diverged over the years . earliest bfm and ccg trials employed daily oral 6 mp and weekly oral methotrexate . bfm all 86 replaced weekly oral methotrexate with 4 courses of parenteral methotrexate 5 gram / m with leucovorin rescue in protocol m. ( 45 ) ccg introduced five courses of vincristine and escalating - dose intravenous methotrexate given every 1011 days without leucovorin rescue followed by asparaginase during this i m phase on ccg 1882 . ( 12 ) this augmented i m phase is now compared to 4 courses of parenteral methotrexate 5 gram / m with leucovorin rescue in the current cog hr b - precursor ( aall0232 ) and t - cell ( aall0434 ) trials . of interest , ccg 5971 found no advantage for every 2 week 5 gram / m methotrexate and leucovorin over weekly oral 20 mg / m methotrexate for patients with lymphoblastic lymphoma . ( 46 ) ccg 1991 found better efs with five courses of vincristine and escalating - dose intravenous methotrexate given every 1011 days without leucovorin given before and after di versus of oral methotrexate and 6mp . thus after 60 years , investigators are still exploring the best ways to administer methotrexate . ( 19 ) ccg 1962 showed that 3 intramuscular ( i m ) doses of pegylated asparaginase can safely replace 21 i m doses of native asparaginase . the pegylated product provided a superior day 14 marrow response and lower rates of antibody development . ( 16 ) ccg 1961 employed pegylated asparaginase after induction for the augmented arms . ( 17 ) pog 9900 changed from 6 doses of native asparaginase 10,000 u / m administered three times a week to one dose of pegylated asparaginase in induction for sr patients . the incidence of end induction minimum residual disease ( mrd ) positivity ( > 10 ) went from 18.9% to 14.3% . ( 47 ) following this , all cog all trials now use pegylated rather than native asparaginase , thus sparing children unneeded intramuscular injections . ( 48 ) freyer et al examined survival after relapse for hr patients on ccg-1961 randomized to more or less effective post induction intensification . ( 49 ) contrary to intuition but in agreement with other observations excluding intravenous 6mp , post relapse survival was identical for patients relapsing from more and less effective regimens . as relapse rates decrease over time with improved therapy , remission deaths become larger contributors to overall death rates . 34% for hr studies , with adolescent patients being at higher risk . among patients older than 15 years , remission deaths comprise 25% of adverse events.(50 ) remission deaths after 5 years may be increasing as more patients receive hematopoietic stem cell transplant in first remission . bhatia et al comment that about 16% of all patients alive and in remission at two years after allogeneic bone marrow will expire in the next 8 years.(51 ) transplantation accounts for the increased remission death rate among adolescents and young adults treated according to adult versus pediatric protocols.(52 ) unfortunately , improvements in therapy have been accompanied by increases in toxicity . the most striking has been the increase in avascular necrosis of bone ( avn ) , which was rarely recognized in patients diagnosed before 1986 , but became more common , especially in adolescents and young adults , with the ccg 1800 era trials.(53,54 ) this complication can lead to significant life - long morbidity with many patients requiring joint replacement surgery during adolescence or early adulthood . altered dosing of dexamethasone during di , i.e. , days 17 and days 1521 , rather than days 121 , ( 55 ) provide some decrease in the incidence but excessive avn led to a suspension of the randomization to induction dexamethasone for adolescents on aall0232.(56 ) screening for exceedingly rare anthracycline cardiotoxicity is standard while screening for avn in older populations with a risk that exceeds 10% has not been adapted . while identification of lesions prior to collapse seems desirable ( 57 ) , the significance of early mri findings remains in doubt.(58 ) another lesson learned from these twenty years of trials is the need for adequate sample size to answer critical questions . statistical power depends on the magnitude of the impact of an intervention and the number of captured events not patients . as trial planning is based on prior data and outcomes tend to improve over time , baseline event rates are often overestimated . if the trials had been designed to detect a 50% or greater reduction in risk of failure , effective interventions , such as dexamethasone for sr all , or augmented bfm for hr all , may have been missed . marginal sample size limits opportunity for exploration of potential interactions and generation of novel hypotheses that will support future trials . sample size estimates should be based on most recent event rates and moderate treatment impact . with improved outcomes , a geometrically increasing number of patients must be treated to prevent one event ( number needed to treat ) . when efs went from 40% to 60% on ccg-106 , ( 7 ) a one - third reduction in failures , only five patients had to be exposed to a novel therapy to benefit one patient . for example , increasing efs from 88% to 92% , a one third reduction in failure , requires that 25 patients receive the experimental intervention to benefit one patient . on ccg 1991 , ( 19 ) increasing efs through better primary treatment can obviate the need for salvage treatment of prevented relapses , usually morbid and too often ineffective , and provide a net decrease in the use of medical services . ( 59 ) for the future , better ascertainment of patients at higher and lower risk of relapse is critical , and new therapies must be developed that are targeted at the molecular abnormalities that cause leukemia and/or treatment failure . most cooperative treatment groups have incorporated minimal residual disease ( mrd ) testing to identify patients at higher or lower risk of relapse . patients with an mrd burden greater than 0.01% at end induction have an increased risk of relapse . however in contemporary cog trials , half of relapses still arise among patients with end - induction mrd < 0.01% . ( 60 ) adding a second mrd time point earlier ( 60 ) or later ( 61 ) in therapy can help to refine mrd - based risk assessment . minimal residual disease is prognostic in t - cell as well as b - precursor leukemia . ( 62 ) the newer genomic technologies including gene expression profiles ( 63 ) and arrays to detect genomic copy number alterations ( 64 ) may lead to better insight into the molecular basis of leukemogenesis ( 65 ) and identify new potential therapeutic targets like jak2 . ( 66 ) the roles of pharmacogenomics ( 67 ) and patient / family treatment adherence ( 68 ) are under study . in philadelphia chromosome positive chronic myelogenous leukemia(69 ) and all ( 70 ) ( 70 ) understanding the mechanism(s ) of imatinib resistance has led to novel , effective treatments . ( 71 ) one might reasonably hope that understanding of the mechanism(s ) of treatment failure in childhood all holds similar promise . over the past 40 years , cure of this once incurable disease has become commonplace . with deeper insight into leukemia biology","<S> the children s cancer group enrolled 13,298 young people age < 21 years on one of 16 protocols between 1983 and 2002 . </S> <S> outcomes were examined in three time periods , 19831988 , 19891995 , 19962002 . over the three intervals , 10-year event - free survival ( efs ) for rome </S> <S> / nci standard risk and higher risk b - precursor patients was 68% and 58% , 77% and 63% , and 78% and 67% , respectively ; while for standard risk and higher risk t - cell patients , efs was 65% and 56% , 78% and 68% , and 70% and 72% , respectively . </S> <S> five - year efs for infants was 36% , 38% , and 43% , respectively . </S> <S> seminal randomized studies led to a number of important findings . </S> <S> stronger post induction intensification improved outcome for both standard and higher risk patients . with improved systemic therapy , </S> <S> additional it methotrexate effectively replaced cranial radiation . for standard risk patients receiving three - drug induction , iso - toxic substitution of dexamethasone for prednisone improved efs . pegylated asparaginase safely and effectively replaced native asparaginase . </S> <S> thus , rational therapy modifications yielded better outcomes for both standard and higher risk patients . </S> <S> these trials provide the platforms for current children s oncology group trials . </S>"
4,"errors in the health care system are due to a diverse interaction of human behavior , socio - cultural aspects , technical aspects of the system , as well as a range of system weaknesses . various categories of errors present as an overlap between human and system causes . when working conditions lead to circumstances in which it is easy to commit an error , this is known as a latent error or system failure. lack of experienced staff on duty , leading to staff fatigue and poor administration verdicts for example , may lead to latent errors or system failure . this , in turn , can lead to violation - producing conditions , where an individual has little choice but to violate protocol . this includes knowledge - based errors , rule - based errors , skill - based errors , technical errors and violations . tackling only the active failures will lead to an accretion of latent conditions , and an inevitable error will ultimately occur , completing the cascade and resulting in a tragic outcome . a holistic approach to incident reporting would allow for the possibility that an error or adverse event suffered by a patient in one part of the world would be a transmitted source of learning that benefits future patients in many other countries . learning from both adverse events and near misses is essential for improving the quality of care however one of the greatest frustrations for patients and professionals alike is the apparent failure of the health care systems to learn from their mistakes . commonly , neither health care professionals nor health care organizations counsel others when a mishap occurs , nor do they share what they have learned when an investigation has been carried out . consequently , the same mistakes occur repeatedly in many settings and patients continue to be harmed by preventable errors . health - care organizations and individuals benefit from incident reporting if they receive back useful information , gained by analysis of similar cases at other institutions . if the event and the results of the analysis are not reported to an external authority , the lessons learned are trapped within the walls of that hospital . the opportunity to analyze the problem although the importance of incident reporting has been established , under - reporting remains a significant problem occurring for example , at a rate of 50%96% annually in the united states . one solution to this dilemma is reporting by the primary care providers within the hospital or health - care organization , and by the organization to a broader audience through a system - wide , regional , or national reporting system . researchers in the field of quality in health care believe that an effective reporting system is the foundation of safe practice and , within a hospital or other health - care organization , a corner stone towards achieving a culture of safety . at a minimum , reporting can help identify hazards and risks , and provide information as to where the system is breaking down . this can help target improvement efforts and systems changes to reduce the likelihood of injury to future patients . extensive work has been done in the west regarding the role of incident reporting systems in preventing harm to patients thus improving the quality and safety of health care . a report prepared for the department of health in the uk indicated that an adverse event was associated with 10% of hospital admissions . with over 850,000 events per year , costing more than 2 billion per year in direct health care costs . in one year , errors involving medical devices led to death or serious injury in 400 people . the cost of hospital - acquired infection was over 1 billion in direct health care costs alone , of which 15% were considered to be preventable . clinical negligence claims currently amount to 400 million annually , with an estimated potential liability of 2.4 billion in existing and expected claims . in the united states , analysis of the harvard medical practice study of 1984 medical records and the colorado / utah study of 1992 records showed adverse events to have been associated with 3.7% and 2.9% of admissions , 13.6% and 8.8% deaths respectively . peer review indicated that 55% of these events were preventable , and almost 28% were due to negligence . medication errors , technical errors , diagnostic errors and failure to prevent injury were the most common type of incidences reported . this report estimated that the total cost of preventable adverse events is between $ 17 billion and $ 29 billion , with direct health care costs accounting for over half . adverse events were associated with 16.6% of hospital admissions ( with approximately half leading to the admission , and half occurring during the admission ) , 4.9% mortality and permanent disability in 13.7% . of all adverse events , the preventable cost of adverse events may be as much as $ 2 billion annually , or 5% of the $ 40 billion spent each year on health care . in addition , costs arising from legal expenses and compensation for medical error currently total $ 400 million per year , which consumes a further 1% of the health budget . since the publication of the us institute of medicine report to err is human , and the uk department of health report an organization with a memory , there has been increasing recognition of the need for healthcare organizations to monitor and learn from patient safety incidents . over the last few years , several countries have established national or system - wide reporting systems to facilitate large scale monitoring and analysis of incident data[1517 ] . the national reporting and learning system ( nrls ) for england and in wales , established by the national patient safety agency , was rolled out in late 2003 and has now received over one million reports , mainly from acute hospitals . limited framework for incident reporting system exists in most of the health care system in pakistan and therefore poses risk to the patients and results in compromised quality of care . development of a nationwide incident reporting system is inevitable in pakistan . recognizing the attitudes and perceptions of health professionals who will implement this system is mandatory for its success . this study aims assess the attitudes and perceptions of doctors and nurses towards incident / error reporting in tertiary level health care of pakistan and to identify potential barriers at the grass root level to the implementation of an error reporting system . to the best of our knowledge the study was conducted in shifa international hospital ( sih ) , a 600 bed tertiary care facility , employing 520 registered health professionals . fifty percent reporting of error was taken as identified factor . for 95% confidence interval and precision of 5% , the questionnaire was designed by modifying those currently used by agency of health related quality ( ahrq ) and other researchers . a small description of key terminology such as incidence , error , adverse events , near misses or close calls , and medication errors was attached to each copy of the questionnaire . the questionnaire consisted of 3 sections , which encompasses determination of : the support or lack thereof provided by the working environment to affirm incident reporting ; health care professional 's perception regarding attitudes of managers and most important barriers to incident reporting ; the motivators to incident reporting ; and patient outcomes that influence reporting behavior of health professionals . variables explored were : working environment ( supportive , culture of blame and shame ) ; attitudes of managers ( "" we are informed about the errors that happen in this unit "" ) ; reasons to report the incident ( to get immediate help for patient , system development so that repetition of incidents can be minimized ) ; to whom incident reporting would be easy ( administration , head of the department ) ; and perceived barriers to incident reporting ( lack of feedback , legal and financial penalties and administrative sanctions ) . mean standard deviation ( sd ) of age and working hours per week were reported . frequency ( percentage % ) were presented for gender , staff position , primary area of employment , patient 's outcome influencing reporting behavior , and individual reporting of an un - witnessed incident . chi square test was used to test the significance of association of professional groups ( doctors and nurses ) with reasons to report , to whom incident reporting would be easy , perceived barriers to incident reporting and patient outcomes that influence reporting behavior . the only exclusion criteria used in the study was medical and nursing students . the ethical approval of the study was obtained from the institutional review board ( irb ) of shifa college of medicine . written informed consent was obtained from all participants . one hundred and fourteen doctors ( 52.5% ) and 103 nurses ( 47.5% ) completed and returned the questionnaire . of these participants , 116 ( 53.5% ) were men and 101 ( 46.5% ) were women . background details of the sample ( n=217 ) considerable homogeneity is found in the incident reporting attitude among different health professionals : 100% among consultants and registrars , 94% among medical officers and 97% among nurses are ready to report the incident happened through them . house officers are reluctant to report the incident happened through them , that is , 75% responded impartially ( neither likely / unlikely ) to report the incident . ( n=217 ) only 19.3% ( n=42 ) doctors and nurses believe that tertiary health care centers have enough staff to handle the workload . this result matches up with the findings that 70% percent ( n= 151 ) health professional believe that their working hours are too long and 60.4% ( n=131 ) health professional are working more than 80 hours per week . some other characteristics of the working environment ( such as mutual respect among workers ) and attitudes of management towards patient safety ( working fast by taking shortcuts ) are depicted in table 2 . working environment and attitudes of management ( agreed ) frequency - percentage ( n=217 ) table 3 shows the main motivator for incident reporting ; to whom reporting is easy ; perceived barriers to incident reporting and patient outcome that influence the reporting behavior of doctors and nurses . a statistically significant difference ( p<0.001 , or 5.035 , 95%ci 2.52 , 10.04 ) was found between doctors ( 42% ) and nurses ( 13% ) in learning for self and others from your mistake as the main reason for incident reporting . eighty percent doctors and 84% nurses think that system development to minimize the repetition of particular incidents is the main reason for incident reporting , although this association is not significant ( or 0.727 . sixty percent doctors ( n=69 ) and 80% nurses ( n= 83 ) think that incidents should be reported to the head of the department ( or 0.37 , 95% ci 0.19 , 0.68 . eighty eight percent of doctors ( n=101 ) and 84% of nurses ( n=87 ) share a common barrier to incident reporting as lack of feedback generation while the significance of association is low ( or 1.42 , 95% ci 0.65 , 3.13 . reasons ( motivators ) to report , feasible to report to and barriers to incident reporting ( n=217 ) we presented three hypothetical situations , in which different outcomes of patients could influence the reporting behavior of health professionals . in first situation , an incident occurred but was corrected before affecting the patient . in the second , the incident happened but has no potential harm to the patient and lastly , an incident happened that can harm the patient but does not . in all three situations nurses tend to report more than doctors and the associations were statically significant ( p<0.001 ) . only 37% doctors will report the incident that could harm the patient contrary to their counterparts nurses ( 79% ) who reported significantly more in this situation ( or 0.13 , 95% ci 0.07 , 0.24 . overall results of health professional 's incident reporting behaviors in different situations are shown in table 4 . any program that aims to improve patient safety must contain all - inclusive information on incidents , near misses , adverse events or errors so that , as a source , it can be used for learning and grounds for precautionary action in the future . some systems focus on specific types of incidents / errors concerning technologies or on areas where incidents / errors occur frequently ( i.e. beeping equipment , infusion pumps , and blood transfusion ) . some systems are open ended taking into account all incidents/ errors along with the entire spectrum of quality of care provided . the rationale for any reporting system is learning . reporting can lead to learning and patient safety in several ways . first , through generating alerts regarding new hazards ( e.g. complications or adverse effects of new drugs ) . finally , report analysis can provide insight into recognizing hazard trends and system failures to aid in the establishment of best practices guidelines . our study shows that incident reporting for the purpose of learning is not well avowed by health professionals , particularly nurses . significant differences exist between doctors ( 42% ) and nurses ( 12% ) for learning as the main reason for incident reporting ( or 5.035 , 95% ci 2.52 , 10.04 . p<0.001 ) . whereas the majority of health professionals ( doctors 80% and nurses 84% ) will report an incident in order to minimize its repetition in the future . incident - reporting behavior differs between doctors and nursing professional groups , with nurses reporting significantly more often than doctors . a study in the uk indicated that health professionals are reluctant to report an incident in which there was a negative outcome for the patient . our study showed similar findings in that nurses are more willing to report than doctors . an incident which harmed the patient negatively influenced the reporting behavior of both doctors and nurses . this may be because health professionals feel insecure about their job and are afraid that they will have to face administrative fury after committing and reporting an error . this is supported by the finding in our study that 69% of doctors ( n=79 ) and 68% of nurses ( n=70 ) believe that administrative sanction is the most important barrier to incident reporting . it is vital to note that a reporting system itself does not bring about or improve patient safety . it is the action or response to the reporting that brings the change . within an organization , reporting of incidents/ adverse events should lead to an in - depth investigation to assess the etiological factors ( active or latent ) so the system can be changed and recurrence can be prevented . at a national level , report analysis by experts and dissemination of information is required to improve patient safety through incident reporting . in this study more than 88% of doctors and 84% of nurses believe that the lack of feedback generation is the most influential barrier to incident reporting . a similar study conducted in south australia ( 2006 ) also found that almost two thirds of the health professionals ( doctors and nurses ) believed lack of feedback was the greatest deterrent to reporting . a non - supportive environment , a culture of blame and shame and the culture of medicine , with its emphasis on professional autonomy , collegiality , and self - regulation , is unlikely to foster incident reporting . our study identified that only 54% of health professionals believe that their hospital environment is supportive . moreover 57.1% of health professionals perceive lack of value in incident reporting because when an event is reported , it feels like the person is written up , not the problem. some other barriers to incident reporting identified from peer reviewed literature is the lack of knowledge about how , what and whom to report . the evidence suggests that an autonomous body to collect and analyze incident reports should be established within the hospitals and that it should not work under the influence of manager / supervisors , head of the department or senior faculty members . our study shows that a significant proportion of doctors ( 60% ) and nurses ( 80% ) are in favor of reporting an incident to the head of the department , while , only 19% of doctors and 9% of nurses prefer reporting to the hospital administration . this preference may be because department heads are more accessible , offer a certain level of confidentiality and feedback may be pursued easily . our research confirms the previous finding that , in the presence of written protocols and guidelines , an incident is more likely to be reported . this finding may provide an initiative to introduce protocols and guidelines in writing , as these are less likely to be violated and violations are more likely to be reported . the willingness of health professionals to report incidents in order to improve patient safety indicates that fertile grounds are available for development of an incident reporting system in pakistan . the core and theme of any incident / error reporting system is to learn from mistakes . this fact however , is not well acknowledged by health professionals in pakistan . more work is needed to raise the awareness among health professionals pertaining to incident reporting . furthermore , any system of incident reporting that might be implemented in the future would need to consider providing : a supportive working environment ; prompt feedback ; and immunity from penalties ( administrative and financial ) .","<S> background : a limited framework of incident reporting exists in most of the health care system in pakistan . </S> <S> this poses a risk to the patient population and therefore there is a need to find the causes behind the lack of such a system in healthcare settings in pakistan.aims:to determine the attitudes and perceived barriers towards incident reporting among tertiary care health professionals in pakistanmaterials and methods : the study was done in shifa international hospitals and consisted of a questionnaire given to 217 randomly selected doctors and nurses . </S> <S> mean sd of continuous variables and frequency ( percentage % ) of categorical variables are presented . </S> <S> chi square statistical analysis was used to test the significance of association among doctors and nurses with various outcome variables ( motivators to report , perceived barriers , preferred person to report and patient 's outcome that influence reporting behaviors ) . </S> <S> p value of < 0.05 was considered significant . </S> <S> student doctors and student nurses were not included in the study.results:unlike consultant , registrars , medical officers and nurses ( more than 95% are willing to report ) , only 20% of house officers will report the incident happened through them . </S> <S> sixty nine percent of doctors and 67% of nurses perceive </S> <S> administration sanction as a common barrier to incident reporting </S> <S> . sixty percent of doctors and 80% of nurses would prefer reporting to the head of the department.conclusions:by giving immunity from administrative sanction , providing prompt feedback and assurance that the incident reporting will be used to make changes in the system , there is considerable willingness of doctors and nurses to take time out of their busy schedules to submit reports . </S>"


The metric is an instance of [`datasets.Metric`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric):

In [None]:
metric

Metric(name: "rouge", features: {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}, usage: """
Calculates average rouge scores for a list of hypotheses and references
Args:
    predictions: list of predictions to score. Each predictions
        should be a string with tokens separated by spaces.
    references: list of reference for each prediction. Each
        reference should be a string with tokens separated by spaces.
    rouge_types: A list of rouge types to calculate.
        Valid names:
        `"rouge{n}"` (e.g. `"rouge1"`, `"rouge2"`) where: {n} is the n-gram based scoring,
        `"rougeL"`: Longest common subsequence based scoring.
        `"rougeLSum"`: rougeLsum splits text using `"
"`.
        See details in https://github.com/huggingface/datasets/issues/617
    use_stemmer: Bool indicating whether Porter stemmer should be used to strip word suffixes.
    use_agregator: Return aggregates if this is set to True
Retu

You can call its `compute` method with your predictions and labels, which need to be list of decoded strings:

In [None]:
fake_preds = ["hello there", "general kenobi"]
fake_labels = ["hello there", "general kenobi"]
metric.compute(predictions=fake_preds, references=fake_labels)

{'rouge1': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rouge2': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeL': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeLsum': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0))}

## Preprocessing the data

Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 `Transformers` `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that the model requires.

To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

That vocabulary will be cached, so it's not downloaded again the next time we run the cell.

In [None]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

By default, the call above will use one of the fast tokenizers (backed by Rust) from the 🤗 `Tokenizers` library.

You can directly call this tokenizer on one sentence or a pair of sentences:

In [None]:
tokenizer("Hello, this one sentence!")

{'input_ids': [0, 31414, 6, 42, 65, 3645, 328, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}

Depending on the model you selected, you will see different keys in the dictionary returned by the cell above. They don't matter much for what we're doing here (just know they are required by the model we will instantiate later), you can learn more about them in [this tutorial](https://huggingface.co/transformers/preprocessing.html) if you're interested.

Instead of one sentence, we can pass along a list of sentences:

In [None]:
tokenizer(["Hello, this one sentence!", "This is another sentence."])

{'input_ids': [[0, 31414, 6, 42, 65, 3645, 328, 2], [0, 713, 16, 277, 3645, 4, 2]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]]}

To prepare the targets for our model, we need to tokenize them inside the `as_target_tokenizer` context manager. This will make sure the tokenizer uses the special tokens corresponding to the targets:

In [None]:
with tokenizer.as_target_tokenizer():
    print(tokenizer(["Hello, this one sentence!", "This is another sentence."]))

{'input_ids': [[0, 31414, 6, 42, 65, 3645, 328, 2], [0, 713, 16, 277, 3645, 4, 2]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]]}


If you are using one of the five T5 checkpoints we have to prefix the inputs with "summarize:" (the model can also translate and it needs the prefix to know which task it has to perform).

In [None]:
if model_checkpoint in ["t5-small", "t5-base", "t5-larg", "t5-3b", "t5-11b"]:
    prefix = "summarize: "
else:
    prefix = ""

We can then write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model. The padding will be dealt with later on (in a data collator) so we pad examples to the longest length in the batch and not the whole dataset.

The max input length of `facebook/bart-base` is 1024, so `max_input_length = 1024`.

In [None]:
max_input_length = 1024
max_target_length = 256

def preprocess_function(examples):
    inputs = [prefix + doc for doc in examples["article"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["abstract"], max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists for each key:

In [None]:
preprocess_function(raw_datasets['train'][:2])

{'input_ids': [[0, 405, 11493, 11, 55, 87, 654, 207, 9, 1484, 8, 189, 1338, 1814, 207, 11, 1402, 3505, 9, 16640, 2156, 941, 11, 1484, 11793, 17930, 8, 73, 368, 13785, 5804, 4, 134, 41, 23249, 16, 6533, 25, 41, 15650, 17215, 672, 9, 23385, 43202, 36, 1368, 428, 4839, 36, 1368, 428, 28696, 316, 821, 1589, 385, 462, 4839, 8, 189, 16072, 25, 10, 898, 9, 5, 7482, 2199, 2156, 13162, 2156, 2129, 10894, 2156, 17930, 2156, 50, 13785, 5804, 479, 6104, 3218, 3608, 14, 7967, 8, 18327, 139, 111, 2174, 797, 71, 13785, 5804, 2156, 941, 11, 471, 8, 5397, 16640, 2156, 189, 28, 13969, 30, 41, 23249, 4, 1978, 41, 23249, 747, 41089, 1290, 5298, 215, 25, 16069, 2156, 8269, 2156, 8, 25599, 642, 22423, 2156, 8, 4634, 189, 33, 10, 2430, 1683, 15, 1318, 9, 301, 36, 2231, 1168, 4839, 8, 819, 2194, 11, 1484, 19, 1668, 479, 4634, 2156, 7, 1477, 2166, 13838, 2156, 2231, 1168, 2156, 8, 17618, 32444, 11, 1484, 19, 1668, 2156, 24, 74, 28, 5701, 7, 185, 10, 16300, 1548, 11, 9397, 9883, 54, 240, 1416, 13, 1668, 111, 30

To apply this function on all the pairs of sentences in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command.

In [None]:
tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)

  0%|          | 0/8 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Even better, the results are automatically cached by the 🤗 `Datasets` library to avoid spending time on this step the next time you run your notebook. The 🤗 `Datasets` library is normally smart enough to detect when the function you pass to map has changed (and thus requires to not use the cache data). For instance, it will properly detect if you change the task in the first cell and rerun the notebook. 🤗 `Datasets` warns you when it uses cached files, you can pass `load_from_cache_file=False` in the call to `map` to not use the cached files and force the preprocessing to be applied again.

Note that we passed `batched=True` to encode the texts by batches together. This is to leverage the full benefit of the fast tokenizer we loaded earlier, which will use multi-threading to treat the texts in a batch concurrently.

## Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it. Since our task is of the sequence-to-sequence kind, we use the `AutoModelForSeq2SeqLM` class. Like with the tokenizer, the `from_pretrained` method will download and cache the model for us.

In [None]:
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/532M [00:00<?, ?B/s]

Note that  we don't get a warning like in our classification example. This means we used all the weights of the pretrained model and there is no randomly initialized head in this case.

To instantiate a `Seq2SeqTrainer`, we will need to define three more things. The most important is the [`Seq2SeqTrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.Seq2SeqTrainingArguments), which is a class that contains all the attributes to customize the training. It requires one folder name, which will be used to save the checkpoints of the model, and all other arguments are optional:

In [None]:
batch_size = 2
model_name = model_checkpoint.split("/")[-1]
args = Seq2SeqTrainingArguments(
    f"{model_name}-finetuned-pubmed",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=5,
    predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
    seed = 42,
    generation_max_length
)

Here we set the evaluation to be done at the end of each epoch, tweak the learning rate, use the `batch_size` defined at the top of the cell and customize the weight decay. Since the `Seq2SeqTrainer` will save the model regularly and our dataset is quite large, we tell it to make three saves maximum. Lastly, we use the `predict_with_generate` option (to properly generate summaries) and activate mixed precision training (to go a bit faster).

The last argument to setup everything so we can push the model to the [Hub](https://huggingface.co/models) regularly during training. Remove it if you didn't follow the installation steps at the top of the notebook. If you want to save your model locally in a name that is different than the name of the repository it will be pushed, or if you want to push your model under an organization and not your name space, use the `hub_model_id` argument to set the repo name (it needs to be the full name, including your namespace: for instance `"sgugger/t5-finetuned-xsum"` or `"huggingface/t5-finetuned-xsum"`).

Then, we need a special kind of data collator, which will not only pad the inputs to the maximum length in the batch, but also the labels:

In [None]:
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

The last thing to define for our `Seq2SeqTrainer` is how to compute the metrics from the predictions. We need to define a function for this, which will just use the `metric` we loaded earlier, and we have to do a bit of pre-processing to decode the predictions into texts:

In [None]:
import nltk
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    
    # Rouge expects a newline after each sentence
    decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
    
    result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    # Extract a few results
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    
    # Add mean generated length
    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
    result["gen_len"] = np.mean(prediction_lens)
    
    return {k: round(v, 4) for k, v in result.items()}

Then we just need to pass all of this along with our datasets to the `Seq2SeqTrainer`:

In [None]:
trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Cloning https://huggingface.co/Kevincp560/bart-base-finetuned-pubmed into local empty directory.
Using amp half precision backend


We can now finetune our model by just calling the `train` method:

In [None]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `BartForConditionalGeneration.forward` and have been ignored: article, abstract.
***** Running training *****
  Num examples = 8000
  Num Epochs = 5
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 20000


Epoch,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum,Gen Len
1,2.3706,2.124483,9.1644,3.8264,8.2223,8.718,20.0
2,2.2246,2.081059,9.023,3.7716,8.1453,8.5998,20.0
3,2.1034,2.046894,9.4412,4.0783,8.4949,8.9977,20.0
4,2.0137,2.038968,9.2261,3.9307,8.3154,8.7937,20.0
5,1.9288,2.027748,9.3963,4.0473,8.4526,8.9659,20.0


Saving model checkpoint to bart-base-finetuned-pubmed/checkpoint-500
Configuration saved in bart-base-finetuned-pubmed/checkpoint-500/config.json
Model weights saved in bart-base-finetuned-pubmed/checkpoint-500/pytorch_model.bin
tokenizer config file saved in bart-base-finetuned-pubmed/checkpoint-500/tokenizer_config.json
Special tokens file saved in bart-base-finetuned-pubmed/checkpoint-500/special_tokens_map.json
tokenizer config file saved in bart-base-finetuned-pubmed/tokenizer_config.json
Special tokens file saved in bart-base-finetuned-pubmed/special_tokens_map.json
Saving model checkpoint to bart-base-finetuned-pubmed/checkpoint-1000
Configuration saved in bart-base-finetuned-pubmed/checkpoint-1000/config.json
Model weights saved in bart-base-finetuned-pubmed/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in bart-base-finetuned-pubmed/checkpoint-1000/tokenizer_config.json
Special tokens file saved in bart-base-finetuned-pubmed/checkpoint-1000/special_tokens_map.js

TrainOutput(global_step=20000, training_loss=2.1619785064697266, metrics={'train_runtime': 7576.1735, 'train_samples_per_second': 5.28, 'train_steps_per_second': 2.64, 'total_flos': 2.4342060847104e+16, 'train_loss': 2.1619785064697266, 'epoch': 5.0})

You can now upload the result of the training to the Hub, just execute this instruction:

In [None]:
trainer.push_to_hub()

Saving model checkpoint to bart-base-finetuned-pubmed
Configuration saved in bart-base-finetuned-pubmed/config.json
Model weights saved in bart-base-finetuned-pubmed/pytorch_model.bin
tokenizer config file saved in bart-base-finetuned-pubmed/tokenizer_config.json
Special tokens file saved in bart-base-finetuned-pubmed/special_tokens_map.json
Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 3.37k/532M [00:00<?, ?B/s]

Upload file runs/Mar01_09-51-56_57778c9a3039/events.out.tfevents.1646128346.57778c9a3039.84.0:  25%|##5       …

To https://huggingface.co/Kevincp560/bart-base-finetuned-pubmed
   d61c74a..1f2489b  main -> main

To https://huggingface.co/Kevincp560/bart-base-finetuned-pubmed
   1f2489b..782b3e0  main -> main



'https://huggingface.co/Kevincp560/bart-base-finetuned-pubmed/commit/1f2489bdf874634c883d7ef24ff37f278f47698f'

You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `"your-username/the-name-you-picked"` so for instance:

```python
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("sgugger/my-awesome-model")
```