If you're opening this Notebook on colab, you will probably need to install 🤗 `Transformers` and 🤗 `Datasets` as well as other dependencies. 

* `datasets`
* `transformers`
* `rogue-score`
* `nltk`
* `pytorch`
* `ipywidgets`

*Note*: Since we are using the GPU to optimize the performance of the deep learning algorithms, `CUDA` needs to be installed on the device.

In [1]:
! pip install datasets transformers rouge-score nltk ipywidgets

Collecting datasets
  Downloading datasets-1.18.3-py3-none-any.whl (311 kB)
[K     |████████████████████████████████| 311 kB 7.6 MB/s 
[?25hCollecting transformers
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 60.5 MB/s 
[?25hCollecting rouge-score
  Downloading rouge_score-0.0.4-py2.py3-none-any.whl (22 kB)
Collecting aiohttp
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 49.4 MB/s 
Collecting xxhash
  Downloading xxhash-3.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |████████████████████████████████| 212 kB 73.6 MB/s 
Collecting fsspec[http]>=2021.05.0
  Downloading fsspec-2022.2.0-py3-none-any.whl (134 kB)
[K     |████████████████████████████████| 134 kB 73.2 MB/s 
Collecting huggingface-hub<1.0.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-non

When using `nltk`, `punkt` also needs to be installed. I guess it is not installed automatically. Not having `punkt` will result in an error during the analysis.

In [2]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

If you're opening this notebook locally, make sure your environment has an install from the last version of those libraries.

To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:

In [3]:
from huggingface_hub import notebook_login

notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


Then you need to install `Git-LFS`.

If you are not using `Google Colab`, you may need to install `Git-LFS` manually, since the code below may not work and depending on your operating system. You can read about `Git-LFS` and how to install it [here](https://git-lfs.github.com/).

In [4]:
! apt install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-470
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 1s (1,655 kB/s)
Selecting previously unselected package git-lfs.
(Reading database ... 155320 files and directories currently installed.)
Preparing to unpack .../git-lfs_2.3.4-1_amd64.deb ...
Unpacking git-lfs (2.3.4-1) ...
Setting up git-lfs (2.3.4-1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...


Make sure your version of `Transformers` is at least 4.11.0 since the functionality was introduced in that version:

In [5]:
import transformers

print(transformers.__version__)

4.17.0


You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/seq2seq).

# Fine-tuning a model on a summarization task

In this notebook, we will see how to fine-tune one of the [🤗`Transformers`](https://github.com/huggingface/transformers) model for a summarization task. We will use the [PubMed Summarization dataset](https://huggingface.co/datasets/ccdv/pubmed-summarization) which contains PubMed articles accompanied with abstracts.

![Widget inference on a summarization task](https://github.com/huggingface/notebooks/blob/master/examples/images/summarization.png?raw=1)

We will see how to easily load the dataset for this task using 🤗 `Datasets` and how to fine-tune a model on it using the `Trainer` API.

In [6]:
model_checkpoint = "sshleifer/distilbart-cnn-12-3"

This notebook is built to run  with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a sequence-to-sequence version in the Transformers library. Here we picked the [`sshleifer/distilbart-cnn-12-3`](https://huggingface.co/sshleifer/distilbart-cnn-12-3) checkpoint. 

## Loading the dataset

We will use the [🤗 `Datasets`](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.  

In [7]:
from datasets import load_dataset, load_metric

raw_datasets = load_dataset("ccdv/pubmed-summarization")
metric = load_metric("rouge")

Downloading:   0%|          | 0.00/4.88k [00:00<?, ?B/s]

No config specified, defaulting to: pub_med_summarization_dataset/document


Downloading and preparing dataset pub_med_summarization_dataset/document to /root/.cache/huggingface/datasets/ccdv___pub_med_summarization_dataset/document/1.0.0/5792402f4d618f2f4e81ee177769870f365599daa729652338bac579552fec30...


Downloading:   0%|          | 0.00/779M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.7M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.8M [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset pub_med_summarization_dataset downloaded and prepared to /root/.cache/huggingface/datasets/ccdv___pub_med_summarization_dataset/document/1.0.0/5792402f4d618f2f4e81ee177769870f365599daa729652338bac579552fec30. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

The `dataset` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set:

In [8]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['article', 'abstract'],
        num_rows: 119924
    })
    validation: Dataset({
        features: ['article', 'abstract'],
        num_rows: 6633
    })
    test: Dataset({
        features: ['article', 'abstract'],
        num_rows: 6658
    })
})

To access an actual element, you need to select a split first, then give an index:

In [9]:
raw_datasets["train"][0]

{'abstract': "<S> background : the present study was carried out to assess the effects of community nutrition intervention based on advocacy approach on malnutrition status among school - aged children in shiraz , iran.materials and methods : this case - control nutritional intervention has been done between 2008 and 2009 on 2897 primary and secondary school boys and girls ( 7 - 13 years old ) based on advocacy approach in shiraz , iran . </S> <S> the project provided nutritious snacks in public schools over a 2-year period along with advocacy oriented actions in order to implement and promote nutritional intervention . for evaluation of effectiveness of the intervention growth monitoring indices of pre- and post - intervention were statistically compared.results:the frequency of subjects with body mass index lower than 5% decreased significantly after intervention among girls ( p = 0.02 ) . </S> <S> however , there were no significant changes among boys or total population . </S> <S> 

Since the `pubmed` data is extremely large, we are going to remove rows so that we have a training set of 8,000, a validation set of 2,000, and a test set of 2,000. 

In [10]:
raw_datasets["train"] = raw_datasets["train"].select(range(1, 8001))
raw_datasets["validation"] = raw_datasets["validation"].select(range(1, 2001))
raw_datasets["test"] = raw_datasets["test"].select(range(1, 2001))

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [11]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=5):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [12]:
show_random_elements(raw_datasets["train"])

Unnamed: 0,article,abstract
0,"the temporal and spatial coupling between increased neuronal activity and cerebral blood flow ( cbf ) , known as functional hyperemia or neurovascular coupling , is a highly regulated phenomenon that ensures adequate supply of oxygen and glucose to the neurons at work during a given task . although intuitively appealing , a direct link between energy state and blood flow is not universally accepted , and the physiological basis of neurovascular coupling still remains uncertain . indeed , neither the lack of glucose or oxygen appears to fully justify the hemodynamic response ( powers et al . , 1996 ; wolf et al . 2010 ) , which may serve as a safety measure for substrate delivery during functional activation ( leithner et al . , 2010 ) . despite these limitations , the changes in hemodynamic signals ( bold response , cbf or cerebral blood volume , cbv ) are commonly used as surrogate markers to map changes in neural activity in brain imaging procedures , such as functional magnetic resonance ( fmri ) , positron emission tomography ( pet ) , or diffuse optical imaging ( doi ) under both physiological and pathological conditions . accordingly , an adequate interpretation of imaging data imperatively requires understanding of the cellular basis of the activated neurocircuitry and its interaction with astrocytic and vascular targets . in contrast to the innervation of large cerebral arteries by ganglia from the peripheral nervous system , which is mainly involved in autoregulation ( for a review , hamel , 2006 ) , the neuronal circuitry at play here refers to pathways of the central nervous system that interact with the brain microcirculation . as such , it is known that the changes in activity are triggered by the incoming afferents , but that it is their local processing by the targeted cells that drives the perfusion changes ( logothetis et al . , 2001 ; these hemodynamic changes are mainly , if not exclusively , achieved by the control of the vasculature at the arteriolar level ( hillman et al . , 2007 ) , from the pial surface down to the precapillary level where vascular pericytes stand ( jones , 1970 ) , the latter contractile elements being likely involved in a localized control of capillary tone ( peppiatt et al . , 2006 ) . additionally , it has been shown that the hemodynamic response correlates with synchronized synaptic activity , a highly energy consuming process ( arthurs et al . , 2000 ) , and that it is controlled by signaling molecules released during increased synaptic activity ( devor et al . , 2007 ) . recently , using the neurovascular coupling response to various sensory stimuli ( whisker , forepaw or hindpaw , visual ) , elegant experiments demonstrated that the spread of the hemodynamic activity accurately reflects the neural response ( berwick et al . , 2008 ) , that it is driven by synaptic activity generated by intracortical processing ( franceschini et al . , 2008 ) , and that the latter reflected the balance between excitatory and inhibitory signals ( devor et al . , 2005 , 2007 ; shmuel et al . , 2006 ) . particularly , suppressed neuronal activity or functional neuronal inhibition has been associated with decrease in blood oxygenation and perfusion , which could explain the negative bold signals ( shmuel et al . , 2002 , 2006 ) as it occurred concurrently with arteriolar constriction ( devor et al . , 2007 ) . as an attempt to identify the underlying neuronal circuitry , an interesting study by lu et al . ( 2004)showed that laminar bold and cbv responses to rat whisker stimulation spatially correlated with increased neuronal activity evaluated by c - fos upregulation . in an unrelated study in rats treated with the serotonin releasing drug m - chlorophenylpiperazine , positive bold signals and c - fos immunoreactivity correlated in areas of increased activity , but not in those that displayed diminished bold signals , presumably due to decreased neuronal signaling ( stark et al . , 2006 ) . recently , it was shown that stimulation of corticocortical and thalamocortical inputs to the same area of the somatosensory cortex induced completely distinct frequency - dependent changes in cbf and oxygen consumption , and evoked activity in different populations of cortical excitatory pyramidal cells and inhibitory gaba interneurons ( enager et al . , 2009 ) . stimulation intensity increases occurred with the silencing or recruitment of distinct inhibitory interneurons , indicating that neural network activation is both stimulus- and frequency - dependent . particularly , the hemodynamic responses were attributed to activation of cyclooxygenase-2 ( cox-2 ) pyramidal cells and somatostatin ( som)/nitric oxide synthase ( nos ) inhibitory interneurons , consistent with pharmacological studies that implicated cox-2- and nos - derived vasodilator messengers in these neurovascular pathways ( niwa et al . , 2000 ; moreover , increased activity in inhibitory interneurons has been associated with the initiation of the hemodynamic response triggered by synchronized cortical activity , as induced by activation of the basal forebrain ( niessing et al . , 2005 ) . stimulation of this basalocortical afferent input further indicated selective activation of cholinoceptive layers ii to vi som and neuropeptide y ( npy ) interneurons , as well as layer i gaba interneurons , with widespread activation of pyramidal cells , including those that contain cox-2 ( kocharyan et al . although it was demonstrated that the cbf response was triggered by cholinergic afferents , its full expression required gabaa - mediated transmission on neuronal , vascular and/or astrocytic targets ( kocharyan et al . , 2008 ) . together these anatomical , neurochemical and functional studies demonstrate the importance of identifying the cellular ensemble that underlies hemodynamic signals , highlighting that specific subsets of neurons are activated by a given stimulus , depending on the afferent input they receive and integrate . in addition to the difficulty in identifying the exact contribution of excitatory and inhibitory neurotransmissions in the evoked hemodynamic response , the effect of these neurotransmitter systems on perivascular astrocytes needs to be considered ( for a detailed review , carmignoto and gomez - gonzalo , 2009 ) . the enwrapping of synapses and blood vessels ( kacem et al . , 1998 ) by glial processes and , particularly , their intervening endfeet in multiple neurovascular appositions identified at the ultrastructural level led to the concept of a neuronal - astrocytic - vascular tripartite functional unit ( vaucher and hamel , 1995 ; cohen et al . , 1996 ; paspalas and papadopoulos , 1996 ; vaucher et al . , 2000 ) . the significance of these interactions in the regulation of cbf was first substantiated by the demonstration that astrocytes could synthesize vasodilatory messengers ( table 1 ) , particularly epoxyeicosatrienoic acids ( eets ) generated from p450 arachidonic acid epoxygenase activity ( alkayed et al . , 1996 ) that were involved in the cbf response to glutamate ( alkayed et al . , 1997 ; harder et al . , soon after came the first in vitro ( in cortical slices ) and in vivo demonstrations for a role of astrocytes , through metabotropic glutamate receptor ( mglurs)-induced ca transients , in microarteriolar dilation and the increase blood flow to forepaw stimulation , a response mediated by arachidonic acid products , possibly prostaglandin e2 ( pge2 ) ( zonta et al . , a role for astrocytes in neurovascular coupling has been reaffirmed in various paradigms ( carmignoto and gomez - gonzalo , 2009 ) . neuronal and glial metabolism by - products ( e.g. , co2 , h , adenosine ) are not included . no , nitric oxide ; vip , vasoactive intestinal polypeptide ; pge2 , prostaglandin e2 ; eets , epoxyeicosatrienoic acids ; nos , nitric oxide synthase ; npy , neuropeptide y ; cox , cyclooxygenase , nmda - r , n - methyl d aspartate receptors , iglurs ; ionotropic glutamate receptors , mglur , metabotropic glutamate receptors ; vsmc , vascular smooth muscle cell , sgc , soluble guanylate cyclase , vpac1 , vip / pacap receptor type 1 ; ep , prostaglandin e2 receptors ; gpcr , g protein coupled receptor . kca , ca+-activated k channels ; kir , inward rectifier k channels ; g , conductance . another pathway activated in astrocytic endfeet following ca increases is the large - conductance , calcium - sensitive potassium ( bk ) channels that induce k release ( table 1 ) , activation of smooth muscle kir channels and relaxation ( filosa et al . , 2006 ) . recently , it was shown that the extent of the ca increases in astrocytic endfeet determined the dilatory or contractile nature of the vascular response , both mediated by extracellular k ( girouard et al . , 2010 ) . this novel mechanism would reunify previous apparently contradictory findings , in brain slices or in retina , of dilation and constriction being induced by increased ca signaling in astrocytes , and explained by the levels of no ( mulligan and macvicar , 2004 ; metea and newman , 2006 ) , oxygen ( gordon et al . , 2008 ) , or the pre - existing tone of the vessels ( blanco et al . , 2008 ) . ( 2010 ) showed that neuronal activation in vitro ( electrical field stimulation ) similarly acted through serial activation of astrocytic bk and smooth muscle kir channels . considering that gaba ( nilsson et al . 2008 ) and peptides such as som that colocalize with gaba ( somogyi et al . , 1984 ; cauli et al . , 2000 ) in interneurons also increase ca signaling in astrocytes , ( straub et al . , 2006 ) , the latter could act as intermediaries to gaba in neurovascular coupling ( table 1 ) . whether or not the astrocytic / smooth muscle bk and kir channel activation cascade is involved still remains to be determined . similarly , it would be interesting to evaluate if the latter , or other contractile mechanisms of neuronal ( cauli et al . 2006 ) or vascular origins ( mulligan and macvicar , 2004 ) could explain the dilatory and constrictive phases seen in the central core of neuronal depolarization and surround region of hyperpolarization , respectively , after somatosensory stimulation ( devor et al . , 2007 ) . to date the cellular and molecular mechanisms or even the functional significance of this response remains unknown in addition to the astrocyte - derived vasodilatory messengers pge2 ( zonta et al . , 2003 ) , eets ( alkayed et al . , 1996 ) , or k ions ( filosa et al . , 2006 ) , iii pyramidal cells ( yamagata et al . , 1993 ; breder et al . , 1995 ) , nitric oxide ( no ) ( gotoh et al . , 2001 ) whose synthetic enzyme is expressed by discrete subpopulations of cortical gaba interneurones ( kubota et al . , 1994 ) , vip ( yaksh et al . , 1987 ) , acetylcholine ( scremin et al . , 1973 ) and corticotropin - releasing factor ( de michele et al . , 2005 ) synthesized by bipolar / bitufted gaba interneurones ( morrison et al . , 1984 ; chdotal et al . , 1994a ; cauli et al . , 1997 ; gallopin et al . , 2006 for instance the highly ca permeable nmda receptors , expressed by cortical neurons ( monyer et al . , 1994 ; cauli et al . , 2000 ) , promote the release of pge2 ( pepicelli et al . , astrocytes , some gaba interneurons produce substances with vasocontractile properties , namely npy ( abounader et al . , 1995 ; cauli et al . , 2004 ) and som ( long et al . , 1992 ; cauli et al . , cortical neurons producing these vasoactive peptides are intimately associated with blood vessels through neuronal - astrocytic - vascular appositions described above ( chdotal et al . , 1994b ; abounader and hamel , 1997 ; estrada and defelipe , 1998 ; vaucher et al . , 2000 ; wang et al . , 2005 ) , and their receptors are expressed by smooth muscle cells and astrocytes ( chalmers et al . , 1995 ; bao et al . , 1997 ; abounader et al . , 1999 ; , 2000 ; cauli et al . , 2004 ; straub et al . , 2006 ; cahoy et al . , this raises the intriguing question of whether or not astrocytes are intermediaries for neuron - derived vasoactive messengers or , alternatively , if the latter exert direct effects on the microcirculation . it is widely admitted that an increase in intracellular ca ( table 1 ) is a required early event for the production and/or release of vasoactive messengers from neurons ( lauritzen , 2005 ) and astrocytes ( straub and nelson , 2007 ) . examination of ca dynamics in these cell types could provide a clue to decipher their relative and temporal contribution to functional hyperemia . the general view is that rapid ca events reflect an entry following fast ( 1012 ms ) spiking response of neurons ( petersen et al . , 2003 ) and/or activation of ca permeable ionotropic receptors , whereas slower dynamics are mainly driven by activation of metabotropic receptors leading to the release of ca from intracellular stores ( perea and araque , 2005 ) . hence , cortical neurons , which express more frequently and abundantly ionotropic glutamate receptors ( monyer et al . , 1994 ; 2008 ) , are likely to be responsible for the majority of fast ca responses . in contrast , group i mglurs ubiquitously expressed by cortical neurons ( baude et al . , 1993 ; cauli et al . , 2000 ) and astrocytes ( porter and mccarthy , 1996 ) would be responsible for slower ca dynamics in both cell types . consistent with this , in somatosensory or visual cortex , evoked ca events in neurons are virtually locked with sensory stimulations and precede those in astrocytes by a few seconds ( stosiek et al . wang et al . , 2006 ; schummers et al . , 2008 ; murayama et al . , 2009 ) , although a small proportion ( 5% ) of astrocytes can exhibit ca responses as fast as neurons ( winship et al . , 2007 ) . calcium uncaging in astrocytic endfeet in vivo showed that arterioles start to dilate 500 ms after the onset of ca increase ( takano et al . , 2006 ) indicating that synthesis , release and effects of vasodilatory messengers must be achieved within this time window . since hemodynamic responses initiate 600 ms after the onset of sensory stimulations ( kleinfeld et al . , 1998 ; 2003 ) , it appears that only cell types exhibiting fast evoked ca events ( i.e. , less than 100 ms ) can account for the early phase of the hemodynamic response . therefore , vasoactive messengers produced by neurons and , possibly , also by astrocytes with fast ca events , could explain this response ( figure 1 ) . summary of the proposed regulation of cortical microvessels by pyramidal cells , gaba interneurons and astrocytes ( a ) , and how their respective effects can temporally regulate cbf changes ( b ) . ( a ) subcortical afferents from a variety of brain areas target distinct populations of neurons in the cerebral cortex . these activated neuronal networks can either directly act on local microvessels , which are endowed with receptors ( geometric forms on the vessel wall ) for most neurotransmitters / neuromediators , or indirectly via astrocytes that act as intermediaries to both pyramidal cells and interneurons . known direct vasoactive mediators released from pyramidal cells and interneurons correspond respectively to cox-2 derivatives like prostaglandin e2 ( pge2 ) and no and , possibly , gaba , whereas astrocytes act chiefly by releasing dilatory eets , an effect comparatively slow as opposed to that of no and pge2 ( or other neurally released vasoactive molecules or peptides ) . the possibility that sub - cortical afferents directly contact and act upon cortical astrocytes or microvessels also has to be taken into consideration . modified from figure 3 in hamel ( 2006 ) . ( b ) schematic representation of the relative and temporal contributions of selected vasoactive mediators produced by pyramidal cells ( pge2 ) , interneurons ( no ) and astrocytes ( eets ) , to the cbf response evoked by sensory stimulation ( see table 1 for a more complete list ) . brief stimulations ( 1 s ) are more likely to involve neurally - derived mediators whereas sustained stimulation ( 1 min ) are more susceptible to recruit astrocyte - derived messengers . no being transiently released its contribution to cbf response during sustained stimulation is minor and could account for its permissive role . in the cerebral cortex , none of the vasoactive messengers implicated in neurovascular coupling ( girouard and iadecola , 2006 ) , whether of neuronal or astroglial origin , can individually account for the hemodynamic response , as demonstrated by genetic invalidation ( ma et al . kitaura et al . , 2007 ) or synthesis inhibition ( lindauer et al . , 1999 ; peng et al . , 2002 ; hoffmeyer et al . , 2007 ; leithner et al . , 2010 ) . when individually summed the inhibition of these messengers largely exceeds the expected value of 100% ( iadecola , 2004 ) , which suggests that their kinetics of action , temporal and spatial recruitment must be carefully considered to elucidate their relative contributions . alternatively , this may suggest that the activated pathways do not obligatorily operate independently from each other , and that , under certain circumstances , some may act like modulator rather than mediator of the perfusion responses , as documented for no in the somatosensory cortex ( lindauer et al . , 1999 ) . no , one of the fastest diffusible ( wood and garthwaite , 1994 ) vasodilator produced by a subset npy - expressing interneurons ( dawson et al . , correspondingly , shibuki 's group showed that neuronal no can account for up to 50% of the cbf response evoked by a brief ( 1 s ) sensory stimulation ( kitaura et al . , 2007 ) . in contrast , others only found a permissive role for no when long ( 1 min ) stimulations were used ( lindauer et al . these differences likely reflect the fact that no release is transient ( buerk et al . , 2003 ) which can be explained by the no scavenging effect of hemoglobin and/or by the adaptation of no producing interneurons ( karagiannis et al . , 2009 ) . in contrast , vasodilatory prostanoids produced by cox-2 , which is chiefly expressed by pyramidal cells ( yamagata et al . , 1993 ; breder et al . , 1995 ) , account for 50% of the cbf response evoked by both sustained ( niwa et al . , 2000 ) and brief stimulations ( kitaura et al . , 2007 ) . consistent with an involvement of neuron - derived messengers in the early phase of neurovascular coupling , blockade of prostanoids and no synthesis almost completely abolished hemodynamic responses evoked by brief sensory stimulations ( kitaura et al . , 2007 ) . during long lasting sensory stimulation , blockade of eets synthesis ( peng et al . , 2002 ) or their receptors ( liu et al . , 2008 ) blocked about 50% of the blood flow response , demonstrating that eets produced by astrocytes ( alkayed et al . , 1996 ) similarly , the local release of k from astrocytic endfeet ( filosa et al . , 2006 ) accounts for up to 50% of the cbf increase evoked by long lasting sensory stimulations ( girouard et al . , 2010 ; leithner et al . , 2010 ) surprisingly , combined blockade of nos , coxs , p450 epoxygenase , bk channels and adenosine receptors ( leithner et al . , 2010 ) did not reach a total inhibition of the late phase of the hemodynamic response as expected from individual blockades ( see above ) . this suggests that either multiple inhibition was incomplete or that other long lasting vasodilatory messengers are involved ( cauli et al . , 2004 ) , possibly vip , which is contained within gaba interneurons targeted by thalamocortical afferents ( staiger et al . , 1996 ) and similarly vasodilatory messengers derived from the endothelium ( rosenblum , 1986 ) such as no , prostacyclin ( faraci and heistad , 1998 ) but also eets ( campbell and fleming , 2010 ) , might be recruited under certain circumstances as it was reported for endothelial no after muscarinic m5 receptor activation ( elhusseiny and hamel , 2000 ; yamada et al . , 2001 ) . current evidence suggests that neuronal and astroglial signals that transduce changes in neuronal activity into an integrated vascular response are highly dependent upon the neurotransmitter released by the incoming afferents , and strictly determined by the target neurons within the activated area . particularly , depending on the nature of the afferent input ( i ) different neuronal or astroglial messengers , likely acting in sequence , mediate the hemodynamic changes , ( ii ) some recruited neurons release messengers that can directly alter blood vessel tone , ( iii ) others act by modulating neuronal and astroglial activity , and ( iv ) astrocytes may act as intermediaries for both excitatory and inhibitory neurotransmitters ( figure 1 ) . probably due to the large diversity of cortical neurons ( ascoli et al . , 2008 ) , to our knowledge , no in vivo study has yet investigated ca dynamics in identified neurons producing vasoactive substances or the vascular effects of their stimulation . the growing development of transgenic mice expressing genetically encoded fluorescent reporters and/or optogenetic tools ( cardin et al . 2009 ) in discrete subsets of cortical neurons ( heintz , 2001 ) together with the emergence of ultrafast multispectral imaging systems ( bouchard et al . , 2009 ) that allow simultaneous monitoring of ca events and hemodynamics should help evaluate the contribution of specific neuronal types in neurovascular coupling . this should provide decisive conclusions on the temporal , spatial and extent of the neurally - driven hemodynamic alterations and how the latter can be interpreted in the context of brain imaging of normal or pathological physiology . indeed , since astrocytes appear as intermediary effectors in conveying signals for sustained hemodynamic responses , their alteration primarily expressed by a state of chronic activation in several chronic diseases of the central nervous system such as alzheimer 's disease or epilepsy , has to be seriously considered . altered perfusion signals detected by fmri , pet or doi may represent astroglial dysfunction and not necessary impaired neuronal activity . extending such thinking to the microcirculation itself , the functional endpoint in the intricate cascade of neuronal - astrocytic - vascular events evoked by increased brain activity , any diseases of the blood vessels themselves or alterations in their physical capacity to dilate or constrict , as seen in pathologies such as hypertension , diabetes , hypercholesterolemia and , even alzheimer 's disease ( iadecola , 2004 ; zlokovic , 2008 ) would hinder the correct vascular response to totally normal neuronal activities . this further highlights that extreme caution should be applied to perfusion signals when making direct inference to altered neural activity ( schleim and roiser , 2009 ; ekstrom , 2010 ) . hence , a careful understanding of the neuronal circuitry at work will need to be interpreted in the context of a healthy or sick brain taking neuroinflammation and vascular diseases as possible confound factors . the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest .","<S> in this article , we will review molecular , anatomical , physiological and pharmacological data in an attempt to better understand how excitatory and inhibitory neurons recruited by distinct afferent inputs to the cerebral cortex contribute to the coupled hemodynamic response , and how astrocytes can act as intermediaries to these neuronal populations . we aim at providing the pros and cons to the following statements that , depending on the nature of the afferent input to the neocortex , ( i ) different neuronal or astroglial messengers , likely acting in sequence , mediate the hemodynamic changes , ( ii ) some recruited neurons release messengers that directly alter blood vessel tone , ( iii ) others act by modulating neuronal and astroglial activity , and ( iv ) astrocytes act as intermediaries for both excitatory and inhibitory neurotransmitters . </S> <S> we will stress that a given afferent signal activates a precise neuronal circuitry that determines the mediators of the hemodynamic response as well as the level of interaction with surrounding astrocytes . </S>"
1,"although most of these measurements were based on manual inspection and intervention , with the advent of fluorescence microscopy , many studies also involved quantitative imaging of living cells either using video or ccd cameras ( inoue , 1981 ; allen and allen , 1983 ) . in the early years of live - cell microscopy , methods for segmentation and tracking of cells ( berg and brown , 1972 ; berns and berns , 1982 ) were rapidly developed and adapted from other areas . nowadays , techniques for fully automated analysis and time space visualization of time series from living cells involve either segmentation and tracking of individual structures , or continuous motion estimation ( for an overview , see fig . 1 ) \n \n . for tracking a large number of small particles that move individually and independently from each other , single particle tracking approaches are most appropriate ( qian et al . , 1991 ) . once the images have been acquired by microscopy and preprocessed to improve the signal - to - noise ratio , they can be directly visualized by methods like volume rendering . for multiple objects in motion , single particle tracking , in which a particle is tracked over different time - steps , is the most direct method used . surface rendering is obtained after segmentation of contours in each individual section and gives rise to volumetric measurements such as volume and surface area . measurements of concentration changes for segmented areas in frap or fluorescence loss in photobleaching experiments give rise to estimates of kinetic parameters such as diffusion and binding coefficients . the estimation of flow of gray values is an approach to quantify mobility in continuous space . all these processes lead to accurate estimates of quantitative parameters . for the determination of more complex movement , two independent approaches were initially developed , but recently have been merged . optical flow ( mitiche and bouthemy , 1996 ) methods image registration ( terzopoulos et al . , 1991 ; lavallee and szeliski , 1995 ) aims at identifying and allocating certain objects in the real world as they appear in an internal computer model . the main application of image registration in cell biology is the automated correction of rotational and translational movements over time ( rigid transformation ) . this allows the identification of local dynamics , in particular when the movement is a result of the superposition of two or more independent dynamics . registration also helps to identify global movements when local changes are artifacts and should be neglected . the basic principle of single particle tracking is to find for each object in a given time frame its corresponding object in the next frame . the correspondence is based on object features , nearest neighbor information , or other inter - object relationships . object features can be dynamic criteria such as displacement and acceleration of an object as well as area / volume or mean gray value of the object . optical flow has been defined as the motion flow ( i.e. , the motion vector field ) that is derived from two consecutive images in a time series ( jhne , 2002 ) . however , due to high levels of noise , this assumption is usually distorted , and standard region - based matching techniques give unsatisfactory results ( anandan , 1989 ) . a more reliable tracking approach involves fuzzy logic - based analysis of the tracking parameters ( tvarusko et al . , 1999 ) . ( apprehend and allocate ) certain objects in the real world as they appear in an internal computer model . initially , only rigid transformations were used to superimpose the images , whereas nowadays , research is focused on the integration of local deformations . a parametric image registration algorithm specifies the parameters of a transformation in a way that physically corresponding points at two consecutive time steps are brought together as close as possible . such algorithms have been broadly studied in medical imaging and cell biology ( maintz and viergever , 1998 ; bornfleth et al . , 1999 ) . although one class of algorithms operates on previously extracted surface points ( lavallee and szeliski , 1995 ) , other algorithms register the images directly based on the gray - value changes . nonrigid deformations , i.e. , transformations others than rotation and translation , present an active body of research in computer vision . nonrigid approaches differ with respect to the underlying motion model ( terzopoulos et al . , 1991 ; szeliski , 1996 ) . most commonly , a cost or error function is defined and an optimization method is chosen that iteratively adjusts the parameters until an optimum has been achieved . other approaches extract specific features ( e.g. , correspondence between points ) that serve as a basis for directly calculating the model parameters ( arun et al . , 1987 ; rohr , 1997 ) . computer vision is a discipline that focuses on information extraction from the output of optical sensors , and on the representation of this information in an internal computer model ( faugeras , 1993 ) . a computer vision framework for detecting and tracking diffraction images of linear structures in differential interference contrast microscopy was developed for measuring deflections of clamped microtubules with a freely moving second end ( danuser et al . , 2000 ) . based on measurements of thermal fluctuations further , prior knowledge based on geometric and dynamic models of the scene can lead to restoration of information beyond the resolution limit of an imaging system ( danuser , 2001 ) . this super - resolution concept was illustrated by the stereo reconstruction of a micropipette moving in close proximity to a stationary target object . complex dynamic processes in cells should ideally be studied in three spatial dimensions over time . thereby , large and complex data sets typically consisting of 5,00010,000 single images are generated . such data are virtually impossible to interpret without computational tools for visual inspection in space and time . typically , 3-d images have been represented as stereoscopic pairs or as anaglyphs by pixel shift method ( white , 1995 ) . displaying time series as movies is still a widely used method for visual interpretation . for fast - moving objects such as trafficking vesicles imaged with high time resolution , time - lapse movies are very informative . however , for much slower nuclear processes or for processes with mixed kinetics that need to be observed over a longer period of time , the total number of time points for imaging are limited due to the photo toxicity of the light exposure during in vivo observation ( konig et al . , 1996 ) . therefore , an interpolation between consecutive time steps is required to reconstruct intermediate time steps . as a side effect , additional information about the continuous development of the observed processes between the imaged time steps ( subpixel resolution in time ) is achieved , and quantitative information can be derived ( see next section ) . although early studies explored 4-d data sets by simply browsing through an image gallery and highlighting interactively selected structures ( thomas et al . , 1996 ) , two commonly used rendering algorithms for displaying 3-d structures are volume rendering and surface rendering ( chen et al . , 1995 ; fig . 1 ) . volume rendering is a technique for visualizing complex 3-d data sets without explicit definition of surface geometry . the classification step assigns a level of opacity , contrast , and color to each voxel in the 3-d volume ( e.g. , wright et al . , then , shading techniques are used to simulate both the object surface characteristics and the position and orientation of surfaces with respect to light sources and the observer . the colored , semitransparent volume a ray is cast into the volume through each grid point on the projection plane . as the ray progresses through the volume , it computes the color and opacity at evenly spaced sample locations , and finally yields a single pixel color . although volume rendering techniques provide a satisfactory display of biological structures , this method is limited to pure visualization and does not deliver quantitative information . in addition , the high anisotropy typical for live - cell imaging with low z - resolution limits the quality of this visualization technique . these limitations are overcome by surface - rendering techniques , where the object surface is represented by polygons . the polygonal surface is displayed by projecting all the polygons onto a plane that is perpendicular to a selected viewing direction . the most commonly used method to triangulate the 3-d surface is the marching cube algorithm ( cline et al . , 1988 ) . the 3-d structure is defined by a threshold value throughout the data set , constructing an isosurface . the drawback of this method is that the surface of many biological structures can not be defined using a single intensity value , resulting in loss of relevant information . a great advantage of the combination of segmentation and surface reconstruction is the immediate access to quantitative information that corresponds to visual data ( eils et al . , 1996 ; these approaches were designed to deal particularly with the high degree of anisotropy typical for 4-d live - cell recordings and to directly estimate quantitative parameters , e.g. , the gray values in the segmented area of corresponding images can be measured to determine the amount and concentration of fluorescently labeled proteins in the segmented cellular compartments . measuring concentration changes by frap and fluorescence loss in photobleaching have become standard methods to evaluate diffusion , binding , and trafficking in live cells ( for review see phair and misteli , 2001 ) . these methods give direct access to kinetic parameters such as the diffusion coefficient of molecules ( axelrod et al . 2000 ) or exchange rates of molecules between different compartments ( hirschberg et al . , 1998 ; phair and misteli , 2001 ) . in combination with motion estimation techniques , parameters such as the velocity of the mass center for individual objects or for each point on the object surface can be readily accessed . further , local parameters such as acceleration , tension , or bending ( bookstein , 1989 ) can be estimated . during motion estimation , global quantities are estimated such as the parameters of rotation and translation ( germain et al . , 1999 ) . the evolution of these eigenvalues can be used to characterize and analyze the observed motion . statistical analysis of velocity histograms can be applied to compute peak velocities corresponding to the most frequently occurring velocity values ( uttenweiler et al . , 2000 ) . an alternative technique for statistical analysis is the confinement tree analysis of the intensity image ( mattes et al . , 2001 ) . for different threshold levels , objects ( confiners ) are segmented in the image . calculated for different levels , besides the estimation of global quantitative values ( e.g. , the global homogeneity of the motion ) , this approach allows the analysis and comparison of movements . a challenge for future work is to better understand the biomechanical behavior of cellular structures , e.g. , cellular membranes , by fitting a biophysical model to the data an approach already successfully implemented in various fields of medical image analysis ( ferrant et al . , 2001 ) . in vivo images of gfp - tagged proteins combined with computational imaging has revealed the dynamic organization of various nuclear subcompartments in the interphase nucleus . live - cell microscopy images of labeled pre - mrna splicing factors were examined for evidence of regulated dynamics by computational segmentation and tracking ( eils et al . , 2000 ) . it was shown that the velocity and morphology of speckles , as well as budding events , were related to transcriptional activity . 1997 ) undergoes slow diffusional motion , and that this movement is confined to relatively small regions in the nucleus . importantly , the constraint on diffusional motion is regulated throughout the cell cycle ( heun et al . . a long - standing question has been whether nuclear compartments can also undergo directed , energy - dependent movements , thereby providing a potential mechanism of regulated gene expression . computational imaging revealed that several nuclear subcompartments do undergo directional transport dependent on metabolic energy ( calapez et al . , 2002 ; muratani et al . , 2002 ; platani et al . , 2002 ) . the role of dynamic tension in actin polymerization in motile cells was investigated by analyzing polarized light images of the flow of the actin network and the motion of actin bundles and filopodia in crawling neurons ( oldenbourg et al . , 2000 ) . in a study on nuclear envelope breakdown , quantification and visualization of four - channel images with labeled chromatin , lamin - b receptor , nucleoporin , and tubulin ( beaudouin et al . , 2002 ) revealed that piercing of the nuclear envelope by spindle microtubules was the mechanism responsible for forming the initial hole during nuclear envelope breakdown . to further investigate stresses on the nuclear envelope during breakdown stresses detected during hole formation were compared for the different grid vertices with respect to the positions of the hole , thus providing information about localized stresses during the tearing process ( mattes et al . , 2001 ) . conversely , the effect of stress on the morphology of cells has been measured using an experimental approach of imposing known stresses on cells in solid - state culture . changes in height , width , volume , and surface area of the cell are measured from 3-d confocal microscopy images , helping to understand the mechano - transduction response ( guilak , 1995 ) . the positioning of chromosomes during the cell cycle was investigated in live mammalian cells with a combined experimental and computational approach . in contrast to the random behavior predicted by a computer model of chromosome dynamics , a striking order of chromosomes was observed throughout mitosis ( gerlich et al . , 2003 ) . further , strong similarities between daughter and mother cells were found for mitotic single chromosome positioning . these results support the existence of an active mechanism that transmits chromosomal positions from one cell generation to the next . computational imaging has been proven to be a powerful and integral part of cell biology . computational imaging provides an important building block for the description of biological phenomena on a quantitative level , which is a prerequisite for mathematical models of dynamic structures and processes in the cell . in combination with models of biochemical processes and regulatory networks , computational imaging as part of the emerging field of systems biology ( kitano , 2002 ) will lead to the identification of novel principles of cellular regulation derived from the huge amount of experimental data that are currently generated .","<S> microscopy of cells has changed dramatically since its early days in the mid - seventeenth century . </S> <S> image analysis has concurrently evolved from measurements of hand drawings and still photographs to computational methods that ( semi- ) automatically quantify objects , distances , concentrations , and velocities of cells and subcellular structures . </S> <S> today 's imaging technologies generate a wealth of data that requires visualization and multi - dimensional and quantitative image analysis as prerequisites to turning qualitative data into quantitative values . </S> <S> such quantitative data provide the basis for mathematical modeling of protein kinetics and biochemical signaling networks that , in turn , open the way toward a quantitative view of cell biology . here , we will review technologies for analyzing and reconstructing dynamic structures and processes in the living cell . </S> <S> we will present live - cell studies that would have been impossible without computational imaging . </S> <S> these applications illustrate the potential of computational imaging to enhance our knowledge of the dynamics of cellular structures and processes . </S>"
2,"at least two overlapping signaling pathways are regulated by the family of secreted glycoproteins known as wnts . the highly conserved canonical wnt/-catenin signaling pathway is activated by the binding of wnt ligand to the receptors frizzled ( fzd ) and low - density lipoprotein receptor related protein 5/6 ( lrp5/6 ) , triggering a series of downstream events that culminate in the cytosolic accumulation and nuclear translocation of the multifunctional protein -catenin . interaction of -catenin with transcription factors of the tcf and lef family results in the regulation of certain target genes that mediate the ultimate effects of this pathway on cellular processes including cell fate , proliferation , and migration . there are also one or more non - canonical or -catenin independent wnt signaling pathways that are less well understand , and that act in a -catenin independent manner leading to changes to cytoskeletal dynamics , adhesion , and motility . interestingly , -catenin independent wnt signaling can antagonize wnt/-catenin signaling in development , regeneration , and cancer [ 4 , 5 ] , highlighting the complex interplay between downstream effectors of wnt signaling . wnt pathways have been intimately linked to cancer ever since the original realization that the mouse mammary oncogene int-1 is a homologue of the drosophila wingless ( wg ) gene , resulting in the portmanteau family designation of wnt . subsequent studies have implicated wnt signaling in almost every major disease and cancer model , reflecting the importance of major developmental pathways in the pathogenesis of adult disease processes [ 7 , 8 ] . most dramatically , almost all colorectal carcinomas harbor inactivating mutations in the gene for adenomatous polyposis coli ( apc ) , which forms a complex with axin and glycogen synthase kinase 3- ( gsk3b ) that normally phosphorylates -catenin to target the protein for proteasomal degradation . mutations or loss of apc in colorectal carcinoma therefore prevent degradation of -catenin and consequently lead to constitutive activation of the pathway . further studies using both cell - based models and transgenic animal models have validated the essential role of wnt dysregulation in the formation of colorectal cancer , establishing this disease paradigm as a primary model for studying the molecular mechanisms of wnt/-catenin signaling in oncogenesis [ 9 , 10 ] . since the initial demonstration that wnt signaling regulates the stability and translocation of -catenin , the immunohistochemical detection of nuclear -catenin in both laboratory models and in patient tumors has been widely employed as a surrogate for demonstrating activation of the wnt/-catenin pathway . in several cancer models including colorectal carcinoma , breast cancer , and esophageal carcinoma , the presence of nuclear -catenin in cancer tissue compared to normal tissue has implicated this signaling pathway in cancer biology . further studies have observed that the presence of nuclear -catenin can predict decreased survival in these cancers , solidifying the importance of this pathway in oncogenesis and in cancer progression . not surprisingly , wnt/-catenin signaling has also been implicated in a broad variety of noncancerous medical conditions . genetic polymorphisms in lrp5/6 that decrease wnt/-catenin signaling have been linked to altered bone density , metabolic syndrome , and to alzheimer s disease . in normal tissues and organs , without genetic polymorphisms or mutations , wnt/-catenin signaling is activated in every animal that displays regeneration , and -catenin signaling is also activated in traumatic brain injury , which does not display extensive regeneration . moreover , it is clear that attenuating -catenin signaling delays regeneration while augmenting -catenin signaling often enhances the rate of regeneration , as determined by analysis of tail fin regeneration in zebrafish , and liver regeneration in both mouse and zebrafish . given that regeneration employs progenitor cells it should come as no surprise that wnts regulate embryonic stem cells , though there is not a consensus on the precise roles . the increasing body of literature on wnt/-catenin signaling in disease has generated tremendous interest in the potential therapeutic targeting of this pathway . until recently , the only modulator of wnt/-catenin signaling approved by the us food and drug administration was lithium chloride , which prevents the degradation of -catenin by inhibiting its phosphorylation by gsk3b . more recent studies have identified small molecule activators as well as inhibitors of wnt/-catenin signaling that may eventually have therapeutic utility in patients [ 17 , 18 , 19 , 20 , 21 , 22 ] . in parallel with studies on wnt/-catenin signaling in cancer and other diseases , substantial progress has also been made in understanding how this pathway regulates developmental processes such as melanocyte differentiation . wnt/-catenin signaling is a major regulator of the pigmented cell lineage , playing a major role in determining the fate of neural crest cells and its derivative pigment cell lineages . wnt/-catenin signaling directly regulates the expression of microphthalmia transcription factor ( mitf ) , a major determinant of both melanocyte development and melanoma progression [ 2325 ] . wnt3a ligand is one of only three factors required to differentiate a pluripotent human embryonic stem cell into a functional melanocyte , further highlighting the critical role of this pathway in pigment cell biology . not surprisingly , the wnt/-catenin pathway has been implicated in the pathogenesis of both benign melanocytic nevi as well as in malignant melanoma . in the last two decades since the initial identification of activated wnt/-catenin signaling in the murine breast cancer model , the role of this pathway in promoting proliferation has fostered the prevailing view that wnt/-catenin signaling is uniformly the wnt/-catenin pathway would not fit the original definition of an oncogene as a gene or pathway that causes cancer when aberrantly activated , since the forced expression of a melanocyte - specific , nondegradable , constitutively active -catenin mutant in either transgenic or cre / lox systems is not enough to induce melanoma in mice . as our understanding of cancer has advanced , the term oncogenic has seemingly broadened to include any gene or pathway implicated in cancer progression , a benchmark that is more difficult to define . the finding that constitutive activation of wnt/-catenin signaling increases the proliferation of murine melanoma cells in vitro , accompanied by mitf - dependent increases in clonogenic growth , implicates this pathway as a promoter of melanoma progression . likewise , the activation of this pathway acts in concert with activation of ras to promote increased tumor formation in transgenic mice , further suggesting a role in tumor promotion . in contrast to these studies in cell culture and mouse - based models , there have been several recent reports that activation of wnt/-catenin signaling in patient tumors , as monitored by increased levels of nuclear -catenin , correlates with an improved rather than poorer prognosis [ 5 , 3032 ] . consistent with the data seen in human patients , forced expression of wnt3a in b16 melanoma cells leads to decreased proliferation in vitro and in vivo , along with the upregulation of genes associated with melanocyte differentiation that are frequently lost with melanoma progression . furthermore , almost all benign nevi are positive for nuclear -catenin , and studies have observed a loss of nuclear -catenin with melanoma progression to metastases . since the rate of transformation of nevi to melanomas is estimated to be very low , these collective observations from patients bring up the possibility that wnt/-catenin signaling may not be oncogenic in any sense , but rather is required to maintain a homeostatic balance that , when disrupted or lost , can lead to early melanoma transformation . another interesting aspect of wnt/-catenin signaling in melanoma involves -catenin independent wnt signaling activated in many ( but not all ) contexts by wnt5a . wnt5a was first linked to disease as a gene that was more highly expressed in aggressive and late - stage melanomas , and the immunohistochemical detection of high wnt5a in tumors was subsequently correlated with decreased patient survival . studies on wnt5a in melanoma have focused largely on the effects of this ligand on cell motility , based on the precedent that wnt5a regulates the movement of cells during the convergent - extension phase of vertebrate gastrulation [ 2 , 36 ] . interestingly , at least two studies have shown that wnt5a inhibits the transcription of wnt/-catenin target genes in melanoma , recapitulating the ability of -catenin independent wnt signaling to inhibit wnt/-catenin signaling that was first observed in developmental models [ 4 , 5 ] . this finding suggests that the acquisition of increased wnt5a with later - stage tumors may be involved with the inhibition and/or loss of wnt/-catenin signaling seen during melanoma progression from benign nevi to metastases . if one looks at the available data from patient - based studies of tumor tissue , the notion that wnt/-catenin signaling may not be universally oncogenic is supported by studies in a growing number of diseases . in the case of medulloblastoma , all patients with extensive nuclear -catenin staining had a 5-year overall survival rate of 92.3% versus a rate of 65.3% in patients with nucleonegative tumors . furthermore , in a study of 72 children , all six children who had an activating mutation in the ctnnb1 gene were alive and free of disease 5 years following diagnosis , compared to a survival rate of 53.7% for the ctnnb1 wild - type group [ 38 ] . in addition , these favorable activating mutations of -catenin correlate with a gene expression profile that distinguishes it from other medulloblastoma subtypes . more limited studies in prostate cancer , ovarian cancer , and even in late - stage colorectal carcinoma also show that the presence of active wnt/-catenin signaling is correlated with improved patient outcomes . ultimately , debates on whether wnt/-catenin signaling is oncogenic are less relevant than understanding whether the therapeutic activation of this pathway could be beneficial for various diseases , including melanoma and other cancers . interestingly , forced activation of wnt/-catenin signaling in the setting of kras - driven murine pancreatic cancer models can antagonize the development of pancreatic intraepithelial neoplasia lesions , and drive the development of tumors that resemble more benign human solid pseudopapillary neoplasms rather than more malignant pancreatic ductal adenocarcinoma [ 43 , 44 ] . this type of intriguing observation invites the hypothesis that wnt/-catenin signaling can be leveraged clinically to prevent aggressive tumors at the expense of potentially acquiring tumors that are more benign and treatable . similarly , the improved survival seen with activating -catenin mutations in medulloblastoma , which is largely a hedgehog - driven tumor , may also reflect the ability of wnt/-catenin signaling to promote cell fates that are less aggressive , a hypothesis supported by observations with genome - wide transcriptional profiling of patient tumors . although in theory the idea of activating wnt/-catenin signaling in cancer patients could be achieved , further studies will be needed to address the timing , strength , and duration of wnt/-catenin signaling required for the desired outcome . in melanoma , activation of wnt/-catenin signaling through the forced expression of wnt3a or through treatment of cells with soluble wnt3a ligand results in the decreased tumor cell proliferation in vitro and in vivo , as well as the transcriptional upregulation of genes associated with melanocyte differentiation . the ability of wnt/-catenin signaling to drive the expression of certain differentiation - related genes likely reflects its role as a major regulator of melanocyte development . further studies will ultimately address whether the therapeutic activation of wnt/-catenin signaling in human melanoma patients is either practicable or beneficial . perhaps the time has come when we should not categorize signaling pathways as either oncogenic or tumor - suppressive , but instead view them in the context of cellular homeostasis . most definitions of cellular homeostasis include the property of cells to maintain a viable and healthy state through the constant adjustment of various inputs such as biochemical pathways . in this context , it is not hard to envision a role for wnt/-catenin signaling in maintaining cell fate and regulating proliferation for both cancerous and noncancerous cells . the transcriptional response of wnt/-catenin signaling itself illustrates the importance of ongoing homeostasis , since many of the classical -catenin dependent target genes , including dkk1 , axin2 , tcf7 , and lef1 , can function as feedback inhibitors to modulate the pathway . there are several well - studied examples where wnt/-catenin signaling maintains normal adult tissue , including in intestinal crypts and in hair follicle units . benign melanocytic nevi are similar , since they are almost all positive by immunohistochemistry for nuclear -catenin . taking into consideration that nevi are thought ( based on histology of patient tumors ) to be precursors for a significant portion of melanomas , it is not so surprising that in some systems , the presence of active wnt/-catenin signaling may be needed for melanoma development . the dysregulation of wnt/-catenin homeostasis is reflected in the observation that nuclear -catenin decreases with melanoma progression . further studies will be needed to determine whether this loss of signaling represents a cause or effect of melanoma - genesis . unlike colon cancer , where mutation or loss of apc plays a primary role in constitutive activation of the pathway , studies have found that mutations leading to constitutive wnt/-catenin signaling in melanoma are rare . as a result , it is thought that the activation of wnt/-catenin signaling in benign nevi and melanomas is likely driven by wnt ligand , either secreted by tumor cells themselves or from cells in the surrounding environment . based on studies in cell - based systems , the signaling induced by secreted wnt ligand is more susceptible to regulation than constitutive activation of the pathway by downstream mutations , suggesting that from a therapeutic point of view , the cause of activated wnt/-catenin signaling in different cancer contexts is an essential consideration . additionally , in cases like melanoma where wnt/-catenin homeostasis is largely mediated by wnt ligand , the role of this pathway in cancer progression may be more difficult to dissect since it likely involves complex interactions between wnt/-catenin signaling and other pathways activated by genetic amplification , deletion , or mutation . accumulating evidence suggests that wnt/-catenin signaling is subject to regulation by cellular , temporal , and spatial contexts that make it difficult to generalize the end results of pathway activation or inhibition . to this end , it is overly simplistic to think of the pathway as oncogenic or tumor - suppressive . while it is easy to draw conclusions from experiments in cell culture dishes or in mouse models , these findings should be reconciled particularly with data on wnt/-catenin activation in human tumors . in studying the role of wnt/-catenin signaling in melanoma first , how well do the mouse models of melanoma recapitulate what is seen in patient tumors with regard to the loss of wnt/-catenin signaling during tumor progression ? second , how reliable is our evaluation of wnt/-catenin activation in patient tumors using nuclear -catenin as our sole measurement ? the context - specific nature of wnt/-catenin signaling may preclude the development of universally applicable assays based on gene targets , but further studies could identify context - specific readouts that could be used in melanoma or other diseases to validate the presence or absence of nuclear -catenin and provide a robust indicator of pathway activation in tumors . by developing reliable models and a faithful set of cell and tissue assays for human diseases , we can begin to fully grasp the cellular , spatial , and temporal contexts that determine the consequences of wnt/-catenin homeostasis and its dysregulation during disease .","<S> in cancer , wnt/-catenin signaling is ubiquitously referred to as an oncogenic pathway that promotes tumor progression . </S> <S> this review examines how the regulation and downstream effects of wnt/-catenin signaling in cancer varies depending on cellular context , with a focus on malignant melanoma . </S> <S> we emphasize that the cellular homeostasis of wnt/-catenin signaling may represent a more appropriate concept than the simplified view of the wnt/-catenin pathway as either oncogenic or tumor - suppressing </S> <S> . ultimately , a more refined understanding of the contextual regulation of wnt/-catenin signaling will be essential for addressing if and how therapeutic targeting of this pathway could be leveraged for patient benefit . </S>"
3,"thrombolysis with recombinant tissue plasminogen activator ( rtpa , alteplase ) is the only effective specific treatment for acute ischemic stroke patients coming in window ( 4.5 h from onset of symptoms ) . a milestone study of national institute of neurological disorders and stroke in 1995 , demonstrated the benefits of rtpa in to patients of acute ischemic stroke ( ais ) who came within window period . these patients of ais who were thrombolysed were 30% more likely to survive with minimal disability resulting in a 12% absolute increase in the proportion having excellent functional outcomes at 3 months . stroke study iii trail ( 2008 ) window period was extended to 4.5 h. for every 15 min reduction in door to needle time ( dtnt ) there is 5% reduction in odds of in hospital mortality ( odd ratio , 0.95 ; 95% confidence interval , 0.92 - 0.98 : p = 0.0007 ) . all patients who presented with stroke to emergency department ( er ) from january 2011 to december 2013 were included in the study . after initial assessment by causality personnel a medical / neuro - resident evaluated the patient . radiological diagnosis was obtained with noncontrasted brain computed tomography and/or diffusion weighted magnetic resonance imaging ( dwi ) . after neuro physician opinion or after telephonic discussion by neuro - resident with neuro physician ( telestroke ) the treatment plan was decided . for the patients who presented within window period thrombolysis was planned . severity of stroke was documented by the national institutes of health stroke scale ( nihss ) contra - indications for thrombolysis were checked and consent of relatives / patient was taken . those patients who presented out of window period ( > 4.5 h after onset of symptom / symptom to door time [ std ] > 4.5 h ) , or patients who had hemorrhagic stroke and those who were not willing for giving consent were excluded [ figure 1 ] . algorithm for patient presenting with stroke in emergency room ( protocol used in the study ) patients record files and charts were used to extract retrospective data . the collected data use to evaluate er to needle [ door to needle time-(dtnt ) ] time and reasons for delay in thrombolysis therapy in acute stroke patients . the following parameters were studied \n onset of symptoms to er time , assessment by physician / medical chief resident time ( door to physician time [ dtpt])er to imaging time ( door to imaging time [ dtit]),er to needle time ( dtnt)contraindications for thrombolysis . \n onset of symptoms to er time , assessment by physician / medical chief resident time ( door to physician time [ dtpt ] ) er to imaging time ( door to imaging time [ dtit ] ) , er to needle time ( dtnt ) contraindications for thrombolysis . the onset of symptom time for patients with wake up was accepted as last time the patient was seen as healthy . the baseline characteristics of patient with acute ischemic stroke brought / admitted to er , clinical features , arrival time ( door time ) to er , severity of stroke , imaging time , radiological findings , contraindication for thrombolytic treatment , time of starting recombinant tissue plasminogen activator ( rt - pa ) and thereafter complications were recorded [ table 2 ] . the data abstracted were transferred to the spss 17.0 program ( spss statistics is a software package used for statistical analysis . it is statistical package for social science and is produced by spss inc . ) of the computer for statistical analysis . six hundred and ninety - five patients with symptoms of stroke were presented to our emergency department in the study period . out of these five hundred and forty seven ( 78.7% ) were excluded as they had come out of window period that is , they had arrived 4.5 h after the onset of stroke symptoms . algorithm for patient presenting with stroke in er ( protocol used in the study ) . further after imaging of these one hundred and forty eight patients , one hundred four ( 70.27% ) were excluded . sixty - two ( 59.6% ) had intra cerebral bleed , 1 ( 0.9% ) had hemoglobin - 3.1 g , 13 ( 13% ) of them had transient ischemic attack ( neurological symptoms improved ) and dwi images of these patient were normal . 6 ( 5.7 ) patients were diagnosed to have metabolic de - rrangement ( hypoglycemia , hyperglycemia , hyponatremia ) . other reasons for exclusion in our study were post - ictal status , financial problem , recent thrombolysis , recent surgery , and delay in contacting senior radiologist [ table 1 ] . distribution of contraindications for thrombolytic therapy of patients baseline clinical characteristics of thrombolysed patients ( n=44 ) total 44 ( 29.7% ) patients with ais were thrombolysed . thirty - four ( 79.5% ) were male and nine ( 20.45% ) were female . co - morbid illness in the form of hypertension 6 ( 13.6% ) , diabetes 2 ( 4.5% ) , ischemic heart diseases ( ihd ) 2 ( 4.5% ) , previous stroke cva 3 ( 6.5% ) , > 1 co - morbidity ( ht / cva / dm / ihd / hypothyriod ) 19 ( 43% ) , seizure 1 ( 2% ) and alcoholic liver diseases 1 ( 2% ) patients respectively . the mean time for arrival of patients from onset of symptoms to hospital ( std ) 1.23 h ( 15 min-3 h ) . the mean door to neuro - physician time dtpt was 32 min ( 5 min-2.23 h ) . the mean dtnt 1.44 h ( 40 min-3.3 h ) [ tables 3 and 4 ] , [ figures 24 ] . our study dtpt , dtit and dtnt compared with aha guideline interval number of patients thrombolysed per hour door to physician time door to imaging time ( recommended standard time : 45 min ) analysis of our study clearly states that stdt , dtpt , dtit , and dtnt time are significantly more . thus , we had many hurdles in delivering thrombolysis therapy to these 44 patients . only 7 ( 15% ) patients had dtnt 60 min . the problems / barriers in our study were categorized into three factors : mean symptom to door time was 83 min ( median : 69 ) . poor recognition of stroke signs , especially in older patients caused delay in arrival time to hospital . public and emergency medical services staff education play important role in shortening the pre hospital period . in our study - door to physician , door to imaging and door to needle time were significantly ore compared to standard recommendations ( aha ) [ table 3 ] . there was lack of handling stroke patients with high priority at each level er , imaging unit , stroke unit . thus lack of triaging stroke patient at all level of intervention was our weakest point . in one of the patients the on call doctor was very busy attending emergency calls so causing increase in dtpt time . the concept of having second on call doctor who takes care only of patients with acute stroke has being recommended by kobayashi et al . lack of triaging at radiology unit and performing entire sequences of magnetic resonance imaging ( mri ) scan lead to increase in dtit . thus again dwi has high sensitivity ( 88 - 100% ) and specificity ( 95 - 100% ) for detecting infarcted regions , within minutes of onset of symptoms . study suggested brain attack team mri sequence of < 10 min to confirm acute ischemia stroke and assess candidacy for iv - rtpa . lack of triaging of bed for stroke patients resulted in increase in dtnt time . to prevent these delay we started thrombolysing ais patient in er . to prevent delay due to inavailability of drug , we have started keeping rtpa in our drug stock our dtnt was 104 min ( door to needle time median - 100 ) . relatives with geriatric patient ( 79 years ) took longer time to give consent due to age of the patient and secondly due to financial burden . one of our patients had liver diseases so we had to wait for international normal ratio report for prothrombin time ( international normalized ratio ) . transient ischemic attack patients were not thrombolysed , but in latter half of study dwi images helped us to prevent delay . poor recognition of stroke signs , especially in older patients caused delay in arrival time to hospital . public and emergency medical services staff education play important role in shortening the pre hospital period . in our study - door to physician , door to imaging and door to needle time were significantly ore compared to standard recommendations ( aha ) [ table 3 ] . there was lack of handling stroke patients with high priority at each level er , imaging unit , stroke unit . thus lack of triaging stroke patient at all level of intervention was our weakest point . education of emergency medical services of stroke symptoms will help to triage stroke patient . in one of the patients the on call doctor was very busy attending emergency calls so causing increase in dtpt time . the concept of having second on call doctor who takes care only of patients with acute stroke has being recommended by kobayashi et al . lack of triaging at radiology unit and performing entire sequences of magnetic resonance imaging ( mri ) scan lead to increase in dtit . thus again dwi has high sensitivity ( 88 - 100% ) and specificity ( 95 - 100% ) for detecting infarcted regions , within minutes of onset of symptoms . study suggested brain attack team mri sequence of < 10 min to confirm acute ischemia stroke and assess candidacy for iv - rtpa . lack of triaging of bed for stroke patients resulted in increase in dtnt time . to prevent these delay we started thrombolysing ais patient in er . to prevent delay due to inavailability of drug , we have started keeping rtpa in our drug stock our dtnt was 104 min ( door to needle time median - 100 ) . relatives with geriatric patient ( 79 years ) took longer time to give consent due to age of the patient and secondly due to financial burden . one of our patients had liver diseases so we had to wait for international normal ratio report for prothrombin time ( international normalized ratio ) . transient ischemic attack patients were not thrombolysed , but in latter half of study dwi images helped us to prevent delay . poor recognition of stroke signs , especially in older patients caused delay in arrival time to hospital . public and emergency medical services staff education play important role in shortening the pre hospital period . in our study - door to physician , door to imaging and door to needle time were significantly ore compared to standard recommendations ( aha ) [ table 3 ] . there was lack of handling stroke patients with high priority at each level er , imaging unit , stroke unit . thus lack of triaging stroke patient at all level of intervention was our weakest point . education of emergency medical services of stroke symptoms will help to triage stroke patient . in one of the patients the on call doctor was very busy attending emergency calls so causing increase in dtpt time . the concept of having second on call doctor who takes care only of patients with acute stroke has being recommended by kobayashi et al . lack of triaging at radiology unit and performing entire sequences of magnetic resonance imaging ( mri ) scan lead to increase in dtit . thus again dwi has high sensitivity ( 88 - 100% ) and specificity ( 95 - 100% ) for detecting infarcted regions , within minutes of onset of symptoms . study suggested brain attack team mri sequence of < 10 min to confirm acute ischemia stroke and assess candidacy for iv - rtpa . lack of triaging of bed for stroke patients resulted in increase in dtnt time . to prevent these delay we started thrombolysing ais patient in er . to prevent delay due to inavailability of drug our dtnt was 104 min ( door to needle time median - 100 ) . patient with raised blood pressure requiring labetalol infusion caused delay . relatives with geriatric patient ( 79 years ) took longer time to give consent due to age of the patient and secondly due to financial burden . one of our patients had liver diseases so we had to wait for international normal ratio report for prothrombin time ( international normalized ratio ) . transient ischemic attack patients were not thrombolysed , but in latter half of study dwi images helped us to prevent delay . the barriers of thrombolysis in our study included : \n lack of public awareness and inaccessiblity to emergency medical serviceslack of prioritizing triage system at er , radiology unit and stroke unitlack of a multi - disciplinary stroke care team . \n lack of public awareness and inaccessiblity to emergency medical services lack of prioritizing triage system at er , radiology unit and stroke unit lack of a multi - disciplinary stroke care team . a multi - disciplinary stroke care team consists of well - established emergency medical services , physicians , neurologist , nurses , radiology staff , neuro - radiologist , and pharmacist . forming a one - call comprehensive stroke code will help in co - ordination at all level . time to time audit of quality indicator of stroke code team may help to overcome the factors for delay in dtnt .","<S> aim:(1 ) to evaluate the number of patients thrombolysed within 1 h of arrival to emergency room ( er ) ( 2 ) to identify reasons for delay in thrombolysis of acute stroke patients.materials and methods : all patients admitted to er with symptoms suggestive of stroke from january 2011 to november 2013 were studied . </S> <S> retrospective data were collected to evaluate er to needle ( door to needle time [ dtnt ] ) time and reasons for delay in thrombolysis . </S> <S> the parameters studied ( 1 ) onset of symptoms to er time , ( 2 ) er to imaging time ( door to imaging time [ dtit ] ) , ( 4 ) er to needle time ( door to needle ) and ( 5 ) contraindications for thrombolysis.results:a total of 695 patients with suspected stroke were admitted during study period . </S> <S> 547 ( 78% ) patients were out of window period . </S> <S> 148 patients ( 21% , m = 104 , f = 44 ) arrived within window period ( < 4.5 h. ) . </S> <S> 104 ( 70.27% ) were contraindicated for thrombolysis . </S> <S> majority were intracerebral bleeds . </S> <S> 44 ( 29.7% ) were eligible for thrombolysis . </S> <S> 7 ( 15.9% ) were thrombolysed within 1 h. the mean time for arrival of patients from onset of symptoms to hospital ( symptom to door ) 83 min ( median - 47 ) . </S> <S> the mean door to neuro - physician time ( dtpt ) was 32 min ( median - 15 min ) . </S> <S> the mean dtit was 58 min ( median - 50 min ) . </S> <S> the mean dtnt 104 ( median - 100 min).conclusion : reasons for delay in thrombolysis are : absence of stroke education program for common people . </S> <S> lack of priority for triage and imaging for stroke patients . </S>"
4,"we use density functional theory ( dft ) to compute the effects of substitutional al , b , cu , mn , and si solutes , and octahedral interstitial c and n solutes on the lattice parameters and elastic stiffness coefficients cij of bcc fe . the purefe.csv file contains the computed lattice parameter , magnetic moment , cij , and the derivatives of the cij with respect to lattice parameter for pure fe . the computational methodology we developed in ref . calculates a strain - misfit tensor for each solute which determines changes in the lattice parameter and volumetric contributions to the derivatives of the cij with respect to solute concentration . we also compute chemical contributions from each solute to the derivatives of the cij with respect to solute concentration . the sum of the volumetric and the chemical contributions gives the total derivatives of the cij with respect to solute concentration . the soluteeffects.csv file contains the diagonal components of the solute strain - misfit tensors and their average values , the volumetric and chemical contributions to the cij derivatives , the sum of the two contributions , and direct calculations of the total derivatives that encompass both contributions . we compute the solute data using 222 ( 16 atoms ) , 333 ( 54 atoms ) , and 444 ( 128-atom ) supercells . the calculation details , including the exchange - correlation functional , pseudopotentials , and all numerical convergence parameters used in generating the data , are given in ref . . the vasp input files incar and kpoints , and output files contcar , outcar , and oszicar for all the calculations are stored in the nist dspace repository ( http://hdl.handle.net/11256/67 ) , along with the analyzed data stored in the purefe.csv and soluteeffects.csv files . the repository also stores unix shell scripts we developed for calculating the data in the csv files from the raw vasp output files . the fundamental quantities necessary for computing strain misfit tensors and elastic stiffness coefficients are the numbers of atoms in the computational supercells , lattice parameters , applied strain magnitudes , and stresses . the scripts compute the elastic stiffness coefficients from derivatives of stress with respect to strain , approximated using a standard four - point central finite - difference formula . table 1 , table 2 list the properties contained in the purefe.csv and soluteeffects.csv files , respectively , along with identifying tags that label the properties in the files and their units .","<S> we present computed datasets on changes in the lattice parameter and elastic stiffness coefficients of bcc fe due to substitutional al , b , cu , mn , and si solutes , and octahedral interstitial c and n solutes . </S> <S> the data is calculated using the methodology based on density functional theory ( dft ) presented in ref . </S> <S> ( m.r . </S> <S> fellinger , l.g . </S> <S> hector jr . , d.r . </S> <S> trinkle , 2017 ) [ 1 ] . </S> <S> all the dft calculations were performed using the vienna ab initio simulations package ( vasp ) ( g. kresse , j. furthmller , 1996 ) [ 2 ] . </S> <S> the data is stored in the nist dspace repository ( http://hdl.handle.net/11256/671 ) . </S>"


The metric is an instance of [`datasets.Metric`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric):

In [13]:
metric

Metric(name: "rouge", features: {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}, usage: """
Calculates average rouge scores for a list of hypotheses and references
Args:
    predictions: list of predictions to score. Each predictions
        should be a string with tokens separated by spaces.
    references: list of reference for each prediction. Each
        reference should be a string with tokens separated by spaces.
    rouge_types: A list of rouge types to calculate.
        Valid names:
        `"rouge{n}"` (e.g. `"rouge1"`, `"rouge2"`) where: {n} is the n-gram based scoring,
        `"rougeL"`: Longest common subsequence based scoring.
        `"rougeLSum"`: rougeLsum splits text using `"
"`.
        See details in https://github.com/huggingface/datasets/issues/617
    use_stemmer: Bool indicating whether Porter stemmer should be used to strip word suffixes.
    use_agregator: Return aggregates if this is set to True
Retu

You can call its `compute` method with your predictions and labels, which need to be list of decoded strings:

In [14]:
fake_preds = ["hello there", "general kenobi"]
fake_labels = ["hello there", "general kenobi"]
metric.compute(predictions=fake_preds, references=fake_labels)

{'rouge1': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rouge2': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeL': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeLsum': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0))}

## Preprocessing the data

Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 `Transformers` `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that the model requires.

To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

That vocabulary will be cached, so it's not downloaded again the next time we run the cell.

In [15]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.69k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/238 [00:00<?, ?B/s]

By default, the call above will use one of the fast tokenizers (backed by Rust) from the 🤗 `Tokenizers` library.

You can directly call this tokenizer on one sentence or a pair of sentences:

In [16]:
tokenizer("Hello, this one sentence!")

{'input_ids': [0, 31414, 6, 42, 65, 3645, 328, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}

Depending on the model you selected, you will see different keys in the dictionary returned by the cell above. They don't matter much for what we're doing here (just know they are required by the model we will instantiate later), you can learn more about them in [this tutorial](https://huggingface.co/transformers/preprocessing.html) if you're interested.

Instead of one sentence, we can pass along a list of sentences:

In [17]:
tokenizer(["Hello, this one sentence!", "This is another sentence."])

{'input_ids': [[0, 31414, 6, 42, 65, 3645, 328, 2], [0, 713, 16, 277, 3645, 4, 2]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]]}

To prepare the targets for our model, we need to tokenize them inside the `as_target_tokenizer` context manager. This will make sure the tokenizer uses the special tokens corresponding to the targets:

In [18]:
with tokenizer.as_target_tokenizer():
    print(tokenizer(["Hello, this one sentence!", "This is another sentence."]))

{'input_ids': [[0, 31414, 6, 42, 65, 3645, 328, 2], [0, 713, 16, 277, 3645, 4, 2]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]]}


If you are using one of the five T5 checkpoints we have to prefix the inputs with "summarize:" (the model can also translate and it needs the prefix to know which task it has to perform).

In [19]:
if model_checkpoint in ["t5-small", "t5-base", "t5-larg", "t5-3b", "t5-11b"]:
    prefix = "summarize: "
else:
    prefix = ""

We can then write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model. The padding will be dealt with later on (in a data collator) so we pad examples to the longest length in the batch and not the whole dataset.

The max input length of `sshleifer/distilbart-cnn-12-3` is 1024, so `max_input_length = 1024`.

In [20]:
max_input_length = 1024
max_target_length = 256

def preprocess_function(examples):
    inputs = [prefix + doc for doc in examples["article"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["abstract"], max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists for each key:

In [21]:
preprocess_function(raw_datasets['train'][:2])

{'input_ids': [[0, 405, 11493, 11, 55, 87, 654, 207, 9, 1484, 8, 189, 1338, 1814, 207, 11, 1402, 3505, 9, 16640, 2156, 941, 11, 1484, 11793, 17930, 8, 73, 368, 13785, 5804, 4, 134, 41, 23249, 16, 6533, 25, 41, 15650, 17215, 672, 9, 23385, 43202, 36, 1368, 428, 4839, 36, 1368, 428, 28696, 316, 821, 1589, 385, 462, 4839, 8, 189, 16072, 25, 10, 898, 9, 5, 7482, 2199, 2156, 13162, 2156, 2129, 10894, 2156, 17930, 2156, 50, 13785, 5804, 479, 6104, 3218, 3608, 14, 7967, 8, 18327, 139, 111, 2174, 797, 71, 13785, 5804, 2156, 941, 11, 471, 8, 5397, 16640, 2156, 189, 28, 13969, 30, 41, 23249, 4, 1978, 41, 23249, 747, 41089, 1290, 5298, 215, 25, 16069, 2156, 8269, 2156, 8, 25599, 642, 22423, 2156, 8, 4634, 189, 33, 10, 2430, 1683, 15, 1318, 9, 301, 36, 2231, 1168, 4839, 8, 819, 2194, 11, 1484, 19, 1668, 479, 4634, 2156, 7, 1477, 2166, 13838, 2156, 2231, 1168, 2156, 8, 17618, 32444, 11, 1484, 19, 1668, 2156, 24, 74, 28, 5701, 7, 185, 10, 16300, 1548, 11, 9397, 9883, 54, 240, 1416, 13, 1668, 111, 30

To apply this function on all the pairs of sentences in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command.

In [22]:
tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)

  0%|          | 0/8 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Even better, the results are automatically cached by the 🤗 `Datasets` library to avoid spending time on this step the next time you run your notebook. The 🤗 `Datasets` library is normally smart enough to detect when the function you pass to map has changed (and thus requires to not use the cache data). For instance, it will properly detect if you change the task in the first cell and rerun the notebook. 🤗 `Datasets` warns you when it uses cached files, you can pass `load_from_cache_file=False` in the call to `map` to not use the cached files and force the preprocessing to be applied again.

Note that we passed `batched=True` to encode the texts by batches together. This is to leverage the full benefit of the fast tokenizer we loaded earlier, which will use multi-threading to treat the texts in a batch concurrently.

## Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it. Since our task is of the sequence-to-sequence kind, we use the `AutoModelForSeq2SeqLM` class. Like with the tokenizer, the `from_pretrained` method will download and cache the model for us.

In [23]:
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/973M [00:00<?, ?B/s]

Note that  we don't get a warning like in our classification example. This means we used all the weights of the pretrained model and there is no randomly initialized head in this case.

To instantiate a `Seq2SeqTrainer`, we will need to define three more things. The most important is the [`Seq2SeqTrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.Seq2SeqTrainingArguments), which is a class that contains all the attributes to customize the training. It requires one folder name, which will be used to save the checkpoints of the model, and all other arguments are optional:

In [24]:
batch_size = 2
model_name = model_checkpoint.split("/")[-1]
args = Seq2SeqTrainingArguments(
    f"{model_name}-finetuned-pubmed",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=5,
    predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
    seed = 42,
)

Here we set the evaluation to be done at the end of each epoch, tweak the learning rate, use the `batch_size` defined at the top of the cell and customize the weight decay. Since the `Seq2SeqTrainer` will save the model regularly and our dataset is quite large, we tell it to make three saves maximum. Lastly, we use the `predict_with_generate` option (to properly generate summaries) and activate mixed precision training (to go a bit faster).

The last argument to setup everything so we can push the model to the [Hub](https://huggingface.co/models) regularly during training. Remove it if you didn't follow the installation steps at the top of the notebook. If you want to save your model locally in a name that is different than the name of the repository it will be pushed, or if you want to push your model under an organization and not your name space, use the `hub_model_id` argument to set the repo name (it needs to be the full name, including your namespace: for instance `"sgugger/t5-finetuned-xsum"` or `"huggingface/t5-finetuned-xsum"`).

Then, we need a special kind of data collator, which will not only pad the inputs to the maximum length in the batch, but also the labels:

In [25]:
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

The last thing to define for our `Seq2SeqTrainer` is how to compute the metrics from the predictions. We need to define a function for this, which will just use the `metric` we loaded earlier, and we have to do a bit of pre-processing to decode the predictions into texts:

In [26]:
import nltk
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    
    # Rouge expects a newline after each sentence
    decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
    
    result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    # Extract a few results
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    
    # Add mean generated length
    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
    result["gen_len"] = np.mean(prediction_lens)
    
    return {k: round(v, 4) for k, v in result.items()}

Then we just need to pass all of this along with our datasets to the `Seq2SeqTrainer`:

In [27]:
trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Cloning https://huggingface.co/Kevincp560/distilbart-cnn-12-3-finetuned-pubmed into local empty directory.
Using amp half precision backend


We can now finetune our model by just calling the `train` method:

In [28]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `BartForConditionalGeneration.forward` and have been ignored: article, abstract. If article, abstract are not expected by `BartForConditionalGeneration.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 8000
  Num Epochs = 5
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 20000


Epoch,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum,Gen Len
1,2.469,2.295583,38.3713,15.2594,23.6734,34.1634,141.707
2,2.2527,2.199446,39.5939,16.2376,24.6363,35.5106,141.831
3,2.0669,2.178034,40.078,16.6705,25.1119,35.9605,141.8475
4,1.9275,2.166949,40.0825,16.6169,24.9702,36.0191,141.928
5,1.8102,2.174292,40.5642,16.9812,25.3449,36.46,141.95


Saving model checkpoint to distilbart-cnn-12-3-finetuned-pubmed/checkpoint-500
Configuration saved in distilbart-cnn-12-3-finetuned-pubmed/checkpoint-500/config.json
Model weights saved in distilbart-cnn-12-3-finetuned-pubmed/checkpoint-500/pytorch_model.bin
tokenizer config file saved in distilbart-cnn-12-3-finetuned-pubmed/checkpoint-500/tokenizer_config.json
Special tokens file saved in distilbart-cnn-12-3-finetuned-pubmed/checkpoint-500/special_tokens_map.json
tokenizer config file saved in distilbart-cnn-12-3-finetuned-pubmed/tokenizer_config.json
Special tokens file saved in distilbart-cnn-12-3-finetuned-pubmed/special_tokens_map.json
Saving model checkpoint to distilbart-cnn-12-3-finetuned-pubmed/checkpoint-1000
Configuration saved in distilbart-cnn-12-3-finetuned-pubmed/checkpoint-1000/config.json
Model weights saved in distilbart-cnn-12-3-finetuned-pubmed/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in distilbart-cnn-12-3-finetuned-pubmed/checkpoint-1000/token

TrainOutput(global_step=20000, training_loss=2.1418580413818358, metrics={'train_runtime': 17711.3775, 'train_samples_per_second': 2.258, 'train_steps_per_second': 1.129, 'total_flos': 4.943637338112e+16, 'train_loss': 2.1418580413818358, 'epoch': 5.0})

You can now upload the result of the training to the Hub, just execute this instruction:

In [29]:
trainer.push_to_hub()

Saving model checkpoint to distilbart-cnn-12-3-finetuned-pubmed
Configuration saved in distilbart-cnn-12-3-finetuned-pubmed/config.json
Model weights saved in distilbart-cnn-12-3-finetuned-pubmed/pytorch_model.bin
tokenizer config file saved in distilbart-cnn-12-3-finetuned-pubmed/tokenizer_config.json
Special tokens file saved in distilbart-cnn-12-3-finetuned-pubmed/special_tokens_map.json
Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 3.36k/973M [00:00<?, ?B/s]

Upload file runs/Mar07_10-25-57_98188b55123e/events.out.tfevents.1646648788.98188b55123e.77.0:  25%|##4       …

To https://huggingface.co/Kevincp560/distilbart-cnn-12-3-finetuned-pubmed
   a1f4f89..8c1b637  main -> main

To https://huggingface.co/Kevincp560/distilbart-cnn-12-3-finetuned-pubmed
   8c1b637..6f18773  main -> main



'https://huggingface.co/Kevincp560/distilbart-cnn-12-3-finetuned-pubmed/commit/8c1b637c8d89ff7d9666a9550d44f1ca14f2a742'

You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `"your-username/the-name-you-picked"` so for instance:

```python
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("sgugger/my-awesome-model")
```