If you're opening this Notebook on colab, you will probably need to install 🤗 `Transformers` and 🤗 `Datasets` as well as other dependencies. 

* `datasets`
* `transformers`
* `rogue-score`
* `nltk`
* `pytorch`
* `ipywidgets`

*Note*: Since we are using the GPU to optimize the performance of the deep learning algorithms, `CUDA` needs to be installed on the device.

In [1]:
! pip install datasets transformers rouge-score nltk ipywidgets

Collecting datasets
  Downloading datasets-1.18.3-py3-none-any.whl (311 kB)
[K     |████████████████████████████████| 311 kB 8.9 MB/s 
[?25hCollecting transformers
  Downloading transformers-4.16.2-py3-none-any.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 52.9 MB/s 
[?25hCollecting rouge-score
  Downloading rouge_score-0.0.4-py2.py3-none-any.whl (22 kB)
Collecting fsspec[http]>=2021.05.0
  Downloading fsspec-2022.2.0-py3-none-any.whl (134 kB)
[K     |████████████████████████████████| 134 kB 45.8 MB/s 
[?25hCollecting aiohttp
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 39.5 MB/s 
[?25hCollecting xxhash
  Downloading xxhash-3.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |████████████████████████████████| 212 kB 60.9 MB/s 
Collecting huggingface-hub<1.0.0,>=0.1.0
  Downloading huggingface_hub-0

When using `nltk`, `punkt` also needs to be installed. I guess it is not installed automatically. Not having `punkt` will result in an error during the analysis.

In [2]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

If you're opening this notebook locally, make sure your environment has an install from the last version of those libraries.

To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:

In [3]:
from huggingface_hub import notebook_login

notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


Then you need to install `Git-LFS`.

If you are not using `Google Colab`, you may need to install `Git-LFS` manually, since the code below may not work and depending on your operating system. You can read about `Git-LFS` and how to install it [here](https://git-lfs.github.com/).

In [4]:
! apt install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-470
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 1s (1,643 kB/s)
Selecting previously unselected package git-lfs.
(Reading database ... 155320 files and directories currently installed.)
Preparing to unpack .../git-lfs_2.3.4-1_amd64.deb ...
Unpacking git-lfs (2.3.4-1) ...
Setting up git-lfs (2.3.4-1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...


Make sure your version of `Transformers` is at least 4.11.0 since the functionality was introduced in that version:

In [5]:
import transformers

print(transformers.__version__)

4.16.2


You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/seq2seq).

# Fine-tuning a model on a summarization task

In this notebook, we will see how to fine-tune one of the [🤗`Transformers`](https://github.com/huggingface/transformers) model for a summarization task. We will use the [PubMed Summarization dataset](https://huggingface.co/datasets/ccdv/pubmed-summarization) which contains PubMed articles accompanied with abstracts.

![Widget inference on a summarization task](https://github.com/huggingface/notebooks/blob/master/examples/images/summarization.png?raw=1)

We will see how to easily load the dataset for this task using 🤗 `Datasets` and how to fine-tune a model on it using the `Trainer` API.

In [6]:
model_checkpoint = "t5-base"

This notebook is built to run  with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a sequence-to-sequence version in the Transformers library. Here we picked the [`t5-base`](https://huggingface.co/t5-base) checkpoint. 

## Loading the dataset

We will use the [🤗 `Datasets`](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.  

In [7]:
from datasets import load_dataset, load_metric

raw_datasets = load_dataset("ccdv/pubmed-summarization")
metric = load_metric("rouge")

Downloading:   0%|          | 0.00/4.88k [00:00<?, ?B/s]

No config specified, defaulting to: pub_med_summarization_dataset/document


Downloading and preparing dataset pub_med_summarization_dataset/document to /root/.cache/huggingface/datasets/ccdv___pub_med_summarization_dataset/document/1.0.0/5792402f4d618f2f4e81ee177769870f365599daa729652338bac579552fec30...


Downloading:   0%|          | 0.00/779M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.7M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/43.8M [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset pub_med_summarization_dataset downloaded and prepared to /root/.cache/huggingface/datasets/ccdv___pub_med_summarization_dataset/document/1.0.0/5792402f4d618f2f4e81ee177769870f365599daa729652338bac579552fec30. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

The `dataset` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set:

In [8]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['article', 'abstract'],
        num_rows: 119924
    })
    validation: Dataset({
        features: ['article', 'abstract'],
        num_rows: 6633
    })
    test: Dataset({
        features: ['article', 'abstract'],
        num_rows: 6658
    })
})

To access an actual element, you need to select a split first, then give an index:

In [9]:
raw_datasets["train"][0]

{'abstract': "<S> background : the present study was carried out to assess the effects of community nutrition intervention based on advocacy approach on malnutrition status among school - aged children in shiraz , iran.materials and methods : this case - control nutritional intervention has been done between 2008 and 2009 on 2897 primary and secondary school boys and girls ( 7 - 13 years old ) based on advocacy approach in shiraz , iran . </S> <S> the project provided nutritious snacks in public schools over a 2-year period along with advocacy oriented actions in order to implement and promote nutritional intervention . for evaluation of effectiveness of the intervention growth monitoring indices of pre- and post - intervention were statistically compared.results:the frequency of subjects with body mass index lower than 5% decreased significantly after intervention among girls ( p = 0.02 ) . </S> <S> however , there were no significant changes among boys or total population . </S> <S> 

Since the `pubmed` data is extremely large, we are going to remove rows so that we have a training set of 8,000, a validation set of 2,000, and a test set of 2,000. 

In [10]:
raw_datasets["train"] = raw_datasets["train"].select(range(1, 8001))
raw_datasets["validation"] = raw_datasets["validation"].select(range(1, 2001))
raw_datasets["test"] = raw_datasets["test"].select(range(1, 2001))

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [11]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=5):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [12]:
show_random_elements(raw_datasets["train"])

Unnamed: 0,article,abstract
0,"the rising level of morbidity and mortality is a sign of social as well as individual malaise . in most parts of the world , mental health and mental illness are largely ignored or neglected , resulting in increasing burden of mental disorders in the community and a widening of treatment gap . meta - analysis of epidemiological studies reported prevalence of mental illness as 58.2 and 73 per 1000 population in india . however , even after three decades of its launch , national mental health programme is restricted only to 123 districts , that too in the rural region . urban community in the country , which is exposed to stress of migration , change in family and social dynamics , widening inequalities in economic status , widespread poverty , poor living conditions , and insecurity , hardly gets due attention in the program . lack of organised public health infrastructure and expensive treatment at private settings add to the problem . patients with mental illness have been stigmatised since long back in any community , and this stigmatisation is beyond just labelling the patients . the condition is perceived as frightening , shameful , imaginary , feigned , and incurable , while the patients are characterised as dangerous , unpredictable , untrustworthy , unstable , lazy , weak , worthless , and/or helpless in the community . furthermore , it is important to study the perception , attitude , and health - seeking behavior in the community regarding mental illness , which will help in providing mental healthcare services for the community . this was a community based cross - sectional study conducted from august 2009 to november 2009 in an urban community , south delhi , in northern india . six blocks of area were included in intensive field practice area under urban health programme of centre for community medicine of all india institute of medical sciences , new delhi . out of which , one block was randomly selected for the study by lottery method . in the selected block , any adult member ( 18 years ) residing in a selected household for more than 6 months and volunteering for interview was recruited in the study to complete a predetermined sample size of 100 participants . focus group discussion ( fgd ) guide was prepared after extensive review of literature and discussion with experts in this field to identify various domains related with mental illness . focus group discussion was carried out among males and females separately in the two different blocks of the study area other than block selected for the study . on the basis of fgd , semi - structured interview schedule was designed to collect information about mental illness including causative factors , preventive measures , identifying features , treatment seeking places , and related practices in the community . this interview schedule was translated into hindi language , pretested , and then modified before use . attitude about mental illness was studied by using opinion about mental illness for chinese community ( omicc ) scale . hindi version was again back translated into english for checking integrity of concepts of various domains . , on the basis of face validity ; building on the scale of opinion about mental illness ( omi ) developed by cohen and strunning and small scale survey with mental health professionals . items emphasize the distinctiveness of people with mental illness and to keep them away at a safe distancestereotyping items that fixed people with mental illness in a particular behavioral pattern , mental ability , and mannerismrestrictiveness items that held a doubtful view on the right of people with mental illnesspessimistic prediction items that held the view that people with mental illness are unlikely to improve and that how society treats them , is not optimisticstigmatization items that perceived mental illness as shameful , that sufferers should be kept from being known to others \n benevolence items related to kind orientation towards people with mental illness separatism items emphasize the distinctiveness of people with mental illness and to keep them away at a safe distance stereotyping items that fixed people with mental illness in a particular behavioral pattern , mental ability , and mannerism restrictiveness items that held a doubtful view on the right of people with mental illness pessimistic prediction items that held the view that people with mental illness are unlikely to improve and that how society treats them , is not optimistic stigmatization items that perceived mental illness as shameful , that sufferers should be kept from being known to others factor analysis of the omicc scale with 34 questions in six domains has been reported to yield a cronbach 's alpha of 0.866 . for perception about mental illness , frequency distribution of the responses was calculated . analysis of omicc attitude scale was done like any other likert scale with five point responses . all responses under each domain were coded from one to five and reverse coded for benevolence domain . attitude in each domain was determined on the basis of pooled mean values for the respective domain . values higher than two were consider as negative attitude in the respective domain . difference in attitude across age , sex , and literacy in six domains was assessed by independent sample t test . was obtained from institute 's ethical committee of all india institute of medical sciences , new delhi . , on the basis of face validity ; building on the scale of opinion about mental illness ( omi ) developed by cohen and strunning and small scale survey with mental health professionals . items emphasize the distinctiveness of people with mental illness and to keep them away at a safe distancestereotyping items that fixed people with mental illness in a particular behavioral pattern , mental ability , and mannerismrestrictiveness items that held a doubtful view on the right of people with mental illnesspessimistic prediction items that held the view that people with mental illness are unlikely to improve and that how society treats them , is not optimisticstigmatization items that perceived mental illness as shameful , that sufferers should be kept from being known to others \n benevolence items related to kind orientation towards people with mental illness separatism items emphasize the distinctiveness of people with mental illness and to keep them away at a safe distance stereotyping items that fixed people with mental illness in a particular behavioral pattern , mental ability , and mannerism restrictiveness items that held a doubtful view on the right of people with mental illness pessimistic prediction items that held the view that people with mental illness are unlikely to improve and that how society treats them , is not optimistic stigmatization items that perceived mental illness as shameful , that sufferers should be kept from being known to others factor analysis of the omicc scale with 34 questions in six domains has been reported to yield a cronbach 's alpha of 0.866 . analysis of omicc attitude scale was done like any other likert scale with five point responses . all responses under each domain were coded from one to five and reverse coded for benevolence domain . attitude in each domain was determined on the basis of pooled mean values for the respective domain . values higher than two were consider as negative attitude in the respective domain . difference in attitude across age , sex , and literacy in six domains was assessed by independent sample t test . ethical clearance for the study was obtained from institute 's ethical committee of all india institute of medical sciences , new delhi . mean age of the participants was 35.8 ( sd : 12.6 ) years with almost half ( 47% ) belonging to the 3049 years age group . majority of the participants were female ( 57% ) , literate ( 84% ) , currently married ( 81% ) , and did not report mental illness in the family ( 95% ) [ table 1 ] . socio - demographic profile of the study participants most of the participants ( 32% ) perceived that living without tension , living happily , and satisfied in routine life as indicators of a healthy mental status . almost one - fourth ( 24% ) of the participants did not know the meaning of being mentally healthy . sudden change in behavior like remaining quite or over talkativeness ( 59% ) and abusing or fighting with others ( 53% ) were among the most common symptoms / signs of mental illness identified by the participants . almost one - third of the participants perceived that symptoms of mental illness were overt , which makes a person with mental illness easily identifiable . only few ( 3% ) of them perceived that it would require expert to diagnose the mental illness . tension / mental stress in routine day to day life were perceived as the most common cause of mental illness ( 79% ) . importantly , one - fourth of the participants perceived the role of uppari chakkar / evil spirits in the development of mental illness . almost one - fourth of the participants perceived that mental illness is transmitted from person to person ( 21% ) and from the mother to her child ( 27% ) like any other communicable disease . most of the participants perceived that mental illness could be curable , but one - fifth of them ( 20% ) perceived that these are not completely curable . almost one - third ( 29% ) perceived that these disorders can be prevented by keeping friendly home environment and sharing problems , thoughts with others . stress relieving activities like yoga and meditations were perceived as one of the important preventive measures for development of mental illness . almost half of the participant felt that patient with mental illness can get specialised care at a mental hospital only ; 21% of the participants perceived the role of faith healers ( tantrik , ojha ) in the treatment of mental illness . according to the participants , community ( 39% ) as well they themselves will prefer mental hospital over general physician for seeking care for mental illness . faith healers ( tantric , ojha ) were identified as health seeking places both for the community as well as for them by ( 12% ) . most of the participant felt that community ( 80% ) ignores mentally ill patients and their families . almost one - fourth ( 27% ) of the participants felt that people from the community tease and make fun of a person with mental illness instead of getting them treated . analysis of omicc scale showed higher ( mean ) scores for stereotyping ( 4.5 ) , restrictiveness ( 3.9 ) , and pessimistic prediction ( 3.8 ) domains and lower values for separatism ( 2.6 ) , benevolence ( 1.8 ) , and stigmatisation ( 2.3 ) domains [ figure 1 ] . community showed kind , non - stigmatising but pessimistic attitude toward the future of the people with mental illness . at the same time , participants also felt that social relationship with these people should be restricted . difference in attitude towards mental illness across age , sex , and literacy status was found statistically non - significant ( p > 0.05 ) . most of the participants ( 32% ) perceived that living without tension , living happily , and satisfied in routine life as indicators of a healthy mental status . almost one - fourth ( 24% ) of the participants did not know the meaning of being mentally healthy . sudden change in behavior like remaining quite or over talkativeness ( 59% ) and abusing or fighting with others ( 53% ) were among the most common symptoms / signs of mental illness identified by the participants . almost one - third of the participants perceived that symptoms of mental illness were overt , which makes a person with mental illness easily identifiable . only few ( 3% ) of them perceived that it would require expert to diagnose the mental illness . tension / mental stress in routine day to day life were perceived as the most common cause of mental illness ( 79% ) . importantly , one - fourth of the participants perceived the role of uppari chakkar / evil spirits in the development of mental illness . almost one - fourth of the participants perceived that mental illness is transmitted from person to person ( 21% ) and from the mother to her child ( 27% ) like any other communicable disease . most of the participants perceived that mental illness could be curable , but one - fifth of them ( 20% ) perceived that these are not completely curable . almost one - third ( 29% ) perceived that these disorders can be prevented by keeping friendly home environment and sharing problems , thoughts with others . stress relieving activities like yoga and meditations were perceived as one of the important preventive measures for development of mental illness . almost half of the participant felt that patient with mental illness can get specialised care at a mental hospital only ; 21% of the participants perceived the role of faith healers ( tantrik , ojha ) in the treatment of mental illness . according to the participants , community ( 39% ) as well they themselves will prefer mental hospital over general physician for seeking care for mental illness . faith healers ( tantric , ojha ) were identified as health seeking places both for the community as well as for them by ( 12% ) . most of the participant felt that community ( 80% ) ignores mentally ill patients and their families . almost one - fourth ( 27% ) of the participants felt that people from the community tease and make fun of a person with mental illness instead of getting them treated . analysis of omicc scale showed higher ( mean ) scores for stereotyping ( 4.5 ) , restrictiveness ( 3.9 ) , and pessimistic prediction ( 3.8 ) domains and lower values for separatism ( 2.6 ) , benevolence ( 1.8 ) , and stigmatisation ( 2.3 ) domains [ figure 1 ] . community showed kind , non - stigmatising but pessimistic attitude toward the future of the people with mental illness . at the same time , participants also felt that social relationship with these people should be restricted . difference in attitude towards mental illness across age , sex , and literacy status was found statistically non - significant ( p > 0.05 ) . this study describes the perception of the community regarding mental health and derangement of mental health , similar to who definition of mental health as a positive sense of well - being and not merely the absence of a illness . awareness of the community about symptoms / signs of mental illness is limited to symptoms that manifest in severe mental illness or in later stage of the illness . the reason could be either lack of awareness of participants about other common symptoms like sense of hopelessness , aloofness , and anxiety or may be that these symptoms are too common to be recognised as abnormal . this fact is supported by the observation that only few of the participants agreed that mental illness could be present in a person with normal behavior and it is not possible to diagnose it by mere observation ( 3% ) . singh et al . , reported stressful conditions as a cause for development of mental illness similar to the observations in the present study . attribution of mental illness to upari chakkar / evil spirit / black magic was present in a substantial proportion ( 25% ) and shows the lack of awareness of the community about bio - medical concept of causation of these disorders . the knowledge of the participant has been reflected in the health - seeking behavior of the community as tantric and ojha were reflected among health - seeking places for mental illness to get rid of evil spirits . , this fact is also supported by observation that participants do regard transmission of the mental illness from person to person ( 21% ) and from mother to child ( 27% ) . this lack of knowledge in the study population from the capital city that has a well - known institute for care for mentally ill is serious because the ignorance could possibly be more in other regions of the country . community identified stress relieving activities like yoga and satsang , supportive family environment , and sharing thoughts with others as preventive measures against mental illness . these findings are in coherence with the knowledge of the community about causality of mental illness . community attitude toward patient with mental illness was kind and non - stigmatising , which was similar to the attitude of the community reported in various studies . although participants did not support isolating person with mental illness from the society , restrictive attitude was observed with regards to marriage or child bearing . this finding corroborates with findings of the study by singh et al . , and kermode et al . , also , participants are pessimistic when it comes to career or job opportunity for a person with mental illness . this socially restrictive attitude is reflected in the practices of community toward psychiatric ill patients in the form of restricting visits to patient 's home and ignoring the patients . in the era of economic and social development , community still approaches tantric / ojha in the capital city . community reported restrictive , stereotyping , pessimistic , and non - stigmatizing attitude toward patient with mental illness that can be the barrier in health - seeking behavior for mental illnesses . this study reported lack of awareness about bio - medical concepts of mental illness in a community in the capital city . there is a need for creating awareness regarding biomedical concepts , availability of effective treatment for mental illness , for identification and better care for these disorders in a community as a part of national mental health programme . health education and increase in public awareness regarding factual information about mental illness can decrease the stigma attached with mental illness and improve help - seeking behavior of the community .","<S> background : mental illness have been largely ignored or neglected because of a community 's perception and attached social stigma.materials and methods : a community based cross - sectional study was conducted in an urban community in south delhi to study perception and attitude of the community about towards mental illness . </S> <S> an adult member in household selected by systematic random sampling was interviewed using semi - structured interview schedule for perception about mental illness and 34 item opinion about mental illness for chinese community ( omicc ) scaleresults : a total of 100 adults were interviewed . </S> <S> mean age of the participants was 35.8 ( sd : 12.6 ) years . </S> <S> living without tension and satisfaction in routine life were identified as indicators of healthy mental status . </S> <S> change in the behavior was perceived as the most common symptom of mental illness . </S> <S> although mental stress was identified as the most common cause of mental illness , 25% attributed it to evil spirits . keeping surroundings friendly and sharing problems with others </S> <S> were identified as - important preventive measures against mental illness . </S> <S> mental illness was perceived as treatable ; 12% preferred treatment from tantric / ojha . </S> <S> community showed negative attitude for stereotyping , restrictiveness , and pessimistic prediction domains of omicc scale with mean score of 4.5 ( sd : 0.2 ) , 3.9 ( sd : 0.9 ) , and 3.8 ( sd : 0.4 ) , respectively , with no statistically significant difference across age , sex , and literacy.conclusion:study observed lack of awareness regarding bio - medical concept of mental illness with socially restrictive , stereotyping , pessimistic , and non - stigmatizing attitude toward mental illness in the capital city . </S>"
1,"hypertension is common in hemolytic uremic syndrome ( hus ) , and often difficult to control in an acute stage . malignant hypertension associated with hus leads to reversible posterior encephalopathy syndrome , seizures , heart failure , and other adverse consequences which increase the morbidity and mortality of the disease . renin - mediated mechanism is believed to be the main factor responsible for hypertension seen in these cases . drugs that act by blocking renin - angiotensin axis ( ras ) are thus ideal for such cases , however , due to concern of progression of renal failure and lack of experience of these agents in children , these are not preferred or used commonly in acute stages . we hereby report two cases of hus with severe refractory malignant hypertension in which we targeted ras by using intravenous ( iv ) enalaprilat , oral aliskiren , and oral enalapril with quick and dramatic response of blood pressure ( bp ) . a 6-year - old male was admitted with a history of vomiting , fever since 2 weeks , hematuria and decreased urine output since 1 week . on evaluation by his local practitioner , he was found to have anemia ( hemoglobin [ hb ] 6.3 g / dl ) , thrombocytopenia ( platelet 72,000/mm ) , active urine sediment ( red blood cell [ rbc ] 4060/hpf , albumin 3 + ) , and azotemia ( blood urea 200 mg / dl , creatinine 4.2 mg / dl ) . he had an episode of seizure ( due to accelerated hypertension ) , hence was brought to our hospital for further management . on evaluation , he was hypertensive ( bp 150/100 mmhg ) with generalized edema , oliguria , and a normal systemic examination . investigations were suggestive of hus ( hb 4.8 g / dl , white blood cell [ wbc ] 11,190 cmm , platelet 1.84/mm , peripheral smear : schistocytes positive , reticulocyte count 6.8% , lactate dehydrogenase [ ldh ] 4300 u / l , direct coombs test and indirect coombs tests were negative , urea 67 mg / dl , creatinine 2.6 mg / dl ) . septic work up , dengue serology , and malarial antigen were negative , and he became afebrile on the 4 day of admission . his antinuclear antibodies ( ana ) and antineutrophil cytoplasmic antibody ( anca ) were negative . he was started on empiric antibiotics ( injection ceftriaxone ) and daily plasmapheresis for hus . detailed complement regulator assay showed very high anti - factor h antibody ( 41,000 iu ) . c3 , c4 , antigenic levels of factor h , factor i , factor b , and cd46 were normal [ table 1 ] . he was given a blood transfusion and initiated on hemodialysis and daily plasma exchanges in view of oligo - anuric acute kidney injury ( aki ) . complement assay in cases * for the child 's height percentile , the bp percentiles were : 90 percentile : 113/72 mmhg and 95 percentile 117/76 mmhg ( blood pressure references used were as per the fourth report ) . for arterial hypertension [ figure 1 ] , he was started on sustained release nifedepine , clonidine , and metoprolol and subsequently prazosin with a gradual increase in dosage . however , arterial bp remained persistently high ( > 99 centile ; up to 170/120 mmhg ) , and he developed blurring of vision , with abdominal pain and vomiting on the 3 day of admission necessitating need for iv nitroglycerine ( up to 5 mcg / kg / min ) and subsequently labetolol infusion ( up to 2 mg / kg / h ) for refractory hypertension . child had persistent arterial hypertension ( > 99 centile for his age ) , despite vigorous fluid removal in hemodialysis sessions . response to antihypertensive medications in case 1 oral enalapril and minoxidil were also added and dosage of other oral antihypertensives optimized to the maximal doses [ figure 1 ] but arterial bp remained high and was difficult to control . but since within 48 h of adding oral enalapril and oral minoxidil , child went into hypertensive emergency ( bp 180/120 mmhg ) , with hallucinations and visual blurring , iv enalaprilat was added . hypertension showed a significant improvement after addition of iv enalaprilat ( 10 g / kg / dose q 8 hourly ) on the 5 day of admission . there was a consistent fall of bp within hours of giving individual iv enalaprilat boluses ( average fall in mean bp 9.5 mmhg ) . arterial bp decreased to 110/78 mmhg ; patient became asymptomatic , and nitroglycerine and labetolol infusions were tapered off successfully . he developed neutropenia a week following enalaprilat therapy ( wbc 1500 cmm , 50% neutrophils ) , which could not be attributed to any other cause ; hence , it was stopped followed by a rebound in hypertension ( arterial bp 190/136 mmhg ) . he was not dialysis dependent at this stage with a good urine output , and serum creatinine had fallen to 0.6 mg / dl , and his hemodialysis catheter was removed . aliskiren ( 2 mg / kg / dose ) was then added with good response ( arterial bp decreased to 150/110 mmhg over 24 h , and 138/110 mmhg over 48 h ) ; however , it was withdrawn after 4 days due to hyperkalemia ( serum potassium 6.5 mmol / l ) . after improvement of neutopenia , oral enalapril was reintroduced along with telmisartan . bp control improved within 48 h on this angiotensin - converting enzyme inhibitor - angiotensin receptor blockade ( acei - arb ) combination ( bp < 90 percentile for age ) along with other oral antihypertensive agents . he received seven daily plasmapheresis sessions till active hemolysis subsided followed by alternate day sessions . he received prednisolone ( 1 mg / kg / day ) and iv immunoglobulin 2 g / day ( day 10 of admission ) , iv cyclophosphamide during his hospital stay . he received total six doses of iv cyclophosphamide followed by maintenance azathioprine . at his 1 year of follow - up , he is doing well , normotensive ( bp < 90 percentile for age ) and no proteinuria ( urine protein / creatinine ratio < 0.2 ) and a normal urine examination on angiotensin - converting enzyme ( ace ) and arb combination . a 7-month - old male was admitted with a history of vomiting , fever since 5 days and anuria since 2 days . , he was hypertensive ( bp 130/60 mmhg ) , with pallor , facial puffiness , and normal systemic examination . investigations were suggestive of atypical hus - microangiopathic hemolytic anemia with aki ( hb 7 g / dl , wbc 13,280/mm , platelets 100,000/mm , peripheral smear : schistocytes , elliptocytes , reticulocyte count 5.6% , ldh 4300 u / l ) and active urine sediment ( rbc 1020/hpf , albumin 2 + ) . the child had no evidence of pneumonia or sepsis , on clinical evaluation , and all cultures were sterile . the child had advanced azotemia ( urea 216 mg / dl , creatinine 7.2 mg / dl ) with severe hyperkalemia and metabolic acidosis . for hus with evidence of ongoing hemolysis and dialysis dependence , he was started on daily plasmapheresis . a detailed complement assay ( including c3 , c4 , antigenic levels of factor h , factor i , factor b , cd46 , and autoantibodies to factor h ) were normal [ table 1 ] . bp was observed to be high since admission and increased up to 160/110 mmhg on serial monitoring , although patient remained asymptomatic . for his arterial hypertension ( > 99 centile for age ) [ figure 2 ] , he was started on amlodipine and prazosin initially ; dosage was increased to the maximal dose , and clonidine and oral enalapril were added on the 3 and 4 day of admission , respectively . despite the addition of multiple antihypertensive agents and dosage optimization and aggressive ultrafiltration in hemodialysis sessions , arterial bp showed only marginal decrease and mean arterial bp remained high ( > 99 centile for age ; 100110 mmhg ) . intravenous enalaprilat ( 10 g / kg / dose q 8 hourly ) was added on day 6 of admission . the mean arterial bp improved ( 80 mmhg ) within 12 h of addition of enalaprilat . once arterial bp was controlled with no further increase , dose of oral enalapril was maximized , while tapering off iv enalaprilat and other antihypertensive medications were continued . response to antihypertensive medications in case 2 daily hemolytic parameters and renal function were monitored . he was discharged in 2 weeks time in a stable condition , with adequate urine output and bp controlled on oral antihypertensive therapy . at a follow - up of 1 year , currently the infant is doing well , with serum creatinine 0.7 mg / dl , urine protein / creatinine ratio 1.5 , and normal bp ( bp < 90 percentile for age ) on oral enalapril 0.4 mg / kg / day . a 6-year - old male was admitted with a history of vomiting , fever since 2 weeks , hematuria and decreased urine output since 1 week . on evaluation by his local practitioner , he was found to have anemia ( hemoglobin [ hb ] 6.3 g / dl ) , thrombocytopenia ( platelet 72,000/mm ) , active urine sediment ( red blood cell [ rbc ] 4060/hpf , albumin 3 + ) , and azotemia ( blood urea 200 mg / dl , creatinine 4.2 mg / dl ) . he had an episode of seizure ( due to accelerated hypertension ) , hence was brought to our hospital for further management . on evaluation , he was hypertensive ( bp 150/100 mmhg ) with generalized edema , oliguria , and a normal systemic examination . investigations were suggestive of hus ( hb 4.8 g / dl , white blood cell [ wbc ] 11,190 cmm , platelet 1.84/mm , peripheral smear : schistocytes positive , reticulocyte count 6.8% , lactate dehydrogenase [ ldh ] 4300 u / l , direct coombs test and indirect coombs tests were negative , urea 67 mg / dl , creatinine 2.6 mg / dl ) . septic work up , dengue serology , and malarial antigen were negative , and he became afebrile on the 4 day of admission . his antinuclear antibodies ( ana ) and antineutrophil cytoplasmic antibody ( anca ) were negative . he was started on empiric antibiotics ( injection ceftriaxone ) and daily plasmapheresis for hus . detailed complement regulator assay showed very high anti - factor h antibody ( 41,000 iu ) . c3 , c4 , antigenic levels of factor h , factor i , factor b , and cd46 were normal [ table 1 ] . he was given a blood transfusion and initiated on hemodialysis and daily plasma exchanges in view of oligo - anuric acute kidney injury ( aki ) . complement assay in cases * for the child 's height percentile , the bp percentiles were : 90 percentile : 113/72 mmhg and 95 percentile 117/76 mmhg ( blood pressure references used were as per the fourth report ) . for arterial hypertension [ figure 1 ] , he was started on sustained release nifedepine , clonidine , and metoprolol and subsequently prazosin with a gradual increase in dosage . however , arterial bp remained persistently high ( > 99 centile ; up to 170/120 mmhg ) , and he developed blurring of vision , with abdominal pain and vomiting on the 3 day of admission necessitating need for iv nitroglycerine ( up to 5 mcg / kg / min ) and subsequently labetolol infusion ( up to 2 mg / kg / h ) for refractory hypertension . child had persistent arterial hypertension ( > 99 centile for his age ) , despite vigorous fluid removal in hemodialysis sessions . response to antihypertensive medications in case 1 oral enalapril and minoxidil were also added and dosage of other oral antihypertensives optimized to the maximal doses [ figure 1 ] but arterial bp remained high and was difficult to control . but since within 48 h of adding oral enalapril and oral minoxidil , child went into hypertensive emergency ( bp 180/120 mmhg ) , with hallucinations and visual blurring , iv enalaprilat was added . hypertension showed a significant improvement after addition of iv enalaprilat ( 10 g / kg / dose q 8 hourly ) on the 5 day of admission . there was a consistent fall of bp within hours of giving individual iv enalaprilat boluses ( average fall in mean bp 9.5 mmhg ) . arterial bp decreased to 110/78 mmhg ; patient became asymptomatic , and nitroglycerine and labetolol infusions were tapered off successfully . he developed neutropenia a week following enalaprilat therapy ( wbc 1500 cmm , 50% neutrophils ) , which could not be attributed to any other cause ; hence , it was stopped followed by a rebound in hypertension ( arterial bp 190/136 mmhg ) . he was not dialysis dependent at this stage with a good urine output , and serum creatinine had fallen to 0.6 mg / dl , and his hemodialysis catheter was removed . aliskiren ( 2 mg / kg / dose ) was then added with good response ( arterial bp decreased to 150/110 mmhg over 24 h , and 138/110 mmhg over 48 h ) ; however , it was withdrawn after 4 days due to hyperkalemia ( serum potassium 6.5 mmol / l ) . after improvement of neutopenia , oral enalapril was reintroduced along with telmisartan . bp control improved within 48 h on this angiotensin - converting enzyme inhibitor - angiotensin receptor blockade ( acei - arb ) combination ( bp < 90 percentile for age ) along with other oral antihypertensive agents . he received seven daily plasmapheresis sessions till active hemolysis subsided followed by alternate day sessions . he received prednisolone ( 1 mg / kg / day ) and iv immunoglobulin 2 g / day ( day 10 of admission ) , iv cyclophosphamide during his hospital stay . he received total six doses of iv cyclophosphamide followed by maintenance azathioprine . at his 1 year of follow - up , he is doing well , normotensive ( bp < 90 percentile for age ) and no proteinuria ( urine protein / creatinine ratio < 0.2 ) and a normal urine examination on angiotensin - converting enzyme ( ace ) and arb combination . a 7-month - old male was admitted with a history of vomiting , fever since 5 days and anuria since 2 days . there was no history of diarrhea or dysentery in the past . on admission , he was hypertensive ( bp 130/60 mmhg ) , with pallor , facial puffiness , and normal systemic examination . investigations were suggestive of atypical hus - microangiopathic hemolytic anemia with aki ( hb 7 g / dl , wbc 13,280/mm , platelets 100,000/mm , peripheral smear : schistocytes , elliptocytes , reticulocyte count 5.6% , ldh 4300 u / l ) and active urine sediment ( rbc 1020/hpf , albumin 2 + ) . the child had no evidence of pneumonia or sepsis , on clinical evaluation , and all cultures were sterile . the child had advanced azotemia ( urea 216 mg / dl , creatinine 7.2 mg / dl ) with severe hyperkalemia and metabolic acidosis . for hus with evidence of ongoing hemolysis and dialysis dependence , he was started on daily plasmapheresis . a detailed complement assay ( including c3 , c4 , antigenic levels of factor h , factor i , factor b , cd46 , and autoantibodies to factor h ) were normal [ table 1 ] . bp was observed to be high since admission and increased up to 160/110 mmhg on serial monitoring , although patient remained asymptomatic . echocardiography and fundus were normal . for his arterial hypertension ( > 99 centile for age ) [ figure 2 ] , he was started on amlodipine and prazosin initially ; dosage was increased to the maximal dose , and clonidine and oral enalapril were added on the 3 and 4 day of admission , respectively . despite the addition of multiple antihypertensive agents and dosage optimization and aggressive ultrafiltration in hemodialysis sessions , arterial bp showed only marginal decrease and mean arterial bp remained high ( > 99 centile for age ; 100110 mmhg ) . intravenous enalaprilat ( 10 g / kg / dose q 8 hourly ) was added on day 6 of admission . the mean arterial bp improved ( 80 mmhg ) within 12 h of addition of enalaprilat . once arterial bp was controlled with no further increase , dose of oral enalapril was maximized , while tapering off iv enalaprilat and other antihypertensive medications were continued . response to antihypertensive medications in case 2 daily hemolytic parameters and renal function were monitored . he was discharged in 2 weeks time in a stable condition , with adequate urine output and bp controlled on oral antihypertensive therapy . at a follow - up of 1 year , currently the infant is doing well , with serum creatinine 0.7 mg / dl , urine protein / creatinine ratio 1.5 , and normal bp ( bp < 90 percentile for age ) on oral enalapril 0.4 mg / kg / day . the extent of renal microangiopathic involvement appears to be responsible for the development of hypertension and renal failure in atypical hus . renal ischemia is triggered which leads to maximal activation of ras resulting in accelerated hypertension which is often very severe and resistant to antihypertensive therapy . local ras activation is believed to be an important key factor in the thrombotic microangiopathy in hus leading to intractable hypertension . this has been demonstrated to occur via enhanced tissue factor expression on glomerular endothelial cells which is enhanced by angiotension ii . on the other hand two studies have shown elevated plasma renin levels in children with hus , irrespective of systemic hypertension . severe hypertension that ensues is a clinician 's nightmare as it is hard to control despite use of multiple drug combinations and careful drug titration . in extreme cases , bilateral nephrectomy is ultimately required as a life - saving measure for achieving control of bp , thus again underlining the pivotal role of hyperreninemia in the development of hypertension in hus . although there is plenty of evidence in favor of renin - mediated mechanism in the pathogenesis of hypertension in hus , in practice there is hesitation to use ras inhibitors or their combinations in the acute stage for hypertension . this is due to fear of worsening of renal failure and lack of pediatric experience with newer ras inhibitors . oral aceis ( enalapril , captopril ) have been shown to be renoprotective and of benefit for long - term bp control and reduction in proteinuria in patients with persistent disease . however , they are not preferred in the acute stage of disease , used with great caution and generally initiated after improvement of renal function . moreover , aceis including enalaprilat are seldom used in hypertensive emergencies due to concerns regarding slow onset of action and variable effectiveness , especially in children . we used iv enalaprilat in both patients for acute hypertension who showed significant and consistent decline in bp . it helped in successful reversal of hypertensive urgency in the older child whose bp remained high and was extremely difficult to control despite multiple drugs including iv nitroglycerine and labetolol . the onset of action begins in 15 min , but the peak effect may take 14 h. the duration of action is usually 46 h. the half life of the drug is usually 11.1 h in infants and children . there is one case series reporting the neonatal use of enalaprilat that reported that doses even at the lower end of what was used in this cohort may lead to significant , prolonged hypotension and oliguric acute renal failure . if it is used in the newborn or children , it should be used with caution with ongoing monitoring of bp , serum potassium , and renal function . the adverse effects of enalaprilat are hyperkalemia , hypotension , cough , diarrhea , angioedema , and leucopenia ( agranulocytosis ) . we encountered neutropenia in one of our patients a week after initiation of iv enalaprilat which reversed quickly on drug withdrawal . however , severe rebound hypertension also occurred on stopping enalaprilat , while other drugs were continued , which reiterates its effectiveness in the management of the hypertensive urgency . studied the bp response to iv enalaprilat in 35 patients with hypertensive crisis , and found that the extent of systolic and diastolic bp reduction correlated well with the pretreatment plasma renin and angiotensin ii levels . thus , it appears that the status of ras determines the efficacy of iv enalaprilat , and hence it was successful in both our patients with hus - induced hypertension . therapeutic enalaprilat levels can probably be achieved with 1/4 total cumulative dose of enalapril , administered as 6 hourly enalaprilat : recommended pediatric dosing is 510 mcg / kg / dose . we used a dose of 10 mcg / kg / dose q 8 hourly in both of our cases based on the previous report . we used aliskiren in the older child which too was very effective in lowering bp . aliskiren is the first direct renin inhibitor available in an oral form approved for adults by the us food and drug administration in 2007 . as renin is the first and rate - limiting step in angiotensin ii synthesis in ras , its direct inhibition is theoretically more advantageous as compared to ace inhibition / arb . as it is a nonpeptide molecule , it has better bioavailability and a long half - life and can , therefore , lower bp effectively . once administered orally , the effect of the drug peaks in 13 h , achieves its steady state in 57 days and has a half - life of 40 h. aliskiren produces dose - dependent bp reduction and its potency has been shown to be equivalent or better than aceis , arbs in various trials . moreover , when administered as a combination with acei , it helps to block an increase in plasma renin activity induced by acei monotherapy . combination therapy ( aliskiren + acei ) has been demonstrated to have greater bp - lowering potential as compared to either alone . dual ras blockade however must be cautiously monitored as there is a higher chance of adverse effects . a case series of children with chronic kidney disease receiving combination aliskiren / acei showed > 45% proteinuria reduction , however side effects in the form of hyperkalemia , worsening of renal function , and hypotension were seen . mild hyperkalemia was seen in our patient but was asymptomatic , and potassium normalized quickly on stopping the drug . other adverse effects including nausea , angioedema , diarrhea , abdominal pain , and headache may be seen with aliskiren but are usually mild and not dose related . adult studies have shown aliskiren to be a safe and effective antihypertensive drug ; however , its use in children has been restricted so far due to the paucity of data . a recent prospective , randomized controlled in children between ages 6 and 17 years concluded that aliskiren in once daily doses of 2 mg / kg or 6 mg / kg was well tolerated and safe . we used a dose of 2 mg / kg in case 1 . in the above two patients , we used two unconventional drugs : aliskiren and iv enalaprilat , both of which were very quick and effective in controlling high bp refractory to multiple antihypertensive medications and aggressive ultrafiltration during dialysis sessions . the limitation of the report is small number , multiple antihypertensives in these patients , and simultaneous use of plasma exchanges and immunosuppression in anti - factor h antibody positive case 1 to help resolution of illness , which might make interpretation difficult . these appear to be promising alternatives in the treatment of severe atypical hus - induced hypertension and hypertensive emergency , and there is a need to have more trials targeting renin in these cases .","<S> hypertension is common in hemolytic uremic syndrome ( hus ) and often difficult to control . </S> <S> local renin - angiotensin activation is believed to be an important part of thrombotic microangiopathy , leading to a vicious cycle of progressive renal injury and intractable hypertension . </S> <S> this has been demonstrated in vitro via enhanced tissue factor expression on glomerular endothelial cells which is enhanced by angiotensin ii . </S> <S> we report two pediatric cases of atypical hus with severe refractory malignant hypertension , in which we targeted the renin - angiotensin system by using intravenous ( iv ) enalaprilat , oral aliskiren , and oral enalapril with quick and dramatic response of blood pressure . </S> <S> both drugs , aliskiren and iv enalaprilat , were effective in controlling hypertension refractory to multiple antihypertensive medications . </S> <S> these appear to be promising alternatives in the treatment of severe atypical hus - induced hypertension and hypertensive emergency . </S>"
2,"the contributing causes of dn pathogenesis and progression are still poorly understood but chronic hyperglycemia and high blood pressure represent the main risk factors for disease onset . \n , high systemic blood pressure usually determines an increase of the intraglomerular pressure and glomerular filtration rate ( gfr ) which results in glomerular hyperfiltration . from the biochemical point of view , hyperglycemia per se sustains the accumulation of advanced glycation end products ( ages ) , altering the electronegativity of the cell ; additionally ages bind proteins of the extracellular matrix ( ecm ) inhibiting their degradation . ages accumulation can induce an increased production of reactive oxygen species ( ros ) and a transcriptional activation of different proinflammatory and profibrotic molecules , including tgf - beta [ 2 , 3 ] . the high glucose - mediated induction of tgf - beta and the central role of this growth factor in dn progression represent the few defining constants in the pathogenesis of dn . \n the earliest clinical signs of dn include a slight but persistent urinary excretion of albumin ( microalbuminuria ) and a temporary increase of the glomerular filtration rate ( gfr ) . these clinical signs , along with the presence of hyperglycemia , are often considered sufficient indicators of dn [ 5 , 6 ] . today , extensive evidence shows that dn is not the only type of renal damage that can be found in diabetic patients [ 7 , 8 ] and kidney biopsy , although highly invasive , remains the diagnostic gold standard . the histological hallmarks of dn include hyperproliferation of the mesangial cells , thickening of the glomerular basement membrane ( gbm ) , podocyte effacement , tubulointerstitial fibrosis , and nodular accumulations of ecm ( kimmelstiel - wilson lesions ) in the glomerulus . given the high prevalence of type 2 diabetes ( t2d ) and the diagnostic limitations currently associated with kidney biopsy , there is an impending need for new , accurate , and easily accessible biomarkers of disease . in this review we will try to outline a system biology overview on dn by recapitulating the main annotations obtained at different levels of molecular investigation . only those studies investigating human samples will be described ; the murine models of dn in fact , although undergoing albuminuria , mesangial expansion , and podocyte loss , do not develop severe glomerulosclerosis and tubulointerstitial fibrosis . also , as substantial differences exist in the etiology and prevalence of type 1 and type 2 dn , the articles discussed in this paper apply to dn secondary to type 2 diabetes ( t2dn ) . as an exception , works describing biomarkers of kidney damage in t1d that have been further validated in t2 dm and vice versa and those reporting potential prognostic biomarkers , because of their particular importance in predicting the progression of renal damage , have been also discussed in the present work . all the annotations discussed in this review are also listed in tables 1 , 2 , 3 , 4 , and 5 , categorized according to whether they summarize the genetic and transcriptomic signature of coding or noncoding rna molecules and the epigenetic proteomic and metabolomic markers , respectively . genetic variation is present under different forms in the human genome , ranging from single nucleotide polymorphisms ( snps ) to large , structural , chromosomal rearrangements . today we know that genetic variation infers disease susceptibility and collective effort aims at identifying the precise loci for dn susceptibility . different methodological strategies can be used to characterize the genetic risk for a disease , either targeted or genome - wide , according to whether a priori hypothesis of the candidate regions for disease susceptibility exists . in genome - wide association studies ( gwas ) , for instance , the whole genome is screened for new , previously uncharacterized single nucleotide polymorphisms ( snps ) . prior to the development of the modern high - throughput technologies such as chip - based microarray analysis and next - generation sequencing , the inheritance of disease susceptibility was investigated through genetic linkage in families . basically , individuals within the same families were sequenced for a collection of genetic snps in order to identify those snps segregating with the disease . this approach led to the identification of many variants responsible for disease susceptibility but it proved mostly suitable for the study of single gene disorders . for complex , common complications like t2d in fact , progression is very likely driven by multiple alleles simultaneously , each having a small correlation to disease progression if inherited individually . this implies that a big population needs to be genotyped in order to detect the common variants responsible for the increased genetic risk . in the field of dn , there is extensive evidence for genetic contribution to disease susceptibility . in 1989 , seaquist et al . showed that diabetic siblings of patients with dn were more at risk for developing dn compared to diabetic siblings of diabetic patients without proteinuria ; epidemiologic studies also indicate that the prevalence of dn varies among ethnic groups . these observations , along with the consideration that only a subset of patients with diabetes develops dn , drove the search for the genetic determinants of dn susceptibility . one of the most consistent annotations in the field is probably the genetic variation on chromosome 18 . in 2002 , a family - based linkage analysis performed in t2dn turkish families and affected sibling pairs of pima indians reported a strong evidence for the localization of a dn susceptibility locus mapping to chromosome 18q22.3 - 23 . researchers were not able to pinpoint the precise susceptibility gene but the same locus was also detected in a t2dn african american population . later studies on chromosome 18 led to the identification of a susceptibility marker within the carnosine dipeptidase 1 ( cndp1 ) gene , and it was also described how the shortest allelic form of the cndp1 gene was more common in the absence of nephropathy . the cndp1 gene encodes the secreted enzyme serum carnosinase that degrades carnosine , a protein controlling the formation of age molecules . as previously discussed similar results were obtained in a meta - analysis study when investigating a multiethnic population with t2d - esrd ; a recently published meta - analysis confirmed the association of the carnosinase d18s880 microsatellite polymorphism with dn susceptibility in a t2d caucasian population although no significant association with t1dn could be found . in a very recent candidate - gene driven study , palmer et al . performed a genotyping of several snps across 22 dn candidate genes in a large cohort of african americans with t2d and esrd . after adjustment for the apol1 g1/g2 alleles , known to be associated with nondiabetic esrd in this population , the most significant signals were observed downstream of the cndp1 gene , at chimerin 2 ( chn2 ) locus and within angiotensin ii receptor type 1 ( agtr1 ) gene . in another work , to investigate the impact of oxidative stress on disease initiation , the polymorphic variants of 7 genes involved in the antioxidant defense were evaluated : sod2 , p22 phox , cat , mpo , gstp1 , gstt1 , and gstm1 . despite the commonly recognized link between oxidative stress and diabetes , authors claim that no association could be found in caucasian t2d patients . in one of the first dn genome - wide genotyping studies , authors reported the engulfment and cell motility 1 ( elmo1 ) gene on chromosome 7p as a likely candidate for disease susceptibility in a japanese patients cohort with t2d . in a cellular system engineered to overexpress elmo1 , they furthermore observed increased expression of extracellular matrix ( ecm ) protein genes and decreased expression of matrix metalloproteinases . finally , recent data from a meta - analysis study suggests the elmo1 association with dn exclusively in the t2d asian subgroup . in a population of pima indians with t2d , the gwas of over 100,000 snps led to the identification of several loci with significant association for esrd susceptibility , with the strongest signal located in the intronic region of the of pvt1 gene . some of these findings were also replicated in an ethnically different population with t1d . in a gwas performed on a large cohort of african americans with t2d and esrd , five gene regions with evidence of association with dn were detected , nominally , sash1 , rps12 , auh , msrb3-hmga2 , and limk2-sfi1 . some of these snps however were later found to contribute to all - cause esrd . in order to establish a comprehensive , well - defined dna biobank for the genotyping of dn in t1d in particular , the first results of this genome - wide scan were reported by pezzolesi et al . in 2009 . authors claimed that although no snp achieved genome - wide significance , strong association was found near the 4.1 protein ezrin , radixin , and moesin [ ferm ] domain containing 3 ( frmd3 ) locus and near the cysteinyl - trna synthetase ( cars ) locus . further studies confirmed the 9q21.32 region ( upstream of frmd3 ) as a susceptibility locus for t2dn in several unrelated study populations [ 20 , 21 ] . despite all the effort currently invested into this field of research , at present it is still impossible to predict those diabetic patients with a higher risk for developing dn . indeed , in almost all the studies published so far on dn susceptibility , diagnosis was based almost exclusively on the presence of hyperglycemia and proteinuria ; therefore , it is not possible to exclude that the inconsistencies among the findings could be linked to a misclassification of the renal damage in the diabetic population . the transcriptome represents the part of genome that is transcribed and includes both coding and noncoding rna molecules . when studying the transcriptome , as for genetic studies , either targeted or genome - wide approaches can be used . rna - sequencing ( rna - seq ) , arrays , and quantitative pcr ( qpcr ) are the techniques employed routinely to assess rna expression . qpcr is very sensitive and even subtle changes can be detected precisely ; arrays on the other hand are very high - throughput but also less sensitive . rna - seq takes advantage of the recent next - generation sequencing platforms and it has rapidly become the method of choice for transcriptome profiling . the main advantages of rna - seq are its very high resolution ( down to a single nucleotide ) , its potential to detect novel transcripts , its ability to measure either primary transcripts or spliced mature mrnas . given the plethora of gene expression data available in the literature , only the research on dn kidney tissue or urine will be discussed . all the coding and noncoding rna markers cited in this paper the first transcriptomic signature of dn kidney was published in 2004 . using an array - based approach , baelde et al . the results of this genome - wide analysis indicated that 96 genes were upregulated in t2dn , including aquaporin 1 ( aqp1 ) , calpain 3 ( capn3 ) , hyaluronoglucosidase , and platelet / endothelial cell adhesion molecule ( pecam-1 ) . over 500 genes were downregulated , including bone morphogenetic protein 2 ( bmp2 ) , vascular endothelial growth factor ( vegf ) , fibroblast growth factor 1 ( igf-1 ) , insulin - like growth factor binding protein 2 ( igfbp-2 ) , and nephrin . in the same manuscript , authors confirmed reduced expression of vegf and nephrin in renal biopsy specimens from additional dn patients at both the protein and rna levels . to explain the existing inconsistencies between human and murine progressive dn , microdissected biopsies from controls , early and progressive t2dn patients underwent global gene expression profiling through microarray hybridization . preliminary results , later confirmed using qpcr , revealed an upregulation of jak-2 and a compromised expression of several members within the jak / stat signaling pathway which could not be detected in either db / db c57blks or diabetic stz - treated dba/2j mice . more recently , woroniecka et al . performed the transcriptome analysis on microdissected kidney biopsies from dn patients , healthy living transplant donors , and patients undergoing tumor nephrectomies ( analyzing the histologically normal kidney tissue ) . the microarray - derived expression profiles indicated that several podocyte - specific transcripts were downregulated , including plce1 , ptgds , nphs1 , nphs2 , synpo , pla2r1 , wt1 , clic5 , and podxl . glomerular transcripts showing upregulation included igh , c3 , col1a2 , cxcl6 , and col6a3 . in the tubular compartment instead , authors detected increased expression of different transcripts including igh , igl , col1a2 , and col3a1 . several reports analyzed the gene expression of both the glomerular and tubular compartments of t2dn kidney biopsies . among the mrna transcripts detected as enriched in the glomerular compartment of t2dn individuals are mrp8 , wnt1 , wnt2b , wnt4 , wnt6 , wnt16 , dkk3 , and lef1 , pkc , fsp1 , angptl2 , and ace . decreased expression for ace2 , vegf [ 64 , 106 ] , ctgf , nephrin , podocin , and wt1 tubule - rich renal biopsies from patients with t2dn , ihg-1 , il6 , ccl2 cd68 , and ccr5 were increased , while tlr4 was overexpressed in both glomeruli and tubules of microalbuminuric and overt dn . using biopsy material collected by the european renal cdna bank , the gene expression of tubulointerstitial mrna from human dn kidneys was compared to that of living donors , cadaveric donors , and patients with minimal change disease through a combined microarray profiling and qpcr validation approach . results indicated dysregulation of specific nf-b targets , highlighting the existence of an inflammatory signature characteristic of progressive dn . eight genes in particular were induced in t1dn and t2dn relative to controls : ccl5/rantes , cxcl10/ip10 , edn1 , vcam1 , hla - a , hla - b , ifnb1 , and b2 m . further work performed using the european renal cdna bank material highlighted additional mrna transcripts as dysregulated in t2dn kidney when compared to normal tissue . within the glomerular compartment in particular , nrp1 and nrp2 were significantly lower in t2dn , while smpdl3b was increased . within the tubulointerstitial compartment , upregulation of mmp7 and fgf-2 , of the unfolded protein response genes hspa5 , hyou1 , and xbp1 and of the apoptosis - related genes trail and opg , the expression of several transcripts was assessed on whole t2dn kidney tissue . upregulated mrnas included hdac2 , hdac4 , and hdac5 , b7 - 1 , stat1 , tnfaip8 and tipe2 , prkc - beta , vegf , uii and ut , pdgf - a and pdgf - b , lox1 , ldlr , and cd36 , jagged1/hes1 , and gremlin [ 45 , 46 ] . decreased transcription was detected for autophagy - related genes beclin 1 , lc3 [ 33 , 34 ] and atg7 , cxcl16 , abca1 , abcg1 , and apoe , timp3 , foxo1 and foxo3a , atg5 , and atg8 , ankrd56 and entpd8 , and nephrin . in other works the study design was developed to compare t2dn with other glomerulopathies . using a qpcr based approach , the tubulointerstitial compartment isolated from kidney biopsies of both dn patients , living donors , and minimal change disease patients was profiled specifically for the expression of 202 candidate genes involved in molecular pathways contributing to dn progression . results showed a decreased expression of vegf and egf , while collagens i and iv , fibronectin 1 , and vimentin as well as matrix metalloproteinases 2 , 7 , and 14 and tissue inhibitor of metalloproteinases 1 and 3 were increased . in another study , increased irs2 mrna was detected in dn patients compared to controls , while no significant changes irs2 expression were present in biopsies from patients with focal - segmental glomerulosclerosis or membranous nephropathy . low expression of robo2 mrna was present in dn compared to nephrosclerosis , focal - segmental glomerulosclerosis , membranous nephropathy , and control pretransplant biopsies . a strong specific induction of col8a1 and col8a2 mrnas expression was found in both glomerular and tubular compartments of biopsies from patients with t2dn versus control pretransplant biopsies , benign nephrosclerosis , and focal - segmental glomerulosclerosis . finally , increased ace expression was observed in t2dn biopsies compared to benign nephrosclerosis , minimal change nephrotic syndrome , and lupus nephritis . aiming to develop a diagnostic tool for early dn diagnosis , zheng et al . designed a pcr - array platform to detect expression changes in 88 genes simultaneously and employed it in a pilot study where the urinary sediment of dn patients was assayed . authors found that several mrnas were significantly increased in dn compared to healthy controls , in particular , notch3 , actn4 , cdh2 , ace , fat1 , col4a1 , synpo , and twist1 . increased mrna levels of podocalyxin , cd2-ap , nephrin , wt-1 , -actinin 4 podocin , and synaptopodin [ 28 , 29 ] were found in the dn group compared with controls . finally , in another work , authors claim that urinary expression of nephrin and podocin was useful for distinguishing diagnostic groups ( iga nephropathy , minimal change disease , and membranous nephropathy ) as well as predicting renal function decline . until a few years ago , the molecular profiling of dn was mainly focused on the characterization of mrna transcripts . over the last decade however , much interest has converged toward the profiling of noncoding rna ( ncrna ) molecules . the ability of ncrnas to modulate gene expression along with the discovery that they can be detected in biofluids and are fairly stable makes them ideal biomarker candidates . micrornas ( mirnas ) are probably the most studied ncrnas ; they are short , single - stranded , highly conserved , and tissue - specific . the partial match binding feature allows mirnas to bind hundreds of targets simultaneously ; accordingly the dysregulation of even one single mirna molecule can profoundly influence the gene expression profile of the surrounding environment . for a complete review on mirnas biogenesis and function refer to [ 107 , 108 ] . in the field of dn , the majority of mirna 's profiling studies was performed on cellular and animal models . more recently , with the surprising discovery that mirnas can be released and carried into the extracellular environment , different body fluids are being characterized in their mirna 's content . initially identified in a mice model of dn , mir-192 , along with mir-377 , mir-337 , and mir-129 , was later discovered as being enriched in human mesangial cells ( mcs ) exposed to high glucose . interestingly , when assessing mir-192 in human dn kidney , expression levels not only are reduced but also inversely correlate with severity of kidney disease , raising once again the issue about the appropriateness of the currently available animal models for dn . mir-21 has recently emerged as a marker for fibrosis in many complications [ 110 , 111 ] ; unsurprisingly , increased mir-21 expression was also detected in human t2dn kidney biopsies relative to healthy controls . except for the previously mentioned dn kidney profiling from krupa et al . , the array - based mirnome analysis of t2dn kidneys was recently published by huang et al . and uncovered mir-155 and mir-146a enrichment in these samples . these two are the only works describing the mirnome of human dn kidney ; noteworthy , the existence of strict renal biopsy policies in most nephrology clinics might be a limiting factor in terms of sample collection and availability . in parallel , the urgent need for novel biomarkers of diagnosis and progression shifted priority to the profiling of more accessible samples , such as biological fluids . using a qpcr based approach , argyropoulos et al . were the first to perform the urinary mirna profiling of t1d patients with and without proteinuria . results showed that mir-323b-5p , mir-221 - 3p , mir-524 - 5p , and mir-188 - 3p were underexpressed in albuminuric relative to nonalbuminuric patients , while mir-214 - 3p , mir-92b-5p , hsa - mir-765 , hsa - mir-429 , mir-373 - 5p , mir-1913 , and mir-638 were overexpressed . in a similar study performed on the rna content of urinary exosomes , authors showed that mir-130a and mir-145 were enriched in t1d patients with microalbuminuria compared to normoalbuminuric subjects , while mir-155 and mir-424 were reduced . in a work aimed to determine the urinary levels of all mir-29 family members ( mir-29a , mir-29b , and mir-29c ) , mir-29a was significantly increased in albuminuric t2dn patients compared to normoalbuminuric patients and it also correlated with the degree of albuminuria . in the work from szeto et al . , when comparing the urinary sediment of patients with either iga nephropathy , dn , or hypertensive nephrosclerosis , mir-15 was decreased in dn samples compared to other groups . similarly , in another work authors found that mir-192 levels were reduced in urinary sediment of dn patients compared to both healthy controls and patients with either minimal change nephropathy , focal glomerulosclerosis , membranous nephropathy , or other diagnosis groups . mirnas expression was also measured in venous blood from t2d han chinese patients with and without albuminuria . using a microarray - based approach , authors identified several differentially expressed mirnas in the different study population and confirmed mirna let-7a downregulation using qpcr . very interestingly , authors also observed how the distribution of a specific variant within let-7a ( rs1143770 ) was significantly higher in diabetic patients ( with and without albuminuria ) relative to control subjects . finally , dysregulation of a new class of noncoding rna molecules has emerged as being potentially involved in different complications , including kidney disease . among these noncoding rna molecules , recent effort aims to characterize the so - called long noncoding rnas ( lncrnas ) . this led to the initial assumption that lncrnas were not biologically relevant . today we know that lncrnas contain individual domains and structural motifs that allow them to specifically associate with dna , rna , and/or protein and thus regulate their function . as previously discussed , multiple experimental evidence , from different ethnic populations , suggested a link between diabetic kidney disease and genetic variants within the pvt1 locus [ 22 , 23 ] . pvt1 , whose increase is significant in mesangial cells stimulated with high glucose , can induce the expression of plasminogen activator inhibitor 1 ( pai-1 ) and transforming growth factor beta 1 ( tgf-1 ) . noteworthy , six different mirnas are encoded within the pvt1 gene ; therefore , authors investigated whether an alteration in pai-1 and tgf-1 gene expression was ascribable to the pvt1 lncrna transcript itself or whether it was the result of a mutation within the mirnas encoded in the pvt1 gene . results showed that both pvt1 lncrna and mir-1207 - 5p were induced by high glucose independently and they both contributed to ecm accumulation in the kidney . the term epigenetics refers to all those dynamic structural changes that , while not resulting from an alteration in the dna sequence , affect gene expression and can be inherited . epigenetic modifications , such as dna methylation , histone methylation , and histone acetylation , modify the accessibility of the chromatin and thus modulate transcription . they are responsible for the phenotypic differences within cell types and explain why the gene expression profile of an organism can change so profoundly during development . unlike genetics , epigenetics is highly susceptible to influences from the environment ; therefore , the understanding of its regulatory machinery offers an incredible opportunity for disease management . the study of epigenetics in diabetic kidney disease is still in its embryonic phase although increasing evidence indicates metabolic memory as a consequence of long - lasting epigenetic modifications contributing to dn progression . in 2007 geisel et al . analyzed the promoter methylation of the stress response protein p66shc , previously shown to increase susceptibility to oxidative stress and atherosclerosis . in peripheral blood mononuclear cells isolated from esrd patients and control subjects , authors demonstrated that increased p66shc expression in esrd group was linked to a significant reduction in the methylation of its promoter region . using an array based approach , the genome - wide promoter dna methylation of 192 t1d patients was analyzed searching for any possible association with dn . the analysis was conducted using dna extracted from peripheral blood cells as these include the t cell population responsible for islet beta cells destruction in t1d . importantly , among the several cpg islands showing correlation with dn development , results uncovered one in particular ( rs10081672 ) , located upstream of the unc13b gene . additionally , this region is in strong linkage disequilibrium with rs13293564 , a variant associated with dn susceptibility . importantly , depending on which allele is present in rs10081672 , a cpg site is either created or abrogated , thereby affecting transcription factor binding . in another work , the genome - wide dna methylation of diabetic patients with esrd and diabetic patients without nephropathy was compared with the aim to identify novel disease biomarkers for noninvasive diagnosis . patients ' saliva was employed as starting material for dna extraction while the study population included african americans and hispanic individuals . results highlighted differential methylation at two or more cpg sites in 187 genes between the two groups . interestingly , many of these genes are involved in inflammation , oxidative stress , ubiquitination , fibrosis , and drug metabolism , and some in particular are even known for their genetic association with dn , suggesting once again a very close connection between genetic dysregulation and epigenetic dysregulation in the pathogenesis of dn . a recent paper from hasegawa et al . demonstrated that sirt1 , a protein deacetylase that targets histones and transcription factors , is reduced in stz - treated mice . using a transgenic mouse model authors also elucidated the interaction between sirt1 expression and cpg methylation of cldn1 , a gene encoding for the protein claudin-1 . claudin-1 is a tight junction protein involved in cell - to - cell adhesion and authors suggest that its epigenetic - mediated induction is responsible for podocyte effacement and proteinuria . in support of this hypothesis authors also revealed the correlation between proteinuria and sirt1 expression in human dn kidney . finally , reddy et al . elegantly demonstrated the link between the protective effect of angiotensin ii receptor antagonist , losartan , and its ability to reverse specific epigenetic modifications in the glomeruli of diabetic db / db mice . all these experimental evidences show that epigenetics holds the potential to allow a temporary and reversible manipulation of the gene expression , conferring protection from disease progression . the proteome probably represents the most complete expression of the potentialities of a living organism since it focuses on the set of proteins , expressed by the genome , that regulate biological and metabolic cell function . the proteomics , formally defined as the massive and mass spectrometric - based analysis of the proteome , is a complex and interdisciplinary matter requiring expertise spanning from chemistry to biology and bioinformatics , in order to reveal the meaning of complex protein datasets of a biological sample in physiological and pathological conditions . unlike genomics studies , based on the analysis of biological samples that may be expanded artificially making complex studies from little starting material possible , proteomics requires a larger amount of starting sample that can be easily available in biological fluids rather than in the tissues or cells . for this reason , proteomic studies in nephrology are more oriented to the analysis of biological fluids and have led , in the last decade , to the identification of a number of putative biomarkers that are expected to enter shortly into the clinical practice . in the next paragraphs we will discuss the main application of proteomics to the identification of new potential biomarkers of dn in kidney tissues and biological fluids with a special emphasis on the new emerging potentialities of the post - translational modifications ( ptms ) screenings . glomerular damage plays a critical role in the onset of dn making this renal compartment a key target for proteomic investigation . however , only few proteomic studies have been carried out on isolated glomeruli since , in general , renal biopsy is rarely carried out on diabetics patients and the number of isolated glomeruli , when starting form biopsy material , is too scarce to produce homogeneous preparations of individual specimens and to extract adequate glomerular protein amounts for deep proteomic studies . recent methodological improvements have now permitted the extraction of intact and unmodified proteins from formalin fixed paraffin embedded ( ffpe ) samples thus making available the use of vast archive of kidney tissues for proteomic analysis . proteomic analysis of isolated glomeruli , obtained by laser capture microdissection ( lcm ) , allowed the identification of over 100 differentially expressed tissue proteins between dn and nondiabetic glomeruli . notably , the results of this study probably underestimates the differences of the glomerular proteome since it was carried out on ffpe tissues derived from autopsy cases undergoing postmortem proteolysis . however , among differently expressed proteins , nephronectin , a protein implicated in the assembly of extracellular matrix and nephrogenesis , was confirmed as differently expressed in dn tissue specimens using immunohistochemistry . a similar study reported increased expression of c3 and the membrane attack complex ( c5b-9 ) and a marked reduction of podocyte - associated proteins and antioxidant proteins in dn . even if these proof of concept studies demonstrate the usefulness of ffpe tissue proteomics , the potentialities of this approach are still prevented by the poor availability of tissue specimens that limits the identification of the key molecular events involved in the onset and progression of dn . biofluids encompass any liquid originating from inside the bodies of living organism . among the body fluids proteomics colleagues reported , in urine of t1d patients with dn , a panel of 65 urine biomarkers , mainly composed of collagen fragments , that was further validated in a multicentre independent cohort of t2 dm patients [ 124 , 125 ] . expanded the 65 peptides classifier to 273 and demonstrated its ability to predict the occurrence of the microalbuminuria in t1d and t2 dm normoalbuminuric patients [ 126 , 127 ] . these data were recently confirmed in another independent study that specifically identified subsets of urine biomarkers able to predict to the transition from normo- to microalbuminuria or from micro- to macroalbuminuria indicating that the appearance of collagen fragments in urine of t2 dm patients may have both diagnostic and prognostic values . lc / ms / ms analysis of 22 t1d normoalbuminuric patients developing microalbuminuria after 6 years median follow - up allowed identifying a set of potential predictive biomarkers that were further validated by elisa assay . of note , the introduction of these proteomic biomarkers ( thp , progranulin , alpha-1-glycoprotein , and clusterin ) into the baseline model that included diabetes duration , baseline albumin excretion rate ( aer ) , hba1c , cystatin c , and uric acid improved the prediction of renal function worsening from 84% to 89% . jin et al . used isobaric tags for relative and absolute quantification ( itraq ) and lc / ms / ms to quantify and identify a set of urinary proteins differentially excreted between normoalbuminuric and microalbuminuric t2 dm patients . three protein biomarkers , namely , alpha-1-antitrypsin , alpha-1-acid glycoprotein 1 , and prostate stem cell antigen , were included in a multiplex assay that was able to correctly classify normoalbuminuric and microalbuminuric t2 dm patients with about 92% accuracy . two mass peaks corresponding to b2-microglobulin and ubiquitin ribosomal fusion protein that were selectively and differently excreted in nephropathic diabetic patients . we further refined this study by selecting only diabetic patients with biopsy - proven kimmelstiel - wilson lesions and identifying both urinary b2-microglobulin and free ubiquitin as specific biomarkers of diabetic glomerulosclerosis over other nondiabetic kidney lesions . although the overall analysis of the urine proteome is up to now the most used way to search for disease - specific biomarkers , the future of this matter will be the analysis of well - purified proteins subfractions since it may provide more detailed information about simplified proteomes and potentially improve the knowledge of specific pathways . until few years ago , the most useful way to reduce the proteome complexity was the selective antibody - based depletion of the most abundant proteins . in the last few years , the enrichment of post - translationally modified proteins has begun a new strategy to highlight functionally interesting proteins . protein phosphorylation is a key player in the regulation of most cell pathways ; thus , phosphoproteome screening of urine samples may represent a precious source of information about deregulated cell processes in many kidney diseases including dn . however , up to now , urine phosphoproteome analysis has not been applied yet to soluble proteins in dn and other ckd probably because most of the historical collections of urine samples have not been prepared and stored in presence of phosphatases inhibitors that , preventing the liability of this ptms , may ensure more reproducible results . on the contrary , the analysis of the microvesicular fraction ( i.e. , exosomes ) that originates from renal epithelial cells and are released into urine may be , at the moment , more useful to study this kind of ptm as the presence of the exosomes ' membrane may preserve ptms by protecting their protein content from spontaneous degradation and dephosphorylation by proteases or phosphatases , respectively . have already published the first proteomic study on urine exosomes of dn patients demonstrating the potentiality of this microvesicular screening for identifying dn specific biomarkers . specifically , 3 over the 25 most significant differently expressed proteins , namely , voltage - dependent anion - selective channel protein 1 ( vdac1 ) , isoform 1 of histone - lysine n - methyltransferase mll3 , and alpha-1-microglobulin / bikunin precursor ( amb ) , were also validated . of note , mll3 , a specific tag for epigenetic transcriptional activation , was detected only in dn exosomes , thus emphasizing the potential importance of epigenetic mechanisms in the pathophysiology of dn . it is reasonable to think about the forthcoming application of the exosomes ' phosphoproteomics as a new way to identify specific deregulated patterns in kidney diseases . as for phosphoproteomics also , only one paper has applied this approach to the study of ckd identifying a number of urinary proteins involved in immune / stress response and many biological functions like homeostasis , platelet degranulation and coagulation , transport , and secretion . due to the importance of the glycoproteomics in cell - cell interaction and signalling cascades , it is reasonable that many further studies will be planned in the next year to understand , by screening this specific subset of proteins , the molecular mechanisms involved in damage progression of specific nephropathies including dn . interestingly , the usefulness of the glycoproteomics for the diagnosis of dn has been recently reported in plasma where thirteen significantly upregulated glycoproteins were described in dn patients compared to t2 dm patients without nephropathy . among these , increased plasma levels of glycated lumican , vasorin , and retinol binding protein-4 were validated by immunoblotting and showed potential specificity for dn . by using a different proteomic strategy , kim and coworkers reported that increased plasma levels of glycated pedf , apolipoprotein j precursor , hemopexin , immunoglobulin mu heavy chain , and immunoglobulin kappa chain correlated with poor glycaemic control in t2 dm patients while glycated prekallikrein and complement factor c4b3 correlated with microalbuminuria and other glycated proteins such as hemopexin precursor , serine proteinase inhibitor , alpha-1-antitrypsin , and haptoglobin - related protein were associated with dn . these studies confirmed the potentiality of the plasma glycoproteome for the identification of reliable biomarkers of dn and their importance is emphasized by the consideration that the overall analysis of serum / plasma proteome is challenging because the candidate biomarkers are generally present in trace amounts . of note , there is an alternative way to reduce the complexity of this biological fluid , namely , the prefractionation of the samples , achieved by several known strategies before the analysis , that allow removing the large background of nonrelevant and abundant proteins and may favour the discovery of potential candidate biomarkers . up to now only few studies have used this approach to analyse the serum [ 89 , 132 ] or plasma proteome of t2 dm patients . these studies have reported extracellular glutathione peroxidase ( egpx ) and apolipoprotein ( apoe ) as potential diagnostic biomarkers of dn and vitamin d - binding protein ( dbp ) as early biomarker of renal damage in t2 dm . overall many independent studies are showing an increasing number of new biomarkers that are potentially useful for both early diagnosis and monitoring of the disease and to understand ever more deeply its pathogenesis . metabolomics is a systematic evaluation of small molecules ( i.e. , metabolites ) that may provide fundamental biochemical insights into disease pathways , drug toxicity , and gene function . metabolomics profiling is generally carried out by nuclear magnetic resonance ( nmr ) and ms - based profiling each with advantage and limitations . two main strategies may be adopted for metabolomics analysis of biological samples : targeted and untargeted profiling . the targeted profiling focuses only on sets of few metabolites generally included in specific metabolic pathways while untargeted analysis provides a comprehensive evaluation of the metabolome without any a priori hypothesis on the metabolic pathways . targeted analysis is an essential tool for the investigation of biological mechanisms rather than for biomarkers discovery ; in fact it is a quantitative approach that allows quantification of each metabolite of an interested metabolic pathway through the use of isotope - labelled standards . untargeted approach is instead more suitable for biomarker discovery since the whole metabolic profile of cases and controls may allow identification of disease - correlated biomarkers . as obvious , the latter approach needs , as for proteomics , further data analysis through supervised statistical methods in order to construct disease - specific metabolomics classifier further sequenced by mass spectrometry . in the last years , the optimization of the separation techniques has allowed the selectively purification of specific class of metabolites such as phospholipids and fatty acids , leading to the development of new more focused untargeted analysis such as phospholipidomics . as for proteomics , most of the metabolomics studies have been carried out on biofluids , namely , urine and serum / plasma . urine metabolomics may offer direct insights into biochemical pathways linked to kidney dysfunction since a variety of metabolites are concentrated by the kidney and excreted in urine . used targeted analysis to investigate the urinary excretion of 94 metabolites in healthy subjects ( hs ) and t2 dm patients with ( dm+ckd ) or without ( dm - ckd ) ckd . thirteen metabolites differently excreted between t2 dm patients and hs were also useful to differentiate dm+ckd from dm - ckd . interestingly , 5 out 13 metabolites were differently excreted between dn and other ckd , thus being specifically associated with the diabetic kidney disease while 8/13 reflected metabolic changes shared by diabetic and nondiabetic ckd . most of the less excreted metabolites in dn group were water soluble organic anions and functional analysis correlated them with impaired mitochondrial function in dn . very recently , pena and colleagues carried out an untargeted analysis of urine and plasma metabolome by gc - ms and reported the possible usefulness of a set of metabolites to predict the development of dn on top of the traditional renal risk markers , namely , baseline urinary albumin excretion and baseline estimated glomerular filtration rate . in this prospective study , 24 normo- to microalbuminuria case / controls pairs and 21 micro- to macroalbuminuria case / controls pairs were enrolled . the metabolomic profiles of micro- to macroalbuminuria case / control pairs show significant differences while normo- to microalbuminuria pairs remained unchanged . specifically they reported two plasma metabolites ( butenoylcarnitine and histidine ) and three urine metabolites ( hexose , glutamine , and tyrosine ) significantly differentially excreted in microalbuminuric patients prone to develop macroalbuminuria . the area under receiving operating characteristic ( roc ) curve arising from the integration of these urine and plasma metabolites to a reference model based on baseline egfr and urine albumin excretion passed form 84% to 99% correct prediction . although these results appear impressive , as the authors suggest , they still need to be managed with care until a validation study on larger and independent cohorts will be set up . some of the identified metabolites may have direct link with the pathopysiology of diabetes and its chronic complications since , for example , butenoylcarnitine plasma accumulation has been related to the excessive yet incomplete mitochondrial oxidation of fatty acids , possibly attributable to a lower mitochondrial number and reduced oxidation capacity in t2d tissues while histidine , a modulator of inflammation and oxidative stress , may be correlated with impaired inflammation and oxidative stress in t2 dm and ckd patients . it is worth noting that both studies stressed the importance of mitochondria dysregulation in the pathogenesis of dn . urine metabolomics has been also applied to type 1 diabetic patients in order to identify predictive biomarkers of renal function worsening . metabolite profile of baseline 24 h urine samples of 52 type 1 diabetic patients ( 26 stable normoalbuminuric and 26 progressed toward microalbuminuria in 5.5 years ' follow - up ) was carried out by lc / ms and gc - ms . multivariate logistic regression analysis of gc - ms and lc / ms dataset showed 65% and 75% predictive power after cross - validation , respectively . twenty - one and 14 compounds showed a significant contribution to the logistic regression model based on gc - ms and lc / ms dataset , respectively . most of the identified gc - ms compounds were carboxylic compounds , acidic metabolites , and endogenous amino acids not showing a documented direct relation to dn while lc - ms dataset reveals specific compounds related to impaired fatty acids metabolism , detoxification system , and gut microbiome . serum and plasma metabolomics has been carried out of both whole samples and specific subfractions . marrachelli and coworkers performed both genomic and metabolomic screening of over 1500 caucasian t2 dm patients , characterized the serum metabolome profile of the microalbuminuric patients by nuclear magnetic resonance ( nmr ) , and correlated it with specific genotypes , thus reporting a potential predictive value of the genotype on the onset of microalbuminuria in t2 dm . furthermore , hirayama et al . reported , in t2 dm patients , 19 serum metabolites including creatinine , aspartic acid , -butyrobetaine , citrulline , symmetric dimethylarginine ( sdma ) , kynurenine , azelaic acid , and galactaric acid that were positively correlated with albuminuria and negatively with egfr . multiple logistic regression , carried out on identified metabolites , recognized 4 features , namely , aspartic acid , azelaic acid , galactaric acid , and symmetric dimethylarginine ( sdma ) as relevant for the model and allowed correct identification of dn patients with about 75% accuracy . zhang et al . carried out serum metabolomic profiling of 8 dn patients , 33 type 2 diabetes mellitus ( t2 dm ) patients , and 25 healthy volunteers in order to investigate the presence of dn biomarkers . importantly , they reported significant changes of leucine , dihydrosphingosine and phytoshpingosinewere specifically in the dn cohort , thus suggesting the perturbations of amino acid metabolism and phospholipid metabolism as key events in diabetic disease . other authors have instead investigated specific subfractions of the metabolome , namely , compounds linked to purine and pyrimidine metabolism , phospholipids , and fatty acids . xia et al . standardized an analytical method for analysis and quantification of purine and pyrimidine metabolites in dn patients and matched healthy controls . according to the well - established association of the purine and pyrimidine metabolic pathway with the development of the dn , they could assess that uric acid , xanthine , and adenosine were significantly increased in dn patients ( especially in those at stage v according to mogensen classification ) while inosine is reduced probably as a result of the adenosine deaminase inhibition that catalyzes inosine formation from adenosine . several phospholipids ( pls ) , significantly upregulated or downregulated in disease models , have been already recognized as potential biomarkers of t2 dm or dn [ 138 , 139 ] . comprehensive and quantitative analysis of plasma pls , such as phosphatidylethanolamine , phosphatidylglycerol , phosphatidylcholine , phosphatidylinositol , phosphatidylserine , sphingomyelin , and lysophosphatidylcholine , may selectively distinguish t2 dm from dn patients . targeted quantification of the phospholipids revealed proportional decrease of phosphatidylinositol and linear increase of sphingomyelin in dn patients . although the molecular pathogenetic mechanisms leading to impaired metabolism of phospholipids are not clear , the authors suggest that reduced phosphatidylinositol may reflect increased sorbitol pathway activation in t2 dm while increased sphingomyelin may depend on glucocorticoids - mediated sphingolipids metabolism . also plasma fatty acids ( fas ) may have a direct impact on the occurrence and development of diabetes since their abnormal accumulation in parenchymal cells of multiple tissues , called lipotoxicity , has been suggested as a trigger of t2 dm and its chronic complications . specific metabolomics screening of fas , namely , lipidomics , may contribute to the understanding of this disease . han and colleagues reported a standardized method based on gas chromatography - mass spectrometry ( cg - ms ) useful for the specific assessment of nonesterified and esterified fatty acids ( nefas and efas , resp . ) . lipidomics screening of 150 patients including diabetics with and without nephropathy showed high discrimination power on different stage of dn . disease progression was specifically correlated with plasma levels of arachidonic acid that is involved in the anabolism of prostaglandins , thus suggesting a key role of the inflammatory processes in the progression of dn . as genetic studies conducted so far are still inconclusive , it is difficult to envisage a common genetic basis for the development of dn . quite possibly a number of environmental factors contribute significantly toward the evolution of the diabetic patient to this specific complication . however , there is no doubt that , from the earliest stages of the disease , many molecular changes , observed at the transcriptomics , proteomics , and metabolomics level , anticipate the onset of a clinical phenotype and may allow us to reconstruct in detail the pathogenetic basis of kidney damage in t2 dm . although new omics challenges such as the analysis of the protein post - translational modifications and of multiprotein complexes , mimicking what naturally happen in intracellular behavior , will further broaden our understanding of the dn pathogenesis , we are already able to identify the common thread that unites all the disparate molecular changes described in the literature by performing bioinformatic - based analysis of genes , transcripts , proteins , and metabolites described so far . we can envisage that the selection of specific omic biomarkers and clinical phenotypes might lead to a better stratification of patient 's specific type of renal damage in t2 dm and might allow the identification of patients that progress or respond to a specific therapy . to accomplish this task and go forward , however , there is an urgent need to build up disease - specific platforms containing personal , clinical , and omics profiles that will allow the full potential application of systems biology analysis and the development of specific disease phenotype models . we can expect in the next future the development of new paradigms of renal damage in t2 dm that will contribute to defining of the road to the molecular medicine as a global , organized approach applicable to dn as well as to other relevant renal conditions .","<S> diabetic nephropathy ( dn ) , a microvascular complication occurring in approximately 2040% of patients with type 2 diabetes mellitus ( t2 dm ) , is characterized by the progressive impairment of glomerular filtration and the development of kimmelstiel - wilson lesions leading to end - stage renal failure ( esrd ) . </S> <S> the causes and molecular mechanisms mediating the onset of t2 dm chronic complications are yet sketchy and it is not clear why disease progression occurs only in some patients . </S> <S> we performed a systematic analysis of the most relevant studies investigating genetic susceptibility and specific transcriptomic , epigenetic , proteomic , and metabolomic patterns in order to summarize the most significant traits associated with the disease onset and progression . </S> <S> the picture that emerges is complex and fascinating as it includes the regulation / dysregulation of numerous biological processes , converging toward the activation of inflammatory processes , oxidative stress , remodeling of cellular function and morphology , and disturbance of metabolic pathways . </S> <S> the growing interest in the characterization of protein post - translational modifications and the importance of handling large datasets using a systems biology approach are also discussed . </S>"
3,"dental caries is one of the causes of tooth loss for all human beings across age and gender . numerous studies have been carried out , which have helped to increase our knowledge of dental caries and reduce the prevalence of dental caries . however , according to the world oral health report , dental caries still remains a major dental disease . the caries process is well understood as a process of alternating demineralization and remineralization of tooth mineral ( featherstone 1999 ) . the major shortcoming of currently available anti - caries products is the fact that their ability to remineralize enamel is limited by the low concentration of calcium and phosphate ions available in saliva . this has led to the research of many new materials that can provide essential elements for remineralization . some of them are bioactive glass ( bag ) , casein phosphopeptide - amorphous calcium phosphate ( cpp - acp ) . bag is an unique material has numerous novel features ; the most important feature is its ability to act as a biomimetic mineralizer , matching the body 's own mineralizing traits . cpp - acp is also one of the novel calcium phosphate remineralization technology , that shows to promote remineralization of enamel subsurface lesions in various in - vitro and in - vivo studies . in - vitro ph - cycling technique was introduced , over 20 year ago , to study the effect of caries - preventive regimens and treatments . many of the researchers have utilized and modified this ph - cycling model to suit their own studies to test different caries - preventive agents . hence the aim of this study was to evaluate and compare the remineralization potential of bag and cpp - acp on early enamel carious lesion . the null hypothesis for the present study was that there is no significant difference in the mean microhardness values of the two groups . a total of 30 healthy human premolars , which were extracted for orthodontic purposes , were collected . teeth were sectioned 1 mm below the cemento - enamel junction with a slow speed diamond disc . the roots were discarded and the crowns were used for the study . to remove variability in samples , 200 m of surface enamel was removed from the buccal surface of all teeth with help of 600 grit abrasive paper and confirmed with digital caliper . a 4 mm 4 mm working window was marked on the buccal surfaces of all the samples . group a : bag containing dentifrice ( shy - nm ; group pharmaceuticals ; india ) and group b : cpp - acp ( gc tooth mousse ; recaldent ; gccorp ; japan ) containing dentifrice . the area of the crown other than the working window was covered with nail varnish . b - smh was checked with vicker 's microhardness testing machine ( vmt ) for all the tooth samples in the area of the working window . the indentations were made with vmt at the rate of 25 gram load for 5 seconds . the average microhardness of the specimen was determined from 3 indentations to avoid any discrepancy . the buffered de-/re - mineralizing solutions were prepared using analytical grade chemicals and deionized water . the demineralizing solution contained 2.2 mm calcium chloride , 2.2 mm sodium phosphate , and 0.05 m acetic acid ; the ph was adjusted with 1 m potassium hydroxide to 4.4 . the remineralizing solution , which contained 1.5 mm calcium chloride , 0.9 mm sodium phosphate , and 0.15 m potassium chloride , had a ph of 7.0 . samples were kept in demineralizing solution for 96 h to produce the artificial carious lesion in the enamel . d - smh was checked with vmt , similarly as done for b - smh . dentifrice supernatants were prepared by suspending 12 g of the respective dentifrice in 36-ml deionized water to create a 1:3 dilution . the suspensions were thoroughly stirred with a stirring rod and mechanically agitated by means of a vortex mixer for 1 min . the suspensions were then centrifuged at 3500 rpm for 20 min at room temperature , once daily before starting the ph - cycling . the specimens were placed in the ph - cycling system on a cylindrical beaker for 10 days . each cycle involved 3 h of demineralization twice a day with a 2 h immersion in a remineralizing solution in between [ figure 1 ] . a 1-min treatment with a toothpaste solution of 3:1 deionized water to toothpaste , after centrifugation ( 5 ml / section ) , was given before the first demineralizing cycle and both before and after the second demineralizing cycle and sections were placed in a remineralizing solution overnight . r - smh was checked with vmt , similarly as done for b - smh . statistical analysis of data were conducted using anova and multiple comparisons within groups was done using bonferroni method ( post - hoc tests ) the decision criterion was to reject the null hypothesis if the p < 0.05 . if there was a significant difference between the groups , multiple comparisons ( post - hoc test ) using bonferroni test was carried out . b - smh was checked with vicker 's microhardness testing machine ( vmt ) for all the tooth samples in the area of the working window . the indentations were made with vmt at the rate of 25 gram load for 5 seconds . the average microhardness of the specimen was determined from 3 indentations to avoid any discrepancy . the buffered de-/re - mineralizing solutions were prepared using analytical grade chemicals and deionized water . the demineralizing solution contained 2.2 mm calcium chloride , 2.2 mm sodium phosphate , and 0.05 m acetic acid ; the ph was adjusted with 1 m potassium hydroxide to 4.4 . the remineralizing solution , which contained 1.5 mm calcium chloride , 0.9 mm sodium phosphate , and 0.15 m potassium chloride , had a ph of 7.0 . samples were kept in demineralizing solution for 96 h to produce the artificial carious lesion in the enamel . d - smh was checked with vmt , similarly as done for b - smh . dentifrice supernatants were prepared by suspending 12 g of the respective dentifrice in 36-ml deionized water to create a 1:3 dilution . the suspensions were thoroughly stirred with a stirring rod and mechanically agitated by means of a vortex mixer for 1 min . the suspensions were then centrifuged at 3500 rpm for 20 min at room temperature , once daily before starting the ph - cycling . the specimens were placed in the ph - cycling system on a cylindrical beaker for 10 days . each cycle involved 3 h of demineralization twice a day with a 2 h immersion in a remineralizing solution in between [ figure 1 ] . a 1-min treatment with a toothpaste solution of 3:1 deionized water to toothpaste , after centrifugation ( 5 ml / section ) , was given before the first demineralizing cycle and both before and after the second demineralizing cycle and sections were placed in a remineralizing solution overnight . r - smh was checked with vmt , similarly as done for b - smh . statistical analysis of data were conducted using anova and multiple comparisons within groups was done using bonferroni method ( post - hoc tests ) the decision criterion was to reject the null hypothesis if the p < 0.05 . if there was a significant difference between the groups , multiple comparisons ( post - hoc test ) using bonferroni test was carried out . table 1 gives us the results of comparison of microhardness within each group , from anova and the p value . mean microhardness in group a was found to be 385.93 at baseline , 300.40 after demineralization and 371.67 after remineralization . while in group b mean microhardness was found to be 385.73 at baseline , 314.73 after demineralization and 357.07 after remineralization . these mean values for group a and group b were found to be statistically significant from baseline to after demineralization ( p < 0.001 * * ) as well as from after demineralization to after remineralization ( p < 0.001 * * ) . mean microhardness within the groups table 2 gives us the results of comparison of microhardness between the group a and group b at different phases of the study . no significant difference in mean microhardness was observed between both groups at baseline ( p > 0.05 ) and after demineralization ( p > 0.05 ) . this indicates that the tooth samples were demineralized to almost the same level of hardness after demineralization , hence giving non - significant result on statistical analysis . mean microhardness between the groups however , the difference in mean microhardness between the groups after remineralization , was found to be statistically significant after remineralization ( p < 0.05 ) , indicating changes in the mineralization of the tooth samples . despite of major advances in the field of cariology , dental caries still remains a major problem affecting human population across the globe . however caries process is now well - understood ; much of it has been described extensively in the dental literature . early enamel carious lesion appears white because the normal translucency of the enamel is lost . even though initial enamel lesions have intact surfaces , they have a low mineral content at the surface layer when compared to sound enamel ; thus showing a lower hardness value at the surface than for sound enamel tissue . when there is acid attack on the tooth surface , the acids lowers the surface ph and diffuse through the plaque , which causes loss of minerals from the enamel and dentine . this mineral loss compromises the mechanical structure of the tooth and lead to cavitation over a long period of time . the subsequent re - mineralization process is nearly the reverse . when oral ph returns to near neutral , ca and po4 ions in saliva incorporate themselves into the depleted mineral layers of enamel as new apatite . the demineralized zones in the crystal lattice act as nucleation sites for new mineral deposition . essentially , the sudden drop in ph following meals produces an under saturation of those essential ions ( ca and po4 ) in the plaque fluid with respect to tooth mineral . this promotes the dissolution of the enamel . at elevated ph , the ionic super - saturation of plaque shifts the equilibrium the other way , causing a mineral deposition in the tooth . over the course of human life , enamel and dentin undergo unlimited cycles of de - mineralization and re - mineralization . for many years however recent reviews have concluded that the decline in caries may be at an end or even in reversal , with levels increasing in some areas . thus , there is a need for developing new biomaterials which can act as an adjunct to the existing fluorides or can individually act as an agent for arresting the carious lesion and their remineralization . in the present study , we have compared the remineralization potential of two different dentifrices , which are not normally the key active ingredients in the dentifrices largely used . considering the importance of the surface layer in caries progression , the evaluation of changes in this region is relevant , thus smh measurement is a suitable technique for studying de - remineralization process . micro hardness measurement is appropriate for a material having fine microstructure , non - homogenous or prone to cracking like enamel . smh indentations provide a relatively simple , non - destructive and rapid method in demineralization and remineralization studies . rather than using the traditional ph - cycling method , a modified version was utilized in our study , in an attempt to simulate the real - life situation . this included a 3 h demineralizing cycle twice a day , with one 2 h and one intervening overnight remineralizing cycle , respectively . and to replicate early morning , midday and before bed - time tooth brushing , toothpaste was applied thrice daily . the remineralizing solutions used in the study were created to replicate supersaturation by apatite minerals found in saliva and were similar to those previously utilized by ten cate and duijsters . even though all the specimens were sectioned from different teeth , the variations among them did not yield any major effect on the progression of demineralization . this was confirmed by the p - value obtained for all the hardness measurements ( p > 0.05 ) before the in - vitro ph cycling commenced . it was therefore reasonable to disregard such variations when analyzing the data after ph cycling . after the treatment regime with the respective dentifrices , increase in mean microhardness was observed in both groups [ table 1 ] ; this is in accordance with various previous studies carried out for determining the remineralization potential of bag , cpp - acp , and when compared group a and group b showed statistically significant values after ph - cycling regimen [ table 2 ] . bag is a ceramic material consisting of amorphous sodium - calcium - phosphosilicate which is highly reactive in water and as a fine particle size powder can physically occlude dentinal tubules . in the aqueous environment around the tooth , i.e. , saliva in the oral cavity , sodium ions from the bag particles rapidly exchange with hydrogen cations ( in the form of h3o ) and this brings about the release of calcium and phosphate ( po4 ) ions from the glass . a localized , transient increase in ph occurs during the initial exposure of the material to water due to the release of sodium . this increase in ph helps to precipitate the extra calcium and phosphate ions provided by the bag to form a calcium phosphate layer . as these reactions continue , this layer crystallizes into hydroxycarbonate apatite ( hca ) . unlike other calcium phosphate technologies , the ions that bag release form hca-(a mineral that is chemically similar to natural tooth mineral ) directly , without the intermediate acp phase . these particles also attach to the tooth surface and continue to release ions and re - mineralize the tooth surface after the initial application . these particles have been shown , in in - vitro studies , to release ions and transform into hca for up to 2 weeks . a study was carried out to determine the remineralizing effects of bag on bleached enamel . it concluded that bag deposits were found on the enamel surface of all the specimens , suggesting that they may act as a reservoir of ions available for remineralization at sites of possible demineralization . the present study also revealed that cpp - acp remineralized enamel lesion in human enamel in - vitro . cpp - acp is calcium phosphate - based delivery systems containing high concentrations of calcium phosphate . the roles of cpp - acp has been described as - localization of the acp on the tooth surface and buffer the free calcium and phosphate ion activity , thereby helping in maintaining the role of super saturation . the cpp stabilizes the calcium and phosphate in a metastable solution facilitating high concentration of the ca and po4 which diffuses in the enamel lesion when cpp - acp comes in contact with the lesion . however the lower hardness values for cpp - acp may be due to its amorphous nature ; which does not adhere to the enamel surface , unlike bag gets attached to tooth , hence not remineralizing the tooth surface for a longer period of time to enhance its hardness . a study investigated the enamel remineralization potential of cpp - acp and bag , showed that , after scanning electron microscope analysis it was clearly seen that although both group samples had plugs that sealed the fissures formed by demineralization , bag plug appeared to be more compact and intimately attached to the enamel surface . the deposits formed by cpp - acp were smaller and amorphous , while bag created larger , more angular deposit . this may also explain the high values of hardness for bag as compared to cpp - acp in the current study ; as bag attaches more intimately and compactly to the tooth surface . under the current in - vitro experimental conditions it can be concluded that both bag and cpp - acp are capable to remineralize early carious lesion and also that bag has a better remineralization potential than cpp - acp .","<S> aims : the aim of this study was to evaluate and compare the remineralization potential of bioactive - glass ( bag ) ( novamin/calcium - sodium - phosphosilicate ) and casein phosphopeptide - amorphous calcium phosphate ( cpp - acp ) containing dentifrice.materials and methods : a total of 30 sound human premolars were decoronated , coated with nail varnish except for a 4 mm 4 mm window on the buccal surface of crown and were randomly divided in two groups ( n = 15 ) . </S> <S> group a bag dentifrice and group b cpp - acp dentifrice . </S> <S> the baseline surface microhardness ( smh ) was measured for all the specimens using the vickers microhardness testing machine . </S> <S> artificial enamel carious lesions were created by inserting the specimens in de - mineralizing solution for 96 h. smh of demineralized specimens was evaluated . </S> <S> 10 days of ph - cycling regimen was carried out . </S> <S> smh of remineralized specimens was evaluated.statistical analysis : data was analyzed using anova and multiple comparisons within groups was done using bonferroni method ( post - hoc tests ) to detect significant differences at p < </S> <S> 0.05 levels.results:group a showed significantly higher values ( p < 0.05 ) when compared with the hardness values of group b.conclusions:within the limits ; the present study concluded that ; both bag and cpp - acp are effective in remineralizing early enamel caries . </S> <S> application of bag more effectively remineralized the carious lesion when compared with cpp - acp . </S>"
4,"the objective signs and subjective symptoms of dry eye fluctuate greatly across patient populations , creating a significant challenge for the clinician to diagnose and treat the condition effectively . a more recent area of study in dry eye is the implications for visual function . the decreased blink rate experienced during visual function tasks , ( eg , extended computer use , reading , video gaming , watching tv ) can exacerbate dry eye and the signs and symptoms of dry eye ( eg , blurred vision , ocular surface staining , short tear film break - up time [ tfbut ] ) , which , in turn , can limit patients visual functioning capabilities . not surprisingly , this reciprocal relationship between dry eye and visual tasking can confound the clinician s diagnosis and treatment even further . patient complaints of the effect of dry eye on visual function may include difficulty driving , reading , and watching television.1 during each blink of the eye , the action of the upper lid serves the essential purpose of reestablishing the tear film over the ocular surface epithelium.2 the tear film instability and increased rates of tear film break - up following a blink that are characteristic of dry eye can be caused by insufficiencies in any of the three major tear film components : lipid , mucin , and aqueous.3,4 ocular surface protection is contingent upon the patient s tfbut matching or exceeding his or her inter - blink interval ( ibi).5 blink rate is thereby closely integrated with ocular surface integrity . the effects of dry eye on visual function can be assessed by testing the visual acuity degradation during the ibi . the inter - blink interval visual acuity decay ( ivad ) test is a novel diagnostic tool that evaluates functional visual acuity between blinks . c s at individualized best - corrected visual acuity ( bcva ) between blinks and measures parameters based on patient responses . during the test , the rotating optotype c is presented and the subject responds using a keypad to indicate the direction of the c.1 by using a standardized task for all patients before and after treatment , the test is designed to establish an accurate representation of the effects of dry eye treatments on visual function . a new artificial tear , systane ultra lubricant eye drops ( alcon , inc . , fort worth , tx , usa ) , contains polyethylene glycol 400 ( peg 400 ) and propylene glycol ( pg ) as active demulcents , gelling agent hp - guar , sorbitol , and preservative polyquad . the viscosity is optimized in the bottle with sorbitol to allow for efficient ocular spreading with minimal blur . upon instillation , the solution then interacts with the mucins and divalent ions of the natural tear film and releases sorbitol , allowing enhanced cross - linking of borate with hp - guar , and forming a matrix on the ocular surface.6 optive ( allergan , inc . , irvine , ca , usa ) contains carboxymethylcellulose sodium ( cmc ) and glycerin , and is preserved with purite.7 the drops were chosen as comparators because they contain different polymeric formulations and target similar dry eye populations . over - the - counter ( otc ) artificial tears are the mainstay in dry eye treatment , and impact the visual function capabilities of patients . clinical studies of artificial tears have demonstrated differences in blurring effects upon instillation between marketed products , as well as the relationship of residence time to this blurring upon - instillation.8,9 knowledge of the considerable impact of dry eye on visual function and the implications for quality - of - life led to further investigation of the effects of dry eye treatment on visual function . the present study was designed to evaluate the effect of two lubricant eye drops ( systane ultra and optive ) on visual function of dry eye patients using the ivad test . this study utilized a single - center , randomized , double masked , cross - over design to evaluate the visual function effects of two lubricant eye drops in patients with dry eye . the study protocol , protocol amendments , informed consent form , investigator qualifications and site , and recruiting materials were approved by an institutional review board ( southwest independent ; fort worth , tx ) . the study was conducted in accordance with current good clinical practice guidelines and the declaration of helsinki . subjects were recruited from an existing database of dry eye patients , and were enrolled in the study if they met all inclusion and no exclusion criteria . enrolled subjects were at least 18 years of age , provided written informed consent , and had a reported history of dry eye in both eyes . enrolled subjects also had a documented history within the previous 6 months of both a tfbut 5 seconds and a sodium fluorescein corneal staining sum score of 1 ( on a standardized 0 [ none ] to 4 [ worst ] scale)10 in both eyes . subjects were enrolled if they reported a use and/or desire to use an artificial tear within the past year , and had a bcva of 0.6 logmar or better in each eye using the early treatment of diabetic retinopathy study ( etdrs ) chart . subjects were excluded if they had undergone ocular surgery in the previous 6 months , had current punctal occlusion , had a history of intolerance or hypersensitivity to any component of the study medications , or had a history or evidence of external ocular infection . subjects who had a history or evidence of glaucoma , ocular hypertension , and intraocular inflammation ; and those who used systemic medications known to cause ocular drying on an unstable dose for the 30 days prior to visit 1 were also excluded from the study . subjects who had any ocular or systemic medical condition that might , in the opinion of the investigator , preclude the safe administration of test product , were excluded as well . use of topical ocular drops within 3 hours of visit 1 , use of restasis ( allergan ) within 30 days of visit 1 , and use of either ( except in the case of artificial tears ) throughout the duration of the study were disallowed . at visit 1 ( day 0 ) subjects provided written informed consent and signed a health insurance portability and accountability act ( hipaa ) privacy document prior to any study procedures . each subject s bcva was tested using the etdrs chart , and primary gaze blink rate was obtained using a digital microcamera and infrared illuminator while the subject was isolated and asked to complete a standard visual task ( subjects were not instructed that their blink rate was being monitored ) . each subject s functional blink rate was acquired during a practice session of the ivad test . reading rate was determined by timing each subject while reading a list of 16 words , one eye at a time . slit lamp examination of the anterior eye was conducted on both eyes of the subjects . as subjects qualified , they were assigned enrollment numbers in numerical sequence , and these numbers corresponded to the treatment order for the two test products . after resting for at least 5 minutes , 1 drop of randomized treatment was instilled in both eyes by a designated member of the investigative staff who was not involved in clinical assessment , data management , or data analysis , and who did not disclose treatment assignments to the investigator , sponsor , or patients . subjects then underwent functional blink rate assessment , ivad testing , and reading rate evaluation at 15 , 45 , and 90 minutes post - dose . finally , any adverse events reported or observed after instillation of test product were recorded and assessed and subjects were scheduled for their next visit . subjects returned for visit 2 on day 7 3 , and were queried for artificial tear use in the 3 hours prior to their visit and any adverse event occurrence since visit 1 , and the same method was applied for the cross - over treatment , followed by study exit . all patients receiving test product were considered evaluable for the safety analysis and all patients receiving test product and completing both visits were considered evaluable for the intent - to - treat ( itt ) analysis . all patients who met the criteria for itt analysis , and satisfied all inclusion and exclusion criteria were considered evaluable for the per - protocol ( pp ) analysis . the primary efficacy variables were ivad test recorded time at bcva and reading rate of the worse eye as assessed at baseline and 15 , 45 , and 90 minutes post - dose . descriptive statistics were presented for each of the efficacy variables ( including the mean , standard deviation , sample size , minimum , and maximum ) and by treatment and time points . repeated measures analysis of variance ( anova ) were used to test for treatment differences in the ivad test recorded time and reading rate of the worse eye . descriptive statistics for the secondary variable of functional blink rate assessed at baseline and 15 , 45 , and 90 minutes post - treatment were generated by treatment and time points . repeated measures of variance were used to test for treatment differences in functional blink rate . post hoc exploratory efficacy analysis was conducted using survival analyses ( right censored ) performed for ivad time at bcva at 15 , 45 , and 90 minutes post - dose . all patients receiving test product were considered evaluable for the safety analysis and all patients receiving test product and completing both visits were considered evaluable for the intent - to - treat ( itt ) analysis . all patients who met the criteria for itt analysis , and satisfied all inclusion and exclusion criteria were considered evaluable for the per - protocol ( pp ) analysis . the primary efficacy variables were ivad test recorded time at bcva and reading rate of the worse eye as assessed at baseline and 15 , 45 , and 90 minutes post - dose . descriptive statistics were presented for each of the efficacy variables ( including the mean , standard deviation , sample size , minimum , and maximum ) and by treatment and time points . repeated measures analysis of variance ( anova ) were used to test for treatment differences in the ivad test recorded time and reading rate of the worse eye . descriptive statistics for the secondary variable of functional blink rate assessed at baseline and 15 , 45 , and 90 minutes post - treatment were generated by treatment and time points . repeated measures of variance were used to test for treatment differences in functional blink rate . post hoc exploratory efficacy analysis was conducted using survival analyses ( right censored ) performed for ivad time at bcva at 15 , 45 , and 90 minutes post - dose . fifty - three subjects were screened and 48 subjects were enrolled in the study , completed all visits , and were evaluable for safety , itt and pp analyses ( mean age = 61.1 14.8 years ; 20.8% male ; 95.8% white , 2.1% black , 2.1% other ) . since the itt and pp data sets were identical , analyses were performed only on the itt set . the results for time at bcva , reading rate , and functional blink rate measurements are listed in table 1 . a significant difference in survival distribution for time at bcva ( time to one - line loss of bcva ) evaluated using the ivad test was observed at 90 minutes post - dose : 50% of patients demonstrated time to one - line loss of bcva greater than 9.17 seconds in peg 400/pg treatment group compared to 6.84 seconds in cmc / glycerin treatment group ( wilcoxon test , p = 0.037 ; see table 2 , figure 1 ) . repeated measures anova ( used to compare treatment differences in mean scores by time because anova allows tests for treatment - by - time and treatment - by - sequence interactions ) showed no significant between - treatment differences for time at bcva overall ( p = 0.49 ) or any time point ( p > 0.36 ) , for reading rate overall ( p = 0.082 ) or at any timepoint ( p > 0.20 ) , or for functional blink rate overall ( p = 0.1408 ) or at any timepoint ( p > 0.14 ) . no adverse events related to the test product were reported and no patients discontinued due to an adverse event . the current study was designed to demonstrate the visual function impact of two lubricant eye drops through the use of the ivad test . the ivad test has been utilized in prior studies , demonstrating the statistically significantly shorter time at which dry eye patients could maintain their bcvas compared to normals ( p = 0.0001 ) . 1 the present study utilized this clinical tool in order to investigate the potential benefits of artificial tear use on these visual function parameters in dry eye patients . although no significant between - treatment differences were observed in the mean values of reading rate or functional blink rate at any time point post - treatment , the survival distribution for time at bcva demonstrated a statistically significant difference between treatments at 90 minutes post - dose favoring the peg 400/pg formulation ( wilcoxon test , p = 0.0365 ) . the ability of the peg 400/pg formulation to extend the amount of time to one - line loss of bcva for prolonged periods of time , as demonstrated by the ivad test , may suggest beneficial visual effects for patients . the link between visual function loss and diminished quality - of - life is well established in literature.11,12 the ability of ocular lubricants to stabilize the tear film during visual tasking and to retard the visual acuity decay between blinks likely translates to enhanced visual functioning capabilities for patients . previous research has explored the residence time and duration of action of lubricant eye drops . a prior study compared the duration of tear film stability achieved by the cmc / glycerin tear to that of a peg 400/pg tear ( a formulation with similar active ingredients to that used in the present study ) . the results of the study demonstrated that the peg 400/pg drop achieved significantly greater extension of tfbut at 45 , 60 , and 90 minutes post - instillation than the cmc / glycerin drop ( p < 0.05 ) . while similar ocular surface protection was determined to result with both artificial tears immediately following instillation , the peg 400/pg tear was able to prolong tfbut and maintain positive changes in ocular protection index up to 90 minutes post - instillation.13 these results may suggest the prolonged duration of action of the peg 400/pg formulation used in the current study , as noted in the ability to positively impact the time to one - line loss of bcva 90 minutes post - instillation . additionally , previous work has shown that the peg 400/pg formulation used in this study has enhanced rheological properties , including those resulting from the dilution of sorbitol upon introduction to the tear film , which may also help to explain the effect at 90 minutes.14 while this data is important , no significant differences from baseline in bcva maintenance existed for either drug until 90 minutes post - instillation , when both tears appeared to show their maximum effectiveness . the ability of an artificial tear to promote visual function abilities has the potential to benefit various dry eye patient populations . in order to elaborate upon the extent of these visual effects , future research could examine the impact of visual function improvement on patient quality - of - life over longer - term artificial tear use ; the potential correlation of sign and symptom reduction with visual function improvement ; or the use of artificial tears while performing specific visual function tasks ( eg , driving at night , computer use , watching television ) . investigating the potential effects of artificial tear use on visual function in patients who may not be diagnosed with dry eye , but who may experience ocular discomfort when acutely exacerbated by prolonged periods of visual tasking , could illuminate potential benefits of ocular lubricants in that population as well . no significant between - treatment differences were observed in mean scores for reading rate at any of the time points measured post - treatment . median time to one - line loss of bcva as measured using the more sensitive measure of the ivad test , however , was significantly longer with the peg 400/pg ocular lubricant than the cmc / glycerin product 90 minutes post - instillation . this is the first clinical trial using the novel ivad test to demonstrate the ability of lubricant eye drops to extend visual acuity maintenance between blinks .","<S> objective : the purpose of the current study was to evaluate the effects of two marketed ocular lubricants on the visual decay in dry eye patients using the inter - blink interval visual acuity decay ( ivad ) test.methods:this controlled , randomized , double - masked crossover study compared the effects of a polyethylene glycol / propylene glycol - based ( peg / pg ) tear and a carboxymethylcellulose sodium ( cmc)/glycerin tear on the visual acuity decay between blinks of dry eye patients . at visit 1 ( day 0 ) , baseline ivad measurements were recorded prior to instillation of a single drop of randomized study medication . </S> <S> ivad testing was repeated at 15- , 45- , and 90-minutes post - instillation . </S> <S> reading rate and functional blink rate were also evaluated . at the second visit ( day 7 3 ) </S> <S> , study procedures were repeated using crossover treatment.results:forty-eight ( 48 ) subjects with dry eye ( 61.1 14.8 years old , 79.2% female , 95.8% white ) completed the study . </S> <S> treatment with the peg / pg - based tear demonstrated statistically significantly longer time to one - line loss of best - corrected visual acuity ( bcva ) as determined by the ivad test at 90 minutes post - instillation compared to the cmc / glycerin tear ( p = 0.0365 ) . </S> <S> measurements of median time at bcva , reading rate , and functional blink rate were similar for both treatments . </S> <S> both formulations were well tolerated in the population studied.conclusions:treatment with the peg / pg - based tear demonstrated statistically significant improved maintenance of visual acuity between blinks at 90 minutes post - instillation compared to the cmc / glycerin tear . </S> <S> this is the first study to demonstrate the ability of an artificial tear to extend visual acuity maintenance between blinks , as measured by the ivad test . </S>"


The metric is an instance of [`datasets.Metric`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric):

In [13]:
metric

Metric(name: "rouge", features: {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}, usage: """
Calculates average rouge scores for a list of hypotheses and references
Args:
    predictions: list of predictions to score. Each predictions
        should be a string with tokens separated by spaces.
    references: list of reference for each prediction. Each
        reference should be a string with tokens separated by spaces.
    rouge_types: A list of rouge types to calculate.
        Valid names:
        `"rouge{n}"` (e.g. `"rouge1"`, `"rouge2"`) where: {n} is the n-gram based scoring,
        `"rougeL"`: Longest common subsequence based scoring.
        `"rougeLSum"`: rougeLsum splits text using `"
"`.
        See details in https://github.com/huggingface/datasets/issues/617
    use_stemmer: Bool indicating whether Porter stemmer should be used to strip word suffixes.
    use_agregator: Return aggregates if this is set to True
Retu

You can call its `compute` method with your predictions and labels, which need to be list of decoded strings:

In [14]:
fake_preds = ["hello there", "general kenobi"]
fake_labels = ["hello there", "general kenobi"]
metric.compute(predictions=fake_preds, references=fake_labels)

{'rouge1': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rouge2': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeL': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeLsum': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0))}

## Preprocessing the data

Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 `Transformers` `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that the model requires.

To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

That vocabulary will be cached, so it's not downloaded again the next time we run the cell.

In [15]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

By default, the call above will use one of the fast tokenizers (backed by Rust) from the 🤗 `Tokenizers` library.

You can directly call this tokenizer on one sentence or a pair of sentences:

In [16]:
tokenizer("Hello, this one sentence!")

{'input_ids': [8774, 6, 48, 80, 7142, 55, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}

Depending on the model you selected, you will see different keys in the dictionary returned by the cell above. They don't matter much for what we're doing here (just know they are required by the model we will instantiate later), you can learn more about them in [this tutorial](https://huggingface.co/transformers/preprocessing.html) if you're interested.

Instead of one sentence, we can pass along a list of sentences:

In [17]:
tokenizer(["Hello, this one sentence!", "This is another sentence."])

{'input_ids': [[8774, 6, 48, 80, 7142, 55, 1], [100, 19, 430, 7142, 5, 1]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}

To prepare the targets for our model, we need to tokenize them inside the `as_target_tokenizer` context manager. This will make sure the tokenizer uses the special tokens corresponding to the targets:

In [18]:
with tokenizer.as_target_tokenizer():
    print(tokenizer(["Hello, this one sentence!", "This is another sentence."]))

{'input_ids': [[8774, 6, 48, 80, 7142, 55, 1], [100, 19, 430, 7142, 5, 1]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}


If you are using one of the five T5 checkpoints we have to prefix the inputs with "summarize:" (the model can also translate and it needs the prefix to know which task it has to perform).

In [19]:
if model_checkpoint in ["t5-small", "t5-base", "t5-larg", "t5-3b", "t5-11b"]:
    prefix = "summarize: "
else:
    prefix = ""

We can then write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model. The padding will be dealt with later on (in a data collator) so we pad examples to the longest length in the batch and not the whole dataset.

The max input length of `t5-base` is 512, so `max_input_length = 512`.

In [20]:
max_input_length = 512
max_target_length = 256

def preprocess_function(examples):
    inputs = [prefix + doc for doc in examples["article"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["abstract"], max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists for each key:

In [21]:
preprocess_function(raw_datasets['train'][:2])

{'input_ids': [[21603, 10, 34, 6986, 16, 72, 145, 5743, 13, 1221, 11, 164, 1535, 12669, 16, 824, 1308, 13, 1874, 7, 3, 6, 902, 16, 1221, 3, 22725, 26324, 11, 87, 127, 11423, 3918, 5, 536, 46, 11658, 19, 4802, 38, 46, 22666, 3, 30715, 593, 13, 24731, 14063, 77, 41, 3, 107, 115, 3, 61, 41, 3, 107, 115, 3, 2, 586, 3, 122, 3, 87, 3, 26, 40, 3, 61, 11, 164, 7931, 38, 3, 9, 741, 13, 8, 3, 10067, 1994, 3, 6, 19021, 3, 6, 2714, 7470, 3, 6, 26324, 3, 6, 42, 11423, 3918, 3, 5, 17413, 2116, 3130, 24, 9990, 11, 2072, 32, 3, 18, 3518, 610, 227, 11423, 3918, 3, 6, 902, 16, 819, 11, 5378, 1874, 7, 3, 6, 164, 36, 22001, 57, 46, 11658, 5, 2266, 46, 11658, 557, 4131, 29, 7, 3976, 224, 38, 13034, 3, 6, 18724, 3, 6, 11, 16633, 102, 29, 15, 9, 3, 6, 11, 2932, 164, 43, 3, 9, 2841, 1504, 30, 463, 13, 280, 41, 3, 1824, 32, 40, 3, 61, 11, 821, 2637, 16, 1221, 28, 1874, 3, 5, 2932, 3, 6, 12, 1172, 1722, 11850, 3, 6, 3, 1824, 32, 40, 3, 6, 11, 813, 6715, 7, 159, 16, 1221, 28, 1874, 3, 6, 34, 133, 36, 4360, 12, 2

To apply this function on all the pairs of sentences in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command.

In [22]:
tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)

  0%|          | 0/8 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

Even better, the results are automatically cached by the 🤗 `Datasets` library to avoid spending time on this step the next time you run your notebook. The 🤗 `Datasets` library is normally smart enough to detect when the function you pass to map has changed (and thus requires to not use the cache data). For instance, it will properly detect if you change the task in the first cell and rerun the notebook. 🤗 `Datasets` warns you when it uses cached files, you can pass `load_from_cache_file=False` in the call to `map` to not use the cached files and force the preprocessing to be applied again.

Note that we passed `batched=True` to encode the texts by batches together. This is to leverage the full benefit of the fast tokenizer we loaded earlier, which will use multi-threading to treat the texts in a batch concurrently.

## Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it. Since our task is of the sequence-to-sequence kind, we use the `AutoModelForSeq2SeqLM` class. Like with the tokenizer, the `from_pretrained` method will download and cache the model for us.

In [23]:
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/850M [00:00<?, ?B/s]

Note that  we don't get a warning like in our classification example. This means we used all the weights of the pretrained model and there is no randomly initialized head in this case.

To instantiate a `Seq2SeqTrainer`, we will need to define three more things. The most important is the [`Seq2SeqTrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.Seq2SeqTrainingArguments), which is a class that contains all the attributes to customize the training. It requires one folder name, which will be used to save the checkpoints of the model, and all other arguments are optional:

In [24]:
batch_size = 2
model_name = model_checkpoint.split("/")[-1]
args = Seq2SeqTrainingArguments(
    f"{model_name}-finetuned-pubmed",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=5,
    predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
    seed = 42,
)

Here we set the evaluation to be done at the end of each epoch, tweak the learning rate, use the `batch_size` defined at the top of the cell and customize the weight decay. Since the `Seq2SeqTrainer` will save the model regularly and our dataset is quite large, we tell it to make three saves maximum. Lastly, we use the `predict_with_generate` option (to properly generate summaries) and activate mixed precision training (to go a bit faster).

The last argument to setup everything so we can push the model to the [Hub](https://huggingface.co/models) regularly during training. Remove it if you didn't follow the installation steps at the top of the notebook. If you want to save your model locally in a name that is different than the name of the repository it will be pushed, or if you want to push your model under an organization and not your name space, use the `hub_model_id` argument to set the repo name (it needs to be the full name, including your namespace: for instance `"sgugger/t5-finetuned-xsum"` or `"huggingface/t5-finetuned-xsum"`).

Then, we need a special kind of data collator, which will not only pad the inputs to the maximum length in the batch, but also the labels:

In [25]:
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

The last thing to define for our `Seq2SeqTrainer` is how to compute the metrics from the predictions. We need to define a function for this, which will just use the `metric` we loaded earlier, and we have to do a bit of pre-processing to decode the predictions into texts:

In [26]:
import nltk
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    
    # Rouge expects a newline after each sentence
    decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
    
    result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    # Extract a few results
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    
    # Add mean generated length
    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
    result["gen_len"] = np.mean(prediction_lens)
    
    return {k: round(v, 4) for k, v in result.items()}

Then we just need to pass all of this along with our datasets to the `Seq2SeqTrainer`:

In [27]:
trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Cloning https://huggingface.co/Kevincp560/t5-base-finetuned-pubmed into local empty directory.
Using amp half precision backend


We can now finetune our model by just calling the `train` method:

In [28]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `T5ForConditionalGeneration.forward` and have been ignored: abstract, article.
***** Running training *****
  Num examples = 8000
  Num Epochs = 5
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 20000


Epoch,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum,Gen Len
1,2.0957,1.900611,8.6968,3.2473,7.9565,8.3224,19.0
2,2.0489,1.857106,8.6877,3.2461,7.9311,8.2991,19.0
3,2.7345,2.611209,9.585,3.0129,8.4729,9.1109,19.0
4,3.0585,2.722198,9.7011,3.3549,8.6588,9.2646,19.0
5,2.9437,2.631093,9.3771,3.7042,8.4912,9.0013,19.0


Saving model checkpoint to t5-base-finetuned-pubmed/checkpoint-500
Configuration saved in t5-base-finetuned-pubmed/checkpoint-500/config.json
Model weights saved in t5-base-finetuned-pubmed/checkpoint-500/pytorch_model.bin
tokenizer config file saved in t5-base-finetuned-pubmed/checkpoint-500/tokenizer_config.json
Special tokens file saved in t5-base-finetuned-pubmed/checkpoint-500/special_tokens_map.json
tokenizer config file saved in t5-base-finetuned-pubmed/tokenizer_config.json
Special tokens file saved in t5-base-finetuned-pubmed/special_tokens_map.json
Saving model checkpoint to t5-base-finetuned-pubmed/checkpoint-1000
Configuration saved in t5-base-finetuned-pubmed/checkpoint-1000/config.json
Model weights saved in t5-base-finetuned-pubmed/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in t5-base-finetuned-pubmed/checkpoint-1000/tokenizer_config.json
Special tokens file saved in t5-base-finetuned-pubmed/checkpoint-1000/special_tokens_map.json
Saving model checkpoi

TrainOutput(global_step=20000, training_loss=2.5142992614746094, metrics={'train_runtime': 8559.1691, 'train_samples_per_second': 4.673, 'train_steps_per_second': 2.337, 'total_flos': 2.433621949019136e+16, 'train_loss': 2.5142992614746094, 'epoch': 5.0})

You can now upload the result of the training to the Hub, just execute this instruction:

In [29]:
trainer.push_to_hub()

Saving model checkpoint to t5-base-finetuned-pubmed
Configuration saved in t5-base-finetuned-pubmed/config.json
Model weights saved in t5-base-finetuned-pubmed/pytorch_model.bin
tokenizer config file saved in t5-base-finetuned-pubmed/tokenizer_config.json
Special tokens file saved in t5-base-finetuned-pubmed/special_tokens_map.json
Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


Upload file pytorch_model.bin:   0%|          | 3.38k/850M [00:00<?, ?B/s]

Upload file runs/Mar03_13-28-16_17b24cb1cd1b/events.out.tfevents.1646314124.17b24cb1cd1b.80.0:  26%|##5       …

To https://huggingface.co/Kevincp560/t5-base-finetuned-pubmed
   1868be0..f4833bf  main -> main

To https://huggingface.co/Kevincp560/t5-base-finetuned-pubmed
   f4833bf..813ed7d  main -> main



'https://huggingface.co/Kevincp560/t5-base-finetuned-pubmed/commit/f4833bf85d36d822d88e2a5b2a16621802f12d0b'

You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `"your-username/the-name-you-picked"` so for instance:

```python
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("sgugger/my-awesome-model")
```