# Sample t5_large_arxiv_abstract_title

## Import libraries

In [15]:
from pyspark.ml import Pipeline

import sparknlp
from sparknlp.annotator import T5Transformer
from sparknlp.base import DocumentAssembler, LightPipeline

## Start Spark

In [16]:
spark = sparknlp.start()
print("Spark NLP Version :", sparknlp.version())

Spark NLP Version : 5.5.3


## Document Assambler

In [17]:
document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("documents")

## Model

In [18]:
t5 = (
    T5Transformer()
    .pretrained("t5_large_arxiv_abstract_title", "en")
    .setTask("generate title from abstract:")
    .setInputCols(["documents"])
    .setOutputCol("output")
)

t5_large_arxiv_abstract_title download started this may take some time.


25/03/06 23:59:37 WARN S3AbortableInputStream: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.


Approximate size to download 2.8 GB
[OK!]


## Pipeline

In [23]:
summarizer_pp = Pipeline(stages=[document_assembler, t5])
empty_df = spark.createDataFrame([[""]]).toDF("text")
pipeline_model = summarizer_pp.fit(empty_df)
sum_lmodel = LightPipeline(pipeline_model)

## Input data

In [24]:
example_txt = """
Transfer learning, where a model is first pre-trained on a data-rich task before being finetuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
"""

## Output

In [25]:
res = sum_lmodel.fullAnnotate(example_txt)[0]
print("Summary:", res["output"][0].result)

Summary: Transfer Learning for Natural Language Processing: A Unified Framework and Scalable Models
