<a href="https://colab.research.google.com/github/akashe/arxiv_hunter/blob/main/Summarization_using_BART.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Idea: Try out simple summarization using BART model. 
Why BART:
1. Really good ROUGE-1 score. A significant jump in Rouge-1 scores. Later models do improve the score but aren't a really big jump.

Things from BART paper(https://arxiv.org/pdf/1910.13461v1.pdf):
1. You could use the entire seq-2-seq model as pretrained model while fine-tuning. For example, for translation tasks they use the pretrained english to english model and for encoder they put another transformer to convert initial foreign language to masked english tokens.
2. Maksing: It seems from multiple papers that masking does help in better language understanding(tasks like SQuad). However, as they mention in the paper, it isn't good for language generation tasks. 
3. BART seems to work better for language generation tasks beacuse of the cross-attention between source and target sequences.

Problems:
1. For very long sequences, BART and Pegasus fail while getting positing embedding.
2. Other available models like BigBIRD are fine-tuned on arxiv or other longer dataset so summarization quality isn't good.

Ideas???
1. break into smaller documents and do summarization over smaller summaries.
2. Identify important parts of longer documents and then later summarize.

In [23]:
!pip install transformers



In [24]:
ARTICLE_TO_SUMMARIZE = """The history of artificial intelligence has been marked by repeated cycles of extreme optimism and promise followed by disillusionment and disappointment. Today’s AI systems can perform complicated tasks in a wide range of areas, such as mathematics, games, and photorealistic image generation. But some of the early goals of AI like housekeeper robots and self-driving cars continue to recede as we approach them.

Part of the continued cycle of missing these goals is due to incorrect assumptions about AI and natural intelligence, according to Melanie Mitchell, Davis Professor of Complexity at the Santa Fe Institute and author of Artificial Intelligence: A Guide For Thinking Humans.

In a new paper titled “Why AI is Harder Than We Think,” Mitchell lays out four common fallacies about AI that cause misunderstandings not only among the public and the media, but also among experts. These fallacies give a false sense of confidence about how close we are to achieving artificial general intelligence, AI systems that can match the cognitive and general problem-solving skills of humans.
Narrow AI and general AI are not on the same scale

The kind of AI that we have today can be very good at solving narrowly defined problems. They can outmatch humans at Go and chess, find cancerous patterns in x-ray images with remarkable accuracy, and convert audio data to text. But designing systems that can solve single problems does not necessarily get us closer to solving more complicated problems. Mitchell describes the first fallacy as “Narrow intelligence is on a continuum with general intelligence.”

“If people see a machine do something amazing, albeit in a narrow area, they often assume the field is that much further along toward general AI,” Mitchell writes in her paper.

For instance, today’s natural language processing systems have come a long way toward solving many different problems, such as translation, text generation, and question-answering on specific problems. At the same time, we have deep learning systems that can convert voice data to text in real-time. Behind each of these achievements are thousands of hours of research and development (and millions of dollars spent on computing and data). But the AI community still hasn’t solved the problem of creating agents that can engage in open-ended conversations without losing coherence over long stretches. Such a system requires more than just solving smaller problems; it requires common sense, one of the key unsolved challenges of AI.
The easy things are hard to automate

Above: Vision, one of the problems every living being solves without effort, remains a challenge for computers.

When it comes to humans, we would expect an intelligent person to do hard things that take years of study and practice. Examples might include tasks such as solving calculus and physics problems, playing chess at grandmaster level, or memorizing a lot of poems.

But decades of AI research have proven that the hard tasks, those that require conscious attention, are easier to automate. It is the easy tasks, the things that we take for granted, that are hard to automate. Mitchell describes the second fallacy as “Easy things are easy and hard things are hard.”

“The things that we humans do without much thought—looking out in the world and making sense of what we see, carrying on a conversation, walking down a crowded sidewalk without bumping into anyone—turn out to be the hardest challenges for machines,” Mitchell writes. “Conversely, it’s often easier to get machines to do things that are very hard for humans; for example, solving complex mathematical problems, mastering games like chess and Go, and translating sentences between hundreds of languages have all turned out to be relatively easier for machines.”

Consider vision, for example. Over billions of years, organisms have developed complex apparatuses for processing light signals. Animals use their eyes to take stock of the objects surrounding them, navigate their surroundings, find food, detect threats, and accomplish many other tasks that are vital to their survival. We humans have inherited all those capabilities from our ancestors and use them without conscious thought. But the underlying mechanism is indeed more complicated than large mathematical formulas that frustrate us through high school and college.

Case in point: We still don’t have computer vision systems that are nearly as versatile as human vision. We have managed to create artificial neural networks that roughly mimic parts of the animal and human vision system, such as detecting objects and segmenting images. But they are brittle, sensitive to many different kinds of perturbations, and they can’t mimic the full scope of tasks that biological vision can accomplish. That’s why, for instance, the computer vision systems used in self-driving cars need to be complemented with advanced technology such as lidars and mapping data.

Another area that has proven to be very difficult is sensorimotor skills that humans master without explicit training. Think of the how you handle objects, walk, run, and jump. These are tasks that you can do without conscious thought. In fact, while walking, you can do other things, such as listen to a podcast or talk on the phone. But these kinds of skills remain a large and expensive challenge for current AI systems.

“AI is harder than we think, because we are largely unconscious of the complexity of our own thought processes,” Mitchell writes.
Anthropomorphizing AI doesn’t help

The field of AI is replete with vocabulary that puts software on the same level as human intelligence. We use terms such as “learn,” “understand,” “read,” and “think” to describe how AI algorithms work. While such anthropomorphic terms often serve as shorthand to help convey complex software mechanisms, they can mislead us to think that current AI systems work like the human mind.

Mitchell calls this fallacy “the lure of wishful mnemonics” and writes, “Such shorthand can be misleading to the public trying to understand these results (and to the media reporting on them), and can also unconsciously shape the way even AI experts think about their systems and how closely these systems resemble human intelligence.”

The wishful mnemonics fallacy has also led the AI community to name algorithm-evaluation benchmarks in ways that are misleading. Consider, for example, the General Language Understanding Evaluation (GLUE) benchmark, developed by some of the most esteemed organizations and academic institutions in AI. GLUE provides a set of tasks that help evaluate how a language model can generalize its capabilities beyond the task it has been trained for. But contrary to what the media portray, if an AI agent gets a higher GLUE score than a human, it doesn’t mean that it is better at language understanding than humans.

“While machines can outperform humans on these particular benchmarks, AI systems are still far from matching the more general human abilities we associate with the benchmarks’ names,” Mitchell writes.

A stark example of wishful mnemonics is a 2017 project at Facebook Artificial Intelligence Research, in which scientists trained two AI agents to negotiate on tasks based on human conversations. In their blog post, the researchers noted that “updating the parameters of both agents led to divergence from human language as the agents developed their own language for negotiating [emphasis mine].”

This led to a stream of clickbait articles that warned about AI systems that were becoming smarter than humans and were communicating in secret dialects. Four years later, the most advanced language models still struggle with understanding basic concepts that most humans learn at a very young age without being instructed.
AI without a body

Can intelligence exist in isolation from a rich physical experience of the world? This is a question that scientists and philosophers have puzzled over for centuries.

One school of thought believes that intelligence is all in the brain and can be separated from the body, also known as the “brain in a vat” theory. Mitchell calls it the “Intelligence is all in the brain” fallacy. With the right algorithms and data, the thinking goes, we can create AI that lives in servers and matches human intelligence. For the proponents of this way of thinking, especially those who support pure deep learning–based approaches, reaching general AI hinges on gathering the right amount of data and creating larger and larger neural networks.

Meanwhile, there’s growing evidence that this approach is doomed to fail. “A growing cadre of researchers is questioning the basis of the ‘all in the brain’ information processing model for understanding intelligence and for creating AI,” she writes.

Human and animal brains have evolved along with all other body organs with the ultimate goal of improving chances of survival. Our intelligence is tightly linked to the limits and capabilities of our bodies. And there is an expanding field of embodied AI that aims to create agents that develop intelligent skills by interacting with their environment through different sensory stimuli.

Mitchell notes that neuroscience research suggests that “neural structures controlling cognition are richly linked to those controlling sensory and motor systems, and that abstract thinking exploits body-based neural ‘maps.’” And in fact, there’s growing evidence and research that proves feedback from different sensory areas of the brain affects both our conscious and unconscious thoughts.

Mitchell supports the idea that emotions, feelings, subconscious biases, and physical experience are inseparable from intelligence. “Nothing in our knowledge of psychology or neuroscience supports the possibility that ‘pure rationality’ is separable from the emotions and cultural biases that shape our cognition and our objectives,” she writes. “Instead, what we’ve learned from research in embodied cognition is that human intelligence seems to be a strongly integrated system with closely interconnected attributes, including emotions, desires, a strong sense of selfhood and autonomy, and a commonsense understanding of the world. It’s not at all clear that these attributes can be separated.”
Common sense in AI

Developing general AI needs an adjustment to our understanding of intelligence itself. We are still struggling to define what intelligence is and how to measure it in artificial and natural beings.

“It’s clear that to make and assess progress in AI more effectively, we will need to develop a better vocabulary for talking about what machines can do,” Mitchell writes. “And more generally, we will need a better scientific understanding of intelligence as it manifests in different systems in nature.”

Another challenge that Mitchell discusses in her paper is that of common sense, which she describes as “a kind of umbrella for what’s missing from today’s state-of-the-art AI systems.”

Common sense includes the knowledge that we acquire about the world and apply it every day without much effort. We learn a lot without being explicitly instructed, by exploring the world when we are children. These include concepts such as space, time, gravity, and the physical properties of objects. For example, a child learns at a very young age that when an object becomes occluded behind another, it has not disappeared and continues to exist, or when a ball rolls across a table and reaches the ledge, it should fall off. We use this knowledge to build mental models of the world, make causal inferences, and predict future states with decent accuracy.

This kind of knowledge is missing in today’s AI systems, which makes them unpredictable and data-hungry. In fact, housekeeping and driving, the two AI applications mentioned at the beginning of this article, are things that most humans learn through common sense and a little bit of practice.

Common sense also includes basic facts about human nature and life, things that we omit in our conversations and writing because we know our readers and listeners know them. For example, we know that if two people are “talking on the phone,” it means that they aren’t in the same room. We also know that if “John reached for the sugar,” it means that there was a container with sugar inside it somewhere near John. This kind of knowledge is crucial to areas such as natural language processing.

“No one yet knows how to capture such knowledge or abilities in machines. This is the current frontier of AI research, and one encouraging way forward is to tap into what’s known about the development of these abilities in young children,” Mitchell writes.

While we still don’t know the answers to many of these questions, a first step toward finding solutions is being aware of our own erroneous thoughts. “Understanding these fallacies and their subtle influences can point to directions for creating more robust, trustworthy, and perhaps actually intelligent AI systems,” Mitchell writes.

Ben Dickson is a software engineer and the founder of TechTalks, a blog that explores the ways technology is solving and creating problems."""

#### BigBART

In [3]:
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

In [4]:
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1054.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1915455.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=3508743.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=775.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1194.0, style=ProgressStyle(description…




In [5]:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2308148159.0, style=ProgressStyle(descr…




In [6]:
inputs = tokenizer(ARTICLE_TO_SUMMARIZE, return_tensors='pt')

In [8]:
len(inputs[0])

2455

In [19]:
prediction = model.generate(**inputs, num_beams=4, max_length=1024, early_stopping=True)

In [20]:
summaries = tokenizer.batch_decode(prediction)

In [22]:
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in prediction])

['the challenge of the century is to create an artificial intelligence that can match the performance of human beings . <n> the challenge of the century is to create an artificial intelligence that can match the performance of human beings . <n> the challenge of the century is to create an artificial intelligence that can match the performance of human beings . <n> the challenge of the century is to create an artificial intelligence that can match the performance of human beings .<n> the challenge of the century is to create an artificial intelligence that can match the performance of human beings .']


#### Pegasus-Large

Pegasus tokenizer not working right now. Raised issue on github.

In [25]:
from transformers import AutoTokenizer, PegasusForConditionalGeneration

In [26]:
model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-xsum')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1362.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2275329241.0, style=ProgressStyle(descr…




In [27]:
tokenizer = AutoTokenizer.from_pretrained('google/pegasus-xsum')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1912529.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=3520083.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=65.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=87.0, style=ProgressStyle(description_w…




In [33]:
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1000, return_tensors='pt')

In [35]:
summary_ids = model.generate(inputs['input_ids'])

IndexError: ignored

In [None]:
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])