# arxiv-txt.org for literature reviews

In this example, we will use [arxiv-txt.org](https://arxiv-txt.org) to generate a literature review for a given topic.
First, let's identify a list of relevant papers on a topic we want to summarize.
Here we will focus on "Masked Autoencoders".

Here is a list of papers we will use:

In [1]:
paper_list = [
    "https://arxiv.org/abs/2205.09113",
    "https://arxiv.org/abs/2304.00571",
    "https://arxiv.org/abs/2211.09120",
    "https://arxiv.org/abs/2212.05922",
    "https://arxiv.org/abs/2301.06018",
]


Let's define some helper functions to get the summaries of the papers for the lit review.

In [2]:
import requests

def get_paper_summary(arxiv_url) -> str:
    """
    Get the summary of a paper from arxiv-txt.org
    """
    assert arxiv_url.startswith("https://arxiv.org/"), f"Invalid arxiv url: {arxiv_url}, must start with https://arxiv.org/"
    arxiv_txt_url = arxiv_url.replace("arxiv.org/", "arxiv-txt.org/raw/")
    response = requests.get(arxiv_txt_url)
    return response.text


def get_summaries(paper_list) -> str:
    """
    Get the summaries of a list of papers
    """
    summary_list = []
    for paper in paper_list:
        print(f"Getting summary for {paper}:")
        summary = get_paper_summary(paper)
        summary_list.append(summary)
    return "\n---\n".join(summary_list)

paper_summaries = get_summaries(paper_list)
print("Summaries:\n---\n")
print(paper_summaries)

Getting summary for https://arxiv.org/abs/2205.09113:
Getting summary for https://arxiv.org/abs/2304.00571:
Getting summary for https://arxiv.org/abs/2211.09120:
Getting summary for https://arxiv.org/abs/2212.05922:
Getting summary for https://arxiv.org/abs/2301.06018:
Summaries:
---

# Masked Autoencoders As Spatiotemporal Learners

## Authors
Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He

## Categories
cs.CV, cs.LG

## Publication Details
- Published: May 18, 2022
- arXiv ID: 2205.09113v2



## Abstract
This paper studies a conceptually simple extension of Masked Autoencoders
(MAE) to spatiotemporal representation learning from videos. We randomly mask
out spacetime patches in videos and learn an autoencoder to reconstruct them in
pixels. Interestingly, we show that our MAE method can learn strong
representations with almost no inductive bias on spacetime (only except for
patch and positional embeddings), and spacetime-agnostic random masking
performs the best. We observ

Now let's get the lit review from the LLM.

In [3]:
# Install litellm for LLM calls
!pip -q install litellm

import os
import litellm

def get_completion(messages, model: str = "gpt-4o-mini") -> str:

    response = litellm.completion(model=model, messages=messages)
    return response.choices[0].message.content


model = "gpt-4o-mini"  # Replace with any litellm supported model, make sure to set OPENAI_API_KEY

system_prompt = f"""
You are a helpful assistant that reviews papers.
You are given a list of papers and their abstracts.
Your goal is to review the papers for a research paper on the given topic.

"""

user_prompt = f"""This will be a paragraph in a scientific paper.
Explain what Masked Autoencoders are, why they are important, and the different innovations listed in the follow-up papers.
Cite the papers and their different contributions to the field of Masked Autoencoders.

Here are the relevant papers and their abstracts:
{paper_summaries}
"""


messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]


In [4]:
lit_review = get_completion(messages)
print(lit_review)

Masked Autoencoders (MAEs) are a class of self-supervised learning models that have gained prominence for their ability to learn robust representations by reconstructing masked portions of input data. Initially popularized in the context of images, these models have been extended to spatiotemporal domains, particularly for video data. The significance of MAEs lies in their capacity to efficiently utilize unlabeled data, enabling the extraction of meaningful patterns without costly annotation efforts. These approaches have shown remarkable performance in various computer vision tasks, establishing a framework for learning that parallels successful techniques in natural language processing, such as BERT.

Recent innovations in the field of MAEs have built upon this foundational architecture to address specific limitations and enhance representation learning. For instance, **Feichtenhofer et al. (2022)** introduced a spatiotemporal extension of MAEs, demonstrating that high masking ratios