# In-Context Learning


In-context learning is a generalisation of few-shot learning where the LLM is provided a context as part of the prompt and asked to respond by utilising the information in the context.

* Example: *"Summarize this research article into one paragraph highlighting its strengths and weaknesses: [insert article text]”*
* Example: *"Extract all the quotes from this text and organize them in alphabetical order: [insert text]”*

A very popular technique that you will learn in week 5 called Retrieval-Augmented Generation (RAG) is a form of in-context learning, where:
* a search engine is used to retrieve some relevant information
* that information is then provided to the LLM as context


In this example we download some recent research papers from arXiv papers, extract the text from the PDF files and ask Gemini to summarize the articles as well as provide the main strengths and weaknesses of the papers. Finally we print the summaries to a local html file and as markdown.

In [4]:
import os
import requests
from bs4 import BeautifulSoup
import google.generativeai as genai
from urllib.request import urlopen, urlretrieve
from IPython.display import Markdown, display
from pypdf import PdfReader
from datetime import date
from tqdm import tqdm

In [5]:
API_KEY = os.environ.get("GEMINI_API_KEY")
genai.configure(api_key=API_KEY)

We select those papers that have been featured in Hugging Face papers.

In [6]:
BASE_URL = "https://huggingface.co/papers"
page = requests.get(BASE_URL)
soup = BeautifulSoup(page.content, "html.parser")
h3s = soup.find_all("h3")

papers = []

for h3 in h3s:
    a = h3.find("a")
    title = a.text
    link = a["href"].replace('/papers', '')

    papers.append({"title": title, "url": f"https://arxiv.org/pdf{link}"})

Code to extract text from PDFs.

In [7]:
def extract_paper(url):
    html = urlopen(url).read()
    soup = BeautifulSoup(html, features="html.parser")

    # kill all script and style elements
    for script in soup(["script", "style"]):
        script.extract()    # rip it out

    # get text
    text = soup.get_text()

    # break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)

    return text


def extract_pdf(url):
    pdf = urlretrieve(url, "pdf_file.pdf")
    reader = PdfReader("pdf_file.pdf")
    text = ""
    for page in reader.pages:
        text += page.extract_text() + "\n"
    return text


def printmd(string):
    display(Markdown(string))

In [8]:
LLM = "gemini-1.5-flash"
model = genai.GenerativeModel(LLM)

We use Gemini to summarize the papers.

In [9]:
for paper in tqdm(papers):
    try:
        paper["summary"] = model.generate_content("Summarize this research article into one paragraph without formatting highlighting its strengths and weaknesses. " + extract_pdf(paper["url"])).text
    except:
        print("Generation failed")
        paper["summary"] = "Paper not available"

100%|██████████| 4/4 [00:23<00:00,  5.89s/it]


We print the results to a html file.

In [16]:
page = f"<html> <head> <h1>Daily Dose of AI Research</h1> <h4>{date.today()}</h4> <p><i>Summaries generated with: {LLM}</i>"
with open("papers.html", "w") as f:
    f.write(page)
for paper in papers:
    page = f'<h2><a href="{paper["url"]}">{paper["title"]}</a></h2> <p>{paper["summary"]}</p>'
    with open("papers.html", "a") as f:
        f.write(page)
end = "</head>  </html>"
with open("papers.html", "a") as f:
    f.write(end)

We can also print the results to this notebook as markdown.

In [17]:
for paper in papers:
    printmd("**[{}]({})**<br>{}<br><br>".format(paper["title"],
                                                paper["url"],
                                                paper["summary"]))

**[YuLan-Mini: An Open Data-efficient Language Model](https://arxiv.org/pdf/2412.17743)**<br>This research paper introduces YuLan-Mini, a 2.42B parameter language model achieving top-tier performance among similarly sized models.  Its strengths lie in a data-efficient pre-training approach encompassing three key contributions: a refined data pipeline combining cleaning and scheduling strategies, a robust optimization method to mitigate training instability (using techniques like µP initialization and WeSaR re-parameterization), and an effective annealing approach incorporating targeted data selection and long-context training.  Remarkably, YuLan-Mini achieves performance comparable to larger models trained on substantially more data.  However, a weakness is the limited context length (28K) due to resource constraints, hindering a full comparison with models offering longer contexts.  Another potential weakness is the reliance on a relatively small number of benchmark tests, which might not fully capture the model's capabilities across all tasks.  Despite these limitations, the open-sourcing of the model and training details is a significant strength, facilitating reproducibility within the research community.
<br><br>

**[A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression](https://arxiv.org/pdf/2412.17483)**<br>This research paper comprehensively investigates gist token-based context compression methods for enhancing long-context processing in large language models (LLMs).  The study finds that while this approach achieves near-lossless performance on tasks like retrieval-augmented generation and long-document QA, especially with a fine-grained key-value cache architecture, it struggles with tasks requiring precise recall, such as synthetic recall.  The authors identify three key failure patterns stemming from compression bottlenecks: information loss at segment boundaries, preferential retention of contextually relevant information, and gradual information loss during multi-step processes. To address these limitations, they propose two effective strategies: fine-grained autoencoding and segment-wise token importance estimation. While these strategies significantly improve performance, particularly under low compression ratios, the study is limited by the scale of models and the scope of compression methods considered, leaving room for future investigation into larger models and broader comparisons with alternative compression techniques.
<br><br>

**[Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation](https://arxiv.org/pdf/2412.18176)**<br>Molar is a novel multimodal large language model (MLLM)-based sequential recommendation framework that addresses limitations of existing LLM approaches by integrating collaborative filtering.  Its strength lies in using an MLLM to generate unified item representations from textual and non-textual data, followed by a post-alignment mechanism that aligns user representations from content-based and ID-based models, improving personalization and robustness.  Experiments show significant performance gains over traditional and other LLM-based methods across multiple datasets. However, a weakness is the computationally intensive multi-task fine-tuning of the MLLM, potentially hindering real-time applications.  Further limitations include reliance on the underlying MLLM's capabilities; suboptimal base models could negatively impact performance.  Future work aims to address these limitations through an end-to-end training framework and the use of larger LLMs.
<br><br>

**[MMFactory: A Universal Solution Search Engine for Vision-Language Tasks](https://arxiv.org/pdf/2412.18072)**<br>MMFactory is a novel framework designed as a universal solution search engine for vision-language tasks.  Its strength lies in its ability to generate a diverse pool of programmatic solutions tailored to user-specified tasks, constraints (e.g., computational resources, performance targets), and a few input-output examples.  This is achieved through a multi-agent LLM system that proposes and refines solutions iteratively, leveraging a repository of vision, language, and vision-language models.  MMFactory also incorporates a metric router to benchmark and compare the performance and resource usage of each proposed solution, allowing users to make informed decisions.  However, a weakness is the computational cost of the multi-agent system, particularly as the number of existing solutions grows. While the framework reduces API calls compared to sample-specific solutions, the initial solution generation time is significant. Additionally, the reliance on powerful LLMs (like GPT-4) limits accessibility and the generalizability of the framework to users without access to such resources.
<br><br>

In [10]:
# Modified prompt for tabulated analysis
for paper in tqdm(papers):
    try:
        prompt = """Analyze this research article and provide:
1. A brief one-sentence summary
2. Key strengths (list 2-3 points)
3. Key weaknesses (list 2-3 points)

Format the response as follows:
Summary: [one sentence]
| Strengths | Weaknesses |
| --- | --- |
| [strength 1] | [weakness 1] |
| [strength 2] | [weakness 2] |
| [strength 3] | [weakness 3] |

Article text: """ + extract_pdf(paper["url"])

        paper["analysis"] = model.generate_content(prompt).text
    except:
        print("Generation failed")
        paper["analysis"] = "Paper not available"

# Modified markdown printing
for paper in papers:
    printmd(f"""**[{paper['title']}]({paper['url']})**\n
{paper['analysis']}\n\n---\n""")

100%|██████████| 4/4 [00:19<00:00,  4.93s/it]


**[YuLan-Mini: An Open Data-efficient Language Model](https://arxiv.org/pdf/2412.17743)**

Summary: YuLan-Mini is a data-efficient 2.42B parameter language model achieving top-tier performance among similarly sized models by employing an elaborate data pipeline, robust optimization methods, and an effective annealing approach.

| Strengths | Weaknesses |
|---|---|
| Achieves top-tier performance comparable to much larger models with significantly less data, demonstrating high data efficiency. | Limited long context capabilities due to resource constraints; context window only extended to 28K tokens. |
| Open-source and reproducible:  The authors provide full details of data composition and training processes, facilitating reproduction by the research community. |  Difficulty in fully reproducing baseline model results due to incomplete information provided by some baseline studies, affecting the fairness of the comparison. |
|  Strong performance across various benchmarks, showcasing its versatility in mathematical reasoning, code generation, and general language understanding tasks. |  |



---


**[A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression](https://arxiv.org/pdf/2412.17483)**

Summary: This research paper comprehensively investigates gist token-based context compression methods for improving long-context processing in large language models, identifying key failure patterns and proposing effective strategies to mitigate them.

| Strengths | Weaknesses |
|---|---|
| Comprehensive evaluation of various gist-based architectures and their performance across diverse tasks. | Limited model scale and context length due to computational resource constraints.  |
| Identification of three key failure patterns (lost by the boundary, lost if surprise, lost along the way) arising from compression bottlenecks, providing valuable insights. |  Focus solely on gist token-based methods; exclusion of other context compression techniques limits the generalizability of findings. |
| Proposal of two novel strategies (fine-grained autoencoding and segment-wise token importance estimation) to significantly improve the effectiveness of context compression. |




---


**[Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation](https://arxiv.org/pdf/2412.18176)**

Summary: Molar, a novel multimodal large language model framework, enhances sequential recommendation by integrating multimodal item information with collaborative filtering signals through a post-alignment mechanism, outperforming existing LLM-based and traditional methods.

| Strengths | Weaknesses |
|---|---|
| Effectively integrates multimodal (text and image) data with ID-based collaborative filtering information for improved recommendation accuracy. |  Requires multi-task fine-tuning, which is computationally expensive and may hinder real-time applications. |
|  Consistently outperforms traditional and other LLM-based sequential recommendation models across multiple datasets. | The performance heavily relies on the underlying capabilities of the MLLM used; suboptimal base models can lead to degraded performance. |
| Employs a post-alignment contrastive learning mechanism to effectively combine content-based and ID-based user embeddings, avoiding the limitations of early fusion. |  Larger LLMs could not be fully trained due to computational constraints, limiting the potential performance gains. |



---


**[MMFactory: A Universal Solution Search Engine for Vision-Language Tasks](https://arxiv.org/pdf/2412.18072)**

Summary: MMFactory is a universal framework that acts as a solution search engine for vision-language tasks, proposing multiple programmatic solutions tailored to user specifications and constraints by combining various vision, language, and vision-language models.

| Strengths | Weaknesses |
|---|---|
| Proposes multiple solutions with performance and resource cost analysis, allowing users to choose the best fit for their needs. | The framework relies heavily on large language models (LLMs), which can be computationally expensive and may not always be accessible to all users. |
| Addresses the limitations of existing methods by generating generalizable solutions applicable to all instances of a task, not just individual samples. |  The evaluation of solutions may be biased by the choice of LLMs used for metric selection and performance assessment.  |
| Uses a multi-agent LLM conversation to create robust and diverse solutions. | The paper lacks a detailed discussion on the limitations and potential biases introduced by using specific LLMs for core functionalities, such as solution and metric routing. |



---


In [11]:
# Modified HTML printing
page = f"""<html>
<head>
    <style>
        table {{
            border-collapse: collapse;
            width: 100%;
            margin: 20px 0;
        }}
        th, td {{
            border: 1px solid #ddd;
            padding: 8px;
            text-align: left;
        }}
        th {{
            background-color: #f2f2f2;
        }}
    </style>
    <h1>Daily Dose of AI Research</h1>
    <h4>{date.today()}</h4>
    <p><i>Analysis generated with: {LLM}</i></p>
</head>
<body>"""

with open("papers_table.html", "w") as f:
    f.write(page)


for paper in papers:
    analysis_html = paper['analysis'].replace('|', '</td><td>').replace('\n', '</td></tr><tr><td>')
    page = f"""
    <h2><a href="{paper['url']}">{paper['title']}</a></h2>
    <table>
        <tr><td>{analysis_html}</td></tr>
    </table>
    <hr>"""
    with open("papers_table.html", "a") as f:
        f.write(page)


end = "</body></html>"
with open("papers_table.html", "a") as f:
    f.write(end)

In [13]:
# Open source model setup
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-mini-instruct",
    device_map="cuda" if torch.cuda.is_available() else "cpu", # Use GPU if available
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

generation_args = {
    "max_new_tokens": 500, # Reduced max tokens
    "return_full_text": False,
    "temperature": 0.1, # Increased temperature slightly
    "do_sample": True,
    "top_p": 0.9, # Added top_p sampling
}

# Modified paper analysis loop for the open source model
for paper in tqdm(papers):
    try:
        # Limit the input text length
        pdf_text = extract_pdf(paper["url"])
        max_input_length = 2000
        truncated_text = pdf_text[:max_input_length]

        messages = [{
            "role": "system",
            "content": "You are a research paper analyzer. Provide analysis in a table format with strengths and weaknesses."
        }, {
            "role": "user",
            "content": f"""Analyze this research article and provide:
1. A brief one-sentence summary
2. Key strengths (list 2-3 points)
3. Key weaknesses (list 2-3 points)

Format the response as follows:
Summary: [one sentence]
| Strengths | Weaknesses |
| --- | --- |
| [strength 1] | [weakness 1] |
| [strength 2] | [weakness 2] |
| [strength 3] | [weakness 3] |

Article text: {truncated_text}"""
        }]

        paper["analysis"] = pipe(messages, **generation_args)[0]['generated_text']
    except Exception as e:
        print(f"Generation failed for {paper['title']}: {e}")
        paper["analysis"] = "Paper not available"

# Modified markdown printing
for paper in papers:
    printmd(f"""**[{paper['title']}]({paper['url']})**\n
{paper['analysis']}\n\n---\n""")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
  0%|          | 0/4 [00:00<?, ?it/s]The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
100%|██████████| 4/4 [01:54<00:00, 28.59s/it]


**[YuLan-Mini: An Open Data-efficient Language Model](https://arxiv.org/pdf/2412.17743)**

 Summary: YuLan-Mini is a highly efficient language model with 2.42B parameters that achieves top performance with significantly less data than industry-leading models, thanks to its innovative pre-training approach.

| Strengths | Weaknesses |
| --- | --- |
| 1. Achieves top-tier performance with a significantly smaller dataset (1.08T tokens) compared to industry standards, demonstrating data efficiency. | 1. The paper may not fully address the potential limitations or challenges in scaling the model beyond the current parameter size or data volume. |
| 2. Introduces a novel pre-training approach with three key technical contributions: an elaborate data pipeline, a robust optimization method, and an effective annealing approach, which could be beneficial for future research and development in the field. | 2. The paper's detailed technical report may be complex and require a deep understanding of machine learning and language modeling, potentially limiting accessibility for a broader audience. |
| 3. Facilitates reproducibility and further research by releasing full details of the data composition for each training phase and providing access to project details on GitHub, promoting transparency and collaboration in the AI community. | 3. The paper does not discuss the potential ethical considerations or biases that may arise from using a highly efficient but large-scale language model, which is an important aspect of responsible AI development. |

Note: The weaknesses listed are inferred based on common challenges and considerations in the field of AI research and development. The actual weaknesses may vary depending on the specific context and further analysis of the paper.

---


**[A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression](https://arxiv.org/pdf/2412.17483)**

 Summary: The study investigates gist-based context compression methods to enhance long-context processing in large language models, revealing near-lossless performance in certain tasks but identifying challenges and failure patterns in others, and proposing strategies to mitigate these issues.

| Strengths | Weaknesses |
| --- | --- |
| 1. Comprehensive investigation of gist-based context compression methods, providing valuable insights into their effectiveness and limitations. | 1. Identified specific tasks (e.g., synthetic recall) where gist-based compression faces challenges, indicating potential limitations in broader applications. |
| 2. Proposed practical strategies (fine-grained autoencoding and segment-wise token importance estimation) to improve compression capabilities, offering actionable solutions for researchers and practitioners. | 2. The study's findings on failure patterns (lost by the boundary, lost if surprise, lost along the way) suggest that further research may be needed to fully understand and address these issues in different contexts. |
| 3. Extensive experimental validation across various tasks (retrieval-augmented generation, long-document QA), demonstrating the potential of gist-based compression in enhancing long-context processing in large language models. | 3. The study's focus on specific tasks and failure patterns may limit the generalizability of its findings, suggesting a need for broader exploration of gist-based context compression in diverse scenarios. |

---


**[Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation](https://arxiv.org/pdf/2412.18176)**

 Summary: Molar is a novel framework for sequential recommendation that integrates multimodal content with collaborative filtering signals, significantly outperforming traditional and LLM-based baselines.

| Strengths | Weaknesses |
| --- | --- |
| 1. **Integration of Multimodal Data**: Molar effectively combines textual and non-textual data, enriching item embeddings and capturing a more comprehensive representation of items. | 1. **Complexity and Resource Intensity**: The framework may require significant computational resources and expertise to implement, which could limit its accessibility and scalability. |
| 2. **Collaborative Filtering Alignment**: The post-alignment mechanism aligns user representations from content-based and ID-based models, ensuring personalized recommendations and robust performance. | 2. **Potential Overfitting**: The sophisticated modeling approach might lead to overfitting, especially if not properly regularized or if the training data is not sufficiently diverse. |
| 3. **Superior Performance**: Extensive experiments demonstrate that Molar significantly outperforms traditional and LLM-based baselines, indicating its effectiveness in leveraging multimodal data and collaborative signals for sequential recommendation tasks. | 3. **Generalization Across Diverse Domains**: While Molar shows superior performance in tested scenarios, its effectiveness across different domains or with varying types of data (e.g., social media, e-commerce) may not be as pronounced without further adaptation or tuning. |

---


**[MMFactory: A Universal Solution Search Engine for Vision-Language Tasks](https://arxiv.org/pdf/2412.18072)**

 Summary: MMFactory is a universal framework that acts as a solution search engine for vision-language tasks, offering a diverse pool of programmatic solutions based on task descriptions, sample inputs/outputs, and user-defined constraints.

| Strengths | Weaknesses |
| --- | --- |
| 1. Provides a universal framework that can handle a variety of vision-language tasks, reducing the need for specialized models for each task. <br> 2. Offers a user-friendly interface by allowing users to input task descriptions, sample inputs/outputs, and constraints, simplifying the process of finding suitable solutions. <br> 3. Incorporates a committee-based solution proposer that leverages multi-agent LLM conversation, enhancing the generation of diverse, universal, and robust solutions. | 1. The effectiveness of MMFactory heavily relies on the quality and diversity of the models available in its repository, which may limit its performance on tasks requiring highly specialized models. <br> 2. The framework's ability to suggest suitable solutions is contingent on the user's ability to accurately describe the task and provide relevant sample inputs/outputs, which may pose a challenge for users with limited technical knowledge or language proficiency. <br> 3. The proposed metrics and benchmarks for performance and resource characteristics may not fully capture the complexity or nuances of certain tasks, potentially leading to suboptimal solution selection.

---
