# In-Context Learning


In-context learning is a generalisation of few-shot learning where the LLM is provided a context as part of the prompt and asked to respond by utilising the information in the context.

* Example: *"Summarize this research article into one paragraph highlighting its strengths and weaknesses: [insert article text]”*
* Example: *"Extract all the quotes from this text and organize them in alphabetical order: [insert text]”*

A very popular technique that you will learn in week 5 called Retrieval-Augmented Generation (RAG) is a form of in-context learning, where:
* a search engine is used to retrieve some relevant information
* that information is then provided to the LLM as context


In this example we download some recent research papers from arXiv papers, extract the text from the PDF files and ask Gemini to summarize the articles as well as provide the main strengths and weaknesses of the papers. Finally we print the summaries to a local html file and as markdown.

In [16]:
import os
import requests
from bs4 import BeautifulSoup
import google.generativeai as genai
from urllib.request import urlopen, urlretrieve
from IPython.display import Markdown, display
from pypdf import PdfReader
from datetime import date
from tqdm import tqdm

In [17]:
#API_KEY = os.environ.get("GEMINI_API_KEY")
API_KEY = "AIzaSyBSAzkemsULoVmt720vZmzZU3MS0DdTWdY"
genai.configure(api_key=API_KEY)

We select those papers that have been featured in Hugging Face papers.

In [18]:
BASE_URL = "https://huggingface.co/papers"
page = requests.get(BASE_URL)
soup = BeautifulSoup(page.content, "html.parser")
h3s = soup.find_all("h3")

papers = []

for h3 in h3s:
    a = h3.find("a")
    title = a.text
    link = a["href"].replace('/papers', '')

    papers.append({"title": title, "url": f"https://arxiv.org/pdf{link}"})

Code to extract text from PDFs.

In [19]:
def extract_paper(url):
    html = urlopen(url).read()
    soup = BeautifulSoup(html, features="html.parser")

    # kill all script and style elements
    for script in soup(["script", "style"]):
        script.extract()    # rip it out

    # get text
    text = soup.get_text()

    # break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)

    return text


def extract_pdf(url):
    pdf = urlretrieve(url, "pdf_file.pdf")
    reader = PdfReader("pdf_file.pdf")
    text = ""
    for page in reader.pages:
        text += page.extract_text() + "\n"
    return text


def printmd(string):
    display(Markdown(string))

In [20]:
LLM = "gemini-1.5-flash"
model = genai.GenerativeModel(LLM)

We use Gemini to summarize the papers.

In [21]:
for paper in tqdm(papers):
    try:
        paper["summary"] = model.generate_content("Summarize this research article into one paragraph without formatting highlighting its strengths and weaknesses. " + extract_pdf(paper["url"])).text
    except:
        print("Generation failed")
        paper["summary"] = "Paper not available"

100%|██████████| 4/4 [00:20<00:00,  5.07s/it]


We print the results to a html file.

In [22]:
page = f"<html> <head> <h1>Daily Dose of AI Research</h1> <h4>{date.today()}</h4> <p><i>Summaries generated with: {LLM}</i>"
with open("papers.html", "w") as f:
    f.write(page)
for paper in papers:
    page = f'<h2><a href="{paper["url"]}">{paper["title"]}</a></h2> <p>{paper["summary"]}</p>'
    with open("papers.html", "a") as f:
        f.write(page)
end = "</head>  </html>"
with open("papers.html", "a") as f:
    f.write(end)

We can also print the results to this notebook as markdown.

In [23]:
for paper in papers:
    printmd("**[{}]({})**<br>{}<br><br>".format(paper["title"],
                                                paper["url"],
                                                paper["summary"]))

**[YuLan-Mini: An Open Data-efficient Language Model](https://arxiv.org/pdf/2412.17743)**<br>This research paper details the development of YuLan-Mini, a 2.42B parameter language model achieving state-of-the-art performance for its size.  Its strengths lie in a meticulously designed data pipeline incorporating data cleaning, scheduling strategies, and synthetic data generation, particularly for reasoning tasks.  A robust optimization method effectively mitigates training instability, even with a large learning rate,  and an annealing approach with targeted data selection and long-context training further enhances performance.  The authors' open-sourcing of the training details and data composition promotes reproducibility. However, a weakness is the relatively limited training data (1.08T tokens) compared to industry models, although still significantly less than those achieving comparable performance.  Further limitations include the relatively short context length achieved (28K tokens) due to resource constraints and the challenge of fully replicating baseline model results due to incomplete public reporting of their methodologies.
<br><br>

**[A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression](https://arxiv.org/pdf/2412.17483)**<br>This research paper comprehensively investigates gist token-based context compression methods for improving long-context processing in large language models (LLMs).  The study finds that while these methods achieve near-lossless performance on tasks like retrieval-augmented generation and long-document QA, they struggle with tasks requiring precise recall, like synthetic recall.  The authors identify three key failure patterns stemming from compression bottlenecks: information loss at segment boundaries, preferential retention of contextually relevant information, and information loss during multi-step reasoning. To address these weaknesses, they propose two effective strategies: fine-grained autoencoding and segment-wise token importance estimation, which significantly improve performance, especially under lower compression ratios. A strength of the study is its thorough experimental evaluation across diverse tasks and its insightful analysis of failure modes. However, a limitation is the scope of compression methods considered, focusing solely on gist token-based approaches and not accounting for the potential impact of using larger model sizes or different training data.
<br><br>

**[Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation](https://arxiv.org/pdf/2412.18176)**<br>Molar is a novel multimodal large language model (MLLM) framework for sequential recommendation that addresses the limitations of existing LLM-based approaches by integrating collaborative filtering.  Its strengths lie in its use of an MLLM to generate unified item representations from textual and non-textual data, enriching item embeddings, and a post-alignment mechanism that effectively combines content-based and ID-based user representations for improved personalization.  Experiments show significant performance improvements over traditional and other LLM-based methods across multiple datasets. However, a weakness is the computationally expensive multi-task fine-tuning required for optimal performance, potentially hindering real-time applications.  Furthermore, the reliance on a pre-trained MLLM means that the quality of the recommendations is dependent on the capabilities of that underlying model.
<br><br>

**[MMFactory: A Universal Solution Search Engine for Vision-Language Tasks](https://arxiv.org/pdf/2412.18072)**<br>MMFactory is a novel framework designed as a universal solution search engine for vision-language tasks.  It addresses limitations of existing methods by proposing a diverse pool of programmatic solutions composed of various vision, language, and vision-language models, tailored to user-specified tasks, sample input-output pairs, and resource constraints.  A committee-based solution proposer, leveraging multi-agent LLM conversation, generates executable and robust solutions.  Experimental results demonstrate state-of-the-art performance on benchmark datasets.  However,  a weakness is the computational cost of the multi-agent system, although the framework mitigates this by generating reusable solutions applicable across all task instances, reducing the overall API call cost compared to sample-specific approaches.  Further research could explore optimizing the multi-agent conversation process to reduce runtime while maintaining solution quality.
<br><br>

In [24]:
# Modified prompt for tabulated analysis
for paper in tqdm(papers):
    try:
        prompt = """Analyze this research article and provide:
1. A brief one-sentence summary
2. Key strengths (list 2-3 points)
3. Key weaknesses (list 2-3 points)

Format the response as follows:
Summary: [one sentence]
| Strengths | Weaknesses |
| --- | --- |
| [strength 1] | [weakness 1] |
| [strength 2] | [weakness 2] |
| [strength 3] | [weakness 3] |

Article text: """ + extract_pdf(paper["url"])

        paper["analysis"] = model.generate_content(prompt).text
    except:
        print("Generation failed")
        paper["analysis"] = "Paper not available"

# Modified markdown printing
for paper in papers:
    printmd(f"""**[{paper['title']}]({paper['url']})**\n
{paper['analysis']}\n\n---\n""")

100%|██████████| 4/4 [00:20<00:00,  5.12s/it]


**[YuLan-Mini: An Open Data-efficient Language Model](https://arxiv.org/pdf/2412.17743)**

Summary: YuLan-Mini is a data-efficient 2.42B parameter language model achieving top-tier performance among similarly sized models by employing an elaborate data pipeline, robust optimization methods, and an effective annealing approach.

| Strengths | Weaknesses |
|---|---|
| Achieves top-tier performance comparable to much larger models with significantly less training data (data efficiency). | Limited long-context capabilities due to resource constraints on long context training. |
| Openly releases full training details and data composition to facilitate reproducibility. |  Lack of detailed comparison with baseline models due to missing information on baseline evaluation setups. |
| Employs multiple innovative techniques to improve training stability and efficiency, such as a combined µP-like initialization with WeSaR re-parameterization, and fused kernels for faster training. |




---


**[A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression](https://arxiv.org/pdf/2412.17483)**

Summary: This research comprehensively investigates gist token-based context compression for long-context processing in large language models, identifying key failure patterns and proposing strategies to mitigate them.

| Strengths | Weaknesses |
|---|---|
| Thorough investigation and comprehensive evaluation of gist token-based context compression across various tasks and architectures. | Limited model scale and context length in experiments, restricting generalizability to larger models and longer contexts. |
| Identification of three critical failure patterns (lost by the boundary, lost if surprise, and lost along the way) providing valuable insights into the limitations of current methods. |  Focus solely on gist token-based compression; exclusion of other context compression methods prevents a broader comparison and limits the conclusions. |
| Proposal of two effective strategies (fine-grained autoencoding and segment-wise token importance estimation) to improve compression performance. |




---


**[Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation](https://arxiv.org/pdf/2412.18176)**

Summary: Molar, a novel multimodal large language model framework, enhances sequential recommendation by integrating multiple content modalities with collaborative filtering signals through a post-alignment mechanism, significantly outperforming traditional and existing LLM-based methods.

| Strengths | Weaknesses |
|---|---|
|  Effectively integrates multimodal data (text, images) with collaborative filtering signals for improved accuracy and robustness. | Requires multi-task fine-tuning, which can be computationally expensive and time-consuming. |
|  Post-alignment mechanism prevents premature integration of collaborative filtering, preserving the strengths of both LLM and traditional methods. | Performance heavily depends on the underlying capabilities of the MLLM used; suboptimal base models can degrade overall performance. |
| Consistently outperforms traditional and state-of-the-art LLM-based baselines across multiple datasets. | Unable to train larger LLMs due to computational constraints. |



---


**[MMFactory: A Universal Solution Search Engine for Vision-Language Tasks](https://arxiv.org/pdf/2412.18072)**

Summary: MMFactory is a universal framework that acts as a solution search engine for vision-language tasks, suggesting multiple programmatic solutions based on user-defined tasks, sample input-output pairs, and constraints, and benchmarking their performance and resource characteristics.

| Strengths | Weaknesses |
|---|---|
| Proposes multiple programmatic solutions for a given task, allowing users to choose the best option based on their constraints. | The framework relies on the availability of a suitable pool of pre-trained models, which might limit its applicability if such models are not accessible or insufficient for a specific task. |
| Generates solutions applicable to all instances of a user-defined task, reducing the need for repeated solution generation for individual samples. | The computational cost of running the multi-agent system and generating solutions can be high, especially as the number of solutions in the pool increases. |
| Leverages a multi-agent system for solution proposal, enhancing the robustness and quality of generated solutions.  |  The reliance on large language models as the core component makes the method dependent on the capabilities and biases of these models. |



---


In [25]:
# Modified HTML printing
page = f"""<html>
<head>
    <style>
        table {{
            border-collapse: collapse;
            width: 100%;
            margin: 20px 0;
        }}
        th, td {{
            border: 1px solid #ddd;
            padding: 8px;
            text-align: left;
        }}
        th {{
            background-color: #f2f2f2;
        }}
    </style>
    <h1>Daily Dose of AI Research</h1>
    <h4>{date.today()}</h4>
    <p><i>Analysis generated with: {LLM}</i></p>
</head>
<body>"""

with open("papers_table.html", "w") as f:
    f.write(page)


for paper in papers:
    analysis_html = paper['analysis'].replace('|', '</td><td>').replace('\n', '</td></tr><tr><td>')
    page = f"""
    <h2><a href="{paper['url']}">{paper['title']}</a></h2>
    <table>
        <tr><td>{analysis_html}</td></tr>
    </table>
    <hr>"""
    with open("papers_table.html", "a") as f:
        f.write(page)


end = "</body></html>"
with open("papers_table.html", "a") as f:
    f.write(end)

In [26]:
# Open source model setup
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-mini-instruct",
    device_map="cuda" if torch.cuda.is_available() else "cpu", # Use GPU if available
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

generation_args = {
    "max_new_tokens": 500, # Reduced max tokens
    "return_full_text": False,
    "temperature": 0.1, # Increased temperature slightly
    "do_sample": True,
    "top_p": 0.9, # Added top_p sampling
}

# Modified paper analysis loop for the open source model
for paper in tqdm(papers):
    try:
        # Limit the input text length
        pdf_text = extract_pdf(paper["url"])
        max_input_length = 2000
        truncated_text = pdf_text[:max_input_length]

        messages = [{
            "role": "system",
            "content": "You are a research paper analyzer. Provide analysis in a table format with strengths and weaknesses."
        }, {
            "role": "user",
            "content": f"""Analyze this research article and provide:
1. A brief one-sentence summary
2. Key strengths (list 2-3 points)
3. Key weaknesses (list 2-3 points)

Format the response as follows:
Summary: [one sentence]
| Strengths | Weaknesses |
| --- | --- |
| [strength 1] | [weakness 1] |
| [strength 2] | [weakness 2] |
| [strength 3] | [weakness 3] |

Article text: {truncated_text}"""
        }]

        paper["analysis"] = pipe(messages, **generation_args)[0]['generated_text']
    except Exception as e:
        print(f"Generation failed for {paper['title']}: {e}")
        paper["analysis"] = "Paper not available"

# Modified markdown printing
for paper in papers:
    printmd(f"""**[{paper['title']}]({paper['url']})**\n
{paper['analysis']}\n\n---\n""")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
100%|██████████| 4/4 [01:59<00:00, 29.97s/it]


**[YuLan-Mini: An Open Data-efficient Language Model](https://arxiv.org/pdf/2412.17743)**

 Summary: YuLan-Mini is a highly efficient language model with 2.42B parameters that delivers top-tier performance with a significantly reduced data requirement compared to industry-leading models, achieved through an innovative pre-training approach.

| Strengths | Weaknesses |
| --- | --- |
| 1. Achieves top-tier performance with a significantly smaller dataset (1.08T tokens) compared to industry standards, demonstrating data efficiency. | 1. The paper may not fully address the potential limitations or challenges in scaling the model beyond the current parameter size or data volume. |
| 2. Introduces a novel pre-training approach with three key technical contributions (data pipeline, robust optimization, and effective annealing), which could be beneficial for future research and development in the field. | 2. The paper's detailed technical report might be complex and require a deep understanding of machine learning and language modeling, potentially limiting accessibility for a broader audience. |
| 3. Facilitates reproducibility and further research by releasing full details of the data composition for each training phase and project details on GitHub, promoting transparency and collaboration in the AI community. | 3. The paper does not discuss the potential ethical considerations or biases that may arise from the model's performance, which is an important aspect of responsible AI development and deployment. |

Note: The weaknesses listed are inferred based on common challenges and considerations in AI research and development. The actual weaknesses may vary depending on the specific context and depth of the paper's analysis.

---


**[A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression](https://arxiv.org/pdf/2412.17483)**

 Summary: The study investigates gist-based context compression methods to enhance long-context processing in large language models, revealing near-lossless performance in certain tasks but identifying challenges and failure patterns in others, and proposing strategies to mitigate these issues.

| Strengths | Weaknesses |
| --- | --- |
| 1. Comprehensive investigation of gist-based context compression methods, providing valuable insights into their effectiveness. | 1. Identified specific tasks (e.g., synthetic recall) where gist-based compression faces challenges, indicating limitations in its applicability. |
| 2. Proposed practical strategies (fine-grained autoencoding and segment-wise token importance estimation) to improve compression capabilities and address identified issues. | 2. The study's findings are based on extensive experiments, but the generalizability of the results to other models or tasks may not be fully established. |
| 3. Contribution to the broader field of artificial intelligence by advancing understanding of long-context processing in large language models, which is crucial for future AI development. | 3. The study focuses on gist-based compression, which may not encompass all possible context compression techniques, potentially limiting the scope of future research in this area. |

Note: The strengths and weaknesses provided are inferred from the given abstract and may not cover all aspects of the full research article. A more detailed analysis would require access to the complete text.

---


**[Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation](https://arxiv.org/pdf/2412.18176)**

 Summary: Molar is a novel framework for sequential recommendation that integrates multimodal content with collaborative filtering signals to enhance recommendation accuracy.

| Strengths | Weaknesses |
| --- | --- |
| 1. **Integration of Multimodal Data**: Molar effectively combines textual and non-textual data, enriching item embeddings and capturing a more comprehensive representation of items. | 1. **Complexity and Resource Intensity**: The framework may require significant computational resources due to the integration of multiple modalities and the use of large language models, potentially limiting its scalability or accessibility. |
| 2. **Collaborative Filtering Alignment**: By aligning user representations from content-based and ID-based models, Molar ensures precise personalization, leading to robust performance in recommendation tasks. | 2. **Data Privacy Concerns**: The use of ID information for collaborative filtering could raise privacy concerns, especially if not handled with proper anonymization and security measures. |
| 3. **Superior Performance**: Extensive experiments demonstrate that Molar significantly outperforms traditional and LLM-based baselines, indicating its effectiveness in leveraging multimodal data and collaborative signals for sequential recommendation. | 3. **Generalization Across Diverse Domains**: While Molar shows strong performance, its effectiveness may vary across different domains or datasets, requiring further investigation to ensure its adaptability and generalizability. |

---


**[MMFactory: A Universal Solution Search Engine for Vision-Language Tasks](https://arxiv.org/pdf/2412.18072)**

 Summary: MMFactory is a universal framework that acts as a solution search engine for vision-language tasks, offering a diverse pool of programmatic solutions based on task descriptions, sample inputs/outputs, and user-defined constraints.

| Strengths | Weaknesses |
| --- | --- |
| 1. Provides a universal framework that can handle a variety of vision-language tasks, offering flexibility and accessibility. | 1. The effectiveness of the suggested solutions may depend on the quality and relevance of the input-output pairs provided, which could limit the framework's applicability in some scenarios. |
| 2. Incorporates user-defined constraints such as performance and resource requirements, allowing for more tailored and efficient solutions. | 2. The framework's performance and accuracy might be influenced by the quality and diversity of the models available in its repository, which could limit its effectiveness if the repository is not comprehensive. |
| 3. Utilizes a committee-based solution proposer and leverages multi-agent LLM conversation, potentially enhancing the generation of diverse, universal, and robust solutions. | 3. The complexity of the framework and its reliance on advanced AI components might pose challenges for users with limited technical expertise, potentially limiting its accessibility and ease of use. |
| 4. Offers metrics and benchmarks for performance and resource characteristics, enabling users to make informed decisions based on their unique design constraints. | 4. The framework's ability to synthesize solutions and propose metrics might require significant computational resources, which could be a barrier for users with limited access to such resources. |
| 5. Acts as a solution search engine, streamlining the process of finding suitable models and tools for vision-language tasks, which could save time and effort for users. | 5. The framework's reliance on a model repository means that its effectiveness is contingent on the continuous updating and maintenance of this repository, which could be a potential limitation. |

---
