In [1]:
%%time

import fitz  

pdf_path = "XCon Learning with Experts for Fine-grained Category Discovery.pdf"

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)  
    all_text = ""  

    for page in doc:
        all_text += page.get_text() 

    doc.close() 
    return all_text

pdf_text = extract_text_from_pdf(pdf_path)
print(pdf_text)

Y. FEI ET AL.: LEARNING WITH EXPERTS FOR FINE-GRAINED CATEGORY DISCOVERY 1
XCon: Learning with Experts for
Fine-grained Category Discovery
Yixin Fei1
yixin.feiyx@gmail.com
Zhongkai Zhao1
zhongkai.zhaok@gmail.com
Siwei Yang1,3
swyang.ac@gmail.com
Bingchen Zhao2,3
zhaobc.gm@gmail.com
1 Tongji University
Shanghai, China
2 University of Edinburgh,
Edinburgh, UK
3 LunarAI
Abstract
We address the problem of generalized category discovery (GCD) in this paper, i.e.
clustering the unlabeled images leveraging the information from a set of seen classes,
where the unlabeled images could contain both seen classes and unseen classes. The seen
classes can be seen as an implicit criterion of classes, which makes this setting different
from unsupervised clustering where the cluster criteria may be ambiguous. We mainly
concern the problem of discovering categories within a ﬁne-grained dataset since it is one
of the most direct applications of category discovery, i.e. helping experts discover novel
conce

In [4]:
%%time

from transformers import BartTokenizer, BartForConditionalGeneration

def chunk_text(text, max_tokens=1024):
    chunks = []
    for i in range(0, len(text), max_tokens):
        chunks.append(text[i:i+max_tokens])
    return chunks

def summarize_chunks(chunks):
    summaries = []
    tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
    model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
    for chunk in chunks:
        inputs = tokenizer(chunk, return_tensors="pt", max_length=1024, truncation=True)
        summary_ids = model.generate(inputs.input_ids, max_length=150, early_stopping=True)
        summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
        summaries.append(summary)
    return summaries

def concatenate_summaries(summaries):
    return " ".join(summaries)

# Your long text
long_text = pdf_text

# Step 1: Chunk the text
text_chunks = chunk_text(long_text)

# Step 2: Summarize each chunk
chunk_summaries = summarize_chunks(text_chunks)

# Step 3: Concatenate the summaries
final_summary = concatenate_summaries(chunk_summaries)
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
tokens = tokenizer.tokenize(final_summary)
number_of_tokens = len(tokens)
print("number_of_tokens: ", number_of_tokens)

print(final_summary)

number_of_tokens:  2636
Y. FEI ET AL.: LEARNING WITH EXPERTS FOR FINE-GRAINED CATEGORY DISCOVERY 1                XCon: Learning with Experts forFine-grained Category Discovery. We address the problem of generalized category discovery (GCD) in this paper. Expert-Contrastive Learning (XCon) is a novel method to mine useful information from the images. It uses k-means clustering and then performing contrastive learning on each sub-dataset to learn discriminative features. Experiments show a clear improved performance over the previous best methods, demonstrating the effectiveness of our method. ations are available, such as image recogni-tion [5] and object detection. However, collecting a dataset at scales like ImageNet or COCO is not always possible. The problem of generalized category discovery was recently formalized in [26] The aim is to discover categories within the unlabeled data by leveraging the information. Clusters formed by DINO features are mainly based on the class irrelev

In [3]:
%%time

from transformers import pipeline, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def summarize_text(text, max_chunk=1024):
    chunks = [text[i:i+max_chunk] for i in range(0, len(text), max_chunk)]
    summarized_text = ""
    for chunk in chunks:
        summary = summarizer(chunk, max_length=200, min_length=50, do_sample=False)
        summarized_text +=  "\n"
        summarized_text += summary[0]['summary_text']
    return summarized_text


summary = ""
summary = summarize_text(pdf_text)
tokens = tokenizer.tokenize(summary)
number_of_tokens = len(tokens)
print("number_of_tokens: ", number_of_tokens)

print(summary)

number_of_tokens:  2258

We address the problem of generalized category discovery (GCD) in this paper, i.e. leveraging the information from a set of seen classes. The seen classes can be seen as an implicit criterion of classes, which makes this setting different from unsupervised clustering where the cluster criteria may be ambiguous. We mainly Concern the problem. of discovering categories within a ﬁne-grained dataset.
Expert-Contrastive Learning (XCon) is a novel method to mine useful information from images. It uses k-means clustering and contrastive learning on each sub-datasets to learn discriminative features. Experiments on ﬁne-grained datasets show a clear improved performance over the previous best methods.
 collecting a dataset at scales like ImageNet or COCO is not always possible. The problem of generalized category discovery was recently formalized in [26] The aim is to discover categories within the unlabeled data by leveraging the information.
Clusters formed by DINO fe

In [3]:
import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="h2oai/h2o-danube-1.8b-chat",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# We use the HF Tokenizer chat template to format each message
# https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {"role": "user", "content": "Please convert the following text into a presentation. Give title and content for each slide. " +summary},
]
prompt = pipe.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
res = pipe(
    prompt,
    max_new_tokens=256,
)
print(res[0]["generated_text"])

model.safetensors:   0%|          | 0.00/3.66G [00:00<?, ?B/s]



generation_config.json:   0%|          | 0.00/161 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

KeyboardInterrupt: 

In [5]:
%%time

from pptx import Presentation

def add_slide(prs, title, content):
    slide_layout = prs.slide_layouts[1]  # Use slide layout index 1 for title slide
    slide = prs.slides.add_slide(slide_layout)
    title_placeholder = slide.shapes.title
    content_placeholder = slide.placeholders[1]

    title_placeholder.text = title
    content_placeholder.text = content

# Create a PowerPoint presentation object
prs = Presentation()

# Add slides with titles and content
slides = [
    ("Introduction", "Paper Title: XCon: Learning with Experts for Fine-grained Category Discovery\n\nProblem Addressed: Generalized Category Discovery (GCD)\n\nMethodology: Expert-Contrastive Learning (XCon)\n\nKey Results: Improved performance over previous methods"),
    ("Background and Motivation", "Challenge: Generic category discovery requires large datasets like ImageNet or COCO, which may not always be feasible.\n\nFormalization of GCD: Leveraging unlabeled data to discover categories, focusing on fine-grained concepts.\n\nLimitation of Existing Approaches: Unsupervised representations may cluster data based on irrelevant cues.\n\nProposed Solution: Expert Contrastive Learning (XCon) to eliminate negative influences and discover fine-grained categories effectively."),
    ("Methodology Overview", "XCon Method: Partition data into k expert sub-datasets using k-means clustering.\n\nEach sub-dataset treated as an expert dataset to eliminate negative influences.\n\nObjective: Learn discriminative features for fine-grained category discovery."),
    ("Contrastive Learning in XCon", "Utilizing k-means grouping on self-supervised features for informative contrastive pairs.\n\nJoint contrastive representation learning on partitioned sub-datasets.\n\nClear performance improvements over previous GCD methods with contrastive learning."),
    ("Representation Learning Challenges", "Challenge: Representations need to be sensitive to detailed discriminative traits.\n\nLeveraging self-supervised representations for rough clustering based on overall image statistics.\n\nProposed approach: Supervised and self-supervised contrastive loss to fine-tune the model."),
    ("Evaluation Metrics", "Splitting training data into labeled (Dl) and unlabeled (Du) datasets.\n\nMeasuring performance using clustering accuracy (ACC) on the unlabeled set."),
    ("Experimental Setup", "Backbone: ViT-B-16\n\nBatch size: 256\n\nTraining epochs: 60 for ImageNet dataset\n\nImplementation: Projection heads as three-layer MLPs"),
    ("Results on Generic Datasets", "Comparison with state-of-the-art methods on CIFAR10, 100, 200, and Stanford Cars.\n\nXCon consistently outperforms baseline methods, demonstrating robust effectiveness."),
    ("Results on Fine-grained Datasets", "Performance improvements on CUB-200 and Stanford Cars benchmarks.\n\nXCon's effectiveness across different α values analyzed."),
    ("Qualitative Analysis", "Visualization of features using t-SNE for qualitative comparison.\n\nClear boundaries between different groups with XCon, corresponding to specific categories."),
    ("Conclusion", "Proposal of XCon for generalized category discovery with self-supervised representation.\n\nImproved performance on image classification benchmarks, validating the method's effectiveness."),
    ("Acknowledgments", "Acknowledgment of compute support from LunarAI."),
    ("References", "Relevant papers and resources cited in the presentation for further reading.")
]

for slide_title, slide_content in slides:
    add_slide(prs, slide_title, slide_content)

# Save the presentation
prs.save("presentation.pptx")

CPU times: user 71.7 ms, sys: 7.98 ms, total: 79.6 ms
Wall time: 99.5 ms
