New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-Shot #1982
Comments
As a general tip, GPT-4, albeit an amazing LLM, is not necessarily the best tool for fact-based information even if you supply it with the source material. As you noticed, there is a risk that GPT-4 gives the wrong answer but that is does not realize it. When it comes to facts, I would advise always checking the source material first as it is important to be able to read the docs as well as the underlying code. Having said that, you can find more about the technique in the documentation:
In other words, the zero-shot topics are assigned first and precede the HDBSCAN clustering. Then, both models are merged. |
Thank you for explaining. That is a very helpful explanation.
…On Fri, May 10, 2024 at 1:44 AM Maarten Grootendorst < ***@***.***> wrote:
As a general tip, GPT-4, albeit an amazing LLM, is not necessarily the
best tool for fact-based information even if you supply it with the source
material. As you noticed, there is a risk that GPT-4 gives the wrong answer
but that is does not realize it. When it comes to facts, I would advise
always checking the source material first as it is important to be able to
read the docs as well as the underlying code.
Having said that, you can find more about the technique in the
documentation
<https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html>
:
This method works as follows. First, we create a number of labels for our
predefined topics and embed them using any embedding model. Then, we
compare the embeddings of the documents with the predefined labels using
cosine similarity. If they pass a user-defined threshold, the zero-shot
topic is assigned to a document. If it does not, then that document, along
with others, will be put through a regular BERTopic model.
In other words, the zero-shot topics are assigned first and precede the
HDBSCAN clustering. Then, both models are merged.
—
Reply to this email directly, view it on GitHub
<#1982 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOOHVEESEZMKS3WI73R6YXLZBRUENAVCNFSM6AAAAABHPRVIA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBTHE3TEOJRG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I have a question about Zero-Shot. I used Zero -shot BERTOPIC to do topic mining for my dissertation. I need to explain in more detail about the process. In the case, zero-shot and HDBSCAN are initiated concurrently or Zero-shot classification precede HDBSCAN clustering? I asked GPT4, at first, it said do HDBSCAN first and then use Zeroshot to label the document. then I give the flowchart to GPT4, it said do Zero-shot first and then HDBSCAN. Then I asked a few questions and GPT4 said look like "Simultaneous Processing Paths: zero shot and HDBSCAN as two paths. So if you could provide more detailed explanation about the process I will appreciate it very much since the committee may ask such questions. Thanks again.
The text was updated successfully, but these errors were encountered: