You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In-context learning (ICL) in Large Language Models (LLMs) has emerged as apowerful new learning paradigm. However, its underlying mechanism is still notwell understood. In particular, it is challenging to map it to the "standard"machine learning framework, where one uses a training set $S$ to find abest-fitting function $f(x)$ in some hypothesis class. Here we make progress onthis problem by showing that the functions learned by ICL often have a verysimple structure: they correspond to the transformer LLM whose only inputs arethe query $x$ and a single "task vector" calculated from the training set.Thus, ICL can be seen as compressing $S$ into a single task vector$\boldsymbol{\theta}(S)$ and then using this task vector to modulate thetransformer to produce the output. We support the above claim via comprehensiveexperiments across a range of models and tasks.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: