Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting from integer-tokens to one-hot tokens gives different results. #179

Closed
codetalker7 opened this issue May 14, 2024 · 2 comments
Closed

Comments

@codetalker7
Copy link

I tried to use the "colbert-ir/colbertv2.0" pretrained checkpoint for a task (it's essentially a BERT model + a linear layer, for this issue we only focus on the BERT model). Here is how I loaded the model:

using CUDA
using Flux
using OneHotArrays
using Test
using Transformers
using Transformers.TextEncoders

const PRETRAINED_BERT = "colbert-ir/colbertv2.0"

bert_config = Transformers.load_config(PRETRAINED_BERT)
bert_tokenizer = Transformers.load_tokenizer(PRETRAINED_BERT)
bert_model = Transformers.load_model(PRETRAINED_BERT)

const VOCABSIZE = size(bert_tokenizer.vocab.list)[1]

Now, we'll simply run the bert_model over a bunch of sentences.

docs = [
    "hello world",
    "thank you!",
    "a",
    "this is some longer text, so length should be longer",
]

encoded_text = encode(bert_tokenizer, docs)
ids, mask = encoded_text.token, encoded_text.attention_mask

Above, by default, ids is a OneHotArray. We convert it to an integer matrix, containing integer token IDS:

integer_ids = Matrix(onecold(ids))

As expected, the bert_model gives the same results on the integer-ids as well as the one-hot encodings:

julia> @test isequal(bert_model((token = integer_ids, attention_mask=mask)), bert_model((token = ids, attention_mask=mask)))
Test Passed

Note that we can also convert from integer_ids back to the OneHotArray using the onehotbatch function. Here's a test just for a sanity check:

julia> @test isequal(ids, onehotbatch(integer_ids, 1:VOCABSIZE))             # test passes
Test Passed

However, if we convert back from the integer ids to the one-hot encodings, and use the converted one-hot encodings in the bert_model, the model throws an error:

julia> bert_model((token = onehotbatch(integer_ids, 1:VOCABSIZE), attention_mask=mask))
ERROR: ArgumentError: invalid index: false of type Bool
Stacktrace:
  [1] to_index(i::Bool)
    @ Base ./indices.jl:293
  [2] to_index(A::Matrix{Float32}, i::Bool)
    @ Base ./indices.jl:277
  [3] _to_indices1(A::Matrix{Float32}, inds::Tuple{Base.OneTo{Int64}}, I1::Bool)
    @ Base ./indices.jl:359
  [4] to_indices
    @ ./indices.jl:354 [inlined]
  [5] to_indices
    @ ./indices.jl:355 [inlined]
  [6] to_indices
    @ ./indices.jl:344 [inlined]
  [7] view
    @ ./subarray.jl:176 [inlined]
  [8] _view(X::Matrix{Float32}, colons::Tuple{Colon}, k::Bool)
    @ NNlib ~/.julia/packages/NNlib/Fg3DQ/src/scatter.jl:38
  [9] gather!(dst::Array{Float32, 4}, src::Matrix{Float32}, idx::OneHotArrays.OneHotArray{UInt32, 2, 3, Matrix{UInt32}})
    @ NNlib ~/.julia/packages/NNlib/Fg3DQ/src/gather.jl:107
 [10] gather
    @ ~/.julia/packages/NNlib/Fg3DQ/src/gather.jl:46 [inlined]
 [11] Embed
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/embed.jl:43 [inlined]
 [12] macro expansion
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:108 [inlined]
 [13] WithArg
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:103 [inlined]
 [14] apply_on_namedtuple
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:80 [inlined]
 [15] macro expansion
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/layer.jl:0 [inlined]
 [16] (::Transformers.Layers.CompositeEmbedding{Tuple{Transformers.Layers.WithArg{(:token,), Transformers.Layers.Embed{Nothing, Matrix{Float32}}}, Transformers.Layers.WithOptArg{(:hidden_state,), (:position,), Transformers.Layers.ApplyEmbed{Base.Broadcast.BroadcastFunction{typeof(+)}, Transformers.Layers.FixedLenPositionEmbed{Matrix{Float32}}, typeof(identity)}}, Transformers.Layers.WithOptArg{(:hidden_state,), (:segment,), Transformers.Layers.ApplyEmbed{Base.Broadcast.BroadcastFunction{typeof(+)}, Transformers.Layers.Embed{Nothing, Matrix{Float32}}, typeof(Transformers.HuggingFace.bert_ones_like)}}}})(nt::NamedTuple{(:token, :attention_mask), Tuple{OneHotArrays.OneHotArray{UInt32, 2, 3, Matrix{UInt32}}, NeuralAttentionlib.LengthMask{1, Vector{Int32}}}})
    @ Transformers.Layers ~/.julia/packages/Transformers/lD5nW/src/layers/layer.jl:620
 [17] apply_on_namedtuple
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:80 [inlined]
 [18] macro expansion
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:0 [inlined]
 [19] Chain
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:319 [inlined]
 [20] (::Transformers.HuggingFace.HGFBertModel{Transformers.Layers.Chain{Tuple{Transformers.Layers.CompositeEmbedding{Tuple{Transformers.Layers.WithArg{(:token,), Transformers.Layers.Embed{Nothing, Matrix{Float32}}}, Transformers.Layers.WithOptArg{(:hidden_state,), (:position,), Transformers.Layers.ApplyEmbed{Base.Broadcast.BroadcastFunction{typeof(+)}, Transformers.Layers.FixedLenPositionEmbed{Matrix{Float32}}, typeof(identity)}}, Transformers.Layers.WithOptArg{(:hidden_state,), (:segment,), Transformers.Layers.ApplyEmbed{Base.Broadcast.BroadcastFunction{typeof(+)}, Transformers.Layers.Embed{Nothing, Matrix{Float32}}, typeof(Transformers.HuggingFace.bert_ones_like)}}}}, Transformers.Layers.DropoutLayer{Transformers.Layers.LayerNorm{Vector{Float32}, Vector{Float32}, Float32}, Nothing}}}, Transformer{NTuple{12, Transformers.Layers.PostNormTransformerBlock{Transformers.Layers.DropoutLayer{Transformers.Layers.SelfAttention{NeuralAttentionlib.MultiheadQKVAttenOp{Nothing}, Transformers.Layers.Fork{Tuple{Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}, Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}, Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}}}, Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}}, Nothing}, Transformers.Layers.LayerNorm{Vector{Float32}, Vector{Float32}, Float32}, Transformers.Layers.DropoutLayer{Transformers.Layers.Chain{Tuple{Transformers.Layers.Dense{typeof(gelu), Matrix{Float32}, Vector{Float32}}, Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}}}, Nothing}, Transformers.Layers.LayerNorm{Vector{Float32}, Vector{Float32}, Float32}}}, Nothing}, Transformers.Layers.Branch{(:pooled,), (:hidden_state,), Transformers.HuggingFace.BertPooler{Transformers.Layers.Dense{typeof(tanh_fast), Matrix{Float32}, Vector{Float32}}}}})(nt::NamedTuple{(:token, :attention_mask), Tuple{OneHotArrays.OneHotArray{UInt32, 2, 3, Matrix{UInt32}}, NeuralAttentionlib.LengthMask{1, Vector{Int32}}}})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/lD5nW/src/huggingface/implementation/bert/load.jl:51
 [21] top-level scope
    @ REPL[26]:1
 [22] top-level scope
    @ ~/.julia/packages/CUDA/s5N6v/src/initialization.jl:190

Am I missing something here?

@chengchingwen
Copy link
Owner

You should use integer_ids = reinterpret(Int32, ids) and OneHotArray{VOCABSIZE}(integer_ids). The OneHotArray used in Transformers and Flux is different and the error happened because that OneHotArray does not overload gather

@codetalker7
Copy link
Author

Thanks for this! I didn't notice that the package was using it's own OneHotArray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants