Why use of CNN char embedding? #41

cbockman · 2019-05-09T02:18:11Z

"To keep things simple, we use minimal task specific architectures atop BERT-Base and SCIBERT embeddings. Each token is represented as the concatenation of its BERT embedding with a
CNN-based character embedding. If the token has multiple BERT subword units, we use the first one."

Why the use of an additional CNN-based char embeddings? Many (most?) papers using BERT (or similar) solely use the embedding coming out of the LM-based model.

Was there a big additional uptick from layering in the CNN-based char embeddings?

brendan-ai2 · 2019-06-28T18:16:21Z

@ibeltagy , @kyleclo we're getting questions about this issue on AllenNLP's project. Any chance you could follow up? Thanks!

FYI @arunzz

kyleclo · 2019-06-28T20:17:34Z

Hey @cbockman, your point is well-taken. I'm currently in the process of re-doing the evaluations without these embeddings and there's a minor uptick in the performance due to these. Since BERT-Base was also evaluated using char embeddings, the relative difference between BERT-Base and SciBERT hasn't changed. For example, BC5CDR sees a performance change from 88.94 -> 88.73 (SciBERT-SciVocab) and 85.72 -> 85.08 (BERT-Base) when removing the char embeddings. I'll release the full set of results once they're ready.

cbockman · 2019-06-28T20:25:53Z

Thanks. What was the rationalization for including? (Empirical > theoretical, of course...) BERT is "supposed" to encapsulate this information (via subwords), anyway. Was this an attempted way to deal with the fact that the BERT layer was frozen (and thus perhaps not able to fully integrate the domain-specific learnings)?

kyleclo · 2019-06-28T22:59:35Z

@cbockman There wasn't any rationale for including char embeddings. We ran the experiments with a standard NER configuration in AllenNLP that had character-level embeddings & noticed afterwards that it included them. Since experiments are a bit expensive & we felt like it was still a fair comparison between BERT-Base and SciBERT, we didn't rerun everything & reported that we included char embeddings in the arXiv draft. We've since decided that it's worth redoing the experiments to exclude the char embeddings & will update the draft when they're done.

cbockman · 2019-06-29T01:03:21Z

Thanks! Love the paper (probably obviously).

arunzz · 2019-07-01T04:32:56Z

Thanks! Will wait for score without char embeddings, Thanks again !

brendan-ai2 mentioned this issue Jun 28, 2019

BERT NER training approach allenai/allennlp#3009

Closed

kyleclo self-assigned this Jun 28, 2019

kyleclo closed this as completed Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why use of CNN char embedding? #41

Why use of CNN char embedding? #41

cbockman commented May 9, 2019

brendan-ai2 commented Jun 28, 2019

kyleclo commented Jun 28, 2019

cbockman commented Jun 28, 2019

kyleclo commented Jun 28, 2019

cbockman commented Jun 29, 2019

arunzz commented Jul 1, 2019

Why use of CNN char embedding? #41

Why use of CNN char embedding? #41

Comments

cbockman commented May 9, 2019

brendan-ai2 commented Jun 28, 2019

kyleclo commented Jun 28, 2019

cbockman commented Jun 28, 2019

kyleclo commented Jun 28, 2019

cbockman commented Jun 29, 2019

arunzz commented Jul 1, 2019