tuple_filter.py中的疑问 #18

MrRace · 2019-12-11T01:14:20Z

在tuple_filter.py 中的GetData_train函数有如下代码：

        for t in candidate_tuples:
            features = candidate_tuples[t]
            if len(gold_tuple) == len(set(gold_tuple).intersection(set(t))):
                X.append([features[9][0][1]])
                Y.append([1])
            else:
                prop = random.random()
                if prop<0.5:
                    X.append([features[9][0][1]])
                    Y.append([0])

为啥是取[features[9][0][1]] ? 请问下其背后的思考逻辑。谢谢！

The text was updated successfully, but these errors were encountered:

duterscmy · 2019-12-11T02:31:51Z

font{ line-height: 1.6; } ul,ol{ padding-left: 20px; list-style-position: inside; } 是这样，因为之前这一步不仅有bert的相似度特征，还有一些字面匹配的，后来没什么用就放弃了，但是数据里还是保留了这些特征的，9就是bert特征的索引，[0][1]是因为我调的那个bert包得到的数据就得这样取索引。在2019年12月11日 09:14，JaonLiu<notifications@github.com> 写道：在tuple_filter.py 中的GetData_train函数有如下代码： for t in candidate_tuples: features = candidate_tuples[t] if len(gold_tuple) == len(set(gold_tuple).intersection(set(t))): X.append([features[9][0][1]]) Y.append([1]) else: prop = random.random() if prop<0.5: X.append([features[9][0][1]]) Y.append([0]) 为啥是取[features[9][0][1]] ? 请问下其背后的思考逻辑。谢谢！ —You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or unsubscribe.

ZainZhou · 2019-12-11T06:35:36Z

@MrRace 请问楼主你在运行entity_filter.py之后实体的召回率能达到多少？

MrRace · 2019-12-11T07:27:55Z

@duterscmy 那现在上传的这个版本其实仅利用到了BERT的特征？
现在上传的这个版本features数据如下：

(1)这种情况，怎么写 X.append()？
(2)在生成负样本时，这种随机数生成的方式为啥能够确保0.05的负样本比例？
谢谢~

ZainZhou · 2019-12-11T08:14:29Z

@MrRace 我是直接使用的X.append([features[2]])

MrRace · 2019-12-11T08:23:01Z

@MrRace 我是直接使用的X.append([features[2]])
你的feature也是类似的结构吗？

duterscmy · 2019-12-11T08:24:37Z

那就直接append(x[-1])好了不能保证吧就是个大概的负例比例

…

---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月11日(周三) 下午3:28 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18) @duterscmy 那现在上传的这个版本其实仅利用到了BERT的特征？现在上传的这个版本features数据如下： (1)这种情况，怎么写 X.append()？ (2)在生成负样本时，这种随机数生成的方式为啥能够确保0.05的负样本比例？谢谢~ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

MrRace · 2019-12-11T09:22:58Z

@duterscmy 那在SaveFilterCandiT中的new_features = features[0:2]+[features[9][0][1]] 需要改成：
new_features = features ? 还是？

MrRace · 2019-12-11T09:27:44Z

单实体问题中，候选答案可召回的的比例为:0.730
候选答案能覆盖标准查询路径的比例为:0.461

在验证集上逻辑回归筛选后top10 召回率为0.72
单实体问题中，候选答案可召回的的比例为:0.731
候选答案能覆盖标准查询路径的比例为:0.560

@ZainZhou 你的呢？

1234560o · 2019-12-11T09:28:43Z

第二个逻辑回归模型只用bert特征吗，不加上之前的词频、长度、字重合度等特征吗？我理解的Bert返回的特征是一个数即正例的概率吧？

ZainZhou · 2019-12-11T09:58:13Z

@MrRace 我跑的tuple_filter的比你这个低很多，因为我前面实体抽取的召回率就偏低，所以才问你entity_filter.py你可以召回多少实体

MrRace · 2019-12-11T10:06:16Z

@MrRace 我跑的tuple_filter的比你这个低很多，因为我前面实体抽取的召回率就偏低，所以才问你entity_filter.py你可以召回多少实体
在entity_filter.py上，
在验证集上逻辑回归top5筛选后，所有问题实体召回率为0.774，单实体问题实体召回率0.820
训练集的话，大概是0.8左右。

ZainZhou · 2019-12-11T10:23:08Z

@MrRace 那其实差不了多少，但不知道为什么后面tuple_filter的差20个点，我再研究研究吧

duterscmy · 2019-12-11T10:58:36Z

对没有加只用bert效果就很好

…

---原始邮件--- 发件人: "zwj"<notifications@github.com> 发送时间: 2019年12月11日(周三) 下午5:28 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18) 第二个逻辑回归模型只用bert特征吗，不加上之前的词频、长度、字重合度等特征吗？我理解的Bert返回的特征是一个数即正例的概率吧？ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

MrRace · 2019-12-12T00:57:30Z

@duterscmy 我运行tuple_filter.py的结果：

单实体问题中，候选答案可召回的的比例为:0.730
候选答案能覆盖标准查询路径的比例为:0.461
单实体问题中，候选答案可召回的的比例为:0.772
候选答案能覆盖标准查询路径的比例为:0.638

在验证集上逻辑回归筛选后top10 召回率为0.72
单实体问题中，候选答案可召回的的比例为:0.731
候选答案能覆盖标准查询路径的比例为:0.560

这个结果是偏低吗？你的大概多少？

duterscmy · 2019-12-12T08:31:14Z

是偏低啊，我这看单实体0.92，筛到5个人0.902。。可能是代码版本传错了但最近两天没空闲gpu用等我确定了一个对的版本传上来

…

---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月12日(周四) 上午8:57 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18) @duterscmy 我运行tuple_filter.py的结果：单实体问题中，候选答案可召回的的比例为:0.730 候选答案能覆盖标准查询路径的比例为:0.461 单实体问题中，候选答案可召回的的比例为:0.772 候选答案能覆盖标准查询路径的比例为:0.638 在验证集上逻辑回归筛选后top10 召回率为0.72 单实体问题中，候选答案可召回的的比例为:0.731 候选答案能覆盖标准查询路径的比例为:0.560 这个结果是偏低吗？你的大概多少？ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

duterscmy · 2019-12-12T08:32:49Z

 我理解错了，这是候选答案的数据啊，我今晚把流程重新跑一下告诉你

…

---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月12日(周四) 上午8:57 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18) @duterscmy 我运行tuple_filter.py的结果：单实体问题中，候选答案可召回的的比例为:0.730 候选答案能覆盖标准查询路径的比例为:0.461 单实体问题中，候选答案可召回的的比例为:0.772 候选答案能覆盖标准查询路径的比例为:0.638 在验证集上逻辑回归筛选后top10 召回率为0.72 单实体问题中，候选答案可召回的的比例为:0.731 候选答案能覆盖标准查询路径的比例为:0.560 这个结果是偏低吗？你的大概多少？ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Keerlsm · 2020-03-23T15:42:46Z

我理解错了，这是候选答案的数据啊，我今晚把流程重新跑一下告诉你
…
---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月12日(周四) 上午8:57 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18) @duterscmy 我运行tuple_filter.py的结果：单实体问题中，候选答案可召回的的比例为:0.730 候选答案能覆盖标准查询路径的比例为:0.461 单实体问题中，候选答案可召回的的比例为:0.772 候选答案能覆盖标准查询路径的比例为:0.638 在验证集上逻辑回归筛选后top10 召回率为0.72 单实体问题中，候选答案可召回的的比例为:0.731 候选答案能覆盖标准查询路径的比例为:0.560 这个结果是偏低吗？你的大概多少？ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

我运行tuple_filter.py的结果和上面相近，是不是参数或模型哪里有变化？我最近在做相关的工作，希望能够复现你提交的结果

counten · 2020-04-19T07:48:23Z

我理解错了，这是候选答案的数据啊，我今晚把流程重新跑一下告诉你
…
---原始邮件--- 发件人: "JaonLiu"<notifications@github.com> 发送时间: 2019年12月12日(周四) 上午8:57 收件人: "duterscmy/ccks2019-ckbqa-4th-codes"<ccks2019-ckbqa-4th-codes@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>;"Caomingyu"<1054527636@qq.com>; 主题: Re: [duterscmy/ccks2019-ckbqa-4th-codes] tuple_filter.py中的疑问 (#18) @duterscmy 我运行tuple_filter.py的结果：单实体问题中，候选答案可召回的的比例为:0.730 候选答案能覆盖标准查询路径的比例为:0.461 单实体问题中，候选答案可召回的的比例为:0.772 候选答案能覆盖标准查询路径的比例为:0.638 在验证集上逻辑回归筛选后top10 召回率为0.72 单实体问题中，候选答案可召回的的比例为:0.731 候选答案能覆盖标准查询路径的比例为:0.560 这个结果是偏低吗？你的大概多少？ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

我运行tuple_filter.py的结果和上面相近，是不是参数或模型哪里有变化？我最近在做相关的工作，希望能够复现你提交的结果

朋友，问题解决了吗，我运行的结果也差不多：还望指教

单实体问题中，候选答案可召回的的比例为:0.745
候选答案能覆盖标准查询路径的比例为:0.471
单实体问题中，候选答案可召回的的比例为:0.755
候选答案能覆盖标准查询路径的比例为:0.579

在验证集上逻辑回归筛选后top10 召回率为0.74
单实体问题中，候选答案可召回的的比例为:0.748
候选答案能覆盖标准查询路径的比例为:0.573

liupenggg · 2020-07-21T08:25:59Z

@duterscmy 我运行tuple_filter.py的结果：

单实体问题中，候选答案可召回的的比例为:0.730
候选答案能覆盖标准查询路径的比例为:0.461
单实体问题中，候选答案可召回的的比例为:0.772
候选答案能覆盖标准查询路径的比例为:0.638

在验证集上逻辑回归筛选后top10 召回率为0.72
单实体问题中，候选答案可召回的的比例为:0.731
候选答案能覆盖标准查询路径的比例为:0.560

这个结果是偏低吗？你的大概多少？

为啥跑出来全是0，是哪里出问题了吗？

binglinchengxiash · 2020-10-16T00:58:22Z

@duterscmy 那在SaveFilterCandiT中的new_features = features[0:2]+[features[9][0][1]] 需要改成：
new_features = features ? 还是？

这个features应该怎么写啊？解决了吗？

JeffSuu · 2021-03-05T07:52:07Z

@duterscmy 那在SaveFilterCandiT中的new_features = features[0:2]+[features[9][0][1]] 需要改成：
new_features = features ? 还是？

这个features应该怎么写啊？解决了吗？

请问这个features的问题解决了吗？写成new_features = features的效果好差。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tuple_filter.py中的疑问 #18

tuple_filter.py中的疑问 #18

MrRace commented Dec 11, 2019

duterscmy commented Dec 11, 2019 via email

ZainZhou commented Dec 11, 2019

MrRace commented Dec 11, 2019

ZainZhou commented Dec 11, 2019

MrRace commented Dec 11, 2019

duterscmy commented Dec 11, 2019 via email

MrRace commented Dec 11, 2019

MrRace commented Dec 11, 2019

1234560o commented Dec 11, 2019

ZainZhou commented Dec 11, 2019

MrRace commented Dec 11, 2019

ZainZhou commented Dec 11, 2019

duterscmy commented Dec 11, 2019 via email

MrRace commented Dec 12, 2019

duterscmy commented Dec 12, 2019 via email

duterscmy commented Dec 12, 2019 via email

Keerlsm commented Mar 23, 2020

counten commented Apr 19, 2020

liupenggg commented Jul 21, 2020

binglinchengxiash commented Oct 16, 2020

JeffSuu commented Mar 5, 2021

tuple_filter.py中的疑问 #18

tuple_filter.py中的疑问 #18

Comments

MrRace commented Dec 11, 2019

duterscmy commented Dec 11, 2019 via email

ZainZhou commented Dec 11, 2019

MrRace commented Dec 11, 2019

ZainZhou commented Dec 11, 2019

MrRace commented Dec 11, 2019

duterscmy commented Dec 11, 2019 via email

MrRace commented Dec 11, 2019

MrRace commented Dec 11, 2019

1234560o commented Dec 11, 2019

ZainZhou commented Dec 11, 2019

MrRace commented Dec 11, 2019

ZainZhou commented Dec 11, 2019

duterscmy commented Dec 11, 2019 via email

MrRace commented Dec 12, 2019

duterscmy commented Dec 12, 2019 via email

duterscmy commented Dec 12, 2019 via email

Keerlsm commented Mar 23, 2020

counten commented Apr 19, 2020

liupenggg commented Jul 21, 2020

binglinchengxiash commented Oct 16, 2020

JeffSuu commented Mar 5, 2021