New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

定制你自己的CRF模型,实例代码并没有加载训练的模型? #314

Closed
flystarhe opened this Issue Jul 26, 2016 · 9 comments

Comments

Projects
None yet
3 participants
@flystarhe

flystarhe commented Jul 26, 2016

//设定模型路径
MyStaticValue.CRF.put(MyStaticValue.CRF_DEFAULT, "/Users/sunjian/Documents/src/CRF++-0.58/test/model.txt") ;
//进行分词
System.out.println(NlpAnalysis.parse("欢迎使用Ansj的CRF功能!"));


而NlpAnalysis加载默认模型的代码还是加载的“crf.model”,也就是jar包中资源目录的。

private static synchronized SplitWord initDefaultModel() {

    Object obj = CRF.get(CRF_DEFAULT);
    if (obj != null && obj instanceof SplitWord) {
        return (SplitWord) obj;
    }
    try {
        LIBRARYLOG.info("init deafult crf model begin !");
        CRFModel model = new CRFModel(CRF_DEFAULT);
        model.loadModel(DicReader.getInputStream("crf.model"));
        SplitWord splitWord = new SplitWord(model);
        CRF.put(CRF_DEFAULT, splitWord);
        return splitWord;
    } catch (Exception e) {
        e.printStackTrace();
        LIBRARYLOG.error("init err!", e);
    }
    return null;
}
@flystarhe

This comment has been minimized.

Show comment
Hide comment
@flystarhe

flystarhe Jul 26, 2016

@ansjsun
我尝试“new CRFModel”手动加载时,报错了:

val model  = new CRFModel("xxx");
model.loadModel("C:\\Users\\jian\\Desktop\\model.txt")

Exception in thread "main" java.util.zip.ZipException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
at java.util.zip.GZIPInputStream.(GZIPInputStream.java:79)
at java.util.zip.GZIPInputStream.(GZIPInputStream.java:91)

说文件格式有问题。

flystarhe commented Jul 26, 2016

@ansjsun
我尝试“new CRFModel”手动加载时,报错了:

val model  = new CRFModel("xxx");
model.loadModel("C:\\Users\\jian\\Desktop\\model.txt")

Exception in thread "main" java.util.zip.ZipException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
at java.util.zip.GZIPInputStream.(GZIPInputStream.java:79)
at java.util.zip.GZIPInputStream.(GZIPInputStream.java:91)

说文件格式有问题。

@ansjsun

This comment has been minimized.

Show comment
Hide comment
@ansjsun

ansjsun Jul 26, 2016

Member

你用model.loadCRFModel 试试

Member

ansjsun commented Jul 26, 2016

你用model.loadCRFModel 试试

@flystarhe

This comment has been minimized.

Show comment
Hide comment
@flystarhe

flystarhe Jul 27, 2016

谢谢,文本模型应该new的是CRFppTxtModel而不是CRFModel,不过没有loadCRFModel,还是loadModel方法。
另外:请教一下,我看了一遍你的源码,是不是只支持单字特征训练的模型。像下面这种训练数据得到的模型加载不了:

我们 O
中国 B
网安 E
。 O

就是特征是多个字符的,我读代码的时候看到makeFeatureArr用的是char。是吗?

flystarhe commented Jul 27, 2016

谢谢,文本模型应该new的是CRFppTxtModel而不是CRFModel,不过没有loadCRFModel,还是loadModel方法。
另外:请教一下,我看了一遍你的源码,是不是只支持单字特征训练的模型。像下面这种训练数据得到的模型加载不了:

我们 O
中国 B
网安 E
。 O

就是特征是多个字符的,我读代码的时候看到makeFeatureArr用的是char。是吗?

@ansjsun

This comment has been minimized.

Show comment
Hide comment
@ansjsun

ansjsun Jul 27, 2016

Member

是的。。只支持单字

Member

ansjsun commented Jul 27, 2016

是的。。只支持单字

@moshangshaoguang

This comment has been minimized.

Show comment
Hide comment
@moshangshaoguang

moshangshaoguang Aug 20, 2016

    Model  model = new CRFppTxtModel("");
    model.loadModel("D:\\fenci\\CRF++-0.58\\test\\model.txt");

这是我的运行的时候报
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:658)
at org.ansj.app.crf.model.CRFppTxtModel.loadTagCoven(CRFppTxtModel.java:237)
at org.ansj.app.crf.model.CRFppTxtModel.loadModel(CRFppTxtModel.java:52)
at anjs.AnjsTest.main(AnjsTest.java:18)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

moshangshaoguang commented Aug 20, 2016

    Model  model = new CRFppTxtModel("");
    model.loadModel("D:\\fenci\\CRF++-0.58\\test\\model.txt");

这是我的运行的时候报
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:658)
at org.ansj.app.crf.model.CRFppTxtModel.loadTagCoven(CRFppTxtModel.java:237)
at org.ansj.app.crf.model.CRFppTxtModel.loadModel(CRFppTxtModel.java:52)
at anjs.AnjsTest.main(AnjsTest.java:18)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

@ansjsun

This comment has been minimized.

Show comment
Hide comment
@ansjsun

ansjsun Aug 20, 2016

Member

@moshangshaoguang 这个可能是我的bug 。。最新的代码已经修正。。。

https://github.com/NLPchina/ansj_seg/blob/b1df7ee91539805fb45241ee0740d52da6a2203a/src/main/java/org/ansj/app/crf/model/CRFppTxtModel.java

229行的位置。。

你需要用最新的代码修正。。

我不确信 http://maven.nlpcn.org/org/ansj/ansj_seg/5.0.2/ 应该这个版本已经修复了此问题。。您可以试试

Member

ansjsun commented Aug 20, 2016

@moshangshaoguang 这个可能是我的bug 。。最新的代码已经修正。。。

https://github.com/NLPchina/ansj_seg/blob/b1df7ee91539805fb45241ee0740d52da6a2203a/src/main/java/org/ansj/app/crf/model/CRFppTxtModel.java

229行的位置。。

你需要用最新的代码修正。。

我不确信 http://maven.nlpcn.org/org/ansj/ansj_seg/5.0.2/ 应该这个版本已经修复了此问题。。您可以试试

@moshangshaoguang

This comment has been minimized.

Show comment
Hide comment
@moshangshaoguang

moshangshaoguang commented Aug 21, 2016

谢谢

@moshangshaoguang

This comment has been minimized.

Show comment
Hide comment
@moshangshaoguang

moshangshaoguang Aug 21, 2016

还有一个问题,就是我怎么训练成我想要的语料库比如说:内蒙古伊泰煤炭有限责任公司直接标记成我想要的 例如:nt,之后调用 NlpAnalysis.parse这个方法的时候出来的结果就是 内蒙古伊泰煤炭有限责任公司/nt 这样

moshangshaoguang commented Aug 21, 2016

还有一个问题,就是我怎么训练成我想要的语料库比如说:内蒙古伊泰煤炭有限责任公司直接标记成我想要的 例如:nt,之后调用 NlpAnalysis.parse这个方法的时候出来的结果就是 内蒙古伊泰煤炭有限责任公司/nt 这样

@ansjsun

This comment has been minimized.

Show comment
Hide comment
@ansjsun

ansjsun Jan 26, 2017

Member

tag少一些...特征少一些..分出的词语就会大一些.但是稳定性会差一些

Member

ansjsun commented Jan 26, 2017

tag少一些...特征少一些..分出的词语就会大一些.但是稳定性会差一些

@ansjsun ansjsun closed this Jan 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment