Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime exceptions #36

Closed
susachintha opened this issue Jan 4, 2017 · 10 comments
Closed

Runtime exceptions #36

susachintha opened this issue Jan 4, 2017 · 10 comments

Comments

@susachintha
Copy link

Hello there,
I'm new to Baleen, I read most of the documentation. Baleen is running in the background. But when I run my test application, I get some runtime exceptions. I ran my test application with -verbose on so I could see all the messages. I have copied my code at the bottom.

Error1:
This comes when I link my test application only with Baleen library. I get an exception "java.lang.NoClassDefFoundError: org/apache/http/config/Lookup".

error1

Error2:
Then I linked httpCore4.4.x jar (which I hope I'm not supposed to do), ran, then I don't get above error. But I get another new exception java.lang.NoSuchMethodError: org.apache.http.entity.ContentType.withCharset(Ljava/lang/String;)Lorg/apache/http/entity/ContentType;

error2

I assume that I must not link apache libraries since Baleen already has references to them and the libraries I link may cause to make conflicts between libraries. I'm on Windows 10, 64x and IDE is Netbeans. I'm using Baleen 2.2.0. Could someone help me to figure out what I'm missing here please?


Following is my test program.
package testbaleen;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.uima.analysis_engine.AnalysisEngine;
import org.apache.uima.analysis_engine.AnalysisEngineDescription;
import org.apache.uima.fit.factory.AnalysisEngineFactory;
import org.apache.uima.fit.factory.ExternalResourceFactory;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ExternalResourceDescription;
import org.apache.uima.resource.ResourceInitializationException;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.node.Node;
import org.elasticsearch.node.NodeBuilder;
import uk.gov.dstl.baleen.consumers.ElasticsearchRest;
import uk.gov.dstl.baleen.resources.SharedElasticsearchRestResource;

/**
*

  • @author Susantha
    */
    public class TestBaleen {

    private static Path tmpDir;
    private static final String ELASTICSEARCH = "elasticsearchRest";
    protected static Client client;
    protected static JCas jCas;
    protected static AnalysisEngine ae;
    /**

    • @param args the command line arguments
      */
      public static void main(String[] args) {

       try {
           tmpDir = Files.createTempDirectory("elasticsearch");
      
           String s = tmpDir.toString();
           
           Settings settings = Settings.builder()
                   .put("path.home", tmpDir.toString())
                   .put("http.port", "19600")		//Don't use the default ports for testing purposes
                   .put("transport.tcp.port", "19300")
                   .build();
           
           Node node = NodeBuilder.nodeBuilder()
                   .settings(settings)
                   .data(true)
                   .local(true)
                   .clusterName("SusanthaSearch")
                   .node();
           
           ExternalResourceDescription erd = ExternalResourceFactory.createExternalResourceDescription(ELASTICSEARCH, SharedElasticsearchRestResource.class, SharedElasticsearchRestResource.PARAM_URL, "http://localhost:19600");
           AnalysisEngineDescription aed = AnalysisEngineFactory.createEngineDescription(ElasticsearchRest.class, ELASTICSEARCH, erd);
           
           try
           {
               System.out.println("Now creating the engine");
               ae = AnalysisEngineFactory.createEngine(aed);
           }catch(ResourceInitializationException ex)
           {
               System.out.println("Caught"+ex.getMessage());
           }catch(Exception e)
           {
               System.out.println("Caught"+e.getMessage());
           }
           client = node.client();
           System.out.println("Done and dusted...");
           
       } catch (IOException ex) {
       Logger.getLogger(TestBaleen.class.getName()).log(Level.SEVERE, null, ex);
           //Logger.getLogger(ContentScrapper.class.getName()).log(Level.SEVERE, null, ex);
       } catch (ResourceInitializationException ex) {
       Logger.getLogger(TestBaleen.class.getName()).log(Level.SEVERE, null, ex);
      

      }

}

}

@jbaker-dstl
Copy link
Contributor

I think there might be several things going on here.

Firstly, you're right in saying that some of the Apache HTTP Components classes seem to be missing from Baleen 2.2.0 - I'm not sure why that is (it's a sub-dependency of one of Baleen's dependencies rather than something we use directly, so it might be something to do with the version we're using there). However, in the latest snapshots of Baleen 2.3.0 it does seem to be included so perhaps consider using that instead?

However, looking at the test code you've provided I'm not sure you're using Baleen in the correct way. Instantiating a single annotator on it's own almost certainly won't work as it will be missing a lot of the additional functionality provided by Baleen and required by a Baleen annotator.

What is it you're trying to achieve? If you're trying to run Baleen inside your application, have you looked at this Wiki page: https://github.com/dstl/baleen/wiki/Run-as-a-Standalone-Application

@susachintha
Copy link
Author

Thanks for the quick reply. I'm not sure from where I can get 2.3.0. Under releases, only 2.2.0 appears as the latest release. I should try that version definitely.

As I came across these issues in my application, I just copied part of it to a sample application to make it simpler. To give you an overview of what I'm trying to do, I just copied few more lines of my program to this test application. Basically, I'm reading some external files (for the moment only a csv file) and read its data (employee data) and store them in ElasticSearch cluster through Baleen, so that later, when a search is performed, easily data can be retrieved.

Logic is something similar to this. (This code may not compile, as I have just copied some lines from the main application)

public static void main(String[] args) {

        try {
            tmpDir = Files.createTempDirectory("elasticsearch");

            String s = tmpDir.toString();
            
            Settings settings = Settings.builder()
                    .put("path.home", tmpDir.toString())
                    .put("http.port", "19600")		//Don't use the default ports for testing purposes
                    .put("transport.tcp.port", "19300")
                    .build();
            
            Node node = NodeBuilder.nodeBuilder()
                    .settings(settings)
                    .data(true)
                    .local(true)
                    .clusterName("SusanthaSearch")
                    .node();
            
            ExternalResourceDescription erd = ExternalResourceFactory.createExternalResourceDescription(ELASTICSEARCH, SharedElasticsearchRestResource.class, SharedElasticsearchRestResource.PARAM_URL, "http://localhost:19600");
            AnalysisEngineDescription aed = AnalysisEngineFactory.createEngineDescription(ElasticsearchRest.class, ELASTICSEARCH, erd);
            
            try
            {
                System.out.println("Now creating the engine");
                ae = AnalysisEngineFactory.createEngine(aed);
            }catch(ResourceInitializationException ex)
            {
                System.out.println("Caught"+ex.getMessage());
            }catch(Exception e)
            {
                System.out.println("Caught"+e.getMessage());
            }
            client = node.client();
            
            String path = "RealFolderPathMustBeGiven"; // folder path where all the files resides
            Files.walk(Paths.get(path)).forEach(filePath -> {
                
            
            FileInputStream fis; // Finds the workbook instance for XLSX file 
                    fis = new FileInputStream(filePath.toString());
            XSSFWorkbook myWorkBook = new XSSFWorkbook(fis); // Return first sheet from the XLSX workbook 
                    XSSFSheet sheet = myWorkBook.getSheetAt(0);
                    ArrayList<String> columnList = new ArrayList();
                    
                    Iterator<Row> rowIterator = sheet.iterator();
                    if (rowIterator.hasNext()) {
                        //First Row
                        // excelsheet header values can be retrieved and store
                    }
            while (rowIterator.hasNext()) {
                  //2nd row on wards
                        Employee employee = new Employee(jCas);
                        
                        Row row = rowIterator.next();
                        for (int i = 0; i < columnList.size(); i++) {
                            if (row.getCell(i) != null) {
                                if(columnList.get(i).equals("LastName"))
                                {
                                    employee.setLastName(row.getCell(i).toString());
                                }
                                else if(columnList.get(i).equals("FirstName"))
                                {
                                    employee.setFirstName(row.getCell(i).toString());
                                }
                                else if(columnList.get(i).equals("Gender"))
                                {
                                    employee.setGender(row.getCell(i).toString());
                                }
                                else if(columnList.get(i).equals("Birthday"))
                                {
                                    employee.setBirthday(row.getCell(i).toString());
                                }
                                else if(columnList.get(i).equals("BirthCountry"))
                                {
                                    employee.setBirthCountry(row.getCell(i).toString());
                                }
                                employee.addToIndexes();
                            }
                            
                        }
                        
                    }
            
            ae.process(jCas);
            
            }
            
        } catch (IOException ex) {
        Logger.getLogger(TestBaleen.class.getName()).log(Level.SEVERE, null, ex);
            //Logger.getLogger(ContentScrapper.class.getName()).log(Level.SEVERE, null, ex);
        } catch (ResourceInitializationException ex) {
        Logger.getLogger(TestBaleen.class.getName()).log(Level.SEVERE, null, ex);
    }

}


//Then from somewhere else I can perform searches like this
{
SearchHit result = client.search(new SearchRequest()).actionGet().getHits().hits()[0];
List<Map<String, Object>> entities = (List<Map<String, Object>>) result.getSource().get("entities");
//I hope this will give all the employees
//Later I need to perform some other searches. Like, for the given parameter (lastName or Birthday), retrieving the employee.
}


Here Employee class is similar to Person class in Baleen which is inherited from Entity. Employee_Type also there similar to Person_Type.
Idea here to use Baleen for our project is to annotate important data, so that searching would be comprehensive and easy. Our web application needs to perform advance searches like above which I haven't written them yet. Still I'm writing the text processing, reading and indexing them to ES cluster.

@jbaker-dstl
Copy link
Contributor

To get Baleen 2.3.0-SNAPSHOT, you will need to compile it yourself (i.e. clone the repo and run mvn package).

In terms of what you're trying to acheive, it sounds like what you really want to do is develop additional components for Baleen and then use Baleen as normal with a pipeline including your components. Any components that are on the Classpath are automatically picked up and available for use. You can also add in additional types (Employee in your case) using a similar method (i.e. making them available on the Classpath).

Have you read the Development Guides included in Baleen? Based on the code above, you probably want to develop your own ContentExtractor to handle your CSV files (or possibly your own CollectionReader) to read in the data and annotate it as appropriate. Your pipeline would then look something like:

collectionreader:
  class: your.class.on.the.classpath.XslxCollectionReader
  file: C:\your\file.xlsx

annotators:
# None required here unless you want to do additional extraction, e.g. finding e-mail addresses and phone numbers?

consumers:
- Elasticsearch

Have a look also at the following pages for some examples: https://github.com/dstl/baleen/wiki/Additional-Components

@susachintha
Copy link
Author

susachintha commented Jan 4, 2017

1.Thanks. I'll go through the docs again.

2.Some build errors cause not to compile.
I followed this (https://github.com/dstl/baleen/blob/master/BUILD.md) but at the before step 5, I get some compiler errors. Snapshot of console output is copied at the bottom.

  1. I'm wondering whether you could make Baleen to support Netbeans too.

{{
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Baleen
[INFO] Baleen Core
[INFO] Baleen UIMA
[INFO] Baleen Resources
[INFO] Baleen Annotators
[INFO] Baleen Collection Readers
[INFO] Baleen Consumers
[INFO] Baleen Jobs
[INFO] Baleen History
[INFO] Baleen Runner
[INFO] Baleen Javadoc
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Baleen 2.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- jacoco-maven-plugin:0.7.4.201502262128:prepare-agent (default-prepare-agent) @ baleen ---
[INFO] argLine set to "-javaagent:C:\Users\Susantha\.m2\repository\org\jacoco\org.jacoco.agent\0.7.4.201502262128\org.jacoco.agent-0.7.4.201502262128-runtime.jar=destfile=D:\baleen-master\baleen-master\baleen\target\jacoco.exec"
[INFO]
[INFO] --- jacoco-maven-plugin:0.7.4.201502262128:report (default-report) @ baleen ---
[INFO] Skipping JaCoCo execution due to missing execution data file:D:\baleen-master\baleen-master\baleen\target\jacoco.exec
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Baleen Core 2.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- jacoco-maven-plugin:0.7.4.201502262128:prepare-agent (default-prepare-agent) @ baleen-core ---
[INFO] argLine set to "-javaagent:C:\Users\Susantha\.m2\repository\org\jacoco\org.jacoco.agent\0.7.4.201502262128\org.jacoco.agent-0.7.4.201502262128-runtime.jar=destfile=D:\baleen-master\baleen-master\baleen\baleen-core\target\jacoco.exec"
[INFO]
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ baleen-core ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 66 resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ baleen-core ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ baleen-core ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 22 resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ baleen-core ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ baleen-core ---
[INFO] Surefire report directory: D:\baleen-master\baleen-master\baleen\baleen-core\target\surefire-reports


T E S T S

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO]
[INFO] --- jacoco-maven-plugin:0.7.4.201502262128:report (default-report) @ baleen-core ---
[INFO] Analyzed bundle 'Baleen Core' with 75 classes
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ baleen-core ---
[INFO]
[INFO] --- maven-jar-plugin:2.4:test-jar (default) @ baleen-core ---
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Baleen UIMA 2.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- jacoco-maven-plugin:0.7.4.201502262128:prepare-agent (default-prepare-agent) @ baleen-uima ---
[INFO] argLine set to "-javaagent:C:\Users\Susantha\.m2\repository\org\jacoco\org.jacoco.agent\0.7.4.201502262128\org.jacoco.agent-0.7.4.201502262128-runtime.jar=destfile=D:\baleen-master\baleen-master\baleen\baleen-uima\target\jacoco.exec"
[INFO]
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ baleen-uima ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 9 resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ baleen-uima ---
[INFO] Compiling 86 source files to D:\baleen-master\baleen-master\baleen\baleen-uima\target\classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\types\semantic\Entity.java:[19,7] error: Entity is not abstract and does not override abstract method getTypeName() in Recordable
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\types\semantic\Event.java:[22,7] error: Event is not abstract and does not override abstract method getTypeName() in Recordable
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\org\apache\uima\jcas\tcas\DocumentAnnotation.java:[286,60] error: incompatible types: String cannot be converted to String[]
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\types\semantic\Relation.java:[19,7] error: Relation is not abstract and does not override abstract method getTypeName() in Recordable
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\uima\grammar\ParseTree.java:[72,68] error: cannot find symbol
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\uima\grammar\ParseTree.java:[73,20] error: cannot find symbol
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\uima\grammar\ParseTree.java:[156,31] error: cannot find symbol
[ERROR]\baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\uima\grammar\ParseTree.java:[157,40] error: incompatible types: invalid method reference
[INFO] 8 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Baleen ............................................ SUCCESS [4.146s]
[INFO] Baleen Core ....................................... SUCCESS [11.700s]
[INFO] Baleen UIMA ....................................... FAILURE [3.081s]
[INFO] Baleen Resources .................................. SKIPPED
[INFO] Baleen Annotators ................................. SKIPPED
[INFO] Baleen Collection Readers ......................... SKIPPED
[INFO] Baleen Consumers .................................. SKIPPED
[INFO] Baleen Jobs ....................................... SKIPPED
[INFO] Baleen History .................................... SKIPPED
[INFO] Baleen Runner ..................................... SKIPPED
[INFO] Baleen Javadoc .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19.299s
[INFO] Finished at: Wed Jan 04 11:07:26 GMT+05:30 2017
[INFO] Final Memory: 20M/47M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project baleen-uima: Compilation failure: Compilation failure:
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\types\semantic\Entity.java:[19,7] error: Entity is not abstract and does not override abstract method getTypeName() in Recordable
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\types\semantic\Event.java:[22,7] error: Event is not abstract and does not override abstract method getTypeName() in Recordable
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\org\apache\uima\jcas\tcas\DocumentAnnotation.java:[286,60] error: incompatible types: String cannot be converted to String[]
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\types\semantic\Relation.java:[19,7] error: Relation is not abstract and does not override abstract method getTypeName() in Recordable
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\uima\grammar\ParseTree.java:[72,68] error: cannot find symbol
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\uima\grammar\ParseTree.java:[73,20] error: cannot find symbol
[ERROR]\baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\uima\grammar\ParseTree.java:[156,31] error: cannot find symbol
[ERROR] \baleen-master\baleen-master\baleen\baleen-uima\src\main\java\uk\gov\dstl\baleen\uima\grammar\ParseTree.java:[157,40] error: incompatible types: invalid method reference
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :baleen-uima
}}

@jbaker-dstl
Copy link
Contributor

Have you tried building from the command line rather than through Eclipse? The Eclipse and Maven integration can be somewhat buggy, so it's possible that's the issue. The other thing to try might be building it on a path without spaces. I will try doing a clean compile of the code here later, but have built it before without issues.

What JDK are you using to build Baleen?

With regards to NetBeans, the files on GitHub don't include the Eclipse specific project files, so you should be able to import the Maven projects into NetBeans.

@jbaker-dstl
Copy link
Contributor

I've just build the latest code from the command line and it worked fine, suggesting it's something with your setup (or possibly an Eclipse bug).

@jamesfry
Copy link
Contributor

jamesfry commented Jan 4, 2017

In your logs the compile error about a method that is not being overridden is a default method, which were introduced with Java 8 - are you building with JDK7?

@susachintha
Copy link
Author

Thanks a lot, I was able to compile and build Baleen 2.3.0 without errors after fixing couple of issues in my side.
I removed whitespaces in my folder path, that solved some compiler errors.
I was using Eclipse Java EE IDE for Web Developers version, for some reason, it allows only up to 1.7 Java version. So I couldn't move forward with Eclipse. Probably I was using a wrong version of eclipse.

Then I imported Maven project to Netbeans, after removing the whitespaces in the project path, I was able to build Baleen. Cause for the some jar file missing errors was I had been using Baleen-core, but after importing to Netbeans I realized complete set of jar dependencies are available in the root of Baleen, so referencing to root Baleen solved those errors also.
What I'm going to do next is rather try to load annotators on their own, use the framework and callbacks. I'll be back if I encounter any problems again.

One last question Could you please send me the pipeline configuration file for this (https://github.com/jamesdbaker/Baleen-Components)? That would be handy to have a look at real example.

I was referring this (https://github.com/dstl/baleen/wiki/Sample-Pipeline) pipeline and am not too sure certain information there. For example
annotators:

  • language.OpenNLP
  • class: misc.DocumentTypeByLocation
    baseDirectory: C:\baleen\data
  • gazetteer.Country
  • class: gazetteer.Mongo
    type: Buzzword
    collection: buzzwords
  • class: gazetteer.Mongo
    type: Location
    collection: location
  • class: gazetteer.Mongo
    type: Organisation
    collection: organisations
  • class: gazetteer.Mongo
    type: Person
    collection: people

What is the convention of these definitions?

  1. class:gazetteer.Mongo" Does it refer to Baleen Mongo class or something user defined class? Baleen doesn't have Mongo class, but has MongoGazetteer.java
  2. misc.DocumentTypeByLocation" Is it a user defined class? What is the baseDirectory there?
  3. What does mean by 'type' and 'collection' under class:gazetteer.Mongo?
  4. I read in the documentation; annotators, consumers and collectionReader are the main clauses to define in pipeline. ( I can see these are high level package names of some Baleen projects. So other than Baleen Annotators, Baleen Collection Readers and Baleen Consumers, if I need to use other classes in other packages, can I define them by their high level package name? For example if I need to use Entity.java and it comes under baleen.types.semantic package. So similar to 'annotators:' above, can I use like this;

types:
-semantic.Entity

Thanks in advance for your answers.

@jbaker-dstl
Copy link
Contributor

Please read the 'Running Baleen with Additional Annotators' guide included within Baleen and the Javadoc for PipelineCpeBuilder, this will answer some of your questions.

Also, consider using the Plankton tool to play around with building pipelines as this will produce the correct YAML for you. The one from that additional components project would look something like:

annotators:
  - jamesbaker.baleen.annotators.HashTag

gazetteer.Mongo refers to the Mongo class in uk.gov.dstl.baleen.annotators.gazetteer. All built in annotators are under uk.gov.dstl.baleen.annotators, so we only need to specify the end of the class. If the annotator you want to load is not under this package, then you need to provide the full name.

type and collection are configuration properties for the Mongo gazetteer annotator. Information about the configuration properties for each annotator (or any component) can be found in the Javadoc.

To include new entity types, I believe they just need to be on the classpath and defined in such a way that UimaFIT will detect them. Please refer to the UimaFIT documentation for how to do this.

@susachintha
Copy link
Author

Thank you for the references, they are very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants