Watson Knowledge Studio is a relatively new tool from IBM that allows you to train a natural language model on your own custom domain. Documents from a domain are uploaded to the service, then they are annotated using a custom system of entities and relationships. The end result is a model that you can push to the Alchemy API, which can then pull information based on your model instead of the standard model.
The first step in using Watson Knowledge Studio (WKS), before signing up for a free trial, before doing any sort of annotation, is making sure you have the data that you will be working with. You need text that is typical of the data that you are going to be using for this application. In our case, we wanted to use Amazon Product Reviews, so as data for WKS we took 100 reviews from different parts of Amazon. Generally, you would want to have as many examples of text as feasible for you to annotate. It is recommended that you have several hundred examples, but in the sample application we annotated 100, and we have done it with as few as 60 and still achieved decent results.

The next step is to actually go into Watson Knowledge Studio. At this link, click the “Free 30-Day Trial” button. If you don’t have an IBM ID make one here, or if you already have one use it to log in. After filling out all of the fields, a service instance will start provisioning. This step usually takes a few minutes, but you should receive an email once it completes.

Once you receive that email, you are ready for the next step. If you are annotating alone, go ahead and start the service up. If you plan on annotating as a team, you need to expand the option box and use the “Add new user:” option. If you want them to only annotate, leave the box unchecked, but if you want a team member to help configure anything about the documents, the type system, or the annotator, check the “Make administrator” button.
Once you have started the service, click on the url that is presented, it will lead you to the instance of WKS that you just started. From there, you’re ready to start your project. Create a new project, you need to give it a name, but you do not need to worry about any of the other options.

From there, you’ll be taken to your project dashboard. The first screen that you see is the type system management screen. Here you can see all of the entities and relationships (none yet) that you have defined for this project. At this point, you need to start getting a little creative. For each type of data that you want to process at a later point, you need to define an entity. For instance, we wanted information about the features of a product, information on customer service, and information on the defects people find, so we had Feature, Customer_Service, and Defect entities. You will most likely not need to define subtypes or mentions for the entities, but if you would like to get very complicated you can. Roles are the ways that entities can act like other entities. For instance, a camera is a product but it can also be a feature of something, so if somebody is talking about their phone’s camera, it could be a product that has the role of a feature. Subtypes are the different classes that are a more specific type of another class, online customer service is a subtype of customer service. However, we did not find much use for subtypes in our application, so we would advise against using them in this case as it adds unneeded complexity. If you would like to use them as an extra datasource, you can define relationships as well, but we found them unnecessary in our application. For more information on these, please visit the WKS documentation on Type Systems. 

From there, click the “Documents” link in the top bar of the page. This is where you can upload the .csv files we created previously. Click import, find the files and import it into WKS. It seems counter-intuitive, but you cannot use those as they are, your texts need to be divided into document sets. Click the “Create Sets” button, from there you can define everything about the sets you are about to create. Overlap is the percentage of documents that are shared between each set, if there is more than one annotator, this is used to measure how much agreement there is between the different annotators. If you are annotating by yourself, it may be useful to see how consistent you are, but we recommend that you set it low so you are not annotating the same thing again and again. We have found that it is almost always better to create lots of little sets, rather than one large one. The machine learning component will not accept documents from sets that aren’t completely annotated, so small sets will let you add to the machine learning component more frequently.

Now that you have the documents all set up, you can add dictionaries if you want. Dictionaries are not necessary, but if you find yourself annotating the same word over and over again, they can help. Dictionaries will pre-annotate case-sensitive string matches to those in the entry. If you would like to use them, you can create one and add entries to it. The surface form of a word is the inflected version of a word that may show up, while the lemma is the base form of the word. “Running” is one surface form that can result from the lemma “Run”. 
It is important to note, if you would like to use dictionaries, you need to run the dictionary pre-annotator on your document sets before you start annotating. For more information on that, as well as other information, please visit the WKS documentation on Dictionaries. 

Now is finally the time to annotate. Follow the “Human Annotation” link in the top bar to arrive at the annotation page. From here, you can check on the status of annotation tasks as well as configure hotkeys and some cosmetics of the editor. Click on the “Add Task” button and add all of the document sets that you would like to annotate. Once you’re all set up, click on one of the document sets, and then one of the documents to start annotating. 
Anytime in your text that something is mentioned that fits into one of the entity types that you defined, click it and then click (or press the hotkey of) the type on the side bar. Once you have combed through the entire text picking out entity types, you can annotate relationships if you defined any. Select the relationships view on the left side then click the first entity in the relationship, then the second, then click (or press the hotkey of) the relationship type in the sidebar.

	Now you can annotate something called coreference. Coreference refers to when there is a word that is actually talking about something else. For instance, in the sentence, "Joe went to the store because he was out of milk" both Joe and he are corefferents, they refer to the same thing. So, anytime that the exact same thing is mentioned more than once (directly or with preforms), you can mark it as a coreference chain. In order to do this, click each word that is referring to the same thing, and then when you’re all done double click the last word. There are also options to combine or delete chains on the right side. 
    
	Once you are finished annotating, it is time to accept those to the Ground Truth. Navigate back to the screen where you can check the status of your annotation task and accept the document sets. If there was any overlap, there will most likely be conflicts between those documents. You can use the provided tool to check those, going through an accepting the correct annotation. (For more detailed instructions please visit the WKS documentation here.) It is important to be careful here, if one document set is accepted, and then another one is at a later time the second will overwrite the first. If you have a high document overlap, that would result in the second set overwriting everything in the first with no chance to resolve conflicts.
    
	After that last step, you should have a ground truth all set up, which means you are ready to let the machine take over. Navigate to "Annotator Component" using the top bar. Click the button to create an annotator and select the option to create the machine learning annotator. Then, select all of the document sets that you have annotated, and define a training/testing split (the default is usually good). Click the option to train and evaluate, then come back in about ten minutes to see how you did.
    
No matter what your final numbers are, you can publish the model you just made and use the validation script validateModels.py in order to determine if your model has the right f1 score to continue. Go into the detail view of the machine annotator and click the “Take Snapshot” button. It will take a few minutes in order to save the snapshot. When it completes, click “Publish”. Take your API key from the Alchemy Language API in Bluemix (which you should have made in a previous step), and insert into the correct field. You should now have the model-ID of a machine learning powered natural language processing model that you can plug into Alchemy Language to use instead of their standard model.

It’s important to note that as long as your model performs how you want it to, the f1 score does not matter it is just a way to quantify what is happening. That being said, if you want to improve your score, the official documentation has very detailed instructions about how to bring each measure up, and why it was low. In our experience, if you see that things are being confused for one another, it may mean that your type system is too generic and things are being confused for one another because they are too similar of a type. A low precision could be due to the same reason, we have found that the best way to improve that and recall are to annotate more documents. If the score still does not improve after more annotation, it is time to change the type system. 
