# Watson Studio Language Translator - Basics & Custom Models

Examples of using the IBM Watson Translation API, made by
- **Stephanie Wagenaar** BOLD.lab
- **Willem Hendriks** IBM / BOLD.lab

1. Basic authentication & Testing
2. Customize Models


### API Documentation

https://cloud.ibm.com/apidocs/language-translator

## From the Documentation

We can read from the [documentation](https://cloud.ibm.com/docs/language-translator?topic=language-translator-customizing);

To create a model that is customized with **both parallel corpora and a forced glossary**, proceed in two steps:

1. Customize with at least one parallel corpus file. You can upload multiple parallel corpus files with a single request. To successfully train with parallel corpora, all corpus files combined must contain at least 5000 parallel sentences. The cumulative size of all uploaded corpus files for a custom model is limited to 250 MB.
2. Customize the resulting model with a forced glossary. You can upload a single forced glossary file for a custom model. The size of a forced glossary for a custom model is limited to 10 MB.

You can store a maximum of 10 custom models for each language pair in a service instance.

---


## Get the service credentials and store them in `credentials` dict

### Fetch your credentials for the Watson API Translation service

1. https://cloud.ibm.com/resources
2. Find the **Language Translator** in the **Services** list
3. From **Service credentials** copy paste the JSON holding the credentials and store them in `credentials` e.g.

```
credentials = {
  "apikey": "XXXXXXXX-XXXXXXXXXXXXX-XXXXXXXXXXXXXXXXX",
  "iam_apikey_description": "Auto-generated for key XXXXXXXX-XXXXXXXXXXXXX-XXXXXXXXXXXXXXXXX",
  "iam_apikey_name": "Auto-generated service credentials",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/a126878c4ab9a3456456456451cf2081b8::serviceid:XXXXXXXX-XXXXXXXXXXXXX-XXXXXXXXXXXXXXXXX",
  "url": "https://api.us-south.language-translator.watson.cloud.ibm.com/instances/XXXXXXXX-XXXXXXXXXXXXX-XXXXXXXXXXXXXXXXX"
}
```

In [None]:
credentials = {
  "apikey": "XXXXXXXX-XXXXXXXXXXXXX-XXXXXXXXXXXXXXXXX",
  "iam_apikey_description": "Auto-generated for key XXXXXXXX-XXXXXXXXXXXXX-XXXXXXXXXXXXXXXXX",
  "iam_apikey_name": "Auto-generated service credentials",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/a126878c4ab9a3456456456451cf2081b8::serviceid:XXXXXXXX-XXXXXXXXXXXXX-XXXXXXXXXXXXXXXXX",
  "url": "https://api.us-south.language-translator.watson.cloud.ibm.com/instances/XXXXXXXX-XXXXXXXXXXXXX-XXXXXXXXXXXXXXXXX"
}

## 1. Basic Test Authenticate & Translate Sample Sentences



In [None]:
data = {'text' : ['Hello Michael Bachrach', 'Hello, World', "How Are you?" ,  'YoYo Kenneth Mcclanahan', 'Supermarket', "where is the bank"], 'model_id' : '6c08c11b-36b4-4058-a3e4-7f71a8cfc669'}

In [None]:
import requests

headers = {
    'Content-Type': 'application/json',
}

params = (
    ('version', '2018-05-01'),
)

In [None]:
response = requests.post('{url}/v3/translate'.format(**credentials), headers=headers, params=params, json=data, auth=('apikey', '{apikey}'.format(**credentials)))


In [None]:
response.json()

## 2. List all Models - default & customized

The following cells will list all models 

In [None]:
response = requests.get('{url}/v3/models'.format(**credentials),  params=params, auth=('apikey', '{apikey}'.format(**credentials)))

In [None]:
# All Models
response.json()

In [None]:
# Print all NONE default model, the customized models - including training status

for resp in response.json()['models']:
    if resp['default_model'] == False:
        print(resp)

## 3. Create Custom Model -  `parallel_corpus`

The [API](https://cloud.ibm.com/docs/language-translator?topic=language-translator-customizing) describes the `--form` parameter to add custom dictionaries, which can be done with `requests` in python, as described on

https://stackoverflow.com/questions/42215356/convert-curl-with-form-to-python-requests

- we use `names` to generate names
- we create a `.csv` to be used as corpus
- resulting model_id could be used to later add a `forced_glossary` on

In [None]:
%%capture pip_install
!pip install names

In [None]:
import names

In [None]:
import random
import uuid


with open('parallel_corpus.csv', 'w') as outfile:
    outfile.write( "en,es\n" )
    for _ in range(5005):
        random_name = names.get_full_name()
        outfile.write("YoYo {},".format(random_name))
        outfile.write("YososYoses {}\n".format(random_name))

### Test the Parallel Corpus

In [None]:
!head parallel_corpus.csv

In [None]:
import requests

params = (
    ('version', '2018-05-01'),
    ('base_model_id', 'en-es'),
    ('name', 'custom_model_v10'),
)

headers = {
    'Content-Type': 'text/csv',
}

test_files = [("parallel_corpus", open("parallel_corpus.csv", "rb"))]

In [None]:
response = requests.post('{url}/v3/models'.format(**credentials), params=params, files=test_files, json=data, auth=('apikey', '{apikey}'.format(**credentials))) 

In [None]:
custom_corpus_model = response.json()

## If the response will show `'status': 'dispatching'` your model is being trained

In [None]:
custom_corpus_model

### Test the Parallel Corpus Model - Note YoYo is translated correct

In [None]:
data = {'text' : ['YoYo Kenneth Mcclanahan', 'Hello, World', "How Are you?" ,  'Supermarket', "where is the bank"], 'model_id' : '7a105127-0c86-465a-9831-4fd3d7499380'}

In [None]:
import requests

headers = {
    'Content-Type': 'application/json',
}

params = (
    ('version', '2018-05-01'),
)

In [None]:
response = requests.post('{url}/v3/translate'.format(**credentials), headers=headers, params=params, json=data, auth=('apikey', '{apikey}'.format(**credentials)))


In [None]:
response.json()

## 4. Create Custom Model -  `forced_glossary`

The [API](https://cloud.ibm.com/docs/language-translator?topic=language-translator-customizing) describes the `--form` parameter to add custom dictionaries, which can be done with `requests` in python, as described on

https://stackoverflow.com/questions/42215356/convert-curl-with-form-to-python-requests

In [None]:
!echo "en,es" > custom.csv
!echo "hi,olaaaa" >> custom.csv
!echo "mexico,meggiko" >> custom.csv

In [None]:
!cat custom.csv

## 5. Change the `base_model_id` to either 'en-es' or similar, or a previous generated custom model id

In [None]:
import requests

params = (
    ('version', '2018-05-01'),
    ('base_model_id', 'es-en'),
    ('name', 'custom_model'),
)

headers = {
    'Content-Type': 'text/csv',
}

test_files = [("forced_glossary", open("custom.csv", "rb"))]


In [None]:
response = requests.post('{url}/v3/models'.format(**credentials), params=params, files=test_files, json=data, auth=('apikey', '{apikey}'.format(**credentials))) 

In [None]:
response.json()['model_id']

In [None]:
response.json()