Make sure you have python3.x and pip3 installed after that just run
source build_env
This will create a virtual environemnt named env
and also install the requirements!
Your model can be easily configured in the file model_config.py
Ex:
class Config:
EMBEDDING_SIZE = 100
DROPOUT = 0.25
N_EPOCHS = 20
LOSS = crf_loss
OPTIMIZER = 'rmsprop'
MODEL = 'GRU_CRF'
DATA_FILE = 'atis.pkl'
WORD_EMBEDDINGS = 'glove' # Can be 'word2vec' / 'glove' / 'None'
PORT = '5004'
Before training your model, add the DATA_FILE
key to your config which is your dataset file saved in the /data
folder. Make sure you mention only file name NOT the path to the file.
To build your own custom word embeddings from your dataset, we use the CustomEmbedding
class in embeddings/custom.py
Initialization
In main.py
from embeddings.custom import CustomEmbedding
word_model = CustomEmbedding()
OR can add your own configuration else default values used.
word_model = CustomEmbedding(config={
'min_count': 5,
'window': 5,
'sg': 0,
'pre_trained': True / False, # (default is False)
'iter': 1000
})
model = Sequential()
model.add(word_model.EmbeddingLayer())
...
OR Can use keras in-built Embedding Layer :
from keras.layers.embeddings import Embedding
model = Sequential()
model.add(Embedding(n_vocab, Config.EMBEDDING_SIZE))
...
sentance_idx | word | tag |
---|---|---|
Sentance: 1 | philadelphia | B-fromloc.city_name |
to | O | |
san | B-toloc.city_name | |
francisco | I-toloc.city_name |
python3 main.py --train
Validation is perfomed on every epoch, and based on the F1-Score
the weights of the best model is saved in the /trained_model
folder following the format - trained_model_<N_EPOCHS>_<MODEL>_<F1_score>.h5
.
After training is done, the model with the best F1-Score is automatically kept in the /trained_model
folder.
Validation script in the /metrics
folder, we're using the already exsiting conlleval.pl
pearl script for finding out the Precision
, Recall
and F1-Score
after every epoch.
The metrics works only for BIO tagged datasets.
You can directly test using the above command as the repo includes the trained_model
as well :D
Go to tests/text_sentances.txt
:
Where is the stop for USAir flight number thirty-seven from Philadelphia to San Francisco?
Make sure you add every sentance on one line and the run :
python3 main.py --test
Go to tests/slots.txt
to find the output :
Where is the stop for USAir flight number 37 from Philadelphia to San Francisco
37 - B-flight_number
philadelphia - B-fromloc.city_name
san - B-toloc.city_name
francisco - I-toloc.city_name
----------------------------------------------------------------------------------------------------
The test command will use the best model based on the F1 score of each model, irrespective of configuration in model_config.py
.
Once, the model is put into training, logs are generated and saved in /logs
folder according to the format - model_<N_EPOCHS>_<MODEL>.log
as mentioned in your configuration file. If you need different log outputs modify /logs/logger.py
.
python3 app.py
Configure you app PORT
in the configuration file, model_config.py
Go to your browser use the above example sentance:
SENTENCE:
Where is the stop for USAir flight number 37 from Philadelphia to San Francisco flying next friday
API ENDPOINT:
/api/v1/slots?sentence=
EXAMPLE URL:
GET http://localhost:9009/api/v1/slots?sentence=Where%20is%20the%20stop%20for%20USAir%20flight%20number%2037%20from%20Philadelphia%20to%20San%20Francisco%20flying%20next%20friday
RESPONSE: (JSON format)
{
"response_time": "0.79s",
"sentance": "Where is the stop for USAir flight number 37 from Philadelphia to San Francisco flying next friday",
"slots": {
"depart_date.date_relative": [
"next"
],
"depart_date.day_name": [
"friday"
],
"flight_number": [
"37"
],
"fromloc.city_name": [
"philadelphia"
],
"toloc.city_name": [
"san francisco"
]
}
}
http://localhost:{PORT}/demo
Make sure your don't change tensorflow==1.13.1
and keras==2.2.5
versions, the model breaks for a different combination, but you're free to test other versions! 😛