Skip to content

Latest commit

 

History

History
29 lines (18 loc) · 2.43 KB

storage_buffer.md

File metadata and controls

29 lines (18 loc) · 2.43 KB

<< Back

Storage and Buffers

Table of Contents



Overview

Some of our tests did show that collecting (caching) more data and randomly sampling from it is more beneficial than using sample sets sent by the Data Collectors right away. This is because while sampling a batch from a bigger cache of samples, we’re taking samples collected over a longer period of time, from other parts of the map, containing different data characteristics., etc. This means the batches are a better representation of all of the data and do not make the model only fit wherever the NPCs are now and to their current actions.



Buffers

When Data Collectors send data to the Trainer through the Server it's not being immediately used to train the model. Instead, the buffers are being filled first, currently to 25,000 samples each. The training starts when all of the buffers are filled in. During training, a batch is formed by randomly sampling from each of the buffers (the same number of samples is being drawn per buffer but the samples are random) then these samples are removed from the buffers (and are never used again). This way each batch of samples consists of random data from different in-game points of time and areas. We found out that without buffers, the model did learn significantly worse since the data has been consisting of very similar images and actions only, changing with time. The buffers combined with a good Balancing let us build a live-training system that feeds the model with the data as fast as possible but also in a balanced and shuffled fashion important for good model training and generalization.



Storage

Because filling in of the buffers takes time, and we sometimes need to restart the server or there are other circumstances like power loss or code issues, we added the storage to the system which keeps a copy of the buffers on the disk in a form of individual samples that are being constantly rotated. This way we can quickly re-fill the Buffers without waiting for the Data Collectors to fill them in (the training can start shortly after the Trainer is being run)