Make sure that when each dataset is built, it's aware of all previous drugs in the dataset so far (not just the previous dataset in the build). I think this is possible by passing multiple drugs.tsv files into the build_drugs.sh script.
This could work for #323 but that is also resolved by the open pr #420