Data transformers should be able to clean the data for example. Some machine learning algorithms like PCA need to scale the data before using them. See the API defined in scikit-learn: http://scikit-learn.org/stable/data_transforms.html