Machine Learning Safari is an esoteric Python comprised of from-scratch implementations of popular machine learning algorithms.
The goal of the package is to provide efficient and easy to understand implementations of popular machine learning algorithms, whilst maintaining a simple package structure. This makes it easier for beginners to comprehend the purpose of different elements of the package, compared to, say, scikit-learn which can be harder to navigate.
The focus of this package is largely on the following types of machine learning algorithms:
- Regression and Classification
- Dimensionality Reduction
- Clustering
We hope that this package can be used for both the practical application of machine learning as well as a demonstration of the implementation of such methods for educational purposes.
The package is composed of nested (sub-)modules, each corresponding to a particular machine learning algorithm implemented as a class. An instance of such a class is called a model and (loosely inspired, by scikit-learn) has fit
and apply
methods. On top of this, each class has an inspect
method which can be used to display information about the models internal state. Note, that we use the method apply
in both supervised and unsupervised settings, rather than including separate predict
/transform
methods.
For supervised models, rather than having separate classifiers and regressors, we bundle up functionality into one class and provide an objective
parameter which can be either regression
or classification
.
For example, we can fit and apply the null model for classification as so.
import numpy as np
import mlsafari as mls
X_train = np.empty((4, 2))
y_train = np.array([1, 2, 3, 3])
X_test = np.empty((3, 2))
mod = mls.NullModel(objective='classification')
mod.fit(None, np.array())
mod.apply(X_test)
#> array([3, 3, 3])
The package is developed using the methods discussed in Hypermodern Python. Most notably, this includes using Poetry for packaging and dependcy management.
Contributions to the package are welcome. Before writing code, we suggest opening an issue detailing the algorithm you wish to implement or selecting an already open issue.
Please ensure that all contributions are documented, have full coverage with unit tests, and follow Black code style.