This repository consists of various materials introducing PyDAAL (Python API of Intel Data Analytics Acceleration Library) that facilitates Python and Machine Learning practitioners to start off with PyDAAL concepts.
Additionally, helper functions and classes have been provided to aid frequently performed PyDAAL operations.
Volume 1, 2 and 3 in PyDAAL Gentle Introduction Series are available as Jupyter Notebooks. These volumes are designed to provide a quick introduction to essential features of PyDAAL. These Jupyter Notebooks offer a collection of code examples that can be executed in the interactive command shell, and helper functions to automate common PyDAAL functionalities.
Install Intel Distribution for Python (IDP) through conda. IDP consists of a large set of commonly used mathematical and statistical Python packages that are optimized for Intel architectures.
- Install the latest version of Anaconda.
- Choose the Python 3.5 version2.
- From the shell prompt (on Windows, use Anaconda Prompt), execute these commands:
conda create --name idp intelpython3_full python=3 -c intel
source activate idp (on Linux and OS X)
activate idp (on Windows)
IDP environment is installed with necessary packages and activated to run these notebooks.
More detailed instructions can be found from this online article.
Various stages of machine learning model building process are bundled together to constitute one helper function class. These classes are constructed using PyDAAL’s data management and algorithm libraries to achieve a complete model deployment.
- Training
- Prediction
- Model Evaluation and Quality Metrics
- Trained Model Storage and Portability
More details on all these stages are available in Volume 3.
- Linear Regression
- Ridge Regression
- SVM - Binary and Multi-Class classifier
- Decision Forest(classification and regression)
- Kmeans
- PCA
- SVM - Binary and Multi-Class classifier
- Kmeans
For practice, usage examples with sample datasets are also provided that utilize these helper function classes.
PyDAAL API's have been used to tailor Python modules that support common operations on DAAL's Data Management library.
Import the customUtils module and explore basic utilities provided for data retrieval and manipulation operations on DAAL's Data Management library
- getArrayFromNT() : Extracts a numpy array from numeric table
- getBlockOfNumericTable(): Slices a block of numeric table with specific range of rows and columns
- getBlockOfCols(): Extracts a block of numeric table within specific range of columns
- getNumericTableFromCSV(): Reads a CSV file into a numeric table
- serialize(): Serializes any input data and saves it into a local variable/disk
- deserialize(): Deserailizes serialized data from a local variable/disk
These tutorials are spread across a collection of Jupyter notebooks comprising a theoritical explanation on algorithms and interactive command shell to execute using PyDDAL API.
Data files used in the tutorials are in the mldata folder. These data files are downloaded from the UCI Machine Learning Repository.