Handy Spark Tools Accessible From Spark SQL
Spark and PySpark packages that extend Apache Spark (UDFs) with additional SQL functions. Available for both Scala and Python.
For Data Scientists and Engineers using Spark
"As a data scientist, I can get more done by using SQL functions"
"As a data engineer, I can get real-time aggregations via SQL functions"
|last_k||Returns the lask k occurences||SELECT user_id, last_k(page_id, timestamp, 100, "unique") FROM dataset||In Development|
|approx_topk||Returns the most frequent items using a fast approximation algorithm with limited memory||SELECT approx_topk(ip_address, 1000, "10MB") FROM dataset||Available|
|approx_cond_topk||Returns the most frequent items conditioned on anyother item using a fast approximation algorithm with limited memory||NA||In Development|
Start with a Prepared Docker Image with PySpark and Jupyter
Browse the Jupyter notebooks.
Clone the repository
git clone email@example.com:MLStream/mlstream-spark-udfs.git
- Run the demo (Linux and Mac only)
The following command starts a demo Jupyter server which is ready to use with local files.
Install via Python pip or as a Spark Package
The project mlstream-spark-udfs is distributed in the hope it will be useful and help you solve pressing problems. At the same time its still early days for mlstream-spark-udfs. mlstream-spark-udfs may contain many bugs - known or unknown, it may crash, force yor computer to run out of memory and produce erroneous results. Please carry out due diligence before using and deploying in your organization. The developers developers of mlstream-spark-udfs, be they organizations or people, should not be held liable for any damages which result from running the code. The code is distributed under Apache License which should be consulted for warranties and liabilities. This disclaimer to does not replace the license.
The code is distributed under Apache License. Please check the source files in the repositories for third-party libraries used. We further use Source code derived from GoLang sort. Please consult GoLang LICENSE
Looking for Help with Spark and Spark Extensions for Your Organization
Code of Conduct
Please file an issue or contact firstname.lastname@example.org.