Skip to content

christopherkindl/Machine-Learning-with-Apache-Spark-Quick-Start-Guide

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine-Learning-with-Apache-Spark-Quick-Start-Guide

Machine Learning with Apache Spark Quick Start Guide, published by Packt

Book Name

This is the code repository for Machine Learning with Apache Spark Quick Start Guide, published by Packt.

Uncover patterns, derive actionable insights, and learn from big data using MLlib

What is this book about?

Every person and every organisation in the world manages data, whether they realise it or not. Data is used to describe the world around us and can be used for almost any purpose, from analysing consumer habits to fighting disease and serious organised crime. Ultimately we manage data in order to derive value from it, and many organisations around the world have traditionally invested in technology to help process their data faster and more efficiently.

This book covers the following exciting features:

  • Understand how Spark fits in the context of Big Data ecosystem
  • Learn to deploy and configure local development environment using Apache Spark
  • Lean to design Supervised and Unsupervised learning models
  • Build models to perform NLP, Deep learning, cognitive services using Spark ML libraries
  • Design real-time machine learning pipelines in Apache Spark

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

import findspark
findspark.init()
from pyspark import SparkContext, SparkConf
import random}

Following is what you need for this book: This book is aimed at Business Analysts, Data Analysts and Data Scientists who wish to make a hands-on start in order to take advantage of modern Big Data technologies combined with Advanced Analytics.

With the following software and hardware list you can run all code files present in the book (Chapter 1-8).

Software and Hardware List

Chapter Software required OS required
1-8 Java SE Development Kit (JDK) 8(8u*) CentOS Linux 7
Anaconda 5.2 (with Python 3.6)
Apache Spark 2.3.2
Apache Kafka 2.0.0

Related products

Get to Know the Author

Jillur Quddus is a lead technical architect, polyglot software engineer and data scientist with over 10 years of hands-on experience in architecting and engineering distributed, scalable, high-performance, and secure solutions used to combat serious organized crime, cybercrime, and fraud. Jillur has extensive experience of working within central government, intelligence, law enforcement, and banking, and has worked across the world including in Japan, Singapore, Malaysia, Hong Kong, and New Zealand. Jillur is both the founder of Keisan, a UK-based company specializing in open source distributed technologies and machine learning, and the lead technical architect at Methods, the leading digital transformation partner for the UK public sector.

Suggestions and Feedback

Click here if you have any feedback or suggestions.

About

Machine Learning with Apache Spark Quick Start Guide, published by Packy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.4%
  • Python 1.6%