GitHub - scalding-io/ProgrammingWithScalding: Programming MapReduce with Scalding

Source code for PACKT Book 'Programming MapReduce With Scalding'

Find more information at http://scalding.io/

The book consists of 9 chapters

Introduction to Map-Reduce - Introduction to Hadoop, Map Reduce, Pipelining, Cascading, Pig and Hive. Chapter presents benefits of higher level abstractions of Map Reduce (concepts and capabilities).
Get ready for Scalding - Theory about Scalding - the Scala Domain Specific Language utilising Cascading. Development environment setup including local hadoop cluster for development. Execute the first Hello World Scalding example.
Scalding by example - The core capabilities of scalding: i) Map-like functions, ii) Grouping/reducing functions iii) Join operations
Intermediate examples - A Scalding log processing flow for a News company, aggregating multiple sources will be presented. Through an example with multiple pipe-lines some more advanced concepts are presented.
Scalding Design Patterns - Interesting design patterns applicable to Scalding data processing applications. Using the 'External Operations' patters will enable us performing unit testing and structuring our applications in a modular way.
Testing & TDD - Best practices of first defining behaviour (Behaviour Driven Development) then tests (Test Driven Development) and then completing the implementation. How to write unit, integration tests and also apply Black-box testing methodologies in the context of Big Data.
Running Scalding in Production - Tips and tricks on how to execute and schedule jobs. Also how to co-ordinate the execution of Scalding/Scala/Java and even external system processes. Finally how to configure Scalding jobs using property files or Hadoop parameters, how to monitor and optimize jobs and other usefull tips.
Using external data stores - Interaction with external external SQL, NOSQL and in-memory applications like HBase, SQL, ElasticSearch etc.
Matrix Calculations and Machine Learning - Matrix calculations using the Matrix API and algebird to calculate text similarity (TF-IDF) and set similarity (Jaccard). Then another example on Mahout K-Means clustering and outlier detection.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
chapter1		chapter1
chapter2		chapter2
chapter3		chapter3
chapter4		chapter4
chapter5		chapter5
chapter6		chapter6
chapter7		chapter7
chapter8		chapter8
chapter9		chapter9
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases 1

Packages

Contributors 4

Languages

License

scalding-io/ProgrammingWithScalding

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages