PHP PSR-4 compliant library to easily do non-distributed locla map-reduce.
Via Composer
$ composer require jotaelesalinas/php-mapreduce
A very simple example were we have a CSV with the full order history of an hypothetical online shop and we want to know the average order value.
use JLSalinas\MapReduce\MapReduce;
use JLSalinas\RWGen\Readers\Csv;
$mapper = function($order) {
return [
'orders' => 1,
'revenue' => $order['total_amount']
];
};
$reducer = function ($carry, $item) {
if ( is_null($carry) ) {
$item['avg_order_value'] = $item['revenue'] / $item['orders'];
return $item;
}
$orders = $carry['orders'] + $item['orders'];
$revenue = $carry['revenue'] + $item['revenue'];
$avg_order_value = $revenue / $orders;
return compact('orders', 'revenue', 'avg_order_value');
};
$mapreducer = (new MapReduce(new Csv('/path/to/file.csv')))
->map($mapper)
->reduce($reducer)
->run();
Now an example where we also read from a CSV with the order history of an online shop, writing the output to another CSV, and we want to know for each customer:
- date of the last order
- number of orders since the beginning
- amount spent since the beginning
- average order value since the beginning
- number of orders in the last 12 months
- amount spent in the last 12 months
- average order value in the last 12 months
use JLSalinas\MapReduce\MapReduce;
use JLSalinas\RWGen\Readers\Csv;
use JLSalinas\RWGen\Writers\Csv;
$mapper = function($order) {
return [
'customer_id' => $order['customer_id'],
'date_last_order' => $order['date'],
'orders' => 1,
'orders_last_12m' => strtotime($order['date']) > strtotime('-12 months') ? 1 : 0,
'revenue' => $order['total_amount'],
'revenue_last_12m' => strtotime($order['date']) > strtotime('-12 months') ? $order['total_amount'] : 0
];
};
$reducer = function ($carry, $item) {
if ( is_null($carry) ) {
$item['avg_revenue'] = $item['revenue'] / $item['orders'];
$item['avg_revenue_last_12m'] = $item['orders_last_12m'] ? $item['revenue_last_12m'] / $item['orders_last_12m'] : 0;
return $item;
}
$date_last_order = max($carry['date_last_order'], $item['date_last_order']);
$orders = $carry['orders'] + $item['orders'];
$orders_last_12m = $carry['orders_last_12m'] + $item['orders_last_12m'];
$revenue = $carry['revenue'] + $item['revenue'];
$revenue_last_12m = $carry['revenue_last_12m'] + $item['revenue_last_12m'];
$avg_revenue = $revenue / $orders;
$avg_revenue_last_12m = $orders_last_12m > 0 ? $revenue_last_12m / $orders_last_12m : 0;
return compact('date_last_order', 'orders', 'orders_last_12m', 'revenue', 'revenue_last_12m', 'avg_revenue', 'avg_revenue_last_12m');
};
$mapreducer = (new MapReduce(new Csv('/path/to/input_file.csv')))
->map($mapper)
->reduce($reducer, true)
->writeTo(new Csv('/path/to/output_file.csv'))
->run();
You can see more elaborated examples under the folder docs.
Please see CHANGELOG for more information what has changed recently.
$ composer test
Please see CONTRIBUTING and CONDUCT for details.
If you discover any security related issues, please DM me to @jotaelesalinas instead of using the issue tracker.
- Tests events in MapReduce
- Add docs
- Insurance example
- adapt to new library
- add insured values
- improve kml output (info, markers)
- (Enhancement)
withBuffer(int $max_size)
to allow mapping and reducing in batches- (Enhancement) Multithread (requires pthreads)
- (Enhancement) Pipelining: map while reading, reduce while mapping
- (Enhancement) Multithread (requires pthreads)
- Mention that it is possible to work both with local and cloud data by implementing the right Reader/Writer, possibly using Flysystem by Frank de Jonge.
- Move this to-do list to Issues
- Create milestones in GitHub for: sequential (v1.0), buffered (v1.1), multithreaded (v1.2), pipelined (v1.3).
The MIT License (MIT). Please see License File for more information.