Skip to content

A local implementation of the map/reduce strategy in PHP

License

Notifications You must be signed in to change notification settings

cyberwolf/php-mapreduce

 
 

Repository files navigation

php-mapreduce

Latest Version on Packagist Software License Build Status Total Downloads

PHP PSR-4 compliant library to easily do non-distributed locla map-reduce.

Install

Via Composer

$ composer require jotaelesalinas/php-mapreduce

Usage

A very simple example were we have a CSV with the full order history of an hypothetical online shop and we want to know the average order value.

use JLSalinas\MapReduce\MapReduce;
use JLSalinas\RWGen\Readers\Csv;

$mapper = function($order) {
    return [
        'orders'  => 1,
        'revenue' => $order['total_amount']
    ];
};

$reducer = function ($carry, $item) {
    if ( is_null($carry) ) {
        $item['avg_order_value'] = $item['revenue'] / $item['orders'];
        return $item;
    }
    
    $orders          = $carry['orders'] + $item['orders'];
    $revenue         = $carry['revenue'] + $item['revenue'];
    $avg_order_value = $revenue / $orders;
    
    return compact('orders', 'revenue', 'avg_order_value');
};

$mapreducer = (new MapReduce(new Csv('/path/to/file.csv')))
                ->map($mapper)
                ->reduce($reducer)
                ->run();

Now an example where we also read from a CSV with the order history of an online shop, writing the output to another CSV, and we want to know for each customer:

  • date of the last order
  • number of orders since the beginning
  • amount spent since the beginning
  • average order value since the beginning
  • number of orders in the last 12 months
  • amount spent in the last 12 months
  • average order value in the last 12 months
use JLSalinas\MapReduce\MapReduce;
use JLSalinas\RWGen\Readers\Csv;
use JLSalinas\RWGen\Writers\Csv;

$mapper = function($order) {
    return [
        'customer_id'      => $order['customer_id'],
        'date_last_order'  => $order['date'],
        'orders'           => 1,
        'orders_last_12m'  => strtotime($order['date']) > strtotime('-12 months') ? 1 : 0,
        'revenue'          => $order['total_amount'],
        'revenue_last_12m' => strtotime($order['date']) > strtotime('-12 months') ? $order['total_amount'] : 0
    ];
};

$reducer = function ($carry, $item) {
    if ( is_null($carry) ) {
        $item['avg_revenue'] = $item['revenue'] / $item['orders'];
        $item['avg_revenue_last_12m'] = $item['orders_last_12m'] ? $item['revenue_last_12m'] / $item['orders_last_12m'] : 0;
        return $item;
    }
    
    $date_last_order      = max($carry['date_last_order'], $item['date_last_order']);
    $orders               = $carry['orders'] + $item['orders'];
    $orders_last_12m      = $carry['orders_last_12m'] + $item['orders_last_12m'];
    $revenue              = $carry['revenue'] + $item['revenue'];
    $revenue_last_12m     = $carry['revenue_last_12m'] + $item['revenue_last_12m'];
    $avg_revenue          = $revenue / $orders;
    $avg_revenue_last_12m = $orders_last_12m > 0 ? $revenue_last_12m / $orders_last_12m : 0;
    
    return compact('date_last_order', 'orders', 'orders_last_12m', 'revenue', 'revenue_last_12m', 'avg_revenue', 'avg_revenue_last_12m');
};

$mapreducer = (new MapReduce(new Csv('/path/to/input_file.csv')))
                ->map($mapper)
                ->reduce($reducer, true)
                ->writeTo(new Csv('/path/to/output_file.csv'))
                ->run();

You can see more elaborated examples under the folder docs.

Change log

Please see CHANGELOG for more information what has changed recently.

Testing

$ composer test

Contributing

Please see CONTRIBUTING and CONDUCT for details.

Security

If you discover any security related issues, please DM me to @jotaelesalinas instead of using the issue tracker.

To do

  • Tests events in MapReduce
  • Add docs
  • Insurance example
    • adapt to new library
    • add insured values
    • improve kml output (info, markers)
  • (Enhancement) withBuffer(int $max_size) to allow mapping and reducing in batches
    • (Enhancement) Multithread (requires pthreads)
      • (Enhancement) Pipelining: map while reading, reduce while mapping
  • Mention that it is possible to work both with local and cloud data by implementing the right Reader/Writer, possibly using Flysystem by Frank de Jonge.
  • Move this to-do list to Issues
  • Create milestones in GitHub for: sequential (v1.0), buffered (v1.1), multithreaded (v1.2), pipelined (v1.3).

Credits

License

The MIT License (MIT). Please see License File for more information.

About

A local implementation of the map/reduce strategy in PHP

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 100.0%