Mirror of Apache Mnemonic (Incubating)
Java C Shell CMake Python

README.md

Apache Mnemonic is an advanced hybrid memory storages oriented library, it proposed a non-volatile/durable Java object model and durable computing service that bring several advantages to significantly improve the performance of massive real-time data processing/analytics. developers are able to use this library to design their cache-less and SerDe-less high performance applications.

Features:

  • In-place data storage on local non-volatile memory
  • Durable Object Model (DOM)
  • Durable Native Computing Model (DNCM)
  • Object graphs lazy loading & sharing
  • Auto-reclaim memory resources and Mnemonic objects
  • Hierarchical cache pool for massive data caching
  • Extensible memory services for new device adoption and allocation optimization
  • Durable data structure collection(WIP)
  • Durable computing service
  • Minimize memory footprint of on-heap
  • Reduce GC Overheads as the following chart shown (collected from Apache Spark experiments)
  • Drop-in Hadoop MapReduce support

Mnemonic_GC_stats

Mnemonic Way

Mnemonic_Way

Mnemonic_Modes

How to use it ?

Define a Non-Volatile class:

/**
 * a durable class should be abstract, implement Durable interface and marked with @DurableEntity annotation
 */
@DurableEntity
public abstract class Person<E> implements Durable, Comparable<Person<E>> {
        E element; // Generic Type

        /**
         * callback for this durable object creation
         */
        @Override
        public void initializeAfterCreate() { 
                System.out.println("Initializing After Created");
        }

        /**
         * callback for this durable object recovery
         */
        @Override
        public void initializeAfterRestore() { 
                System.out.println("Initializing After Restored");
        }

        /**
         * setup generic info manually to avoid performance penalty
         */
        @Override
        public void setupGenericInfo(EntityFactoryProxy[] efproxies, GenericField.GType[] gftypes) {

        }

        @Test
        public void testOutput() throws RetrieveDurableEntityError {
                System.out.printf("Person %s, Age: %d ( %s ) \n", getName(), getAge(),
                                null == getMother()? "No Recorded Mother" : "Has Recorded Mother");
        }

        public int compareTo(Person<E> anotherPerson) {
                int ret = 0;
                if (0 == ret) ret = getAge().compareTo(anotherPerson.getAge());
                if (0 == ret) ret = getName().compareTo(anotherPerson.getName());
                return ret;
        }

        /**
         * Getters and Setters for non-volatile fields marked with @DurableGetter and @DurableSetter
         */
        @DurableGetter(Id = 1L)
        abstract public Short getAge();
        @DurableSetter
        abstract public void setAge(Short age);

        @DurableGetter(Id = 2L)
        abstract public String getName() throws RetrieveDurableEntityError;
        @DurableSetter
        abstract public void setName(String name, boolean destroy) throws OutOfPersistentMemory, RetrieveDurableEntityError;

        @DurableGetter(Id = 3L)
        abstract public Person<E> getMother() throws RetrieveDurableEntityError;
        @DurableSetter
        abstract public void setMother(Person<E> mother, boolean destroy) throws RetrieveDurableEntityError;

        @DurableGetter(Id = 4L)
        abstract public Person<E> getFather() throws RetrieveDurableEntityError;
        @DurableSetter
        abstract public void setFather(Person<E> mother, boolean destroy) throws RetrieveDurableEntityError;
}

Use a non-volatile class:

Setup an allocator for non-volatile object graphs.
        // create an allocator instance
        NonVolatileMemAllocator act = new NonVolatileMemAllocator(1024 * 1024 * 8, "./pobj_person.dat", true);

        // fetch handler store capacity from this non-volatile storage managed by this allocator
        KEYCAPACITY = act.handlerCapacity();
        ....
        // close it after use
        act.close();
Generate structured non-volatile objects.
        // create a new non-volatile person object from this specific allocator
        person = PersonFactory.create(act);

        // set attributes
        person.setAge((short)rand.nextInt(50));
        person.setName(String.format("Name: [%s]", UUID.randomUUID().toString()), true);

        // keep this person on non-volatile handler store
        act.setHandler(keyidx, person.getHandler());

        for (int deep = 0; deep < rand.nextInt(100); ++deep) {

                // create another person as mother
                mother = PersonFactory.create(act);
                mother.setAge((short)(50 + rand.nextInt(50)));
                mother.setName(String.format("Name: [%s]", UUID.randomUUID().toString()), true);

                // set the person's mother
                person.setMother(mother, true);

                person = mother;
        }
Use the non-volatile objects
        for (long i = 0; i < KEYCAPACITY; ++i) {

                System.out.printf("----------Key %d--------------\n", i);
                // iterate non-volatile handlers from handler store of this specific allocator
                val = act.getHandler(i);
                if (0L == val) {
                        break;
                }

                // restore person objects from this specific allocator
                Person<Integer> person = PersonFactory.restore(act, val, true);

                while (null != person) {
                        person.testOutput();
                        // iterate all mother's ancestors
                        person = person.getMother();
                }
        }
Perform the durable native computing (e.g. printing) w/o packing/unpacking massive object graphs
         // fetch print service
         GeneralComputingService gcsvr = Utils.getGeneralComputingService("print");
         // instantiate a value info for a value matrix
         ValueInfo vinfo = new ValueInfo();
         // instantiate a object stack
         List<long[][]> objstack = new ArrayList<long[][]>();
         // fill up with all durable object info in order
         objstack.add(firstnv.getNativeFieldInfo());
         objstack.add(person.getNativeFieldInfo());
         // configure the Id stack for each level of durable objects
         long[][] fidinfostack = {{2L, 1L}, {0L, 1L}};
         // configure the handler of a value matrix
         vinfo.handler = handler;
         // set translate table from handler's allocator
         vinfo.transtable = m_act.getTranslateTable();
         // specify the durable type of value
         vinfo.dtype = DurableType.SHORT;
         // generate frames for this value matri from both stacks
         vinfo.frames = Utils.genNativeParamForm(objstack, fidinfostack);
         // form an array of value infos
         ValueInfo[] vinfos = {vinfo};
         // perform the print operation
         gcsvr.perform(vinfos);

How to build it ?

Please see the file LICENSE for information on how this library is licensed.

  • mnemonic-core -- the submodule project for core
  • mnemonic-collections -- the submodule project for generic collections
  • mnemonic-examples -- the submodule project for examples, Please refer to the testcases of respective module as complete examples.
  • mnemonic-memory-services/mnemonic-pmalloc-service -- the submodule project for pmalloc memory service
  • mnemonic-memory-services/mnemonic-nvml-vmem-service -- the submodule project for vmem memory service
  • mnemonic-memory-services/mnemonic-nvml-pmem-service -- the submodule project for pmem memory service
  • mnemonic-memory-services/mnemonic-sys-vmem-service -- the submodule project for system vmem memory service
  • mnemonic-memory-services/service-dist -- the location of extensive memory services (auto-generated)
  • mnemonic-computing-services/mnemonic-utilities-service -- the submodule project for utilities computing service
  • mnemonic-computing-services/service-dist -- the location of extensive computing services (auto-generated)
  • mnemonic-hadoop/mnemonic-hadoop-mapreduce -- the submodule project for hadoop mapreduce computing

Durable Memory Service Comparison Table

Features NVML-VMEM PMALLOC NVML-PMEM SYS-VMEM
Fixed Durable K-V Store NA O O NA
Support DOM O O O O
Support DNCM O O O O
Support OS X NA O NA O
Memory Map Sync. NA O O NA
PM Flush NA NA O NA
PM Drain NA NA O NA
PM Persist NA NA O NA
PM Atomic Ops. NA NA O NA
Expected Performance Average Average Slow(on Disk) Fast

To build this library, you may need to install some required packages on the build system:

  • Maven -- the building tool v3.2.1 or above [Required]
  • NVML -- the NVM library (Please compile this library that was revised with 630862e82f) (http://pmem.io) [Optional if mnemonic-nvml-vmem-service/mnemonic-nvml-pmem-service are excluded, e.g. on MacOSX]
  • JDK -- the Java Develop Kit 1.6 or above (please properly configure JAVA_HOME) [Required]
  • PMFS -- the PMFS should be properly installed and configured on Linux system if you want to simulate read latency [Optional]
  • PMalloc -- a supported durable memory native library(Latest) at https://github.com/NonVolatileComputing/pmalloc.git [Optional if mnemonic-pmalloc-service is excluded]

Once the build system is setup, this Library is built using this command at the top level:

  $ git clean -xdf # if pull from a git repo.
  $ mvn clean package install

To exclude a customized memory service for your platform e.g. OSX, note that if you excluded one or both memory services, some or all testcases/examples will fail since their dependent memory services are unavailable.

  $ git clean -xdf # if pull from a git repo.
  $ mvn -pl '!mnemonic-memory-services/mnemonic-nvml-vmem-service' clean package install

To install this package to local repository (required to run examples and testcases):

  $ mvn clean install

To run an example:

  $ # requires 'vmem' memory service to run, please refer to the code of test cases for more examples.
  $ mvn exec:exec -Pexample -pl mnemonic-examples

To run several test cases:

  $ # a testcase for module "mnemonic-core" that requires 'pmalloc' memory service to pass
  $ mvn -Dtest=DurablePersonNGTest test -pl mnemonic-core -DskipTests=false

  $ # a testcase for module "mnemonic-core" that requires 'pmalloc' memory service to pass
  $ mvn -Dtest=NonVolatileMemAllocatorNGTest test -pl mnemonic-core -DskipTests=false

  $ # a testcase for module "mnemonic-core" that requires 'vmem' memory service to pass
  $ mvn -Dtest=VolatileMemAllocatorNGTest test -pl mnemonic-core -DskipTests=false

  $ # a testcase for module "mnemonic-core" that requires 'vmem memory service to pass
  $ mvn -Dtest=MemClusteringNGTest test -pl mnemonic-core -DskipTests=false

  $ # a testcase for module "mnemonic-collection" that requires 'pmalloc' memory service to pass
  $ mvn -Dtest=DurableSinglyLinkedListNGTest  test -pl mnemonic-collections -DskipTests=false

  $ # a testcase for module "mnemonic-collection" that requires 'pmalloc' memory service to pass
  $ mvn -Dtest=DurablePersonNGTest  test -pl mnemonic-collections -DskipTests=false

  $ # a testcase for module "mnemonic-computing-services/mnemonic-utilities-service" that requires 'pmalloc' memory service to pass
  $ mvn -Dtest=DurableSinglyLinkedListNGPrintTest test -pl mnemonic-computing-services/mnemonic-utilities-service -DskipTests=false

  $ # a testcase for module "mnemonic-computing-services/mnemonic-utilities-service" that requires 'pmalloc' memory service to pass
  $ mvn -Dtest=DurableSinglyLinkedListNGSortTest test -pl mnemonic-computing-services/mnemonic-utilities-service -DskipTests=false

  $ # a testcase for module "mnemonic-hadoop/mnemonic-hadoop-mapreduce" that requires 'pmalloc' memory service to pass
  $ mvn -Dtest=MneMapreduceIOTest test -pl mnemonic-hadoop/mnemonic-hadoop-mapreduce -DskipTests=false

How to benchmark ?

To run sort bench workloads.

  $ # generate some input data files with a count parameter that indicates how many random numbers to be generated
  $ mnemonic-benches/mnemonic-sort-bench/bin/gen_data.py 20000
  $ # create a configure file that contains all the absolute paths of generated input data files to work on in a batch
  $ # call the run.py to run the bench workloads with the configure file
  $ mnemonic-benches/mnemonic-sort-bench/bin/run.py ./sort-files.conf
  $ # after finished, you can find the result data from the sort_bench_result.log
  $ less mnemonic-benches/mnemonic-sort-bench/sort_bench_result.log

Where is the document ?

How to apply it for other projects ?