GitHub - IcebrakerCZ/mesosphere: Mesosphere

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Makefile		Makefile
README		README
application.cpp		application.cpp
application.hpp		application.hpp
cluster.cpp		cluster.cpp
cluster.hpp		cluster.hpp
cluster_job.cpp		cluster_job.cpp
cluster_job.hpp		cluster_job.hpp
cluster_new.cpp		cluster_new.cpp
cluster_new.hpp		cluster_new.hpp
cluster_node.cpp		cluster_node.cpp
cluster_node.hpp		cluster_node.hpp
log.cpp		log.cpp
log.hpp		log.hpp
main.cpp		main.cpp

Repository files navigation

Input resources in form (node units):

(2 3) (7 1) (1 23) (8 5) (2 8)


Input jobs in form (job units):

(3 4) (1 4) (4 7) (1 3) (5 4) (9 3)



Question: What the unit mean? How is it counted?
          Is it just hardware resource consumption as CPU or Memory
          or is there counted time complexity and is there counted
          if it can be processed on many cores in parallel?


Issue:

FIFO - worst case will be when huge job will come and all nodes will be finishing
       small jobs so free processing units will grow slowly over the nodes
       and all nodes will have almost the same processing units grow

       In the end all nodes will be able to process the huge job so it will be sent
       to one node and job processing will run normally again.

       Huge job will result in a bottleneck and cluster will hiccups :)


Solutions:

1) When bigger job comes to the job queue save some node for it
   so no other task will be sent to the saved node until the bigger task
   is sent for processing.

    Positives - simple solution
              - fast for development
              - can be easily improved as solution (2)

    Negatives - another node could have enough free units faster
                than the saved node

    Summary   - can be developed as quick fix before better solution
                is designed


2) Divide nodes to buckets where every bucket must be able to process
   bigger jobs in short time.

   So for example 25% of nodes will be free to do every small job.
   Next 25% will be reserved for medium sized jobs next 25% of nodes
   will be reserved to process big jobs and last 25% will be reserved
   to process huge jobs.

   The limits can be easily configurable so the nodes percentage
   and what small/medium/big/huge job means can be configurable too.

   On top of that a cluster stats will be counted so the nodes percentage
   and jobs difficulty can be update in runtime and can be even predicted
   based on date time as peaks occures periodically.


3) Count to the job units even how long they are in the queue already
   so even smaller jobs can be processed before bigger jobs
   when the bigger job blocks the queue.

    Positives - the system will process new jobs even the queue is blocked

    Negatives - it must be counted and proved precisely how the algorithm
                will behave

    Summary   - better solution, but the system will still have hiccups
                much smaller than in FIFO scenario but still it will hiccups



From my point of view it is better to implement solution (1) now.

Everytime bigger job comes to the beginning of the queue one node
is saved for it.