New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next Generation Bears [$110 awarded] #725

Closed
sils opened this Issue Jul 2, 2015 · 21 comments

Comments

4 participants
@sils
Copy link
Member

sils commented Jul 2, 2015

This is an umbrella bug for #724 , #722 , #720 with complete removal of the old Bear class.

I opened this because I'll be putting a bounty on it soon :)

@sils sils added the type/feature label Jul 2, 2015

@sils sils changed the title Next Generation Bears Next Generation Bears [$55] Jul 2, 2015

@sils sils added the bounty label Jul 2, 2015

@sils sils changed the title Next Generation Bears [$55] Next Generation Bears [$60] Jul 2, 2015

@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 2, 2015

I'm always there for helping designing this, trying to add info here and there.

@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 2, 2015

Requirements, this is the order in which I'd try to implement that stuff:

  • A coala pool exists. Ideally it should provide the same functionality as multiprocessing.pool
    • If coala runs with only one process it simply uses traditional map/imap functions.
    • Else it parallelizes the execution over the processes coala has spawned anyways. This is at the same time a full abstraction over the hardware, i.e. that stuff could easily run across multiple machines if coala supports that later. Dataparallel programming in a pythonic and easy way managed by coala. (Yeah!)
  • Results can be yielded or returned as list (i.e. function can be written as a generator)
    • If results are yielded they should be immediately transferred to the other process allowing e.g. a GUI to display a result while the bear still runs.
  • A bear decorator exist.
    • It postprocesses all results and adds the origin field correctly. One trivial thing less for bear writers.
    • That decorator can be chained with others to automatically use coalib.pool for parallelizing over files, e.g.
      @bear @file_parallel def some_bear(file, filename): pass
      • A pool can be requested by simply adding a pool param (so file_parallel can simply demand one)
      • If no coala pool is given bear just passes multiprocessing pool.
      • file_parallel preprocesses results and only gives the ones associated with the current file in file-parallel mode.
      • file_parallel postprocesses the results and adds the filename for file parallel stuff.
  • Bears can set dependencies in their signature @bear def some_bear(dependencies: [some_bear]) and will get a dict then with the bearnames as key and the values.
    • If no dependency results are given, the bear decorator executes some_bear automatically so it can be used without coala. This can get complex with multiple dependencies...
  • All bears are refactored to use that (easy :))
  • The Bear class doesn't exist anymore.
@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 2, 2015

Example of a local bear:

from coalib.bears.bear import bear

def without_comments(line): pass  # DEMO

@bear
@file_parallel
def remove_comments(filename, file, pool):
    # bear decorator postprocesses the result and adds the function name as origin and the
    # file in case of file-parallel data processing
    return HiddenResult(pool.map(without_comments, file))

@bear
@file_parallel
def find_keyword(filename, file, keyword: str, dependencies: [remove_comments]):
    # file_parallel knows that only dep results for the current file are interesting and
    # filters out all others, we know remove_comments only yields one result per file
    without_comments = dependencies[remove_comments][0]
    for line_nr, has_kw in enumerate(without_comments):
        yield Result("Keyword found!", line_nr)

@sils sils added this to the 0.2 alpha milestone Jul 2, 2015

@AbdealiJK

This comment has been minimized.

Copy link
Contributor

AbdealiJK commented Jul 4, 2015

@sils1297 - Let's take an example.
If I want to write a global bear to find some specific info about each file and then concatenate all these to give back 1 final output - how would I do this ?
Currently, @bear @file_parallel would run a function on every file. But what If I want to use the results of all these to do something else ?

For example - Something like find number of lines of code in the whole project

@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 4, 2015

@bear
@file_parallel:
def count_file_lines(filename, file):
    yield HiddenResult(len(file))

@bear
@depends_on(count_file_lines)
def count_all_lines(file_dict, dependency_results):
    yield Result("All files have {} lines.".format(sum(result.value for result in dependency_results[count_file_lines])))
@AbdealiJK

This comment has been minimized.

Copy link
Contributor

AbdealiJK commented Jul 4, 2015

AH - looks cool.
So, global bears aren't parallelized

@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 4, 2015

@AbdealiJK there is no local or global bear anymore - there are only decorators like file_parallel that do data parallel execution of your bear. There are just bears.

@AbdealiJK

This comment has been minimized.

Copy link
Contributor

AbdealiJK commented Jul 4, 2015

So, decorators needed will be depends_on , bear and file_parallel. Anything else to add ?

@AbdealiJK

This comment has been minimized.

Copy link
Contributor

AbdealiJK commented Jul 4, 2015

Another question, as the bears have become functions now - there are a few ways to implement the bear_gathering.

  1. We set a condition that the filename and function name has to be same - which is a little irritating.
  2. We search all *.py files in the bear_dir to get the function name which is same as the bear

Also, currently the bear names in config files are camel case (eg: SpaceConsistencyBear) do we want to change that to snake case (space_consistency_bear) ? as they are now functions ?

@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 4, 2015

Actually I think the cleanest way is to use python annotations. So basically we want to import everything that is marked with the @bear decorator. We could just make the @bear decorator alter the signature, i.e. it adds an annotation for the return value. Think of it: a bear is anything that yields results so we could annotate it like

from coalib.results.Result import Result

def this_is_a_bear_without_using_a_decorator(file_dict) -> Result:
    pass

So the @bear decorator does nothing but adding this Result annotation and coala just checks for this annotation. This way we don't even rely on our decorators or the name of them.

@AbdealiJK

This comment has been minimized.

Copy link
Contributor

AbdealiJK commented Jul 4, 2015

But how would you differentiate between names ? If I write SpaceConsistencyBear in my coafile, How do I know which function to run ?

the bear-dev would have to specifiy what he wants his bear to be called somewhere

@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 4, 2015

it would collect all functions from the SpaceConsistencyBear.py file. So we thought a lot about this problem and we chose this way because it allows grouping bears in an easy way (we also added an extra mechanism for that) and we know which files to search.

@AbdealiJK

This comment has been minimized.

Copy link
Contributor

AbdealiJK commented Jul 4, 2015

I see - so, I cannot define 2 bears in the same file ?
Or rather what would you do if someone declares 2 @bears in the same file

@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 4, 2015

If you define 2 bears in the same file your user can only execute both or none of them. They will both be imported.

@Makman2

This comment has been minimized.

Copy link
Member

Makman2 commented Jul 13, 2015

For the @bear decorator: Maybe we want to use a parameter that determines the parallelization, since this field is everytime associated with a bear and is required. We could default to global since this is the simplest parallelization model.

For example:

@bear(Parallelization.FILE_PARALLEL)
def my_bear(whatever_input):
    pass
@Makman2

This comment has been minimized.

Copy link
Member

Makman2 commented Jul 13, 2015

And after short reading about annotations this is imo the way to go, since we can't remove the bear class completely. The bear class works in background, the new annotations shall create this class for us simply so we just need to write a decorated function.

@sils

This comment has been minimized.

Copy link
Member

sils commented Jul 14, 2015

For the @bear decorator: Maybe we want to use a parameter that determines the parallelization,
since this field is everytime associated with a bear and is required. We could default to global since
this is the simplest parallelization model.

For example:

@bear(Parallelization.FILE_PARALLEL)
def my_bear(whatever_input):
    pass

The idea is to split off parallelization from the coala internals. The bear doesn't need to have anything to do with parallelization so we can split it off. That being said if you want to do a global bear it get's easier for you. Second thing, we'll allow arbitrary parallelization, not only file wise but whatever-wise; you'll be able to write decorators for whatever you want. You can only do that as own decorators or you would need to come up with a parallelization object or so. IMO this is more generic, you can even chain decorators.

@Makman2

This comment has been minimized.

Copy link
Member

Makman2 commented Jul 14, 2015

okay if the next gen system would allow that this is definitely cooler and more flexible :)

@sils sils modified the milestones: 0.3 alpha, 0.2 alpha Jul 15, 2015

@sils

This comment has been minimized.

Copy link
Member

sils commented Aug 10, 2015

simple bear decorator solving the metadata problem:

def copy_metadata(source, target, omit):
    if hasattr(source, "__metadata__"):
        target.__metadata__ = source.__metadata__
    else:
        target.__metadata__ = FunctionMetadata.from_function(source, omit)

def bear(func):
    def invoke_bear(**kwargs):
        func(**kwargs)

    copy_metadata(func, invoke_bear, ('self', ))
    return invoke_bear

@sils sils changed the title Next Generation Bears [$60] Next Generation Bears [$110] Aug 17, 2015

@fneu

This comment has been minimized.

Copy link
Contributor

fneu commented Aug 17, 2015

But how would you differentiate between names ? If I write SpaceConsistencyBear in my coafile, How do I know which function to run ?

or we name them module.bear_name. That is not beautiful but convenient, and most other options are bad:

  • bear name is interpreted as file name and all functions are imported: missing flexibility to import just one bear.
  • bear name is interpreted as function name: we need to search and there can be naming conflicts.
@sils

This comment has been minimized.

Copy link
Member

sils commented Aug 17, 2015

closing because growing chaotic, reopening new bug soonish.

@sils sils closed this Aug 17, 2015

@sils sils reopened this Aug 17, 2015

@sils sils closed this Aug 17, 2015

@sils sils changed the title Next Generation Bears [$110] Next Generation Bears [$110 awarded] Jan 10, 2016

@AbdealiJK AbdealiJK referenced this issue Mar 16, 2016

Open

Next generation bears (v2) [$110] #933

0 of 3 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment