Skip to content

Taking more time to analyse with many processes #29

@KarthickRaja2002

Description

@KarthickRaja2002

Hi @RootLUG ,

I am invoking Aura through Java ProcessBuilder as 30 processes with same zips as input. While doing this it is taking more time for analysis. If the same zip is invoked with a single process, it is completed within 3 mins. But doing the same for 30 zips as 30 processes, it is taking more than an hour.

Moreover, The zip contains more recursive zips. So that I have used the ThreadPoolExecutors with max_workers as 10 for extraction alone. I have also changed the max-depth in aura_config.yaml file to 50.

Here, I have given the modified ThreadPoolExecutor in package_analyzer.py file. Kindly check this and let me know why it is taking too much time for analysis while invoking through Java with 30 processes.

Thanks in advance!

`
@staticmethod
def scan_directory(item: base.ScanLocation):
print(f"Collecting files in a directory '{item.str_location}")
dir_executor = futures.ThreadPoolExecutor(max_workers=10)
dir_executor.submit(Analyzer.scan_dir_by_ThreadPool, item)
collected = Analyzer.scan_dir_by_ThreadPool(item=item)
dir_executor.shutdown()
return collected

@staticmethod
def scan_dir_by_ThreadPool(item: base.ScanLocation):
    """Scanning input directory"""
    topo = TopologySort()
    collected = []
    for f in utils.walk(item.location):
        if str(f).endswith((".py",".zip",".jar",".war", ".whl", ".egg",".gz",".tgz")):
            new_item = item.create_child(f,
                parent=item.parent,
                strip_path=item.strip_path
                )
            collected.append(new_item)
            topo.add_node(Path(new_item.location).absolute())
            logger.debug("Computing import graph")
            for x in collected:
                if not x.metadata.get('py_imports'):
                    continue
                node = Path(x.location).absolute()
                topo.add_edge(node, x.metadata['py_imports']['dependencies'])
            topology = topo.sort()
            collected.sort(
                key=lambda x: topology.index(x.location) if x.location in topology else 0
            )
            logger.debug("Topology sorting finished")
    return collected

`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions