Taking more time to analyse with many processes

Hi @RootLUG ,

I am invoking Aura through __Java ProcessBuilder__ as **30 processes** with same zips as input. While doing this it is taking more time for analysis. If the same zip is invoked with a **single process,** it is completed **within 3 mins**. But doing the same for **30 zips as 30 processes, it is taking more than an hour**.

Moreover, The zip contains more recursive zips. So that I have used the **ThreadPoolExecutors with max_workers as 10** for extraction alone. I have also changed the **max-depth** in aura_config.yaml file to 50.

Here, I have given the modified ThreadPoolExecutor in package_analyzer.py file. Kindly check this and let me know why it is taking too much time for analysis while invoking through Java with 30 processes.

Thanks in advance!

 `
@staticmethod
    def scan_directory(item: base.ScanLocation):
        print(f"Collecting files in a directory '{item.str_location}")
        dir_executor = futures.ThreadPoolExecutor(max_workers=10)
        dir_executor.submit(Analyzer.scan_dir_by_ThreadPool, item)
        collected = Analyzer.scan_dir_by_ThreadPool(item=item)
        dir_executor.shutdown()
        return collected
    
    @staticmethod
    def scan_dir_by_ThreadPool(item: base.ScanLocation):
        """Scanning input directory"""
        topo = TopologySort()
        collected = []
        for f in utils.walk(item.location):
            if str(f).endswith((".py",".zip",".jar",".war", ".whl", ".egg",".gz",".tgz")):
                new_item = item.create_child(f,
                    parent=item.parent,
                    strip_path=item.strip_path
                    )
                collected.append(new_item)
                topo.add_node(Path(new_item.location).absolute())
                logger.debug("Computing import graph")
                for x in collected:
                    if not x.metadata.get('py_imports'):
                        continue
                    node = Path(x.location).absolute()
                    topo.add_edge(node, x.metadata['py_imports']['dependencies'])
                topology = topo.sort()
                collected.sort(
                    key=lambda x: topology.index(x.location) if x.location in topology else 0
                )
                logger.debug("Topology sorting finished")
        return collected

`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Taking more time to analyse with many processes #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Taking more time to analyse with many processes #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions