-
-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Hi @RootLUG ,
I am invoking Aura through Java ProcessBuilder as 30 processes with same zips as input. While doing this it is taking more time for analysis. If the same zip is invoked with a single process, it is completed within 3 mins. But doing the same for 30 zips as 30 processes, it is taking more than an hour.
Moreover, The zip contains more recursive zips. So that I have used the ThreadPoolExecutors with max_workers as 10 for extraction alone. I have also changed the max-depth in aura_config.yaml file to 50.
Here, I have given the modified ThreadPoolExecutor in package_analyzer.py file. Kindly check this and let me know why it is taking too much time for analysis while invoking through Java with 30 processes.
Thanks in advance!
`
@staticmethod
def scan_directory(item: base.ScanLocation):
print(f"Collecting files in a directory '{item.str_location}")
dir_executor = futures.ThreadPoolExecutor(max_workers=10)
dir_executor.submit(Analyzer.scan_dir_by_ThreadPool, item)
collected = Analyzer.scan_dir_by_ThreadPool(item=item)
dir_executor.shutdown()
return collected
@staticmethod
def scan_dir_by_ThreadPool(item: base.ScanLocation):
"""Scanning input directory"""
topo = TopologySort()
collected = []
for f in utils.walk(item.location):
if str(f).endswith((".py",".zip",".jar",".war", ".whl", ".egg",".gz",".tgz")):
new_item = item.create_child(f,
parent=item.parent,
strip_path=item.strip_path
)
collected.append(new_item)
topo.add_node(Path(new_item.location).absolute())
logger.debug("Computing import graph")
for x in collected:
if not x.metadata.get('py_imports'):
continue
node = Path(x.location).absolute()
topo.add_edge(node, x.metadata['py_imports']['dependencies'])
topology = topo.sort()
collected.sort(
key=lambda x: topology.index(x.location) if x.location in topology else 0
)
logger.debug("Topology sorting finished")
return collected
`