Skip to content

Conversation

@njzjz
Copy link
Member

@njzjz njzjz commented Apr 28, 2021

The original implementation method removes files one by one.
In some supercomputers, it's very slow to remove large numbers of files
(e.g. directory containing trajectory) due to bad I/O performance.
Also, if the server latency is high, it is very terrible
to execute thousands of commands synchronously.
Instead, we can call nohup rm -rf $directory >/dev/null &
to remove a directory asynchronously, which will save a lot
of time in some situations.

@felix5572 said he may have a better way to implement it.

@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2021

Codecov Report

Merging #385 (ce9c7b4) into devel (288f0b1) will increase coverage by 0.01%.
The diff coverage is 20.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##            devel     #385      +/-   ##
==========================================
+ Coverage   32.35%   32.37%   +0.01%     
==========================================
  Files          85       85              
  Lines       14381    14373       -8     
==========================================
  Hits         4653     4653              
+ Misses       9728     9720       -8     
Impacted Files Coverage Δ
dpgen/dispatcher/SSHContext.py 21.55% <20.00%> (+0.76%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 288f0b1...ce9c7b4. Read the comment docs.

@amcadmus
Copy link
Member

Some times the user dose not want a asynchronous dir removing, we should provide the option for them.

@njzjz njzjz marked this pull request as draft April 29, 2021 02:20
@njzjz
Copy link
Member Author

njzjz commented May 14, 2021

Some times the user dose not want a asynchronous dir removing, we should provide the option for them.

How do we provide the option?

njzjz added a commit to njzjz/dpdispatcher that referenced this pull request May 26, 2021
The original implementation method removes files one by one using sftp.
If the latency of the remote server is high, it is very slow.
Thus, it's better to use system's `rm` to remove a directory, which may
save a lot of time.

Also, in some supercomputers, it's very slow to remove large numbers of files
(e.g. directory containing trajectory) due to bad I/O performance.
So an asynchronously option is provided.

Implement deepmodeling/dpgen#385. Close deepmodeling/dpgen#385.
@njzjz njzjz closed this May 26, 2021
@njzjz njzjz deleted the rm_directory branch May 26, 2021 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants