New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: hashmap based caching #82
Conversation
lossy shouldn't be used here.
crates/cli/src/package.rs
Outdated
.join(&package_version.name); | ||
|
||
if let Some(mut receiver) = package_cache.get_mut(&saved_path) { | ||
// TODO: is this loop necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a Mutex<Receiver<PackageState>>
we don't have to have a loop here. Similar to: https://medium.com/@polyglot_factotum/rust-concurrency-patterns-condvars-and-locks-e278f18db74f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not what my question was referring to. I am wonder whether I should remove the whole code block inside loop (leaving only return Ok(())
) or keep it. This would only be clear after further work. Hence the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the package is InProgress
we need to wait, since after this function, we call package import functions to link the store files with node_modules.
crates/tarball/src/lib.rs
Outdated
Ok(cas_files) | ||
}) | ||
.await? | ||
verify_checksum(&response, package_integrity)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we remove tokio spawn in here? The tarball extraction is a CPU bounded problem, and solving it in a different thread helps the cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were going to .await
on the handle right afterward. So even if we spawn another thread, the current thread will be waiting for the spawned thread to complete. It will just like not spawning any thread at all.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #82 +/- ##
==========================================
- Coverage 85.82% 83.92% -1.91%
==========================================
Files 24 24
Lines 1270 1306 +36
==========================================
+ Hits 1090 1096 +6
- Misses 180 210 +30
☔ View full report in Codecov by Sentry. |
Benchmark ResultsLinux
|
Could you add a description what this PR is about? |
done |
.join("node_modules") | ||
.join(&package_version.name); | ||
|
||
// TODO: skip when it already exists in store? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the virtual store at node_modules/.pnpm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is global store.
crates/cli/src/package.rs
Outdated
|
||
let saved_path = config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think import_destination
or import_dest
would be a better name for the variable.
However, the argument name in the function is save_path, so better name the same, not "saveD_path"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
@@ -3,7 +3,7 @@ use serde::{Deserialize, Serialize}; | |||
#[derive(Serialize, Deserialize, Debug, Default, Clone, Eq)] | |||
#[serde(rename_all = "camelCase")] | |||
pub struct PackageDistribution { | |||
pub integrity: String, | |||
pub integrity: String, // TODO: why fetching typescript cause error here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we merge the PR with TODO items? It is probably best to open issues instead of adding TODO comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer code todos as well as issues. It makes it easier to navigate through the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crates/tarball/src/lib.rs
Outdated
} | ||
} | ||
drop(cache_lock); | ||
sleep(Duration::from_millis(100)).await; // TODO: millis can be any small number, even 0, further testing is required to find the ideal number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to use sleep for this? Looks like a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we even need sleep in here? We need more comments on the code in general.
&self.http_client, | ||
name, | ||
version, | ||
&self.config.modules_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already passing self.config to this function. We dont need this do we?
crates/cli/src/package.rs
Outdated
|
||
let saved_path = config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
.join("node_modules") | ||
.join(&package_version.name); | ||
|
||
// TODO: skip when it already exists in store? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is global store.
@@ -3,7 +3,7 @@ use serde::{Deserialize, Serialize}; | |||
#[derive(Serialize, Deserialize, Debug, Default, Clone, Eq)] | |||
#[serde(rename_all = "camelCase")] | |||
pub struct PackageDistribution { | |||
pub integrity: String, | |||
pub integrity: String, // TODO: why fetching typescript cause error here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer code todos as well as issues. It makes it easier to navigate through the codebase.
@@ -30,13 +29,18 @@ impl PackageVersion { | |||
http_client: &reqwest::Client, | |||
registry: &str, | |||
) -> Result<Self, RegistryError> { | |||
let url = || format!("{0}{name}/{version}", ®istry); // TODO: use reqwest url type | |||
let network_error = |error| NetworkError { error, url: url() }; | |||
|
|||
http_client | |||
.get(format!("{0}{name}/{version}", ®istry)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line needs to be updated to use local variable url.
@@ -30,13 +29,18 @@ impl PackageVersion { | |||
http_client: &reqwest::Client, | |||
registry: &str, | |||
) -> Result<Self, RegistryError> { | |||
let url = || format!("{0}{name}/{version}", ®istry); // TODO: use reqwest url type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current type of registry
is &str
, and it only works correctly when it ends with a /
. The Url
type is more efficient and less error-prone.
crates/tarball/src/lib.rs
Outdated
} | ||
} | ||
drop(cache_lock); | ||
sleep(Duration::from_millis(100)).await; // TODO: millis can be any small number, even 0, further testing is required to find the ideal number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we even need sleep in here? We need more comments on the code in general.
`sleep` is just `yield_now` with extra steps
serde = { workspace = true } | ||
serde_json = { workspace = true } | ||
tokio = { workspace = true } | ||
async-recursion = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using async-recursion? I couldn't find any reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[async_recursion] |
Before this PR,
pacquet install
was installing packages by "generations": All direct dependencies are installed first, then dependencies of direct dependencies, then dependencies of dependencies of dependencies, and so on. This was not optimal, but it was done to avoid race condition where some packages would be fetched twice should they be invoked in the window of time where its duplicated has begun fetching but not yet saved to local disk.This PR aims to change this by using an in-memory hashmap cache to keep track of all fetched packages, then re-enable full parallelization.
NOTE: This PR only affects
pacquet install
,pacquet add
still uses the old algorithm.NOTE: New tests with multiple packages for
pacquet install
is required, but I am too tired right now.