Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git Only feature #1497

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Conversation

ozgunozerk
Copy link

Aims to fix #1144.

I'll edit the PR explanation once the discussions below are finalized.

@ozgunozerk
Copy link
Author

Here is the summary of changes:

Background info on repo (to double check my understanding):

  • release command is just for releasing, it doesn't generate changelog nor bumping versions. However, it can be used together with release_pr to automate the whole process
  • both update and release_pr commands calls update() function internally to generate the changelog and bumping the versions
  • update() function calls next_versions(), and this is the most critical part imo. Due to:
    • next_versions() fetches the "to be compared" packages from either registry (crates.io by default) or local file system
    • then compares the fetched crates (registry_packages) against local_packages
    • for git-only feature, we basically want to fetch the registry_packages from local file system (details will be discussed below)
    • after fetching the registry_packages, all should be good, because rest of the business logic does not need to know whether packages came from git or not. Rest of the business logic will just generate the changelog and bump versions based on the content of the packages, not based on where they come from.

Approach:

  • in next_versions() function -> if git-only flag is present, and registry_manifest is not provided:
    • fetch the latest tag for the repo that follows semver (vX.X.X)
    • checkout to latest tag
    • find the path to cargo.toml
    • use that path to override registry_manifest

That's it!

P.S. even for the case there are multiple packages, there shouldn't be a problem. Because, it is already working with multiple packages without the git-only option. From what I understood, it fetches the workspace cargo.toml and then processes each package. Why it should be different in git-only mode :)

The changes I propose will also fetch the workspace cargo.toml file if that's the case, so it should be good to go imo.

@MarcoIeni please let me know if I missed anything.

Also, we can discuss how release command should be modified later after we settle the first part.

Remaining work:

  • release command itself has not been touched. I haven't checked it yet, so I don't know whether it will require modifications (I assume it will).

@MarcoIeni
Copy link
Owner

release command is just for releasing

Yes, it "releases" the current state of the repo, i.e. unpublished crates, running cargo publish, creating tags, GitHub release

The rest of the background info looks correct 👍

The approach looks correct, too!

I will review this PR this soon!

release-plz release

we can discuss how release command should be modified later after we settle the first part.

Yes, probably in another PR we want to avoid cargo publish if git-only is true. Plus, to check if a crate is already published, we look at git tags, instead of looking in the cargo registry.

It would be great to have some integration tests. You can find tests here

One test I would write is:

  • create a new project with the git_only option
  • run release-plz release-pr
  • merge the release PR
  • Run release-plz release
  • Assert that the crate is NOT published to the registry
  • Assert that the Git tag is created
  • do a change to the crate
  • run release-plz release-pr again
  • Assert the content of the PR (it should suggest 1.1.0 for example)

@MarcoIeni
Copy link
Owner

I added the new line to the schema in #1500

so that your tests don't fail 👍

@ozgunozerk
Copy link
Author

Draft implementation is complete. I'll take a look at the tests and try to write the test you mentioned as the next step. I might be busier than usual this week, but I'm definitely committed to finish this feature. If you @MarcoIeni or someone else wants to chip in to make the process faster, feel free to do so!

repository.repo.checkout(&tag)?;
Ok(input.local_manifest.clone())
}
None => return Err(anyhow!("there is no previous tag to compare with!")),
Copy link
Owner

@MarcoIeni MarcoIeni Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, instead of throwing an error, we should release v0.1.0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask that. Didn't inspect the code for that path (where registry package is not found). So, release-plz already does not require previous version to exist, and can release the first version (a.k.a v0.1.0) when registry package is not found, am I right?

/// Returns the latest tag of the repository in the form of `vX.X.X`
/// * `None` if there are no version tags found matching the regex,
/// * `Tag` if at least one tag is matching the regex
pub fn get_repo_versions(repo: &Repo) -> Option<String> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function name is plural, but you are returning a string.

Also, workspaces tags are in the form of {package}-v{version}. E.g. https://github.com/MarcoIeni/release-plz/tags
Does the implementation of this PR recognize the right tag for the right package?

Copy link
Author

@ozgunozerk ozgunozerk Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function name is plural, but you are returning a string.

Will fix the name, thanks for the catch!

Also, workspaces tags are in the form of {package}-v{version}.

regex can detect package and v{version} separately via subgroups and we can do anything we want with them. I'll update the function so that it will return package-v{version} instead of v{version}

Does the implementation of this PR recognize the right tag for the right package?

Ok, here is the confusion on my end about this topic.

  • Let's say we want to release package a-v1.0.1.
  • And assume we have these two following releases in the repo:
    • a-v1.0.0 -> released at last year
    • b-v1.0.0 -> released just yesterday (more recent compared to a)

Even though we will be updating the package a, should we use the tag a-v1.0.0 as the comparison for the workspace repo, or should we checkout to b-v1.0.0? I'm honestly confused about that, and I believe you can make a better decision than me @MarcoIeni on this. Although it's counter-intuitive, I argue b-v1.0.0 is a better choice. Due to:

  • in tag b-v1.0.0, the version of package a is also v1.0.0, so nothing changed about a.
  • for the changelog generation, it might be simpler to compare the current version against the latest release to eliminate duplications across release notes.

However, I don't know how differences are calculated and how changelog is generated (did not spend that much time surfing the repo). So my assumptions may be invalid.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the tag a-v1.0.0 as the comparison for the workspace repo, or should we checkout to b-v1.0.0?

I would use tag a-v1.0.0 if we are comparing a, and tag bv-1.0.0 if we are comparing b.
This mirrors better the current behavior of release-plz, when we compare each package with the version published in crates.io

in tag b-v1.0.0, the version of package a is also v1.0.0, so nothing changed about a.

I'm not sure about this, maybe people released b, but they didn't release a yet. But this new tag contains unreleased changes of a.

The difference is calculated by analyzing every commit that touches the files of a package. Each commit is added to the changelog.
So I don't see issues of duplication across release notes. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe people released b, but they didn't release a yet.

Makes sense! I'll update the logic according to this, so it will detect the latest version for each package

@ozgunozerk
Copy link
Author

ozgunozerk commented Jun 3, 2024

For the scope of this PR (update and release-pr functionalities), I think there is only 1 thing left: get_registry_packages(). After finalizing that, I'll start on writing tests.

I could use some clarifications on the behavior of this function:

First point:

  • forget about git-only, and assume if there is no released version of the package. Then, this function will still return a PackagesCollection object, however it will be effectively empty, right? Due to no package could be downloaded from registry...
    • in this case, changelog and diff generator logics should be fine working with an effectively empty PackagesCollection object.

If these assumptions above are true, I think we can leave the registry_manifest as None if there are no valid tags present in the repo.

Second point:

  • if registry_manifest is provided to this function, then it tries to read from local (already downloaded) files.
    • in git-only case, our files are already downloaded locally. So, it make sense to use the registry_manifest
      • however, the registry_manifest and the local_manifest will be pointing towards the same path. Since we are comparing a version of the repo with another version of the repo, the path to cargo.toml will be the same.
      • this function does not checkout to any specific tag or anything. So, I believe the registry_manifest provided should be different than the local_manifest for this function to work properly.

If above understanding is correct, then I believe the code should be changed as the following:

  1. this function will take both git-only parameter and the optional tag parameter
  2. if git-only
  3. create a temp directory as usual
  4. check out to given tag
  5. apply the registry_manifest path to the new temp directory that we checked out to provided version

No change should be required from what I can tell.

When you have the time, I'd appreciate a sanity check on these @MarcoIeni

@MarcoIeni
Copy link
Owner

First point

Your assumptions seem correct to me. However, I'm not sure about the following:

I think we can leave the registry_manifest as None if there are no valid tags present in the repo.

If registry_manifest is None in the get_registry_packages function, release-plz will look to crates.io to find the package, right? Which is what we want to avoid.
Maybe I misunderstood 🤔

Second point

Everything looks good here. More specifically, you are right that the registry_manifest and the local_manifest should be different. Copying to a temporary directory is what I had in mind. 👍
The task list looks correct. 👍
The only thing is that from a quick look at the code, it doesn't seem right that the get_registry_packages takes the "tag" as input, because get_registry_packages should work for a workspace, too, so there can be multiple tags (one per package). Again, I might be misunderstanding your explanation 👍

@ozgunozerk
Copy link
Author

ozgunozerk commented Jun 9, 2024

Ok good that we are discussing this, because maybe I misunderstood something.

First point

If registry_manifest is None in the get_registry_packages function, release-plz will look to crates.io to find the package, right? Which is what we want to avoid.

You are right, and I think we already got it covered. But I probably did not clearly explain my ideas. My understanding was:

  • get_registry_packages() will return an empty object when the package is not yet released on crates.io.
    • good that I already got confirmation of this 👍

This means, I can write the below logic for get_registry_packages() function:

  • if there is no tag present, and git-only is true:
    • return early from get_registry_packages() by returning the empty object (as in the case of the not yet released package in crates.io).
    • the changelog and diff generation should work fine in this case (empty PackagesCollection object when no tag is present in GitHub).

Second point

This is where I'm still confused.

because get_registry_packages should work for a workspace, too, so there can be multiple tags (one per package)

Unfortunately, this doesn't make sense to me. I'm probably missing some details for the multi-package scenario.

When you have the time, if you can see my comment, maybe we can continue this discussion there.

@MarcoIeni
Copy link
Owner

MarcoIeni commented Jun 10, 2024

will return an empty object when the package is not yet released on crates.io

Yes, if your project doesn't have naming conflicts with projects present in crates.io! For example, if your project is called rand, then release-plz will download the rand crate.
Imo if git-only is enabled, people should be able to call their crates with names that conflict with something in crates.io and release-plz should work anyway (i.e. we shouldn't download crates from crates.io if people use the git-only option). The logic you described covers this case imo 👍

I replied to that comment (sorry, I missed it!). Happy to provide more help if needed 👍

@ozgunozerk
Copy link
Author

ozgunozerk commented Jun 14, 2024

Hi @MarcoIeni, was a busy week so I'm taking a look just right now.

Btw, thanks a lot for your responsiveness, great communication!

The code changes are trivial, however the biggest problem is not changing the code for this PR. Whilst I was changing the code, I found the root of my confusion. I'll explain it below:

this is crates.io scenario, with multiple packages -> a.k.a main branch

  • we have a single registry_manifest file even though we have multiple publishable packages
  • next_versions() is just taking the InputRequest and then returns PackagesUpdate. So, the transition from a single Cargo.toml file to multiple packages is happening in next_versions().
    • this is especially happening in get_registry_packages() function, which takes input.registry_manifest as the input, and outputting PackagesCollection
      • inside get_registry_packages(), if we have provided a registry_manifest path, we extract all the publishable packages with publishable_packages_from_manifest() method. This is the first concrete step for handling multiple packages from what I've seen.

coming back to git-only feature, with multiple packages

  • since we provided a single registry_manifest (as the workspace cargo.toml I assume) in the crates.io scenario, we can do the same for the git-only option.
  • hence, I thought we should provide a single registry_manifest, instead of multiple cargo.tomls for each package.

Discussion:

I agree that we should compare each package with their own previous version, but I'm puzzled on how does it even work with multiple packages in main branch. My confusion arises from this:

  • monorepos are not publishable in crates.io IIRC, but their individual packages can be published individually.
  • So, how can we achieve the correct behavior by providing only a single registry_manifest path?

In short: since I assumed everything works for crates.io case, I thought I can follow the same approach, and provide a single registry_manifest. And that registry_manifest should be the latest tag of the repo (corresponds to the latest release in crates.io).

I hope this long convos are not bothering you.
I'm an outsider to this repo, and I don't want to code my solution before we are on the same page.

Appreciated ^^

@MarcoIeni
Copy link
Owner

Btw, thanks a lot for your responsiveness, great communication!

Thank you for spending time on this issue 😄

we have a single registry_manifest file even though we have multiple publishable packages

Yes, because the manifest can be a "workspace manifest". For example, take a look at the Cargo.toml in the root of this repository.

hence, I thought we should provide a single registry_manifest, instead of multiple cargo.tomls for each package.

Yes 👍

monorepos are not publishable in crates.io IIRC, but their individual packages can be published individually.

Yes 👍

I hope this long convos are not bothering you.
I'm an outsider to this repo, and I don't want to code my solution before we are on the same page.

No problem, this issue is very difficult, it's normal that you ask doubts!

how can we achieve the correct behavior by providing only a single registry_manifest path?

I thought about the following algorithm, but maybe I'm missing something:

  1. read packages from the main branch by providing the registry_manifest file.
  2. Determine the latest tag of each package
  3. When comparing each package, checkout to the latest tag of that package

@ozgunozerk
Copy link
Author

I thought about the following algorithm, but maybe I'm missing something:

  1. read packages from the main branch by providing the registry_manifest file.
  2. Determine the latest tag of each package
  3. When comparing each package, checkout to the latest tag of that package

Great! This resolves my biggest confusion.

I was and am planning to provide a single cargo.toml file as the registry_manifest. Then in our discussions, you asked if the function get_the_latest_repo_tag is able to retrieve all the latest tags for each package. Then I was confused, because I thought we should provide a single cargo.toml file as we discussed:

hence, I thought we should provide a single registry_manifest, instead of multiple cargo.tomls for each package.

Yes 👍

But now I realized you asked if the function is able to retrieve latest tags for all packages NOT for the registry_manifest file to be fetched from GitHub for the comparison. These latest_tags are to be used later for the individual package comparison. Now it makes sense 👍

This whole discussion is basically about me being lazy by not looking at the code and trying to understand how this project works for the multiple packages case 😅

Coding the solution is pretty easy after understanding all these.

I'm on vacation for the last week, hence the late reply. I'll take a look at the codebase once my vacation is over.

@MarcoIeni
Copy link
Owner

Great, enjoy your vacation :D

@ozgunozerk
Copy link
Author

ozgunozerk commented Jun 19, 2024

registry_manifest

For the sake of brevity, I’ll refer to the case: monorepo with registry_manifest provided (current main-branch) as main-multiple.

I took a look at the code, and I think your suggestions & questions has discrepancies with the current main-multiple (or even more likely, I’m missing some details).

I was trying to replicate the main-multiple case for the git-only feature, since they should be roughly identical in terms of business logic.

Let’s focus on the get_registry_packages() function, because after that, everything should be identical afaik.

get_registry_packages():

  1. takes the inputs:
    1. registry_manifest → this is the main discussion of this PR, I’ll come back to it.
    2. publishable packages of local project → should require no change for git-only
    3. registry → we shouldn’t need one for git-only case, and as expected, it is also not used in main-multiple case, so I believe we can ignore this one as well for the sake of this discussion
  2. if registry_manifest is provided:
    1. finds the publishable packages by parsing the registry_manifest
      1. in the case of main-multiple, the directories of registry_manifest and the local_manifest are already different. So in git-only case, we have to create an additional temporary directory and check out to given tag/commit prior to calling get_registry_packages(), so that the directory of registry_manifest can be different from local_manifest, and everything can work as usual.

And that’s it for the release-pr phase I guess! The rest should be related to release part, for which, I’ll probably open another PR.

Coming back to our previous discussion, you asked the following:

Does the implementation of this PR recognize the right tag for the right package?

My initial approach has not changed after I read the codebase tbh. The right tag for the right package should be already detected by the existing code. Else, how will main-multiple work correctly in the first place?

So, I think, I should only find the correct cargo.toml to be provided as the registry_manifest, and that’s it. This PR should not be responsible from detecting the right tag for each package.

TL;DR

I was focused on finding the latest released cargo.toml, hence, finding the latest tag only. Your focus was on detecting each latest tag for the each package separately. I think this is the crux of the discussion.


Potential bug in main-multiple

Why I think main-multiple might not be working correctly:

  • I thought I should find the latest tag, and then supply the cargo.toml file of that commit as the registry_manifest. However, your argument convinced me:

in tag b-v1.0.0, the version of package a is also v1.0.0, so nothing changed about a.

I'm not sure about this, maybe people released b, but they didn't release a yet. But this new tag contains unreleased changes of a.

Now, I think using the latest tag’s cargo.toml might be an erroneous approach. But immediately after that, a new question arises:

  • How does it even work for the main-multiple?
    • we are also providing a single cargo.toml file in main-multiple case. And I think your counter-argument is still valid for main-multiple,

      I'm not sure about this, maybe people released b, but they didn't release a yet. But this new tag contains unreleased changes of a.

    • now I started to think main-multiple might not be working correctly 😅

If you prefer to have a short sync/call, I'm all for it. I really like the utility of this crate, and after finishing this PR, I was thinking about submitting other PR's to improve the documentation and possible refactorings that I think you might appreciate.

Best!

@MarcoIeni
Copy link
Owner

Let's have a call, yes. We can even work on this together with pair programming 👍
I sent you a DM on linkedin :)

@MarcoIeni
Copy link
Owner

Summary of our call

get_registry_packages

For the git-only feature, IN THEORY we can't rely on the get_registry_packages function in its current state because when we run cargo metadata (by calling cargo_utils::get_manifest_metadata) we might get unpublished results from certain crates. Instead this function should parse the packages as they were published.

For example, one could:

  • publish crate_a
  • edit the readme field of crate_a Cargo.toml
  • publish crate_b

Now when we run cargo metadata on the latest tag of the repo (crate_b), we parse the Cargo.toml of crate_a in the wrong way.

The solution to this is to git checkout latest_tag_of_each_crate before running cargo metadata.

To avoid performance issues, we could run cargo_metadata inside the directory of the crate, so that we don't parse the entire workspace every time, but this can also be a TODO in the code for this PR 👍

diff algorithm

In the diff algorithm, we need to git checkout to the latest version of each package in the registry_packages temporary directory.
Right now the temporary directory field is unused. Imo we should use it to run git checkout latest_tag before doing the diff (around here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a "git only" mode
2 participants