Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redocly CLI Hangs When Running In A Container #1592

Open
its-hammer-time opened this issue Jun 18, 2024 · 9 comments
Open

Redocly CLI Hangs When Running In A Container #1592

its-hammer-time opened this issue Jun 18, 2024 · 9 comments
Labels
Type: Bug Something isn't working

Comments

@its-hammer-time
Copy link

its-hammer-time commented Jun 18, 2024

Describe the bug

We're using the Redocly CLI to bundle our OpenAPI specs, but for some reason it seems to hang randomly. We believe this started with Docker image redocly/cli:1.13.0, but we're not 100% certain. For now we've snapshotted ourselves to 1.12.0.

For context, the CLI works when I install it directly with NPM or use the docker pull ... && docker run ... commands found on your documentation. However, if I intentionally use the desktop-linux builder in Docker Hub that's when it ends up hanging.

Unfortunately, it doesn't look like there's any way to enable debug logs with the CLI so I'm not sure if it's pausing on something related to our OpenAPI spec or if it's a genuine issue which is making it hard to determine where it's coming from.

To Reproduce
To reproduce this issue on my ARM Mac, I ran the following:

  1. Dockerfile content:
FROM redocly/cli as bundle-spec
WORKDIR /spec
COPY . .
RUN redocly bundle --remove-unused-components --dereferenced openapi.yaml -o dist/bundled_openapi.yaml
  1. export DOCKER_BUILDKIT=1
  2. docker buildx create --name linuxbuilder --use
  3. docker buildx inspect linuxbuilder --bootstrap
  4. docker buildx build --platform linux/amd64 -t your-image-name .

Expected behavior

The spec should bundle successfully or at least error out with some sort of reason why it failed.

Logs

After running the steps above, I can see this. Notice that this step has been running for 481 seconds already (and climbing)

 => [bundle-spec 4/8] RUN redocly bundle --remove-unused-components --dereferenced openapi.yaml -o dist/bundled_openapi.yaml            481.5s
 => => # (node:1) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
 => => # (Use `node --trace-deprecation ...` to show where the warning was created)
 => => # bundling openapi.yaml...

OpenAPI description

We are using OpenAPI 3.1.0. I'm not sure if I can post my spec here so I will try to find a test example that reproduces this issue. If I do, I will post it as a comment below.

Redocly Version(s)

For the test scenario above, I'm pulling the latest which I believe is 1.16.0

Node.js Version(s)

Using the Docker image provided by you

Additional context

N/A

@its-hammer-time its-hammer-time added the Type: Bug Something isn't working label Jun 18, 2024
@its-hammer-time its-hammer-time changed the title Redocly CLI Hangs When Running In Redocly CLI Hangs When Running In A Container Jun 18, 2024
@tatomyr
Copy link
Contributor

tatomyr commented Jun 18, 2024

Hi @its-hammer-time! Could you check whether it also hangs outside the docker container (when installed globally with npm install -g @redocly/cli and run via redocly bundle --remove-unused-components --dereferenced openapi.yaml -o dist/bundled_openapi.yaml)?

@its-hammer-time
Copy link
Author

its-hammer-time commented Jun 19, 2024

Hey @tatomyr, it seems to work when I run it in a "Mac" environment. What I mean by that is NPM and DockerCLI both work, but as soon as I try to build it with the desktop-linux setup I described above it hangs.

I was doing some testing yesterday with my own spec and it seems to be stemming from a portion of our OpenAPI spec. Essentially, i commented out all of our paths and slowly uncommented them to see if it was potentially coming from a circular dependency that wouldn't resolve when the environment changed. Again this is just a theory, but maybe the OS integrations change so Mac works, but Linux doesn't or something along those lines.

Anyways, I narrowed it down to a few endpoints, but what was interesting is it would work if my paths were ordered in a certain way, but if I moved an endpoint around it broke. I'm still trying to dig into this, but my theory is that the internal $ref resolution mechanism is breaking under a very unique case. It would be great if there was a way to enable debug logs on the CLI, but for now I'll keep playing around with it.

For example, this does NOT work:

/endpoint/A
/endpoint/B
/endpoint/C
/endpoint/D
/endpoint/E

But this does?

/endpoint/A
/endpoint/B
/endpoint/D
/endpoint/E

/endpoint/C

@its-hammer-time
Copy link
Author

Okay, this is very interesting. I was able to isolate a failure case for two endpoints like the following:

# This does NOT work
/endpoint/A
/endpoint/B

# This works
/endpoint/B
/endpoint/A

I then iteratively removed pieces all the way down endpoint A to see if it was maybe a $ref issue like I explained above. However, what I found is that it's a single minimum value on an enum that breaks the bundler. I'm honestly not quite sure why, but here's the schema that I edited. As long as I comment out that minimum field it works?

  day:
    description: The day of the month to run the report schedule.
    examples:
      - 23
    type:
      - integer
      - "null"
    minimum: -1
    maximum: 31

I'm going to try and create an example spec that reproduces this issue so we have something to work off of together. I know it can be hard to debug these types of issues when you don't get an example spec.

@its-hammer-time
Copy link
Author

its-hammer-time commented Jun 19, 2024

Okay, I'm starting to believe the minimum piece I mentioned above was a red herring. After I found that, I wanted to isolate the two endpoints so I commented out all of our components as well and all of a sudden it starting working with the minimum field. I slowly added components back in until it failed again at which point I ran the bundler multiple times. Depending on how "complex" the spec is (i.e. how much I uncomment), I'm getting inconsistent results.

For example, with ~20% of the spec, I get build issues maybe 2/10 times, but if I filter it down to ~10% of my spec I don't get any build issues. Then with 100% of the spec I get build issues every time. For context, I'm just running the following command over and over after making a change in my openapi.yaml file. It's the same as in my original post, but I added the --no-cache arg so I can re-run the build.

docker buildx build --no-cache --platform linux/amd64 -t your-image-name .

With that said, it seems like this may be a race condition or a resource constraint. I noticed that there's a different bug ticket regarding CPU constraints which you then linked to a memory issue.

@tatomyr
Copy link
Contributor

tatomyr commented Jun 19, 2024

I don't think the issue you've mentioned is related to your case. The memory issue crops up with the build-docs command (which runs the React renderer) while you're using bundle.

I hope you'll be able to come up with some repro because it's indeed very hard to figure out what's wrong without it.

@its-hammer-time
Copy link
Author

its-hammer-time commented Jun 19, 2024

Summary

PLEASE READ: Refer to "Don't Specify Platform To Docker Build" below as I imagine this is the real culprit

Okay, I was able to get a small working example from our existing spec "working".... in other words, it's broken 😄. I'll put my steps below on how I reproduce it as well, but what's funny is that minimum: -1 value is still a culprit in it breaking. If I comment that out or even change the value to 0 it works all of a sudden. I'm not sure why that would break the bundler, but that's what I'm noticing.

example.zip

System Setup

  • 2021, 14-inch, Macbook Pro
  • 64 GB memory
  • macOS 14.5
  • Docker version: 26.0.0

Ensure you're using default as your builder

I'm fairly certain this builder is provided by docker so I imagine everyone has it? If not, let me know, but here's how to use it:

# Lists the builders you currently have, ensure default is an option
docker context list

# Check your current builder and write it down somewhere incase you need to swap back
docker context inspect

# Swap to default
docker context use default

Note that I also tried using the desktop-linux builder, but I got the same results.

Build The Image

From now on, I essentially just re-run this command whenever I want to "test" the container build. Note that we're using the no cache arg so we can ensure a full build every time.

docker build --no-cache --platform linux/amd64 -t your-image-name .

Test Cases

I've also noticed the following changes to the spec seem to resolve the issue. At this point, I'm honestly not sure why it's behaving like this. The "fixes" seem random to me, but hopefully you will have more insight as to what may be happening.

Don't Use -1 Minimum Value

As mentioned above, in open_api_spec/components/schema/report_schedule.yaml, there's a property called day that has minimum: -1 set. If you comment that out or change the value to 0 it starts to work all of a sudden.

No Changes: 0% success rate (0/10 builds passed)

Using 0: 100% success rate (10/10 builds passed)

Commented Out: 100% success rate (10/10 builds passed)

Don't Include Campaign Schema

In open_api_spec/example.yaml, under components/schemas there's a field called Campaign. If you comment this out, it appears to start working even with the minimum still set to -1?

No Changes: 0% success rate (0/10 builds passed)

Campaign Commented Out: 100% success rate (10/10 builds passed)

Don't Specify Platform To Docker Build

I imagine this is actually the real culprit. The M1 Macs are based on ARM and I'm requesting that it build for amd/64 so Docker is having to perform some magic to get it working. If I don't specify the platform and let docker run it's default platform then the CLI actually works. Assuming this is related to the platform, it's very strange that doing minor things like changing a minimum value would make things work 🤔

However, my companies build systems build for linux since we deploy to K8 pods which are linux/amd64.

docker build --no-cache -t your-image-name 

@tatomyr
Copy link
Contributor

tatomyr commented Jun 19, 2024

Thank you! I'll review the example a bit later.

@its-hammer-time
Copy link
Author

its-hammer-time commented Jun 19, 2024

Using the example.zip above, I was able to confirm that using redocly/cli:1.12.0 works whereas redocly/cli:1.13.0 does not! Perhaps a dependency changed between these two versions and doesn't support the amd64 platform on ARM? Looking at the release notes, it does look like 1.13.0 upgraded redocly/openapi-core

https://github.com/Redocly/redocly-cli/releases/tag/%40redocly%2Fcli%401.13.0

@its-hammer-time
Copy link
Author

Hey @tatomyr, any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants