Skip to content

Conversation

@idryomov
Copy link
Contributor

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@idryomov
Copy link
Contributor Author

No related failures:

https://pulpito.ceph.com/dis-2024-07-27_20:04:43-rbd-wip-dis-testing-distro-default-smithi/
https://pulpito.ceph.com/dis-2024-07-28_14:24:41-rbd-wip-dis-testing-distro-default-smithi/

(I suspect that there is still something wrong with built-in LUKS on EC because of how https://pulpito.ceph.com/dis-2024-07-28_14:24:41-rbd-wip-dis-testing-distro-default-smithi/7822183 and https://pulpito.ceph.com/dis-2024-07-28_14:24:41-rbd-wip-dis-testing-distro-default-smithi/7822192 failed in the rerun. It's reproducible, but very unlikely to have anything to do with this PR -- I will keep trying to pinpoint it in the background.)

idryomov added 2 commits July 30, 2024 17:24
Dead code after return and an unused variable.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add missing spaces, don't use the word stream when reporting errors
on POSIX file operations (open() and lseek64()) and fix a cut-and-paste
typo in RawSnapshot.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov idryomov force-pushed the wip-native-format-dispatch branch from 8e9edf0 to 6e927fc Compare July 30, 2024 15:26
@idryomov idryomov marked this pull request as ready for review July 30, 2024 15:29
@idryomov idryomov requested a review from a team as a code owner July 30, 2024 15:29
@idryomov
Copy link
Contributor Author

idryomov commented Jul 30, 2024

Enhanced commit messages and made a couple of tweaks. No functional changes.

@idryomov
Copy link
Contributor Author

-- The CXX compiler identification is Clang 14.0.0
-- The C compiler identification is Clang 14.0.0
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /usr/bin/clang-14
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /usr/bin/clang++-14
-- Check for working CXX compiler: /usr/bin/clang++-14 - broken
CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message):
  The C++ compiler

    "/usr/bin/clang++-14"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/jenkins-build/build/workspace/ceph-pull-requests/build/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/ninja cmTC_ee667 && [1/2] Building CXX object CMakeFiles/cmTC_ee667.dir/testCXXCompiler.cxx.o
    [2/2] Linking CXX executable cmTC_ee667
    FAILED: cmTC_ee667 
    : && /usr/bin/clang++-14   CMakeFiles/cmTC_ee667.dir/testCXXCompiler.cxx.o -o cmTC_ee667   && :
    /usr/bin/ld: cannot find -lstdc++: No such file or directory
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    ninja: build stopped: subcommand failed.

@idryomov
Copy link
Contributor Author

jenkins test make check

Copy link
Contributor

@ajarr ajarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd the following question when trying to understand the surrounding code.

Does the first parameter in the OpenSourceImageRequest 's constructor and create methods refer to the destination IoCtx? If so, maybe it'd have been better to name it dst_ioctx instead of just io_ctx similat to dst_image_ctx?

@ajarr
Copy link
Contributor

ajarr commented Jul 30, 2024

The code looks good except for the one minor comment

@idryomov
Copy link
Contributor Author

Does the first parameter in the OpenSourceImageRequest 's constructor and create methods refer to the destination IoCtx?

Yes.

If so, maybe it'd have been better to name it dst_ioctx instead of just io_ctx similat to dst_image_ctx?

Until support for migrating from external clusters is added, IoCtx instances are interchangeable in the sense that they can freely duped/cloned between each other because they belong to the same Rados instance, so it's not a big deal. I'll make a note to rename it then.

idryomov added 7 commits July 30, 2024 23:01
For now, this is just slightly clearer.  The distinction would become
important with planned support for migrating from external clusters.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In preparation for divorcing NativeFormat from FormatInterface and
changing when/how src_image_ctx is created, make parse_source_spec()
independent of src_image_ctx.  The "invalid source-spec JSON" error is
duplicated by the "failed to parse migration source-spec" error, so
just get rid of the former to spare having to pass CephContext to
parse_source_spec().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
A Rados instance is sufficient to map the pool name to the pool ID,
no need to involve an IoCtx instance as well.  While at it, report
distinctive errors for a non-existing pool and an invalid JSON value
for pool_name key cases.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In preparation for not instantiating NativeFormat and losing a copy of
the source spec JSON object in m_json_object, refactor the parsing code
to use only const methods (which std::map's operator[] isn't) and local
variables where possible.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Trying to shoehorn NativeFormat under FormatInterface doesn't really
work.  It fundamentally doesn't fit in:

- Unlike for RawFormat and QCOWFormat, src_image_ctx for NativeFormat
  is not dummy -- it's an ImageCtx for a real RBD image.  Pre-creating
  it in OpenSourceImageRequest with the expectation that placeholder
  values would be overridden later forces NativeFormat to reach into
  ImageCtx guts, duplicating the logic in the constructor.  This also
  necessitates calling snap_set() in a separate step, since snap_id
  isn't known at the time ImageCtx is created.

- Unlike for RawFormat and QCOWFormat, get_image_size() and
  get_snapshots() implementations for NativeFormat are dummy.

- read() and list_snaps() implementations for NativeFormat are
  inconsistent: read() passes through io::ImageDispatch layer, but
  list_snaps() doesn't.  Both can be passing through, meaning that in
  essence these are also dummy.

All of this is with today's code.  Additional complications arise with
planned support for migrating from external clusters where src_image_ctx
would require more invasive patching to "move" to an IoCtx belonging to
an external cluster's CephContext and also with other work.

With the above in mind, NativeFormat actually consists of:

1. Code that parses the "type: native" source spec
2. Code that patches ImageCtx, working around the fact that it's
   pre-created in OpenSourceImageRequest
3. A bunch of dummy implementations for FormatInterface

With this change, (1) is wrapped into a static method that also creates
ImageCtx after all required parameters are known and (2) and (3) go away
entirely.  NativeFormat no longer implements FormatInterface and doesn't
get instantiated at all.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently, on errors in FormatInterface::open(), RawFormat disposes
of src_image_ctx, but QCOWFormat doesn't, which is a leak.  Rather than
having each format do it internally, do it in OpenSourceImageRequest.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Now that NativeFormat is handled via dispatch, FormatInterface::read()
can be void again for consistency with FormatInterface::list_snaps().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov idryomov force-pushed the wip-native-format-dispatch branch from 6e927fc to b6c7f69 Compare July 30, 2024 21:11
@idryomov
Copy link
Contributor Author

Until support for migrating from external clusters is added, IoCtx instances are interchangeable in the sense that they can freely duped/cloned between each other because they belong to the same Rados instance, so it's not a big deal. I'll make a note to rename it then.

I realized that NativeFormat would need the same, and as it's being effectively rewritten here, I went ahead and did the rename in this PR.

@ajarr
Copy link
Contributor

ajarr commented Jul 30, 2024

LGTM. Issues with make check job?

@idryomov
Copy link
Contributor Author

-- Found yaml-cpp: /usr/lib/x86_64-linux-gnu/libyaml-cpp.so (found suitable version "0.7.0", minimum required is "0.5.1") 
CMake Error at /usr/lib/x86_64-linux-gnu/cmake/protobuf/protobuf-targets.cmake:111 (message):
  The imported target "protobuf::protoc" references the file

     "/usr/bin/protoc-25.1.0"

  but this file does not exist.  Possible reasons include:

  * The file was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and contained

     "/usr/lib/x86_64-linux-gnu/cmake/protobuf/protobuf-targets.cmake"

  but not all the files it references.

Call Stack (most recent call first):
  /usr/lib/x86_64-linux-gnu/cmake/protobuf/protobuf-config.cmake:14 (include)
  src/CMakeLists.txt:384 (_find_package)
  src/seastar/cmake/SeastarDependencies.cmake:162 (find_package)
  src/seastar/CMakeLists.txt:399 (seastar_find_dependencies)

@idryomov
Copy link
Contributor Author

jenkins test make check

@idryomov
Copy link
Contributor Author

[1403/2759] Linking CXX executable bin/unittest-seastar-buffer
FATAL: command execution failed
java.nio.channels.ClosedChannelException
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:241)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:289)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:177)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:279)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:501)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:244)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:196)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:209)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:793)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:172)
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:343)
	at hudson.remoting.Channel.close(Channel.java:1497)
	at hudson.remoting.Channel.close(Channel.java:1450)
	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:949)
	at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:823)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
	at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused: java.io.IOException: Backing channel 'JNLP4-connect connection from 8.43.84.3/8.43.84.3:37808' is disconnected.
	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:215)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
	at jdk.proxy2/jdk.proxy2.$Proxy197.isAlive(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1212)
	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1204)
	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
	at hudson.model.Build$BuildExecution.build(Build.java:199)
	at hudson.model.Build$BuildExecution.doRun(Build.java:164)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
	at hudson.model.Run.execute(Run.java:1894)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:442)
FATAL: Unable to delete script file /tmp/jenkins6012929160526823645.sh
java.nio.channels.ClosedChannelException
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:241)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:289)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:177)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:279)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:501)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:244)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:196)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:209)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:793)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:172)
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:343)
	at hudson.remoting.Channel.close(Channel.java:1497)
	at hudson.remoting.Channel.close(Channel.java:1450)
	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:949)
	at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:823)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
	at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@a1c8c42:JNLP4-connect connection from 8.43.84.3/8.43.84.3:37808": Remote call on JNLP4-connect connection from 8.43.84.3/8.43.84.3:37808 failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:996)
	at hudson.FilePath.act(FilePath.java:1226)
	at hudson.FilePath.act(FilePath.java:1215)
	at hudson.FilePath.delete(FilePath.java:1762)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:163)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
	at hudson.model.Build$BuildExecution.build(Build.java:199)
	at hudson.model.Build$BuildExecution.doRun(Build.java:164)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
	at hudson.model.Run.execute(Run.java:1894)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:442)
Build step 'Execute shell' marked build as failure

@idryomov
Copy link
Contributor Author

jenkins test make check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants