Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COPY FROM does not work on all files inside folder #15834

Closed
jphjsoares opened this issue Apr 11, 2024 · 10 comments
Closed

COPY FROM does not work on all files inside folder #15834

jphjsoares opened this issue Apr 11, 2024 · 10 comments

Comments

@jphjsoares
Copy link

jphjsoares commented Apr 11, 2024

CrateDB version

4.6

CrateDB setup information

Number of nodes: 1

Problem description

According to crate documentation, the following syntax is used to import all files inside a directory: COPY quotes FROM '/tmp/import_data/qu*.json'; but unfortunately I was unable to use that query when trying to import files that were exported by cratedb as well.
When instead of providing a directory with *, a filename is specified, the COPY works as expected

Steps to Reproduce

  1. Export data from a table I want to copy, into a directory, using COPY TO DIRECTORY
  2. Import that data into a new table, with the same schema using COPY FROM

Actual Result

io.crate.exceptions.SQLParseException: Illegal char <*> at index 9: /test/*
	at io.crate.exceptions.SQLExceptions.esToCrateException(SQLExceptions.java:137)
	at io.crate.exceptions.SQLExceptions.prepareForClientTransmission(SQLExceptions.java:126)
	at io.crate.rest.action.SqlHttpHandler.sendResponse(SqlHttpHandler.java:161)
	at io.crate.rest.action.SqlHttpHandler.lambda$channelRead0$0(SqlHttpHandler.java:117)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
	at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2321)
	at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:115)
	at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:78)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.crate.protocols.http.HttpBlobHandler.channelRead0(HttpBlobHandler.java:166)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.crate.auth.HttpAuthUpstreamHandler.handleHttpChunk(HttpAuthUpstreamHandler.java:135)
	at io.crate.auth.HttpAuthUpstreamHandler.channelRead0(HttpAuthUpstreamHandler.java:84)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at org.elasticsearch.http.netty4.cors.Netty4CorsHandler.channelRead(Netty4CorsHandler.java:85)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.base/java.lang.Thread.run(Thread.java:831)
Caused by: java.nio.file.InvalidPathException: Illegal char <*> at index 9: /test/*
	at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
	at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)
	at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
	at java.base/sun.nio.fs.WindowsPath.parse(WindowsPath.java:92)
	at java.base/sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:230)
	at java.base/java.nio.file.Path.of(Path.java:147)
	at java.base/java.nio.file.Paths.get(Paths.java:69)
	at io.crate.execution.engine.collect.files.FileReadingIterator.toURI(FileReadingIterator.java:350)
	at io.crate.execution.engine.collect.files.FileReadingIterator.getUrisWithGlob(FileReadingIterator.java:311)
	at io.crate.execution.engine.collect.files.FileReadingIterator.<init>(FileReadingIterator.java:105)
	at io.crate.execution.engine.collect.files.FileReadingIterator.newInstance(FileReadingIterator.java:132)
	at io.crate.execution.engine.collect.sources.FileCollectSource.getIterator(FileCollectSource.java:80)
	at io.crate.execution.engine.collect.sources.ProjectorSetupCollectSource.getIterator(ProjectorSetupCollectSource.java:49)
	at io.crate.execution.engine.collect.MapSideDataCollectOperation.createIterator(MapSideDataCollectOperation.java:58)
	at io.crate.execution.engine.collect.CollectTask.start(CollectTask.java:162)
	at io.crate.execution.jobs.RootTask.start(RootTask.java:191)
	at io.crate.execution.engine.JobLauncher.setupTasks(JobLauncher.java:240)
	at io.crate.execution.engine.JobLauncher.execute(JobLauncher.java:153)
	at io.crate.planner.statement.CopyFromPlan.executeOrFail(CopyFromPlan.java:123)
	at io.crate.planner.Plan.execute(Plan.java:73)
	at io.crate.action.sql.Session.singleExec(Session.java:678)
	at io.crate.action.sql.Session.exec(Session.java:541)
	at io.crate.action.sql.Session.triggerDeferredExecutions(Session.java:515)
	at io.crate.action.sql.Session.sync(Session.java:499)
	at io.crate.rest.action.SqlHttpHandler.executeSimpleRequest(SqlHttpHandler.java:266)
	at io.crate.rest.action.SqlHttpHandler.handleSQLRequest(SqlHttpHandler.java:205)
	at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:114)
	... 49 more

Expected Result

All the data should be imported correctly from all files and the * should work as expectedly.

@jphjsoares jphjsoares added the triage An issue that needs to be triaged by a maintainer label Apr 11, 2024
@BaurzhanSakhariev
Copy link
Contributor

BaurzhanSakhariev commented Apr 12, 2024

@BaurzhanSakhariev BaurzhanSakhariev added bug Clear identification of incorrect behaviour and removed triage An issue that needs to be triaged by a maintainer labels Apr 12, 2024
@BaurzhanSakhariev
Copy link
Contributor

Hi @jphjsoares, thanks for reporting! Looks like Windows specific bug.

@jphjsoares
Copy link
Author

I forgot to mention it, but I'm also unable to copy all the data from one table to the other by using the good and old: INSERT INTO table2 (SELECT * FROM table1). The command will copy the shards, but Crate reports 0 total records. Does that also have to do with this bug or should I create a separate issue?

@matriv
Copy link
Contributor

matriv commented Apr 12, 2024

I forgot to mention it, but I'm also unable to copy all the data from one table to the other by using the good and old: INSERT INTO table2 (SELECT * FROM table1). The command will copy the shards, but Crate reports 0 total records. Does that also have to do with this bug or should I create a separate issue?

Yes please, with details of SHOW CREATE TABLE for both tables and any error you get in the logs.
Keep in mind, that before opening a concrete github issue, you can also use our community portal: https://community.cratedb.com/

Keep in mind we have an open issue to address this scenario of INSERT INTO ... SELECT where no error is returned, just number of inserted rows: #12218

@jphjsoares

This comment was marked as off-topic.

@BaurzhanSakhariev

This comment was marked as off-topic.

@jphjsoares

This comment was marked as off-topic.

@BaurzhanSakhariev
Copy link
Contributor

Hi @jphjsoares, I could reproduce on a Windows machine but the problem can be resolved by applying a tip from the docs

Tip
If you are using Microsoft Windows, you must include the drive letter in the file URI.
For example:
file://C:/tmp/import_data/quotes.json

copy quotes from 'file:///C:/full/path/te*.json' ; worked for me locally. Could you please try it?

@BaurzhanSakhariev BaurzhanSakhariev added needs info or feedback and removed bug Clear identification of incorrect behaviour labels Apr 18, 2024
@BaurzhanSakhariev
Copy link
Contributor

ftr: We do have tests covering such scenario and we even used to run Windows tests on CI in the past.
Tests passes on Windows since we don't use path as a string but compute it like getResource("test_file").toURI()... and this gets resolved to a correct path with drive, as hinted in docs.

@BaurzhanSakhariev
Copy link
Contributor

Closing this since I have tested documented hint and it worked on a Windows machine (see comment above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants