Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Insert data to table with a partitions in Spark in EMR #4212

Open
Lior-AI opened this issue Sep 2, 2021 · 2 comments
Open

Cannot Insert data to table with a partitions in Spark in EMR #4212

Lior-AI opened this issue Sep 2, 2021 · 2 comments

Comments

@Lior-AI
Copy link

Lior-AI commented Sep 2, 2021

After 42f6982
I have successfully created a table with partitions, but when I trying insert data the job end with a success
but the segment is marked as "Marked for Delete"

I am running:

CREATE TABLE lior_carbon_tests.mark_for_del_bug(
timestamp string,
name string
)
STORED AS carbondata
PARTITIONED BY (dt string, hr string)
INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
select * from lior_carbon_tests.mark_for_del_bug

gives

+---------+----+---+---+
|timestamp|name| dt| hr|
+---------+----+---+---+
+---------+----+---+---+

And

show segments for TABLE lior_carbon_tests.mark_for_del_bug

gives

+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|ID |Status           |Load Start Time        |Load Time Taken|Partition|Data Size|Index Size|File Format|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S        |NA       |NA       |NA        |columnar_v3|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+

I took a looking at the folder structure in S3 and it seems fine

@Lior-AI Lior-AI changed the title Cannot Insert data to table with a partitions Cannot Insert data to table with a partitions in Spark in EMR Sep 2, 2021
@Indhumathi27
Copy link
Contributor

Hi,
Please check the following.

If any exception occurred during insert ? because the segment here is Marked for Delete
If scenario works fine with non-partition table ?

@Lior-AI
Copy link
Author

Lior-AI commented Oct 10, 2021

1.No,
This are the logs:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/livy/filecache/48/__spark_libs__3665716770347383703.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/09/29 15:44:36 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 18902@ip-10-4-181-156
21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for TERM
21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for HUP
21/09/29 15:44:37 INFO SignalUtils: Registered signal handler for INT
21/09/29 15:44:37 INFO SecurityManager: Changing view acls to: yarn,livy
21/09/29 15:44:37 INFO SecurityManager: Changing modify acls to: yarn,livy
21/09/29 15:44:37 INFO SecurityManager: Changing view acls groups to: 
21/09/29 15:44:37 INFO SecurityManager: Changing modify acls groups to: 
21/09/29 15:44:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, livy); groups with view permissions: Set(); users  with modify permissions: Set(yarn, livy); groups with modify permissions: Set()
21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection to ip-10-4-137-125.eu-west-1.compute.internal/10.4.137.125:34545 after 78 ms (0 ms spent in bootstraps)
21/09/29 15:44:38 INFO SecurityManager: Changing view acls to: yarn,livy
21/09/29 15:44:38 INFO SecurityManager: Changing modify acls to: yarn,livy
21/09/29 15:44:38 INFO SecurityManager: Changing view acls groups to: 
21/09/29 15:44:38 INFO SecurityManager: Changing modify acls groups to: 
21/09/29 15:44:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, livy); groups with view permissions: Set(); users  with modify permissions: Set(yarn, livy); groups with modify permissions: Set()
21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection to ip-10-4-137-125.eu-west-1.compute.internal/10.4.137.125:34545 after 1 ms (0 ms spent in bootstraps)
21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at /mnt2/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-5aa03748-2d6d-4c78-9da5-1ef0e23cc506
21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at /mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-2dba9cef-1782-4baa-a13f-fe379e090118
21/09/29 15:44:38 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/livy/appcache/application_1632902169938_0005/blockmgr-d279178b-8dc9-4319-a64c-1e5bad11fe29
21/09/29 15:44:38 INFO MemoryStore: MemoryStore started with capacity 4.0 GB
21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@ip-10-4-137-125.eu-west-1.compute.internal:34545
21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
21/09/29 15:44:38 INFO Executor: Starting executor ID 4 on host ip-10-4-181-156.eu-west-1.compute.internal
21/09/29 15:44:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38947.
21/09/29 15:44:38 INFO NettyBlockTransferService: Server created on ip-10-4-181-156.eu-west-1.compute.internal:38947
21/09/29 15:44:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/09/29 15:44:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None)
21/09/29 15:44:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None)
21/09/29 15:44:38 INFO BlockManager: external shuffle service port = 7337
21/09/29 15:44:38 INFO BlockManager: Registering executor with local external shuffle service.
21/09/29 15:44:38 INFO TransportClientFactory: Successfully created connection to ip-10-4-181-156.eu-west-1.compute.internal/10.4.181.156:7337 after 2 ms (0 ms spent in bootstraps)
21/09/29 15:44:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, ip-10-4-181-156.eu-west-1.compute.internal, 38947, None)
21/09/29 15:44:38 INFO Executor: Using REPL class URI: spark://ip-10-4-137-125.eu-west-1.compute.internal:34545/classes
21/09/29 15:44:38 INFO CoarseGrainedExecutorBackend: Got assigned task 1
21/09/29 15:44:38 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
21/09/29 15:44:39 INFO TorrentBroadcast: Started reading broadcast variable 4
21/09/29 15:44:39 INFO TransportClientFactory: Successfully created connection to ip-10-4-137-125.eu-west-1.compute.internal/10.4.137.125:38079 after 5 ms (0 ms spent in bootstraps)
21/09/29 15:44:39 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 74.5 KB, free 4.0 GB)
21/09/29 15:44:39 INFO TorrentBroadcast: Reading broadcast variable 4 took 147 ms
21/09/29 15:44:39 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 187.1 KB, free 4.0 GB)
21/09/29 15:44:40 INFO TransportClientFactory: Successfully created connection to ip-10-4-137-125.eu-west-1.compute.internal/10.4.137.125:34545 after 2 ms (0 ms spent in bootstraps)
21/09/29 15:44:40 INFO CodeGenerator: Code generated in 471.203622 ms
21/09/29 15:44:40 INFO CodeGenerator: Code generated in 33.323995 ms
21/09/29 15:44:40 INFO CodeGenerator: Code generated in 22.954171 ms
21/09/29 15:44:40 INFO CodeGenerator: Code generated in 30.408357 ms
21/09/29 15:44:41 INFO CodeGenerator: Code generated in 81.831165 ms
21/09/29 15:44:41 INFO CoarseGrainedExecutorBackend: eagerFSInit: Eagerly initialized FileSystem at s3://does/not/exist in 2268 ms
21/09/29 15:44:41 INFO SQLConfCommitterProvider: Getting user defined output committer class org.apache.carbondata.hadoop.api.CarbonOutputCommitter
21/09/29 15:44:41 INFO FileOutputCommitter: File Output Committer Algorithm version is 2
21/09/29 15:44:41 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
21/09/29 15:44:41 INFO SQLConfCommitterProvider: Using output committer class org.apache.carbondata.hadoop.api.CarbonOutputCommitter
21/09/29 15:44:41 INFO CodeGenerator: Code generated in 12.314282 ms
21/09/29 15:44:41 INFO CodeGenerator: Code generated in 12.462397 ms
21/09/29 15:44:42 INFO CodeGenerator: Code generated in 42.357031 ms
21/09/29 15:44:42 INFO CarbonProperties: Property file path: /mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/container_1632902169938_0005_01_000008/../../../conf/carbon.properties
21/09/29 15:44:42 INFO CarbonProperties: ------Using Carbon.properties --------
21/09/29 15:44:42 INFO CarbonProperties: {}
21/09/29 15:44:42 INFO CarbonProperties: Considered file format is: V3
21/09/29 15:44:42 INFO CarbonProperties: Blocklet Size Configured value is "64"
21/09/29 15:44:42 WARN CarbonProperties: The enable mv value "null" is invalid. Using the default value "true"
21/09/29 15:44:42 WARN CarbonProperties: The value "LOCALLOCK" configured for key carbon.lock.type is invalid for current file system. Use the default value HDFSLOCK instead.
21/09/29 15:44:42 INFO CarbonProperties: Considered value for min max byte limit for string is: 200
21/09/29 15:44:42 INFO CarbonProperties: Using default value for carbon.detail.batch.size 100
21/09/29 15:44:42 INFO CarbonDataProcessorUtil: Successfully created dir: /mnt2/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001
21/09/29 15:44:42 INFO CarbonDataProcessorUtil: Successfully created dir: /mnt/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001
21/09/29 15:44:42 INFO CarbonDataProcessorUtil: Successfully created dir: /mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001
21/09/29 15:44:42 INFO DataLoadExecutor: Data Loading is started for table mark_for_del_bug4
21/09/29 15:44:42 INFO CarbonDataProcessorUtil: Successfully created dir: /mnt2/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001/Fact/Part0/Segment_0/100100000100001
21/09/29 15:44:42 INFO CarbonDataProcessorUtil: Successfully created dir: /mnt/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001/Fact/Part0/Segment_0/100100000100001
21/09/29 15:44:42 INFO CarbonDataProcessorUtil: Successfully created dir: /mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001/Fact/Part0/Segment_0/100100000100001
21/09/29 15:44:42 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
21/09/29 15:44:42 INFO AbstractFactDataWriter: Total file size: 1073741824 and dataBlock Size: 966367642
21/09/29 15:44:42 INFO AbstractFactDataWriter: Carbondata will write temporary fact data to local disk.
21/09/29 15:44:42 INFO CarbonFactDataWriterImplV3: Sort Scope : NO_SORT
21/09/29 15:44:43 INFO AbstractFactDataWriter: Randomly choose factdata temp location: /mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001/Fact/Part0/Segment_0/100100000100001
21/09/29 15:44:43 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
21/09/29 15:44:43 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
21/09/29 15:44:43 WARN UnsafeMemoryManager: It is not recommended to set off-heap working memory size less than 512MB, so setting default value to 512
21/09/29 15:44:43 INFO UnsafeMemoryManager: Off-heap Working Memory manager is created with size 536870912 with OFFHEAP
21/09/29 15:44:43 INFO CarbonFactDataWriterImplV3: Number of Pages for blocklet is: 1 :Rows Added: 1
21/09/29 15:44:43 INFO CarbonUtil: Copying /mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001/Fact/Part0/Segment_0/100100000100001/part-0-100100000100001_batchno0-0-0-1632930271956.snappy.carbondata to s3a://coralogix-bigicecream/CarbonDataTests/bla2.db/mark_for_del_bug4/dt=2021-07-07/hr=13, operation id 1632930283187
21/09/29 15:44:43 INFO CarbonUtil: Total copy time is 235 ms, operation id 1632930283187
21/09/29 15:44:43 INFO AbstractFactDataWriter: Randomly choose index file location: /mnt2/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001/Fact/Part0/Segment_0/100100000100001
21/09/29 15:44:43 INFO CarbonUtil: Copying /mnt2/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001/Fact/Part0/Segment_0/100100000100001/100100000100001_batchno0-0-0-1632930271956.carbonindex to s3a://coralogix-bigicecream/CarbonDataTests/bla2.db/mark_for_del_bug4/dt=2021-07-07/hr=13/0_1632930271956.tmp, operation id 1632930283434
21/09/29 15:44:43 INFO CarbonUtil: Total copy time is 244 ms, operation id 1632930283434
21/09/29 15:44:43 INFO AbstractDataLoadProcessorStep: Total rows processed in step Data Writer: 1
21/09/29 15:44:43 INFO AbstractDataLoadProcessorStep: Total rows processed in step Input Processor: 1
21/09/29 15:44:43 INFO CarbonTableOutputFormat: Closed writer task attempt_20210929154434_0001_m_000000_1
21/09/29 15:44:43 INFO CarbonLoaderUtil: Deleted the local store location: /mnt2/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001:/mnt/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001:/mnt1/yarn/usercache/livy/appcache/application_1632902169938_0005/carbon15ae7277d0784335b1396e8f8687ff55_100100000100001 : Time taken: 2
21/09/29 15:44:43 INFO SparkHadoopMapRedUtil: No need to commit output of task because needsTaskCommit=false: attempt_20210929154434_0001_m_000000_1
21/09/29 15:44:43 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 2799 bytes result sent to driver
  1. Yes, when the table doesn't have partitions it works fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants