Insertion of duplicated Data

Hi. 

I have a cluster of 4 machines with clickhouse installed and I want to do a Replicated and distributed table between them. I want to execute a query in all the machines so I did the next configuration: 
-	4 shards 
-	2 replicas by shard

![image](https://user-images.githubusercontent.com/23501646/33842398-b402b1f8-de9a-11e7-97ec-5c54f7981c20.png)


So the config file is: 

[config.txt](https://github.com/yandex/ClickHouse/files/1548498/config.txt)



**I have a cluster of zookeepers installed in three of the machines:**

![image](https://user-images.githubusercontent.com/23501646/33842373-9f51413e-de9a-11e7-939d-a4dd8529e592.png)
     


**CREATION MERGETREE TABLE

This is the creation of the table that I run in all the servers**
_CREATE TABLE TEST_DB.Test_Table (
  Md5 FixedString(32),
  InsertionDate Date,
  EventDatetime DateTime,
  Field1 UInt8,
  Field2 UInt8
)  ENGINE = MergeTree(InsertionDate, (Md5, InsertionDate), 8192)_

**And in the clusch04, I create the distributed Table.** 
_CREATE TABLE TEST_DB.Test_Table_dist AS Test_Table ENGINE = Distributed(tlevents, TEST_DB, Test_Table, rand())_


As Result, I have this:
![image](https://user-images.githubusercontent.com/23501646/33841352-c849c758-de97-11e7-82e4-4e0afd381c30.png)

**INSERTION DATA:** 
I have a file with 100 unique  rows that I going to Insert with the clickhouse-client. 
wc –l TestFile.txt                                                 
 **100**  202 7299 TestFile.txt

[TestFile.txt](https://github.com/yandex/ClickHouse/files/1548511/TestFile.txt)

And I import the file with the clickhouse-client to the distributed table
_clickhouse-client --database="TEST_DB" --query="INSERT INTO TEST_DB.Test_Table_dist FORMAT CSV" < TestFile.txt_


If I do a COUNT(*) on the Distributed Table:

![image](https://user-images.githubusercontent.com/23501646/33842114-ec39b45a-de99-11e7-99f3-ddfcc80db3be.png)

If I do a COUNT(*) on the MergeTree Table:
![image](https://user-images.githubusercontent.com/23501646/33842212-331965e6-de9a-11e7-982f-162b843587af.png)

**Why If I inserted 100 rows now I have 175 rows inserted??** 

If I do the query in all the servers: 
![image](https://user-images.githubusercontent.com/23501646/33842267-56785eac-de9a-11e7-8207-87dff04cdeca.png)


-	Is the data really  Replicated? How can I check this?
-	It seems that the data is well distributed
**-	But, I have duplicated data. I have 100 rows but the client insert 175 rows!!!** 

Do I have an error?? Where?  In the configuration??  Is there any error in clickhouse? 

If you need more information, do not hesitate to tell me. 

Regards. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insertion of duplicated Data #1621

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Insertion of duplicated Data #1621

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions