Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: the datanodes is not WRITABLE #3173

Open
1 task done
corffee opened this issue Mar 5, 2024 · 10 comments
Open
1 task done

[Bug]: the datanodes is not WRITABLE #3173

corffee opened this issue Mar 5, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@corffee
Copy link

corffee commented Mar 5, 2024

Contact Details

No response

Is there an existing issue for this?

  • I have searched all the existing issues

Priority

fatal

Environment

- CubeFS version: v3.3
- Deployment mode(docker or standalone or cluster):standalone 
- Dependent components:
- OS kernel version(Ubuntu or CentOS):CentOS
- CPU/Memory:
- Others:

Current Behavior

When I use cubefs, creating volume fails, check the node status and find that the datanode is not writable。

./cfs-cli datanode list
[Data nodes]
ID ADDRESS WRITABLE STATUS
6 172.16.1.101:17310 No Active
7 172.16.1.102:17310 No Active
8 172.16.1.103:17310 No Active
9 172.16.1.104:17310 No Active

Expected Behavior

No response

Steps To Reproduce

No response

CubeFS Log

more dataNode_error.log
2024/03/05 10:19:19.348249 [ERROR] repl_protocol.go:410: id[Req(16)_Partition(18)_Extent(0)_ExtentOffset(0)_KernelOffset(0)Size(30302)
Opcode(OpWrite)_CRC(119113069)_ResultMesg(IntraGroupNetErr)] isPrimaryBackReplLeader[true] remote[172.16.1.101:42490], err[ActionPrepare
Pkt_addExtentInfo partition 18 GetAvailableTinyExtent error no available extent]
2024/03/05 10:19:19.558907 [ERROR] repl_protocol.go:410: id[Req(51)_Partition(21)_Extent(0)_ExtentOffset(0)_KernelOffset(0)_Size(3456)_O
pcode(OpWrite)_CRC(1759042440)_ResultMesg(IntraGroupNetErr)] isPrimaryBackReplLeader[true] remote[172.16.1.101:42496], err[ActionPrepare
Pkt_addExtentInfo partition 21 GetAvailableTinyExtent error no available extent]
2024/03/05 10:19:19.625427 [ERROR] repl_protocol.go:410: id[Req(84)_Partition(28)_Extent(0)_ExtentOffset(0)_KernelOffset(0)_Size(3455)_O
pcode(OpWrite)_CRC(1451489064)_ResultMesg(IntraGroupNetErr)] isPrimaryBackReplLeader[true] remote[172.16.1.101:42492], err[ActionPrepare
Pkt_addExtentInfo partition 28 GetAvailableTinyExtent error no available extent]

Anything else? (Additional Context)

No response

@corffee corffee added the bug Something isn't working label Mar 5, 2024
@hooklee2000
Copy link
Contributor

hooklee2000 commented Mar 5, 2024

Maybe the size of your hardisk less than RETAIN size.

datanode_disks | string array | Format: PATH:RETAIN, PATH: disk mount path, RETAIN: the minimum reserved space under this path, and the disk is considered full if the remaining space is less than this value. Unit: bytes. (Recommended value: 20G~50G) -- | -- | --

@corffee
Copy link
Author

corffee commented Mar 6, 2024

bin/cfs-cli cluster info
[Cluster]
Cluster name : cfs_dev
Master leader : 172.16.1.102:17010
Master-1 : 172.16.1.101:17010
Master-2 : 172.16.1.102:17010
Master-3 : 172.16.1.103:17010
Auto allocate : Enabled
MetaNode count : 4
MetaNode used : 0 GB
MetaNode total : 54 GB
DataNode count : 4
DataNode used : 15 GB
DataNode total : 49 GB
Volume count : 1
Allow Mp Decomm : Enabled
EbsAddr :
LoadFactor : 0
BatchCount : 0
MarkDeleteRate : 0
DeleteWorkerSleepMs: 0
AutoRepairRate : 0
MaxDpCntLimit : 3000

tail -f dataNode_error.log
2024/03/06 09:58:46.253073 [ERROR] partition_raft.go:144: [FATAL] stop raft partition(58)
2024/03/06 09:58:46.263252 [ERROR] partition_raft.go:144: [FATAL] stop raft partition(41)
2024/03/06 09:58:46.273050 [ERROR] partition_raft.go:144: [FATAL] stop raft partition(56)
2024/03/06 09:58:46.279611 [ERROR] partition_raft.go:144: [FATAL] stop raft partition(45)
2024/03/06 09:58:46.289317 [ERROR] partition_raft.go:144: [FATAL] stop raft partition(52)
2024/03/06 09:58:46.292559 [ERROR] partition_raft.go:144: [FATAL] stop raft partition(42)
2024/03/06 09:58:46.302727 [ERROR] partition_raft.go:144: [FATAL] stop raft partition(46)

@NaturalSelect
Copy link
Collaborator

@corffee Please show the output of cfs-cli datanode info [IP].

@corffee
Copy link
Author

corffee commented Mar 6, 2024

cfs-cli datanode info [IP].

[root@worker1 ~]#bin/cfs-cli datanode info 172.16.1.101:17310
[Data node info]
ID : 6
Address : 172.16.1.101:17310
Carry : 0.6645600532184904
Allocated ratio : 0.3187813141076219
Allocated : 3.95 GB
Available : 8.43 GB
Total : 12.38 GB
Zone : default
IsActive : Active
Report time : 2024-03-06 12:00:37
Partition count : 8
Bad disks : []
Persist partitions : [8 10 2 5 9 6 4 1]
[root@worker1 ~]# /bin/cfs-cli datanode info 172.16.1.102:17310
[Data node info]
ID : 7
Address : 172.16.1.102:17310
Carry : 0.4377141871869802
Allocated ratio : 0.3238310221713745
Allocated : 4.01 GB
Available : 8.37 GB
Total : 12.38 GB
Zone : default
IsActive : Active
Report time : 2024-03-06 12:00:49
Partition count : 7
Bad disks : []
Persist partitions : [8 10 2 6 3 4 7]
[root@worker1 ~]# /bin/cfs-cli datanode info 172.16.1.103:17310
[Data node info]
ID : 8
Address : 172.16.1.103:17310
Carry : 0.4246374970712657
Allocated ratio : 0.3238310221713745
Allocated : 4.01 GB
Available : 8.37 GB
Total : 12.38 GB
Zone : default
IsActive : Active
Report time : 2024-03-06 12:00:55
Partition count : 8
Bad disks : []
Persist partitions : [10 2 5 9 6 3 7 1]

@NaturalSelect
Copy link
Collaborator

cfs-cli datanode info [IP].

[root@worker1 ~]#bin/cfs-cli datanode info 172.16.1.101:17310 [Data node info] ID : 6 Address : 172.16.1.101:17310 Carry : 0.6645600532184904 Allocated ratio : 0.3187813141076219 Allocated : 3.95 GB Available : 8.43 GB Total : 12.38 GB Zone : default IsActive : Active Report time : 2024-03-06 12:00:37 Partition count : 8 Bad disks : [] Persist partitions : [8 10 2 5 9 6 4 1] [root@worker1 ~]# /bin/cfs-cli datanode info 172.16.1.102:17310 [Data node info] ID : 7 Address : 172.16.1.102:17310 Carry : 0.4377141871869802 Allocated ratio : 0.3238310221713745 Allocated : 4.01 GB Available : 8.37 GB Total : 12.38 GB Zone : default IsActive : Active Report time : 2024-03-06 12:00:49 Partition count : 7 Bad disks : [] Persist partitions : [8 10 2 6 3 4 7] [root@worker1 ~]# /bin/cfs-cli datanode info 172.16.1.103:17310 [Data node info] ID : 8 Address : 172.16.1.103:17310 Carry : 0.4246374970712657 Allocated ratio : 0.3238310221713745 Allocated : 4.01 GB Available : 8.37 GB Total : 12.38 GB Zone : default IsActive : Active Report time : 2024-03-06 12:00:55 Partition count : 8 Bad disks : [] Persist partitions : [10 2 5 9 6 3 7 1]

Please show the configuration of datanode.

@NaturalSelect
Copy link
Collaborator

By the way show the output of cfs-cli vol [VOLUME] info -d.

@corffee
Copy link
Author

corffee commented Mar 6, 2024

cfs-cli datanode info [IP].

[root@worker1 ~]#bin/cfs-cli datanode info 172.16.1.101:17310 [Data node info] ID : 6 Address : 172.16.1.101:17310 Carry : 0.6645600532184904 Allocated ratio : 0.3187813141076219 Allocated : 3.95 GB Available : 8.43 GB Total : 12.38 GB Zone : default IsActive : Active Report time : 2024-03-06 12:00:37 Partition count : 8 Bad disks : [] Persist partitions : [8 10 2 5 9 6 4 1] [root@worker1 ~]# /bin/cfs-cli datanode info 172.16.1.102:17310 [Data node info] ID : 7 Address : 172.16.1.102:17310 Carry : 0.4377141871869802 Allocated ratio : 0.3238310221713745 Allocated : 4.01 GB Available : 8.37 GB Total : 12.38 GB Zone : default IsActive : Active Report time : 2024-03-06 12:00:49 Partition count : 7 Bad disks : [] Persist partitions : [8 10 2 6 3 4 7] [root@worker1 ~]# /bin/cfs-cli datanode info 172.16.1.103:17310 [Data node info] ID : 8 Address : 172.16.1.103:17310 Carry : 0.4246374970712657 Allocated ratio : 0.3238310221713745 Allocated : 4.01 GB Available : 8.37 GB Total : 12.38 GB Zone : default IsActive : Active Report time : 2024-03-06 12:00:55 Partition count : 8 Bad disks : [] Persist partitions : [10 2 5 9 6 3 7 1]

Please show the configuration of datanode.

[root@worker1 data]# more conf/data1.conf
{
"role": "datanode",
"listen": "17310",
"localIP": "172.16.1.101",
"bindIp": "true",
"raftHeartbeat": "17330",
"raftReplica": "17340",
"raftDir": "/home/aspire/cubefs/data/data1/raftlog/datanode",
"logDir": "/home/aspire/cubefs/data/data1/logs",
"warnLogDir": "/home/aspire/cubefs/data/data1/logs",
"logLevel": "debug",
"disks": [
"/home/aspire/cubefs/data/data1/disk:1000000000"
],
"enableSmuxConnPool": "true",
"masterAddr": [
"172.16.1.101:17010","172.16.1.102:17010","172.16.1.103:17010"
]
}

/bin/cfs-cli vol info objtest
Summary:
ID : 10
Name : objtest
Owner : obj
Authenticate : Disabled
Capacity : 1 GB
Create time : 2024-03-05 10:17:21
DeleteLockTime : 0
Cross zone : Disabled
DefaultPriority : false
Dentry count : 1
Description :
DpCnt : 10
DpReplicaNum : 3
Follower read : Disabled
Inode count : 2
Max metaPartition ID : 3
MpCnt : 3
MpReplicaNum : 3
NeedToLowerReplica : Disabled
RwDpCnt : 10
Status : Normal
ZoneName : default
VolType : 0
DpReadOnlyWhenVolFull : false
Transaction Mask : off
Transaction timeout : 1
Tx conflict retry num : 10
Tx conflict retry interval(ms) : 20
Tx limit interval(s) : 0
DisableAuditLog : false
Quota : Disabled

@morphes1995
Copy link
Contributor

the dir /home/aspire/cubefs/data/data1/disk created ? @corffee

@corffee
Copy link
Author

corffee commented Mar 7, 2024

the dir /home/aspire/cubefs/data/data1/disk created ? @corffee

yes,folder already exists。And after restarting all services, the issue cannot be resolved。

@hooklee2000
Copy link
Contributor

hooklee2000 commented Mar 9, 2024

Available Space of data node must be more than 10GB,check the code master/data_node.go:

func (dataNode *DataNode) isWriteAble() (ok bool) {
	dataNode.RLock()
	defer dataNode.RUnlock()

	if dataNode.isActive && dataNode.AvailableSpace > 10*util.GB && !dataNode.RdOnly {
		ok = true
	}

	return
}

ID : 6
Address : 172.16.1.101:17310
Carry : 0.6645600532184904
Allocated ratio : 0.3187813141076219
Allocated : 3.95 GB
Available : 8.43 GB #less than 10GB
Total : 12.38 GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants