Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dubbo service repeat register to registry #7124

Closed
Donaldhan opened this issue Jan 22, 2021 · 57 comments
Closed

dubbo service repeat register to registry #7124

Donaldhan opened this issue Jan 22, 2021 · 57 comments
Labels
type/bug Bugs to being fixed

Comments

@Donaldhan
Copy link

Donaldhan commented Jan 22, 2021

Environment

  • Dubbo version: 2.7.8
  • Operating System version:win10
  • Java version: 1.8

information

服务重复注册,用的是dububo:2.7.8, spring-boot:2.3.0.RELEASE,我的pom

   <dependency>
             <groupId>org.apache.dubbo</groupId>
             <artifactId>dubbo-spring-boot-starter</artifactId>
             <version>${dubbo.version}</version>
         </dependency>
         <dependency>
             <groupId>org.apache.dubbo</groupId>
             <artifactId>dubbo-dependencies-zookeeper</artifactId>
             <version>${dubbo.version}</version>
             <type>pom</type>
             <exclusions>
                 <exclusion>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                 </exclusion>
             </exclusions>
         </dependency>

dubbo服务

@DubboService(version = "1.0.0", validation = "true", dynamic = true)
public class AccountRemoteServiceImpl implements AccountRemoteService {
    @Autowired
    AccountBizService accountBizService;

    @Override
    public Boolean debit(String userId, BigDecimal money) {
        return  accountBizService.debit(userId, money);
    }
}

查看zk服务的节点信息,
image

ip地址一样;

4213来看,同样存在这样的问题?说是dynamic的问题,默认false,则持久路径,true为临时路径,2.7.8为true还是没用。
DubboService
···java
/**
* Whether the service is dynamic, default value is true
*/
boolean dynamic() default true;
···

调试结果:

image

临时节点,在连接断开后,zk会自动删除, 这不应该呀!

我们dubbo版本为2.6.9,dubbo-boot:0.2.1.RELEASE,spring-boot:2.1..RELEASE 存在同样的问题,重复注册?
pom

<!--0.2.1.RELEASE start-->
        <dependency>
            <groupId>com.alibaba.boot</groupId>
            <artifactId>dubbo-spring-boot-starter</artifactId>
            <version>${alibaba.boot.dubbo.springboot.starter}</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>dubbo</artifactId>
            <version>${dubbo.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-framework</artifactId>
            <version>2.8.0</version>
            <exclusions>
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-recipes</artifactId>
            <version>2.8.0</version>
        </dependency>
        <dependency>
            <groupId>io.netty</groupId>
            <artifactId>netty-all</artifactId>
            <version>4.1.15.Final</version>
        </dependency>
        <!--0.2.1.RELEASE end-->

dubbo 服务

@Service(version = "1.0.0", validation = "true", dynamic = true)
public class AccountRemoteServiceImpl implements AccountRemoteService {
    @Autowired
    AccountBizService accountBizService;

    @Override
    public Boolean debit(String userId, BigDecimal money) {
        return  accountBizService.debit(userId, money);
    }
}

然而注解service为false

public @interface Service {
...
 boolean dynamic() default false;
...
}

dynamic 为false还是有;

从这两个例子来看,跟dynamic 无关。

这是一个问题,另外我想问一下,服务节点路径的timestamp 是做什么用了

/dubbo/com.story.home.cherry.api.domain.account.service.AccountRemoteService/providers/dubbo%3A%2F%2F10.242.214.156%3A20880%2Fcom.story.home.cherry.api.domain.account.service.AccountRemoteService%3Fanyhost%3Dtrue%26application%3Dcomosus-cherry%26deprecated%3Dfalse%26dubbo%3D2.0.2%26dynamic%3Dtrue%26generic%3Dfalse%26interface%3Dcom.story.home.cherry.api.domain.account.service.AccountRemoteService%26metadata-type%3Dremote%26methods%3Ddebit%26pid%3D14688%26release%3D2.7.8%26retries%3D0%26revision%3D1.0.0%26service.filter%3DelapsedTimeFilter%26side%3Dprovider%26threads%3D400%26timeout%3D6000000%26timestamp%3D1611295886073%26version%3D1.0.0

去掉timestamp,是不是,就保持一个节点了, 但这不是根本问题。

@Donaldhan
Copy link
Author

Donaldhan commented Jan 22, 2021

4213 , 这个并没有解决问题,或者说, 我理解错了

@Donaldhan
Copy link
Author

4213, 问题回复:

近期,关于升级到2.7.1版本后注册中心(多数是zookeeper)出现重复URL地址数据无法删除的情况,我们持续收到来自社区的issue报告。以下issue是几个典型的异常现象:#3785 #3770 #3920 #4013

经过问题排查我们定位到了问题的原因:在2.7.1版本中URL地址在zookeeper中被注册为持久persistent节点(2.7.0及之前注册的是临时emperal节点),这样当server进程异常终止无法进入正常优雅下线流程时,如进程崩溃、强行kill -9等,导致zookeeper已失效持久节点无法清空,最终导致脏数据的出现。

解决及规避方法:

上下线过程避免强行终止进程,保证让Dubbo进入优雅停机流程
官方2.7.2版本将会修复此问题,预计在6月初发布
对于已经产生的脏数据,考虑手动或ZK脚本予以清理,但此时要注意严格控制删选条件
2.7.1版本增加配置:<dubbo:provider dynamic="true"/> 或<dubbo:service dynamic="true"/>

到2.7.8了,依然没有解决?

@Donaldhan
Copy link
Author

从zookeeper的官方临时节点生命周期,及官方文档来看Nodes+and+ephemeral+nodes ,显然不止Dynamic的问题

ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Ephemeral nodes are useful when you want to implement [tbd].

会话结束后,临时节点应该删除呀。

@Donaldhan
Copy link
Author

测试临时节点没有问题

@Slf4j
public class CuratorZookeeperClientTest {
   private static CuratorZookeeperClient zookeeperClient = null;
    /**
     *
     */
    @BeforeClass
    public static void initClient(){
        URL url = new URL("zookeeper","127.0.0.1", 2181);
        zookeeperClient = new CuratorZookeeperClient(url);
        log.info("zookeeperClient init done");

    }
    /**
     *
     */
    @AfterClass
    public static void closeClient(){
        if(!ObjectUtils.isEmpty(zookeeperClient)){
            zookeeperClient.close();
            log.info("zookeeperClient close done");
        }
    }

    /**
     * https://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html#Nodes+and+ephemeral+nodes
     */
    @Test
    public void testEphemeral(){
        zookeeperClient.createEphemeral("/cherry");
        log.info("zookeeperClient createEphemeral /cherry");
    }
}

@Donaldhan
Copy link
Author

是不是没有优雅关机,我是在Idea启动服务,然后关闭。

@Donaldhan
Copy link
Author

damn!graceful shutdow, is ok, the problem is slove:

$ curl -X POST localhost:8888/actuator/shutdown
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    35    0    35    0     0     56      0 --:--:-- --:--:-- --:--:--    58{"message":"Shutting down, bye..."}

how grace shutodow refer to : https://www.baeldung.com/spring-boot-shutdown;

but use the dubbo-spring-boot-starter:0.1.0, not care the point.

@Donaldhan Donaldhan reopened this Jan 22, 2021
@Donaldhan
Copy link
Author

why use the dubbo-spring-boot-starter:0.1.0, not care the point.

@Donaldhan

This comment has been minimized.

@Donaldhan
Copy link
Author

why use the dubbo-spring-boot-starter:0.1.0, not care the point.

wait to solve?

@xiaoheng1
Copy link
Contributor

能提供一个稳定重现的例子不?

@Donaldhan
Copy link
Author

能提供一个稳定重现的例子不?

现在有3个版本的,你需要哪一个版本? 2.7.8, 2.6.9, 2.5.x, 2.5.x没有问题,其他两个版本要优雅方式关闭,方可避免同ip重复注册

@xiaoheng1
Copy link
Contributor

麻烦提供一个 2.7.8 的 demo

@Donaldhan
Copy link
Author

springboot 版本:2.3.0.RELEASE

属性配置

<properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <spring.boot.version>2.3.0.RELEASE</spring.boot.version>
        <dubbo.version>2.7.8</dubbo.version>
</properties>

pom

 <!-- dubbo 2.7.8 start-->

         <dependency>
             <groupId>org.apache.dubbo</groupId>
             <artifactId>dubbo-spring-boot-starter</artifactId>
             <version>${dubbo.version}</version>
         </dependency>
         <dependency>
             <groupId>org.apache.dubbo</groupId>
             <artifactId>dubbo-dependencies-zookeeper</artifactId>
             <version>${dubbo.version}</version>
             <type>pom</type>
             <exclusions>
                 <exclusion>
                     <groupId>org.slf4j</groupId>
                     <artifactId>slf4j-log4j12</artifactId>
                 </exclusion>
             </exclusions>
         </dependency>
        <!--   dubbo 2.7.8 end -->

dubbo 服务

public interface AccountRemoteService {
    /**
     * 从用户账户中借出
     */
    Boolean debit(String userId, BigDecimal money);
}

@Slf4j
@DubboService(version = "1.0.0", validation = "true", dynamic = true)
public class AccountRemoteServiceImpl implements AccountRemoteService {
    @Autowired
    AccountBizService accountBizService;

    @Override
    public Boolean debit(String userId, BigDecimal money) {
        return  accountBizService.debit(userId, money);
    }
}

Application

@SpringBootApplication
public class Application
{
    public static void main(String[] args)
    {
        SpringApplication.run(Application.class, args);
    }

        @Bean
    public RegistryConfig dubboRegistry() {
        RegistryConfig registry = new RegistryConfig();
        registry.setAddress("zookeeper://127.0.0.1:2181");
        return registry;
    }
}

dubbo config

dubbo.config.multiple=false
dubbo.application.id=home-cherry
dubbo.application.name=home-cherry
dubbo.scan.basePackages=com.home.cherry.dubbo.provider
# ProtocolConfig
dubbo.protocol.id=home-cherry-protocol
dubbo.protocol.name=dubbo
dubbo.protocol.port=20880
dubbo.protocol.threads=400
dubbo.service.shutdown.wait=10000
dubbo.provider.timeout=6000000
dubbo.provider.retries=0
dubbo.protocol.dynamic=true
# ConsumerConfig
dubbo.consumer.client=netty
dubbo.consumer.filter=elapsedTimeFilter
dubbo.consumer.timeout=60000000
dubbo.consumer.retries=0

服务AccountRemoteServiceImpl实现这一块,你可以置空,试一下

@Donaldhan
Copy link
Author

@xiaoheng1

@xiaoheng1
Copy link
Contributor

好的,我这边看下

@Donaldhan
Copy link
Author

TK

@AlbumenJ AlbumenJ added the type/bug Bugs to being fixed label Jan 25, 2021
@xiaoheng1
Copy link
Contributor

请教下,你这边启动后还做了啥操作没?我这边测试是一个节点。

@xiaoheng1
Copy link
Contributor

@Donaldhan

@dkisser
Copy link

dkisser commented Jan 26, 2021

服务非优雅停机临时节点不会马上删除,要等zookeeper心跳超时才会删除这个节点。在zookeeper删除无效节点之前你再去注册就会有两个一模一样的节点(timestamp会不一样),这个不影响服务调用的。

@Donaldhan
Copy link
Author

请教下,你这边启动后还做了啥操作没?我这边测试是一个节点。

一个节点启动即可,启动,观察zk的服务注册节点,什么也不做,然后使用idea stop 关闭应用,再重启,观察zk的服务注册节点

@Donaldhan
Copy link
Author

结果重复注册

@Donaldhan
Copy link
Author

服务非优雅停机临时节点不会马上删除,要等zookeeper心跳超时才会删除这个节点。在zookeeper删除无效节点之前你再去注册就会有两个一模一样的节点(timestamp会不一样),这个不影响服务调用的。

  1. 可以尝试一下,第一天启动应用关闭,第二天在此启动,你发现zk服务节点,并未删除,可以尝试一下;
  2. timestamp是不同的;
    重新注册timestamp, 是不一样, 那就是不同给的节点,肯定不会覆盖的
 public static void appendRuntimeParameters(Map<String, String> map) {
        map.put(DUBBO_VERSION_KEY, Version.getProtocolVersion());
        map.put(RELEASE_KEY, Version.getVersion());
        map.put(TIMESTAMP_KEY, String.valueOf(System.currentTimeMillis()));
        if (ConfigUtils.getPid() > 0) {
            map.put(PID_KEY, String.valueOf(ConfigUtils.getPid()));
        }
    }

@xiaoheng1
Copy link
Contributor

好的

@dkisser
Copy link

dkisser commented Jan 26, 2021

#7124 (comment)
那就是你会话超时时间设太久了,你debug都到了创建临时节点那里,说明过期节点删除机制出问题了,这个需要看看zookeeper配置了

@Donaldhan
Copy link
Author

默认60s:

protected int DEFAULT_SESSION_TIMEOUT_MS = 60 * 1000;
public CuratorZookeeperClient(URL url) {
        super(url);
        try {
            int timeout = url.getParameter(TIMEOUT_KEY, DEFAULT_CONNECTION_TIMEOUT_MS);
            int sessionExpireMs = url.getParameter(ZK_SESSION_EXPIRE_KEY, DEFAULT_SESSION_TIMEOUT_MS);
            CuratorFrameworkFactory.Builder builder = CuratorFrameworkFactory.builder()
                    .connectString(url.getBackupAddress())
                    .retryPolicy(new RetryNTimes(1, 1000))
                    .connectionTimeoutMs(timeout)
                    .sessionTimeoutMs(sessionExpireMs);
            String authority = url.getAuthority();
            if (authority != null && authority.length() > 0) {
                builder = builder.authorization("digest", authority.getBytes());
            }
            client = builder.build();
            client.getConnectionStateListenable().addListener(new CuratorConnectionStateListener(url));
            client.start();
            boolean connected = client.blockUntilConnected(timeout, TimeUnit.MILLISECONDS);
            if (!connected) {
                throw new IllegalStateException("zookeeper not connected");
            }
        } catch (Exception e) {
            throw new IllegalStateException(e.getMessage(), e);
        }
    }

@Donaldhan
Copy link
Author

@dkisser

@Donaldhan
Copy link
Author

Donaldhan commented Jan 26, 2021

看zookeeper配置,清理过期节点是zookeeper的功能,和你客户端没关系哦

zk会话追踪器SessionTrackerImpl,检查客户端会话超时时,会关闭会话, 同时清理会话临时节点(不可能两天都没有清掉),会话的超时时间,是客户端设置的60s.

//org.apache.zookeeper.server.DataTree#killSession

    /**
     * 清除会话临时节点
     * @param session
     * @param zxid
     */
    void killSession(long session, long zxid) {
        // the list is already removed from the ephemerals
        // so we do not have to worry about synchronizing on
        // the list. This is only called from FinalRequestProcessor
        // so there is no need for synchronization. The list is not
        // changed here. Only create and delete change the list which
        // are again called from FinalRequestProcessor in sequence.
        Set<String> list = ephemerals.remove(session);
        if (list != null) {
            for (String path : list) {
                try {
                    deleteNode(path, zxid);
                    if (LOG.isDebugEnabled()) {
                        LOG
                                .debug("Deleting ephemeral node " + path
                                        + " for session 0x"
                                        + Long.toHexString(session));
                    }
                } catch (NoNodeException e) {
                    LOG.warn("Ignoring NoNodeException for path " + path
                            + " while removing ephemeral for dead session 0x"
                            + Long.toHexString(session));
                }
            }
        }
    }

@Donaldhan
Copy link
Author

Donaldhan commented Jan 26, 2021

调试来看超时未3秒:
image

但从zk节点看,会话都是同一个
image

image

image

amazing!

@Donaldhan
Copy link
Author

Donaldhan commented Jan 27, 2021

1.26号的,1.27还在
image

@Donaldhan
Copy link
Author

image

@xiaoheng1
Copy link
Contributor

你把你的 zk 的配置文件贴出来看下

@Donaldhan
Copy link
Author

今天突然发现Zk启动的最大和最小超时会话id为-1:想看一下是不是这个问题:
image

zk超时会话设置

 int sessionTimeout = connReq.getTimeOut();
        byte passwd[] = connReq.getPasswd();
        int minSessionTimeout = getMinSessionTimeout();
        if (sessionTimeout < minSessionTimeout) {
            sessionTimeout = minSessionTimeout;
        }
        int maxSessionTimeout = getMaxSessionTimeout();
        if (sessionTimeout > maxSessionTimeout) {
            sessionTimeout = maxSessionTimeout;
        }
        cnxn.setSessionTimeout(sessionTimeout);

根据配置和连接会话的超时时间,取客户端的sessionTimeout ,zk配置的minSessionTimeout,maxSessionTimeout, 那我就吧zk配置minSessionTimeout,maxSessionTimeout设置为60s;

# The number of milliseconds of each tick
tickTime=20000000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
minSessionTimeout=60000
maxSessionTimeout=60000
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

再次重启:
image

服务注册
image

关闭服务
image

再次查看zk节点
image

客户端关闭应该超过60s了还没有关闭

查看zk的关于会话超时的说明
https://zookeeper.apache.org/doc/r3.4.5/zookeeperProgrammers.html#ch_zkSessions
image

跟zk无关。

@Donaldhan Donaldhan reopened this Jan 27, 2021
@xiaoheng1
Copy link
Contributor

The number of milliseconds of each tick

tickTime=20000000 这个会导致 dubbo 配置的 sessionTimeout 失效。

dubbo 中配置的 sessionTimeout 必须在 2 * tick ~ 20 * tick 这个范围。

@Donaldhan
Copy link
Author

这个我好想在那看到过,我试一下先

@Donaldhan
Copy link
Author

One of the parameters to the ZooKeeper client library call to create a ZooKeeper session is the session timeout in milliseconds. The client sends a requested timeout, the server responds with the timeout that it can give the client. The current implementation requires that the timeout be a minimum of 2 times the tickTime (as set in the server configuration) and a maximum of 20 times the tickTime. The ZooKeeper client API allows access to the negotiated timeout.

@Donaldhan
Copy link
Author

确实这个问题:会话追踪器中的过期check时间为tickTime.

 public SessionTrackerImpl(SessionExpirer expirer,
            ConcurrentMap<Long, Integer> sessionsWithTimeout, int tickTime,
            long serverId, ZooKeeperServerListener listener)
    {
        super("SessionTracker", listener);
        this.expirer = expirer;
        //过期会话的check的时间间隔为tickTime
        this.sessionExpiryQueue = new ExpiryQueue<SessionImpl>(tickTime);
        this.sessionsWithTimeout = sessionsWithTimeout;
        this.nextSessionId.set(initializeNextSession(serverId));
        for (Entry<Long, Integer> e : sessionsWithTimeout.entrySet()) {
            addSession(e.getKey(), e.getValue());
        }
    }
   public ExpiryQueue(int expirationInterval) {
        this.expirationInterval = expirationInterval;
        nextExpirationTime.set(roundToNextInterval(Time.currentElapsedTime()));
    }
 /**
     * Remove the next expired set of elements from expireMap. This method needs
     * to be called frequently enough by checking getWaitTime(), otherwise there
     * will be a backlog of empty sets queued up in expiryMap.
     *
     * @return next set of expired elements, or an empty set if none are
     *         ready
     */
    public Set<E> poll() {
        long now = Time.currentElapsedTime();
        long expirationTime = nextExpirationTime.get();
        if (now < expirationTime) {
            return Collections.emptySet();
        }

        Set<E> set = null;
        long newExpirationTime = expirationTime + expirationInterval;
        if (nextExpirationTime.compareAndSet(
              expirationTime, newExpirationTime)) {
            set = expiryMap.remove(expirationTime);
        }
        if (set == null) {
            return Collections.emptySet();
        }
        return set;
    }

tickTime过大,导致过期会话检查时间没有到,没有移除过期会话。

@Donaldhan
Copy link
Author

@xiaoheng1 , 你复现过是什么原因? 在注册之前最后check一下或以IP管理,避免重复注册;虽然是TickTime设置太大导致,Zk工作是正常的,如果能避免这个问题,更好。可以容忍,外部因素(····)。

@xiaoheng1
Copy link
Contributor

我当时为了复现你说的这个问题时发现的,当时注册到 zk 上的节点,40s 左右就过期来,这个是不正常的。
我觉得按照在使用文件记录 ip 这种方式来修复会比较好。

@Donaldhan
Copy link
Author

临时文件存储?这种方式也可以,减少对现有目录结构的变更。第一种方式:注册信息在两个地方维护;第二种只在ZK上维护,便于管理,都可以的;

@Donaldhan
Copy link
Author

Donaldhan commented Jan 27, 2021

1.临时文件存储

优点:减少对现有目录结构的变更;
缺点:注册信息在两个地方维护;

2.增加IP节点

优点:注册信息统一有ZK维护,更清晰,直观;
缺点:需要改变现有的服务注册节点目录;

从大局观上讲,选择1;从整体、维护的角度及后期的发展来看,选择2;

看如何权衡,取舍吧。
@xiaoheng1

@Donaldhan
Copy link
Author

40s 左右就过期来

关闭应用后,40s左后,会话就过期,同时注册的临时节点消失?

@r13ljj
Copy link

r13ljj commented Jan 29, 2021

@xiaoheng1
我也碰到这个问题,自己做治理给服务加tag 然后更新了provider,导致新的provider emhemeralOwer是治理的应用,这样应用本身重启后zk不会去清理updated provider;
虽说是我的使用姿势问题,不过也暴露出了dubbo实例注册信息的遗失,原来我们做dubbo服务这块的治理也是在外围维护实例注册这块的数据,建议后续版本增强这块的功能,个人鄙见

@xiaoheng1
Copy link
Contributor

xiaoheng1 commented Jan 29, 2021

关闭应用后,40s左后,会话就过期,同时注册的临时节点消失?

默认的 zk 配置,tick 为 2s,那么 minSessionTime 为 4s,maxSessionTime 为 40s. dubbo 默认配置 sessionTime 为 60s,默认取 40s,和我观察到的差不多40s 左右消失是一致的(注册到zk 的临时节点)。

@xiaoheng1
Copy link
Contributor

@xiaoheng1
我也碰到这个问题,自己做治理给服务加tag 然后更新了provider,导致新的provider emhemeralOwer是治理的应用,这样应用本身重启后zk不会去清理updated provider;
虽说是我的使用姿势问题,不过也暴露出了dubbo实例注册信息的遗失,原来我们做dubbo服务这块的治理也是在外围维护实例注册这块的数据,建议后续版本增强这块的功能,个人鄙见

能详细描述下这个问题是如何产生的吗?能列下复现的步骤就更好了。

@r13ljj
Copy link

r13ljj commented Jan 30, 2021

@xiaoheng1
我也碰到这个问题,自己做治理给服务加tag 然后更新了provider,导致新的provider emhemeralOwer是治理的应用,这样应用本身重启后zk不会去清理updated provider;
虽说是我的使用姿势问题,不过也暴露出了dubbo实例注册信息的遗失,原来我们做dubbo服务这块的治理也是在外围维护实例注册这块的数据,建议后续版本增强这块的功能,个人鄙见

能详细描述下这个问题是如何产生的吗?能列下复现的步骤就更好了。

在一个非provider的应用实现通过zk registry更新provider,这样这个provider(zk ephemeral)的emhemeralOwer 属于这个非真实provider的会话,导致真实的provider非正常关闭(kill -9)时,zk不会去清理ephemeral node。。。

@xiaoheng1
Copy link
Contributor

@xiaoheng1
我也碰到这个问题,自己做治理给服务加tag 然后更新了provider,导致新的provider emhemeralOwer是治理的应用,这样应用本身重启后zk不会去清理updated provider;
虽说是我的使用姿势问题,不过也暴露出了dubbo实例注册信息的遗失,原来我们做dubbo服务这块的治理也是在外围维护实例注册这块的数据,建议后续版本增强这块的功能,个人鄙见

能详细描述下这个问题是如何产生的吗?能列下复现的步骤就更好了。

在一个非provider的应用实现通过zk registry更新provider,这样这个provider(zk ephemeral)的emhemeralOwer 属于这个非真实provider的会话,导致真实的provider非正常关闭(kill -9)时,zk不会去清理ephemeral node。。。

也就是说你用一个第三方的应用,写了一段代码,连上了 zk,然后修改了 zk provider 节点的属性,导致真实 provider 非正常关闭的时候没有清理 ephemeral node?

zk 的配置麻烦贴下,还有就是这个 ephemeral node 节点后来一直没有被清掉吗?

@r13ljj
Copy link

r13ljj commented Feb 1, 2021

@xiaoheng1
我也碰到这个问题,自己做治理给服务加tag 然后更新了provider,导致新的provider emhemeralOwer是治理的应用,这样应用本身重启后zk不会去清理updated provider;
虽说是我的使用姿势问题,不过也暴露出了dubbo实例注册信息的遗失,原来我们做dubbo服务这块的治理也是在外围维护实例注册这块的数据,建议后续版本增强这块的功能,个人鄙见

能详细描述下这个问题是如何产生的吗?能列下复现的步骤就更好了。

在一个非provider的应用实现通过zk registry更新provider,这样这个provider(zk ephemeral)的emhemeralOwer 属于这个非真实provider的会话,导致真实的provider非正常关闭(kill -9)时,zk不会去清理ephemeral node。。。

也就是说你用一个第三方的应用,写了一段代码,连上了 zk,然后修改了 zk provider 节点的属性,导致真实 provider 非正常关闭的时候没有清理 ephemeral node?

zk 的配置麻烦贴下,还有就是这个 ephemeral node 节点后来一直没有被清掉吗?

是的,第三方应用连接zk获取操作provider,update之后provider的ephemeral node被第三方应用的连接会话把持,这个是我使用上的问题;zk配置没问题的 tickTime=2000 minSessionTimeout=30000 maxSessionTimeout=60000

@xiaoheng1
Copy link
Contributor

明白了

@lin-mt
Copy link

lin-mt commented Apr 28, 2021

2.7.8 使用 nacos 也有这个问题

@CrazyHZM
Copy link
Member

@Donaldhan @lin-mt 在2.7.14版本中是否存在该问题?

@CrazyHZM
Copy link
Member

CrazyHZM commented Dec 3, 2021

No feedback for a long time, please close the issue temporarily. If there is still a problem, you can reopen it.

@CrazyHZM CrazyHZM closed this as completed Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Bugs to being fixed
Projects
None yet
Development

No branches or pull requests

7 participants