Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rlm_redis - add Redis Sentinel support #2741

Open
jobec opened this issue Jun 12, 2019 · 8 comments

Comments

Projects
None yet
2 participants
@jobec
Copy link

commented Jun 12, 2019

Issue type

  • Feature request.

Feature description

Add Redis Sentinel support to rlm_redis

https://redis.io/topics/sentinel

When running more then 1 redis server in master slave setup, the writes must go to the master. Currently, things like accounting data cannot be in such sentinel setup.

@arr2036

This comment has been minimized.

Copy link
Member

commented Jun 12, 2019

As I understand it, Sentinel exists to perform automatic promotion of nodes. As we already support Redis cluster in FreeRADIUS master branch, I don't see that there's any work explicitly required to support sentinel. If you want HA support in FreeRADIUS 3 then you should use one of the many Redis proxies available, which also implement support for redis cluster natively.

@arr2036 arr2036 closed this Jun 12, 2019

@jobec

This comment has been minimized.

Copy link
Author

commented Jun 13, 2019

Redis sentinel is something different then redis cluster.

Cluster is sharding, where there are multiple redis instances serving a piece of the keys. Request a key on a wrong instance and it'll redirect you to the right instance. This setup is not meant for HA but for sizing/performance.

Redis Sentinel on the other hand, is a HA setup. You have again multiple redis instances, but they are setup in a master+slaves mode. Slaves are read only replicas of the master.
In front of these master/slaves are multiple sentinels (at least 3) that keep an eye on the master/slaves. Whenever a master becomes unavailable, they promote another instance as master.

The big difference is that you communicate with the sentinels. You ask them who is the master and then connect to that master.
https://redis.io/topics/sentinel#obtaining-the-address-of-the-current-master
Whenever there are connection problems with the master, you request the new master address again.
If there are issues with the sentinel, you skip to the next sentinel.

As far as I can tell, this mechanism is not in the current code in the master branch.

@arr2036

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

Clustering also does replica promotion, so it handles both sizing/performance and HA. You can also communicate with replicas or masters or whichever nodes you like, and ask them for the current map of the cluster, and run read only queries against the replicas. Is there a compelling reason to implement support for sentinels natively in FreeRADIUS instead of using an external proxy? For clustering there was performance and being able to implement more advanced failover algorithms.

@arr2036 arr2036 reopened this Jun 13, 2019

@jobec

This comment has been minimized.

Copy link
Author

commented Jun 13, 2019

There are a couple reasons:

  • Making cluster HA needs a lot more servers then sentinel (3 for the cluster, and 3 slaves)
  • For us, cluster is overkill. We don't need performance, we need things to remain online in case of a failure.
  • For some setups, redis cluster makes no sense. Celery for example, uses lists to keep it's queues. So that list would always be on the exact same server. Using the same Redis setup for both freeradius data and celery and getting the same HA level as with sentinel, you would need 9 (!!) servers
  • I don't know of any redis proxy that has sentinel support, they all seem to be focused on cluster. If you know about one that itself is is HA, I'm all ears. My google-fu might be failing me.
@arr2036

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

https://github.com/RedisLabs/sentinel_tunnel. seems pretty official, though it hasn't had any commits in a couple of years. You're right Redis Cluster is too heavyweight for some applications.

I'm happy to leave this open, but be aware it probably won't receive much attention for a few months unless you're willing to help out.

@jobec

This comment has been minimized.

Copy link
Author

commented Jun 13, 2019

I'm would be willing to help out, but I'm not skilled enough in C/C++ to be of any value.

I'll try to see if I can get around with configuring multiple redis modules and using the load balancing features of freeradius. I should be able to then check the response and in case of a slave, skip to the next until we get to a master.

Performance wise this won't be great, but if I can make the master state sort of predictable it should get the first try on 99% of the situations.

@jobec

This comment has been minimized.

Copy link
Author

commented Jun 13, 2019

Meanwhile, I got a workaround with this config. Not sure if it's the best way to do it, but it works.

redis1 points to the redis instance that should be master in normal circumstances. Sadly enough, when redis1 goes down, there's a 1 second delay for every request before redis2 is attempted.

file mods-enabled/redundant_redis

redis redis1 {
    server = redis1.local
    port = 6379
    query_timeout = 1
    pool {
        start = 0
        connect_timeout = 1.0
    }
}
redis redis2 {
    server = redis2.local
    port = 6379
    query_timeout = 1
    pool {
        start = 0
        connect_timeout = 1.0
    }
}
redis redis3 {
    server = redis3.local
    port = 6379
    query_timeout = 1
    pool {
        start = 0
        connect_timeout = 1.0
    }
}

file sites-enabled/some_config

redundant {
    # The redis configs redis1 -> redisX are defined in the
    # redundant redis module in the mods-enabled directory
    group {
        if ("%{redis1:EXPIRE %{toupper:%{User-Name}} 1234}" == "1") {
            ok
        } else {
            fail
        }
    }
    group {
        if ("%{redis2:EXPIRE %{toupper:%{User-Name}} 1234}" == "1") {
            ok
        } else {
            fail
        }
    }
    group {
        if ("%{redis3:EXPIRE %{toupper:%{User-Name}} 1234}" == "1") {
            ok
        } else {
            fail
        }
    }
}
@arr2036

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

Yeah, that's about as good as you're going to get with v3.0.x. IIRC master at least gives you sub-second timeouts.

The architecture of v3.0.x just doesn't support this kind of thing well. The module states are only ever advanced by requests passing through them, there's no concepts of timers that can fire independently of request processing, or asynchronous I/O events or any of that good stuff.

At least once all connections have aged out failover should be instantaneous... Except if the connection pool decides to try and establish a new connection... In which case it blocks the request whilst it tries... :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.