# Semaphores

## What is semaphore?

A semaphore is a variable or abstract data type that is used to control access to common resources by multiple processes in a concurrent system.

## Visualization

Think of semaphores as a number of units for a particular resources available that be consumed to perform operations.

We start with a fix number of units, everytime we want to access the resources, we decrement the number of available units. If there are no more units available, we need to wait for previous operations to be completed which will increment back the number of units.

## Types of Semaphores

- binary semaphores: are restricted to the values 0 and 1 to represent locked/unlocked or available/unavailable pairs). Can be used to implement locks
- counting semaphores: allows an arbitrary resource count

## Implementation in Redis

How do we implement semaphores in redis?

## DataType

Sorted Sets

## Approach

We can use sorted sets to hold the number of units available. 
- Every time we attempt to acquire a semaphore, we just need to add a unique identifier with the present timestamp as the score to the sorted sets. 
- Then we get the rank of the identifier that we inserted. This is akin to getting the index of an item in an array.
- If the rank (index) is less than the limit, then we have acquired a semaphore.
- Else, we have exceeded the limit. We should remove the identifier that we have added earlier.

In [2]:
def acquire_semaphore(conn, semname, limit, timeout=10):
    identifier = str(uuid.uuid4())
    now = time.time()
    
    pipeline = conn.pipeline(True)
    pipeline.zremrangebyscore(semname, '-inf', now - timeout)
    pipeline.zadd(semname, identifier, now)
    pipeline.zrank(semname, identifier)
    if pipeline.execute()[-1] < limit:
        return identifier

    # Discard the identifier when we fail to get our semaphore.
    conn.zrem(semname, identifier)
    return None

In [3]:
def release_semaphore(conn, semname, identifier):
    return conn.zrem(semname, identifier)

## Implementing fair semaphore

The above implementation relies on the timestamp. The problem with that is when we have two systems with different system clocks (say System A runs 10 ms faster than System B), the ones with the slower system clocks can steal the semaphore from clients on systems with faster clocks.

A semaphore is considered unfair anytime we have a lock or a semaphore where such a slight difference in the system clock can drastically affect who can get the lock.

However, not all cases requires the kind of fairness. The key is to understand what scenarios require a fair semaphore.

## Scenario where fairness is not needed

If you have N identical worker threads, it doesn't matter which one gets a task to schedule.

## Scenario where fairness is needed

If you have N task queues, you don't want one queue to be waiting forever and never acquiring the lock.


## Implementing a fair semaphore

- Instead of using system clock timestamp, we can use an auto-incremented counter
- The counter creates a steadily increasing timer-like mechanism that ensures that whoever incremented the counter first should be the one to get the semaphore.
- We the enforce our requirements that clients that wants the semaphore who get the counter first also get the semaphore key by using an "owner" zset with counter-produced value as the score, checking our identifier's rank in the new ZSET to determine which client got the semaphore.

In [5]:
def acquire_fair_semaphore(conn, semname, limit, timeout=10):
    identifier = str(uuid.uuid4())
    czset = f'{semname}:owner'
    ctr = f'{semname}:counter'
    
    now = time.time()
    
    pipeline = conn.pipeline(True)
    pipeline.zremrangebyscore(semname, '-inf', now - timeout)
    pipeline.zinterstore(czset, {czset: 1, semname: 0})
    
    pipeline.incr(ctr)
    counter = pipeline.execute()[-1]
    
    pipeline.zadd(semname, identifier, now)
    pipeline.zadd(czset, identifier, counter)
    
    pipeline.zrank(czset, identifier)
    # If we get the semaphore, return the identifier.
    if pipeline.execute()[-1] < limit:
        return identifier
    
    # Else, we clear the data that we set.
    pipeline.zrem(semname, identifier)
    pipeline.zrem(czset, identifier)
    pipeline.execute()
    return None

In [9]:
def release_fair_semaphore(conn, semname, identifier):
    pipeline = conn.pipeline(True)
    pipeline.zrem(semname, identifier)
    pipeline.zrem(f'{semname}:owner', identifier)
    
    # Returns True if the semaphore was properly released
    # or False if it had timed out.
    return pipeline.execute()[0]

## Refreshing Semaphore

We set the default timeout to be 10 seconds, but sometimes we want to extend the timeout of the semaphore.

We can implement a `refresh_semaphore()` method to increase the duration of the semaphore, as long as the semaphore has not yet timed out yet.

In [8]:
def refresh_fair_semaphore(conn, semname, identifier):
    if conn.zadd(semname, identifier, time.time()):
        # We lost our semaphore, report back.
        release_fair_semaphore(conn, semname, identifier)
        return False
    # We still have our semaphore.
    return True

In [10]:
def acquire_semaphore_with_lock(conn, semname, limit, timeout=10):
    identifier = acquire_lock(conn, semname, acquire_timeout=0.01)
    if identifier:
        try: 
            return acquire_fair_semaphore(conn, semname, limit, timeout)
        finally:
            release_lock(conn, semname, identifier)