# 08/09/2025

## EXERCISES:

### QUESTIONS

1. How might the clocks in two computers that are linked by a local network be
synchronized without reference to an external time source? What factors limit the
accuracy of the procedure you have described? How could the clocks in a large number of computers connected by the Internet be synchronized? Discuss the accuracy of that procedure.

2. What is the man disadvantage of distributed systems which exploit the infrastructure offered by the Internet? How can this be overcome?


3. The host computers used in peer-to-peer systems are often simply desktop computers in users’ offices or homes. What are the implications of this for the availability and security of any shared data objects that they hold and to what extent can any weaknesses be overcome through the use of replication?


4. There exist services (e.g.: Network Time Protocol service) that can be used to synchronize computer clocks. Explain why, even with these service, no guaranteed bound is given for the difference between two clocks



5. Speedup Calculation
A program spends 60% of its execution time in a part that can be parallelized. The rest (40%) must remain sequential.
According to Amdahl's Law, what is the maximum theoretical speedup if the parallel part is executed on:
a) 2 processors
b) 4 processors
c) An infinite number of processors

6. Finding the Parallel Fraction
An application achieves a speedup of 5 when running on 8 processors. Use Amdahl's Law to determine the fraction of the program that was parallelized.

### ANSWERS

1. If two computers are connected through a local network, they can synchronize their clocks without relying on an external time source by exchanging their current time values. To achieve accurate synchronization, the network latency must be measured. By estimating the message transmission delay, the difference between the clocks can be calculated and one machine’s clock adjusted accordingly.

2. A major disadvantage of distributed systems is the dependency between components—failure in one can affect the whole system. This issue can be mitigated through redundancy, ensuring backup resources are available when failures occur.

3. Host computers in homes or offices are typically just desktop machines. This setup results in poor availability and security since only a single device is used. To improve reliability, redundancy can be applied by duplicating resources digitally.

4. Perfect time measurement is fundamentally impossible for digital devices. Even with synchronization using atomic clocks, absolute precision cannot be achieved due to unavoidable latencies in communication.

5. Amdahl’s Law is given by:
   $$
   S(N) = \frac{1}{(1 - P) + \frac{P}{N}}
   $$
   For a parallel fraction $P = 0.60$:
   - With $N = 2$ processors:
     $$
     S(2) = \frac{1}{(1 - 0.60) + \frac{0.60}{2}} = 1.43
     $$
   - With $N = 4$ processors:
     $$
     S(4) = \frac{1}{(1 - 0.60) + \frac{0.60}{4}} = 1.82
     $$
   - With infinitely many processors:
     $$
     S(\infty) = \frac{1}{1 - P} = \frac{1}{0.40} = 2.5
     $$

6. Again, by Amdahl’s Law:
   $$
   S(N) = \frac{1}{(1 - P) + \frac{P}{N}}
   $$
   Given $S(N) = 5$ and $N = 8$:
   $$
   5 = \frac{1}{(1 - P) + \frac{P}{8}}
   $$
   Rearranging:
   $$
   (1 - P) + \frac{P}{8} = \frac{1}{5}
   $$
   $$
   1 - \frac{8P}{8} + \frac{P}{8} = 0.2
   $$
   $$
   1 - \frac{7P}{8} = 0.2
   $$
   $$
   0.8 = \frac{7P}{8}
   $$
   $$
   P = \frac{0.8 \times 8}{7} = \frac{6.4}{7} \approx 0.914
   $$
   Hence, the parallel fraction is approximately **91.4%**.


# LECTURE 09/09/2025

# EXERCISES

# QUESTIONS

- Exercise 1:
Explain under which assumptions the fail-recovery and the fail-silent models are similar
in (note that in both models any process can commit omission faults).
- Exercise 2:
Show how to make a stubborn point-to-point link by calling an instance of a fair-loss link
API.
- Exercise 3:
Describe the implementation of a perfect failure detector for a synchronous system
- Exercise 4:
Does the following statement satisfy the synchronous-computation assumption? 
On my server, no request ever takes more than 1 week to be processed.
- Exercise 5:
Is it possible to design a perfect failure detector for byzantine faults?
- Exercise 6:
Assume you have provided a cloud application alone, and aim to provide an SLA for
potential customers. Would you offer 90%, 99%, or 99.9% uptime per month to the
customer? If you give a 99% per month SLA, how many crashes of your application do
you expect to have to deal with in a month, and how quickly do you think this can be
dealt with?
Provide a detailed calculation of how many minutes of downtime each option allows
you to fix issues.
- Exercise 7
MS Azure SLA for VMs: https://www.azure.cn/en-us/support/sla/virtual-machines/
How much time do they get to fix an issue on their side for each SLA listed there? How
do you think they expect to achieve such high numbers. What do you think is their
sampling frequency (must search for this online)?
- Exercise 8
Consider a network as seen here:

![image](../images/Screenshot%202025-09-09%20at%2009.35.19.png)

In this network, what (minimum) present of messages of node 1 to node 2 go through
node 6?
- Exercise 8.1
Which links would you add so that this network can tolerate any two node crashes
without getting any partitions?

# ANSWERS

1. The crash recovery and crash silent are similar due to the fact that in both cases you're not sure if a process is going to fail. A crash silent event halts the process but the failure is not reliably detected while a crash recovery may fail and will recover later. So based on this assumption we can say that in both cases there's no certainty that a process will actually run.

---

2. To create a stubborn point-to-point link using a fair-loss link API, we can implement a mechanism that continuously retransmits messages until an acknowledgment is received.
A stuborn link is built upon a fair-loss link and it guarantees reliability. Implementing stubborn links can be done using a “Retransmit Forever” algorithm that builds upon an underlying fair-loss link. We know that the fair-loss link will drop a certain percentage of messages, but that some messages do reach their destination. If we keep retransmitting all messages sent, in the limit we overcome any possible dropped messages and can guarantee that every message is delivered.

```{r, tidy=FALSE, eval=FALSE, highlight=FALSE }
Implements:
    StubbornPointToPointLink, instance S

Uses:
    FairLossLink, instance F

upon event <S, Init> do
    retransmissionSet := ∅; \\messages that need sending
    delivered := ∅; \\prevents duplication
    starttimer(Δ);

\\sending a message m to a process q
\\it basically adds to the set that message
upon event <S, Send | q, m> do
    retransmissionSet := retransmissionSet ∪ {(q, m)};

\\for all the pairs of messages initiates an event that uses F to send a message to p with msg labeled as DATA
upon event <Timeout> do
    forall (p, msg) ∈ retransmissionSet do
        trigger <F, Send | p, [DATA, msg]>;
    starttimer(Δ);

\\when a message is delivered to q using f it initiates a send through F with the acknowledged label and if the pair is not in delivered then it adds it to the set
\\it then executes the deliver with S of the m message to the q process
upon event <F, Deliver | q, [DATA, m]> do
    trigger <F, Send | q, [ACK, m]>;
    if (q, m) ∉ delivered then
        delivered := delivered ∪ {(q, m)};
        trigger <S, Deliver | q, m>;

\\if the message has been acknowledged then it removes the pair from the set and it wont retransmit
upon event <F, Deliver | q, [ACK, m]> do
    retransmissionSet := retransmissionSet \ {(q, m)};
    
```

---

3. A perfect failure detector for synchronous system

```{r, tidy=FALSE, eval=FALSE, highlight=FALSE }

Implements:
    PerfectFailureDetector, instance P

Uses:
    PointToPointLinks, instance pl

// δ is the known maximum message delay

upon event <P, Init> do
    alive := Π; // Set of all processes, initially all are considered alive
    starttimer(2δ);

upon event <Timeout> do
    for all p ∈ alive do
        trigger <pl, Send | p, [HEARTBEAT]>;
    alive := ∅;
    starttimer(2δ);

upon event <pl, Deliver | q, [HEARTBEAT]> do
    if q ∉ alive then
        alive := alive ∪ {q};
        // A process is detected as crashed if it was previously alive
        // but did not send a new heartbeat in the last round.
        // This algorithm implicitly detects crashes by omission from the 'alive' set,
        // but we can make it explicit for clarity.
        // For simplicity, we consider a process "crashed" if it's not in the 'alive' set.

```

---

4. Yes, that statement satisfies the synchronous-computation assumption.\
The core requirement for the synchronous model is that there is a known and finite upper bound on the time it takes to perform a computation.

---

5. Perfect failure detector for a byzantine fault is generally considered impossible as it is a condition where a distributed system presents different symptoms to different users,including imperfect information on whether a system component has failed.

---

6. 

For a cloud application provided by a single person, I would offer a **99% uptime SLA** per month.

This choice represents a realistic balance. A **90% SLA** allows for over 3 days of downtime a month, which is commercially unacceptable and would drive customers away. Conversely, a **99.9% SLA** is extremely demanding for a solo operator, requiring sophisticated automated failover systems and 24/7 availability that are difficult to manage alone. A **99% uptime** is a professional standard that assures customers of reliability while providing a practical buffer for unforeseen issues, maintenance, and manual recovery.



**Downtime Calculations**

To calculate the allowed downtime, we first find the number of minutes in an average month: 30.4375 days/month * 24 hours/day * 60 minutes/hour = 43,830 minutes/month
Using this, the allowed monthly downtime for each SLA is:

* **90% Uptime:**
    ```
    10% downtime * 43,830 min = 4383 minutes (or 73.05 hours)
    ```
* **99% Uptime:**
    ```
    1% downtime * 43,830 min = 438.3 minutes (or 7.31 hours)
    ```
* **99.9% Uptime:**
    ```
    0.1% downtime * 43,830 min = 43.83 minutes (or 0.73 hours)
    ```



#### **Managing a 99% SLA**

With **438.3 minutes** of downtime allowed per month, the number of crashes you can handle depends entirely on your **Mean Time To Recovery (MTTR)**—the average time it takes to fix a problem from the moment you're alerted.

Let's assume your MTTR is 2 hours (120 minutes). This includes the time to be notified, diagnose the problem, and deploy a fix. In this scenario, you could handle: 438.3 minutes / 120 minutes/crash ≈ 3.65 crashes per month

So, you could manage about **3 to 4 significant outages** per month.

Achieving a 2-hour MTTR as a solo provider is challenging. It requires:
* **Excellent Monitoring:** Automated alerts sent directly to your phone.
* **Rapid Diagnosis:** Good logging and diagnostic tools to find the root cause quickly.
* **Efficient Resolution:** A streamlined process for deploying fixes or restarting services.

If a single crash takes longer to fix (e.g., it happens overnight while you're asleep), it could consume your entire monthly downtime budget in one incident.

---

7. 
Based on the provided Azure SLA for Virtual Machines, here is the allowed time they have to fix an issue for different configurations. The calculations use the same average of `43,830` minutes per month.

* **99.9% (Single VM with Premium Storage):**
    ```
    0.1% * 43,830 = 43.83 minutes/month
    ```
* **99.95% (Multiple VMs in an Availability Set):**
    ```
    0.05% * 43,830 = 21.92 minutes/month
    ```
* **99.99% (Multiple VMs across Availability Zones):**
    ```
    0.01% * 43,830 = 4.38 minutes/month
    ```



#### **Achieving High Availability**

Azure achieves these incredibly high uptime numbers through massive investment in **redundancy and automation**.

1.  **Redundancy at All Levels:**
    * **Hardware:** Individual servers have redundant power supplies, network cards, and RAID storage.
    * **Availability Sets:** This feature ensures your VMs are placed in different server racks within the same datacenter. This protects against rack-level failures like a faulty power distribution unit or network switch.
    * **Availability Zones:** This is the highest level of protection. It places your VMs in physically separate datacenters within the same region, each with independent power, cooling, and networking. This protects against catastrophic events like a fire or total power loss to an entire building.

2.  **Automation and Orchestration:**
    * **Health Monitoring:** Systems constantly monitor the health of the underlying hardware.
    * **Automated Failover:** If a server is about to fail, Azure can automatically live-migrate a VM to a healthy server with no downtime.
    * **Rapid Provisioning:** Software-defined infrastructure allows for the near-instant deployment of new resources to replace failed ones.

3.  **Expert Staff:** Microsoft employs a global team of engineers working 24/7/365 in Network Operations Centers (NOCs) to respond to incidents immediately.



#### **Sampling Frequency**

Microsoft Azure's standard **sampling frequency for VM health metrics is one minute**.

The SLA document specifies that "Downtime is defined as two consecutive minutes of no External Connectivity." This implies that their monitoring system must be checking the VM's status at least once per minute to be able to reliably detect a two-minute continuous outage. This one-minute interval is a standard feature of Azure Monitor for collecting performance and health metrics from virtual machines.

---

8. 1->3->6->8->2
   1->4->6->8->2

These are the shortest paths from 1 to 2 passing through the node 6.

   1. In order to not create partitions for the network to work we'll need to add links to every pair of nodes that are not adiacent to eachother.

---

# LECTURE 15/09/2025

# EXERCISES
# QUESTIONS

- Exercise 1
Write the exact conversion formula between Unix time and Iso 8601
- Exercise 2
Give (at least) one reason why not to just switch all timing information regarding software, computers, and distributed systems on the ISO 8601 time.
- Exercise 3
Assume that the function currentTime() returns the time of your system, which periodically updates via an NTP server and that the function fun() takes 50ms to run on your machine. 
You run the following code:
long startTime = System.currentTime();
fun();
long endTime = System.currentTime();
long elapsedTime = endTime-startTime;
Give bounds on elapsedTime.
- Exercise 4
Calculate all the happens-before pairs in the Lamport diagrams on slides 20 and 21.
- Exercise 5
Prove that for two events a and b in a distributed system, Either a -> b, b ->a, or a || b.
- Exercise 6
A relation R is a strict partial order if it is irreflexive ( not exists event a such that (a,a) is in R) and 
transitive for any events a, b, c : if (a,b) in R and (b,c) in R implies (a,c) in R. 
(These two conditions also imply that R is asymmetric, i.e. that for any events a,b, if (a,b) in R then (b,a) not in R.) 
Prove that the happens-before relation is a strict partial order.
You may assume that any two nodes are a nonzero distance apart, as well as the physical principle that information
cannot travel faster than the speed of light.
- Exercise 7
Assume an NTP client which at time t1 = 3:31am sends a request to an NTP server, which arrives at time 3:30 (at the server side). 
The server then responds with a message "response(3:31, 3:30, 3:31), which is received by the NTP client at time 3:34. 
What is the estimated network delay, and what is the estimated clock skew of the client?
Discuss what a clock correction would look like based on the skew you calculated.
- Exercise 8
In a fail-stop model, which of the following properties are safety properties?
  1. every process that crashes is eventually detected;
  2. no process is detected before it crashes;
  3. no two processes decide differently;
  4. no two correct processes decide differently;
  5. every correct process decides before t time units;
  6. if some correct process decides then every correct process decides.

# ANSWERS

1. A direct conversion is not a single formula but an algorithm. Here are the mathematical steps to perform the conversions.

---

## Unix Time to ISO 8601

This process decomposes the total number of seconds into date and time components.

Let $t_{unix}$ be the input Unix timestamp.

First, separate the total seconds into full days and the remaining seconds within the current day.
* Seconds per day: $S_{day} = 86400$
* Number of full days since epoch: $D_{total} = \lfloor \frac{t_{unix}}{S_{day}} \rfloor$
* Remaining seconds into the current day: $s_{rem} = t_{unix} \pmod{S_{day}}$

The remaining seconds are used to find the hour, minute, and second.
* Hour ($H$): $H = \lfloor \frac{s_{rem}}{3600} \rfloor$
* Minute ($M_{time}$): $M_{time} = \lfloor \frac{s_{rem} \pmod{3600}}{60} \rfloor$
* Second ($S$): $S = s_{rem} \pmod{60}$

This is an iterative algorithm to find the year, month, and day from $D_{total}$. You must account for leap years.

**a. Find the Year ($Y$)**
Start with the epoch year and subtract days for each year until the remainder is found.
1.  Set $Y = 1970$ and $D_{rem} = D_{total}$.
2.  Define a function for days in a given year:
    $DaysInYear(Y) = \begin{cases} 366 & \text{if } (Y \pmod 4 = 0 \text{ and } Y \pmod{100} \neq 0) \text{ or } (Y \pmod{400} = 0) \\ 365 & \text{otherwise} \end{cases}$
3.  Loop: While $D_{rem} \ge DaysInYear(Y)$:
    * $D_{rem} = D_{rem} - DaysInYear(Y)$
    * $Y = Y + 1$

**b. Find the Month ($M_{date}$) and Day ($D$)**
The final $D_{rem}$ is the day of the year (from 0 to 364 or 365). Convert this to a month and day by iterating through the month lengths of year $Y$.

Combine the calculated components into the ISO 8601 string format: **`YYYY-MM-DDTHH:MM:SSZ`**.

---

2. 
We don't use the human-readable ISO 8601 format for all system timing because it's highly inefficient for machines. Computers use compact numerical timestamps, like Unix time, because they are significantly faster to process, require less storage, and make calculations instantaneous. This numerical approach is also essential for internal system logic that relies on monotonic clocks, which only move forward and cannot be represented by a wall-clock standard like ISO 8601.

---

3. 
long startTime = System.currentTime(); -> it reads 10:00:00:00\
fun(); -> it becomes 10:00:00:50\
long endTime = System.currentTime(); -> it reads 10:00:00:50\
long elapsedTime = endTime-startTime; -> it's 50ms\

---

4. 

Slide 20:
![image](../images/Screenshot%202025-09-16%20at%2009.59.17.png)

happens-before relations:

bob1 -> bobf1 -> carolrec1\
bob1 -> bobf1 -> bobs1 -> alicerec3

bob2 -> bobs2 -> carolrec2\
bob2 -> bobs3 -> alicerec2

bob3 -> bobf3 -> carolrec3\
bob3 -> bobf4 -> alicrec1



Slide 21:
![image](../images/Screenshot%202025-09-16%20at%2009.59.27.png)

happents-before relations:

t1 -> t2\
t3 -> t4\
t3 -> t5\
t1 -> t6 (through m1)

we can say that t2 happens before t6 due to the fact that t1 -> t2 and then through t2 it will send other messages


---

5. 
a->b is a happens-before relation and if there is a path, because of the local ordering, massage passing etc. this relation is exclusive\
b->a same thing\
a||b if neither a->b or b->a then there are no relations hence they are concurrent

---

6. 
As said in the previous exercise the happens-before relation is a strict partial order.\
e.g.\
given the relations a->b, b->c it implies the relation a->c because of the local order\
also if a->b it means there exists a path that connects a to b following that specific order, making it also an exclusive one\
in fact if there is a a->b relation in a happens-before relation then there won't be a b->a relation\

---

7. 
t1 = 3:31\
t2 = 3:30\
t3 = 3:31\
t4 = 3:34

network delay = δ = (t4 − t1) − (t3 − t2) = (3 min) - (1 min) = 180 - 60 = 120s\

clock skew = θ = T3 + (δ ÷ 2) − T4 = 3:31 + 60s - 3:34 = -2 min -> -120s\

The estimated clock skew is -2 minutes (-120,000 ms). The negative value means the client's clock is 2 minutes ahead of the server's clock.

---

8.  
In a fail-stop model, a **safety property** is a property that states "something bad never happens." It can be proven false by observing a finite execution of the system. 
In contrast, a liveness property states that "something good will eventually happen," which requires an infinite observation to prove false.

Based on this, the safety properties from your list are: **2, 3, 4, and 5**.

* **1. Every process that crashes is eventually detected.**
    * **Liveness:** This is a classic liveness property. It states that something good (detection) will *eventually* happen. To prove it false, you would have to observe the system forever to confirm that a crashed process is *never* detected.

* **2. No process is detected before it crashes.**
    * **Safety:** This property specifies that a "bad thing" (a false detection) should never happen. You can prove it false in a finite execution: if you observe a process being marked as crashed while it is still running, the property is violated.

* **3. No two processes decide differently.**
    * **Safety:** This is an agreement or consistency property. The "bad thing" is having two different decisions. If process A decides '1' and process B decides '0' at any point, you have a finite trace that violates the property.

* **4. No two correct processes decide differently.**
    * **Safety:** This is a slightly weaker version of the previous property, but it is still a safety property for the same reason. The "bad thing" is disagreement between correct processes. A finite observation of two correct processes making different decisions is enough to prove it false.

* **5. Every correct process decides before t time units.**
    * **Safety:** This is a timeliness property, often called "bounded liveness," but it fits the definition of a safety property. The "bad thing" is a correct process not having decided by time `t`. You can prove this property false by simply running the system for `t` units of time. If a correct process hasn't decided by then, the property is violated.

* **6. If some correct process decides then every correct process decides.**
    * **Liveness:** This property states that if a good thing happens (one process decides), then another good thing (all other correct processes deciding) must *eventually* happen. To prove it false, you would need to see one correct process decide and then watch for an infinite amount of time to confirm another correct process *never* decides.

---

# LECTURE 16/09/2025
## QUESTIONS 

- Exercise 1:
Write pseudocode for Lamport clock algorithm in the API style from lecture 1 (see slide example for failure detector for formatting ideas).
- Exercise 2:
Calculate the Lamport clocks for the Lamport diagram in slide 9. Explain if they can be used to fix the issue presented in that slide.
- Exercise 3:
Prove that If $a \implies b$ then $ LT(a) < LT(b)$.
- Exercise 4:
Assume a set of nodes, and two events occurring in two different nodes with $LT(e1) = 5$ and $LT(e2) = 3$. What can you say about the causality of events $e1$ and $e2$?
- Exercise 5:
In slide 13 list all possible FIFO broadcast allowed orders of the messages a, b, and c. In slide 14 list all possible causal broadcast allowed orders of the messages a, b, and c.
- Exercise 6:
Prove that all Causal broadcast protocols are also FIFO broadcast protocols.
- Exercise 7:
Write pseudocode for a causal broadcast. You will be needed to keep track of some vector clock structure.
- Exercise 8:
Can we devise a uniform reliable broadcast algorithm with an eventually perfect failure detector but without assuming a majority of correct processes?
- Exercise 9:
Compare the causal delivery property the following property: “If a process delivers messages m1 and m2, and m1 → m2, then the process must deliver m1 before m2.”
- Exercise 10:
Calculate the vector clocks of the following distributed system. Show the contents of two different exchanged messages. Give an example of 2 different pairs of concurrent events.
![image](../images/Screenshot%202025-09-16%20at%2010.22.22.png)

## ANSWERS

1. 

```
implements:
    LamportClock, instance L

upon event <L, init> do
    clock := 0

// Correct: Increment clock before sending
upon event <L, send | p, m>
    clock := clock + 1
    trigger <Network, Send | p, [m, clock]> // Sending over a network

// Corrected logic for receiving a message
upon event <Network, Deliver | p, [m_content, sender_clock]>
    clock := max(clock, sender_clock) + 1
    trigger <L, Deliver | p, m_content> 

```

---

2. 
![image](../images/Screenshot%202025-09-16%20at%2009.59.17.png)

Solution:

![image](../images/lamport_diagram_1.png)

---

3. 
a -> b is a happens-before relation which is a strict partial order meaning that it is an exclusive relation and that a happens before b\
therefore LT(a) < LT(b)

---

4. 
Given a set of nodes and two nodes with different event so that:
LT(e1)=5 and LT(e2)=3 we could say that one happens before the other but we do not know from the given information if they are somehow connected to one other, thus
LT(e2) < LT(e1) but it does not imply that e1 -> e2 meaning they are concurrent events.

---

5. 

Slide 13 picture
![image](../images/Screenshot%202025-09-16%20at%2011.25.41.png)

All possible solutions:
User A: Must deliver a before c (FIFO). Must also deliver b before c (Causality). And since a → b, the only valid order is (a, b, c).\
User B: Must respect the global causal chain a → b → c. The only valid order is (a, b, c).\
User C: Must respect the global causal chain a → b → c. The only valid order is (a, b, c).

Slide 14 picture
![image](../images/Screenshot%202025-09-16%20at%2011.25.57.png)

All possible solutions:\
The rules we derived are:\
a must be delivered before b.\
a must be delivered before c.

This means a must always be the first message delivered. Since b and c can be delivered in any order relative to each other, the two possible total orders are:\
(a, b, c)\
(a, c, b)

---

6. 

All causal broadcast protocols are also FIFO broadcast protocols because the "happens-before" relationship of causality inherently includes the sequence of events at a single process.

---

7. 

```
Implements:
CausalBroadcast, instance cb.

Uses:
BestEffortBroadcast, instance beb.

upon event < cb, Init > do
VC := [0] * N;  // N is the number of processes
pending := ∅;

upon event < cb, Broadcast | m > do
VC[self] := VC[self] + 1;
trigger < beb, Broadcast | [VC, m] >;

upon event < beb, Deliver | p, [VC_m, m] > do
pending := pending ∪ {(p, VC_m, m)};
loop do
exists (p', VC_m', m') ∈ pending such that
VC_m'[p'] = VC[p'] + 1 and
(∀ k ≠ p': VC_m'[k] ≤ VC[k]);
if exists then
pending := pending \ {(p', VC_m', m')};
VC[p'] := VC[p'] + 1;
trigger < cb, Deliver | p', m' >;
else
break loop;

```

---

8. 
Can we devise a uniform reliable broadcast algorithm with an eventually perfect failure detector but without assuming a majority of correct processes?

No. Uniform Reliable Broadcast requires Uniform Agreement: if any process delivers a message m, all correct processes must also deliver m.

Without a majority, the network can partition into groups. A faulty process in one partition could deliver a message m that is never seen by the correct processes in another partition. This violates Uniform Agreement. An eventually perfect failure detector can't prevent this, as the partition and delivery can happen before the detector becomes accurate.

---

9. 
Compare the causal delivery property with the following property: “If a process delivers messages m1 and m2, and m1 → m2, then the process must deliver m1 before m2.”

Let's call the standard definition Property 1 and the alternative one Property 2.

The standard Property 1 is stronger because it includes a liveness guarantee. It means if broadcast(m1) → broadcast(m2), a process is not allowed to deliver m2 unless it has already delivered m1. This forces the delivery of the cause.

Property 2 is only an ordering guarantee. It says that if a process happens to deliver both, the order must be correct. It doesn't force the process to deliver m1 just because m2 has arrived.

In short, if m1 is lost but m2 arrives at a process:

    Property 1 is violated: The process cannot deliver m2.

    Property 2 is satisfied: Since the process never delivers both m1 and m2, the condition doesn't apply.
---

10. 

![img](../images/IMG_1030.jpg)

---

# LECTURE 23/09/2025

## EXERCISES


1. Give an example why the centralized algorithm (with leader and token) for Mutex does not satisfy the ordering property. 
What mechanism could the leader use to fix this? 

2. Give a summary for the total number of messages for Centralized, Token ring, Maekava, and decentralized (R &A)  algorithm for each one of: client delay , syncronization delay, bandwidth, and a major problem you know with this algorithm. 
Would you say token ring is "more" fault tolerant that leader token based algorithm? Explain why. Hint: Consider if processes can crash-recover what happens each time. 

3. If the whole system of clients and servers is fully synchronus and each process is single threaded, is mutual exclusion condition ME3, which specifies entry in happened-before order, relevant? Can it be mitigated without the algorithm changing? 

4. Give a formula for the maximum throughput of a mutual exclusion system in terms of the synchronization delay.

5. Adapt the central server algorithm for mutual exclusion to handle the crash failure of any client (in any state), assuming that the server is correct and given a reliable failure detector. Comment on whether the resultant system is fault-tolerant. What would happen if a client that possesses the token is wrongly suspected to have failed?

6. Give the module pseudocode for all 4 algorithms seen in the message parsing case. 

7. In a certain system, each process typically uses a critical section many times before another process requires it. Explain why Ricart and Agrawala’s multicast-based mutual exclusion algorithm is inefficient for this case, and describe how to improve its performance. Does your adaptation satisfy liveness condition ME2?

8. Make a process based description of Dekkers algorithm. 

8.1. In dekker algorimth let process i request the critical resource CS and let a process j with j < i to request it when queue  = k ( k< j) and again when queue = j + 1. How many times before i can j be granted entrance to the resource? 

9. When we run the two following processes (initially all variables used are 0)

P1 			            P2

1. x := 1		        1. y := 1

2. x := 2		        2. b := x

3. a := y		        3. if b = 0 then

4. if a = 0 then  	    4. CS

5.  if x=0	

6. 	CS	


under TSO, which values is it possible for P1 to read at line 4 and for P2 at line 2? 
Can both processes enter the CS at the same time?  Why yes or no? What if we replace line 5 for P1 with if x=1?

## Solutions

1. 

The algorithm with leader and token does not satisfy the order:
![img](../images/IMG_0994.jpg)

However this could be solved with either lamport logical clock algorithm:
![img](../images/IMG_0995.png)

This way the leader knows which request has been sent before due to the timestamp increasing

2. 
Centralized: Low message overhead (3 per entry) but the leader is a single point of failure.  

Token Ring: Variable delay to enter, constant background traffic. Failure of any node or token loss breaks the ring.  

Ricart & Agrawala: High message overhead (2*(N−1) per entry) and poor scalability.  

Maekawa: Moderate overhead (~3√N per entry) but can deadlock.  

**Fault Tolerance:**  
The Token Ring is generally *more fault-tolerant* than the centralized (leader-based) algorithm because no single node controls access. In a ring, a crashed node or lost token can be detected and recovered, while a centralized system halts if the leader fails.

3. 
ME3 relevance in a fully synchronous, single-threaded system
No — ME3 (entry in happened-before order) is largely irrelevant there. A fully synchronous system with single-threaded processes already enforces a global round/real-time ordering, so entries naturally follow the system schedule.
Mitigation without algorithm change: use the system clock/round number or message timestamps to decide entry order; that requires no change to the mutual-exclusion protocol logic.

4. 
Let $T_{sync}$ = synchronization delay (round-trip / negotiation cost) and $T_{cs}$ = critical-section duration.

Maximum throughput (entries per unit time) is given by:
$$\text{Throughput}_{\max} = \frac{1}{T_{sync} + T_{cs}}$$

If $T_{cs}$ is negligible, the formula simplifies to:
$$\text{Throughput}_{\max} \approx \frac{1}{T_{sync}}$$

5. 
Modify the server to maintain tokenOwner, a waiting queue, and a token epoch; on a failure report remove the client from the queue and if it was the tokenOwner increment epoch, clear tokenOwner, and grant a new token (with the new epoch) to the next requester.
With a correct (reliable) failure detector this tolerates crash-stop clients; if a token holder is wrongly suspected and the server reissues a token, duplicate tokens (and thus a safety violation) can occur unless you add epoch/version checks or leases so recovered clients discard stale tokens.

6. 
TODO


7. 
Ricart & Agrawala's algorithm is inefficient in this scenario because it has a high fixed overhead. It always requires 2(N−1) messages for a process to enter the critical section, even when no other process is competing for it.
The performance can be significantly improved by allowing a process to cache the permission to the critical section.
The adapted rule is simple: When a process exits the critical section, it checks if it has deferred any requests.
- If no other process is waiting, it keeps the permission and can re-enter immediately with zero message cost.
- If another process has sent a REQUEST, it must release the lock by replying, following the original protocol.


8. 
```
Implements:
    Dekker, instance D.

upon event <Init, D> do:
    flag := [FALSE, FALSE]        // flag[i] = TRUE if process i wants to enter CS
    turn := 0                     // whose turn it is to enter the critical section
    self_id := get_process_id()   // Assume this function returns 0 or 1
    other_id := 1 - self_id

upon event <D, request> do:
    flag[self_id] := TRUE
    while flag[other_id] do
        if turn ≠ self_id then
            flag[self_id] := FALSE
            wait until turn = self_id
            flag[self_id] := TRUE
        end if
    end while
    trigger <D, enter>         // enter critical section

upon event <D, release | > do:
    turn := other_id
    flag[self_id] := FALSE
    trigger <D, exit>          // indicate critical section is released

```

9. 


---

# LECTURE 06/10/2025

## EXERCISES

1.    We said that consensus is not consistency. How can you implement consistency using consensus? Which kind of consistency did you implement?

2.    Argue if non-Partition tolerant systems can still be considered distributed systems.

3.    Consider slide 21 (interleavings)
    .    Construct an interleaving that is not linearizable but sequentially consistent
    .    Construct an interleaving that is not sequentially consistent
    .    What is the minimum number of clients with which you can construct a non-sequentially consistent interleaving?

4.    What is the message complexity of passive replication? What is its delay?

5.    What is the meaning of "Sacrifice linearizability => offload reads to backups!"? What kind of consistency do you end up with?`

6.    What is the message complexity of active replication? What is its delay?

7.    So, why should somebody use active replication?

8.    What is the message complexity in the case of the gossip architecture? What is its delay?

9.    Read operations in the gossip architecture:
    .    What happens when your second read operation ends on an outdated replica?

10.    Write operations in the gossip architecture:
    .    Should you apply all the updates in the log when you receive a read request?
    .    In slide 37, we say "actually, it uses an Executed operation table not to re-apply them, but keep them forever". Why can't we delete the updates?

11.    Chang-Roberts: how can you overcome a crash, after you detected it?

12.    In the bully algorithm, why is safety broken if:
    .    too tight deadline?
    .    process IDs reappears?
    .    system is not synchronous?


## Solutions

1. Consistency is not consensus

Definition of consistency:
It is a consistency model in distributed systems that ensures all operations across processes appear in a single, unified order

Definition of consensus:
It is the process of agreeing on a single data value among distributed processes or systems.

To implement consistency using consensus, we can use a consensus algorithm to agree on the order of operations. Each process proposes its operation to the consensus algorithm, which then decides on a single operation to be executed next. This ensures that all processes see the same sequence of operations, achieving consistency.
The kind of consistency is eventual consistency, given the fact that some can crash.

2. 
Non-partition tolerant systems can't be considered distributed systems because network partitions are an unavoidable reality in distributed environments.

3. 

Definition of interleaving:
An interleaving is a sequence of operations from multiple processes that reflects the order in which they are executed in a concurrent system.

3.1 - A distributed system can have different processes happening at one time, thus the model can be not linearizable as there isn't a specific time order to respect however the processes must respect the causal order.

3.2 - 

3.3 - 

4. 


5. 


6. 


7. 


8. 



# LECTURE 07/10/2025

1. Why do you have non-POSIX primitives in GFS? In particular, which requirements led you to needing:
- snapshot
- record append

2. Your GFS is configured with the usual chunk size (64MB). 
   Your GFS master is storing these metadata: /foo.txt (0x2ef0, 172.31.177.226:8081) (0x2551, 172.31.177.223:8082)
                                              /bar.txt (0x144f ,172.31.177.223:8082) (0xaaaa, 172.31.177.226:8081) (0x9233, 172.31.177.226:8081)
   Your client wants to access the following data in its filesystem. Please write down the queries that the client will issue to the GFS master, and what the GFS master will send back to the client.
- /foo.txt, byte 1000
- /bar.txt, first byte in its 66th MB
- /foo.txt, first byte in its 130th MB

3. If I decide to append 10 MBs to a file already 60MB long, what are the actions taken by the GFS master?

4. We said that GFS uses some kind of passive replication.
   Which characteristics of passive replication are respected by GFS implementation?
   Which aspects of GFS implementation are not compliant with the usual passive replication approach?

5. Regarding the fault tolerance of the GFS master, we talked about the operations log and the checkpoints.
   Why do we need two different mechanisms?
   Where are operations logs and checkpoints saved?
   How do we decide which one to use, between operations log and checkpoints?

6. How many computers are usually served by one Chubby cell?

7. Why does Chubby use multi-paxos instead of paxos?

8. In Chubby client sessions, 
   what happens to the client sessions when a client crashes? Why?
   what happens to the client sessions when a master crashes? Why? (feel free to consider or ignore the "jeaopardy" mechanism)

9. In BigTable,
   why isn't the master serving meta-data to clients (such as on which tabletserver data are located)?
   which operations are responsibility of the master?

10. In BigTable, what do you get from Chubby when you look for data?

11. Both GFS and Bigtable make the same core design choice – to have a single master. What are the repercussions of a failure of this single master in each case?