# 08/09/2025

## EXERCISES:

### QUESTIONS

1. How might the clocks in two computers that are linked by a local network be
synchronized without reference to an external time source? What factors limit the
accuracy of the procedure you have described? How could the clocks in a large number of computers connected by the Internet be synchronized? Discuss the accuracy of that procedure.

2. What is the man disadvantage of distributed systems which exploit the infrastructure offered by the Internet? How can this be overcome?


3. The host computers used in peer-to-peer systems are often simply desktop computers in users’ offices or homes. What are the implications of this for the availability and security of any shared data objects that they hold and to what extent can any weaknesses be overcome through the use of replication?


4. There exist services (e.g.: Network Time Protocol service) that can be used to synchronize computer clocks. Explain why, even with these service, no guaranteed bound is given for the difference between two clocks



5. Speedup Calculation
A program spends 60% of its execution time in a part that can be parallelized. The rest (40%) must remain sequential.
According to Amdahl's Law, what is the maximum theoretical speedup if the parallel part is executed on:
a) 2 processors
b) 4 processors
c) An infinite number of processors

6. Finding the Parallel Fraction
An application achieves a speedup of 5 when running on 8 processors. Use Amdahl's Law to determine the fraction of the program that was parallelized.

### ANSWERS

1. If two computers are connected through a local network, they can synchronize their clocks without relying on an external time source by exchanging their current time values. To achieve accurate synchronization, the network latency must be measured. By estimating the message transmission delay, the difference between the clocks can be calculated and one machine’s clock adjusted accordingly.

2. A major disadvantage of distributed systems is the dependency between components—failure in one can affect the whole system. This issue can be mitigated through redundancy, ensuring backup resources are available when failures occur.

3. Host computers in homes or offices are typically just desktop machines. This setup results in poor availability and security since only a single device is used. To improve reliability, redundancy can be applied by duplicating resources digitally.

4. Perfect time measurement is fundamentally impossible for digital devices. Even with synchronization using atomic clocks, absolute precision cannot be achieved due to unavoidable latencies in communication.

5. Amdahl’s Law is given by:
   $$
   S(N) = \frac{1}{(1 - P) + \frac{P}{N}}
   $$
   For a parallel fraction $P = 0.60$:
   - With $N = 2$ processors:
     $$
     S(2) = \frac{1}{(1 - 0.60) + \frac{0.60}{2}} = 1.43
     $$
   - With $N = 4$ processors:
     $$
     S(4) = \frac{1}{(1 - 0.60) + \frac{0.60}{4}} = 1.82
     $$
   - With infinitely many processors:
     $$
     S(\infty) = \frac{1}{1 - P} = \frac{1}{0.40} = 2.5
     $$

6. Again, by Amdahl’s Law:
   $$
   S(N) = \frac{1}{(1 - P) + \frac{P}{N}}
   $$
   Given $S(N) = 5$ and $N = 8$:
   $$
   5 = \frac{1}{(1 - P) + \frac{P}{8}}
   $$
   Rearranging:
   $$
   (1 - P) + \frac{P}{8} = \frac{1}{5}
   $$
   $$
   1 - \frac{8P}{8} + \frac{P}{8} = 0.2
   $$
   $$
   1 - \frac{7P}{8} = 0.2
   $$
   $$
   0.8 = \frac{7P}{8}
   $$
   $$
   P = \frac{0.8 \times 8}{7} = \frac{6.4}{7} \approx 0.914
   $$
   Hence, the parallel fraction is approximately **91.4%**.


# LECTURE 09/09/2025

# EXERCISES

## QUESTIONS

- Exercise 1:
Explain under which assumptions the fail-recovery and the fail-silent models are similar
in (note that in both models any process can commit omission faults).
- Exercise 2:
Show how to make a stubborn point-to-point link by calling an instance of a fair-loss link
API.
- Exercise 3:
Describe the implementation of a perfect failure detector for a synchronous system
- Exercise 4:
Does the following statement satisfy the synchronous-computation assumption? 
On my server, no request ever takes more than 1 week to be processed.
- Exercise 5:
Is it possible to design a perfect failure detector for byzantine faults?
- Exercise 6:
Assume you have provided a cloud application alone, and aim to provide an SLA for
potential customers. Would you offer 90%, 99%, or 99.9% uptime per month to the
customer? If you give a 99% per month SLA, how many crashes of your application do
you expect to have to deal with in a month, and how quickly do you think this can be
dealt with?
Provide a detailed calculation of how many minutes of downtime each option allows
you to fix issues.
- Exercise 7
MS Azure SLA for VMs: https://www.azure.cn/en-us/support/sla/virtual-machines/
How much time do they get to fix an issue on their side for each SLA listed there? How
do you think they expect to achieve such high numbers. What do you think is their
sampling frequency (must search for this online)?
- Exercise 8
Consider a network as seen here:

![image](../images/Screenshot%202025-09-09%20at%2009.35.19.png)

In this network, what (minimum) present of messages of node 1 to node 2 go through
node 6?
- Exercise 8.1
Which links would you add so that this network can tolerate any two node crashes
without getting any partitions?

# ANSWERS

1. The crash recovery and crash silent are similar due to the fact that in both cases you're not sure if a process is going to fail. A crash silent event halts the process but the failure is not reliably detected while a crash recovery may fail and will recover later. So based on this assumption we can say that in both cases there's no certainty that a process will actually run.
2. To create a stubborn point-to-point link using a fair-loss link API, we can implement a mechanism that continuously retransmits messages until an acknowledgment is received.
A stuborn link is built upon a fair-loss link and it guarantees reliability. Implementing stubborn links can be done using a “Retransmit Forever” algorithm that builds upon an underlying fair-loss link. We know that the fair-loss link will drop a certain percentage of messages, but that some messages do reach their destination. If we keep retransmitting all messages sent, in the limit we overcome any possible dropped messages and can guarantee that every message is delivered.

```{r, tidy=FALSE, eval=FALSE, highlight=FALSE }
Implements:
    StubbornPointToPointLink, instance S

Uses:
    FairLossLink, instance F

upon event <S, Init> do
    retransmissionSet := ∅; \\messages that need sending
    delivered := ∅; \\prevents duplication
    starttimer(Δ);

\\sending a message m to a process q
\\it basically adds to the set that message
upon event <S, Send | q, m> do
    retransmissionSet := retransmissionSet ∪ {(q, m)};

\\for all the pairs of messages initiates an event that uses F to send a message to p with msg labeled as DATA
upon event <Timeout> do
    forall (p, msg) ∈ retransmissionSet do
        trigger <F, Send | p, [DATA, msg]>;
    starttimer(Δ);

\\when a message is delivered to q using f it initiates a send through F with the acknowledged label and if the pair is not in delivered then it adds it to the set
\\it then executes the deliver with S of the m message to the q process
upon event <F, Deliver | q, [DATA, m]> do
    trigger <F, Send | q, [ACK, m]>;
    if (q, m) ∉ delivered then
        delivered := delivered ∪ {(q, m)};
        trigger <S, Deliver | q, m>;

\\if the message has been acknowledged then it removes the pair from the set and it wont retransmit
upon event <F, Deliver | q, [ACK, m]> do
    retransmissionSet := retransmissionSet \ {(q, m)};
    
```

3. A perfect failure detector for synchronous system

```{r, tidy=FALSE, eval=FALSE, highlight=FALSE }

Implements:
    PerfectFailureDetector, instance P

Uses:
    PointToPointLinks, instance pl

// δ is the known maximum message delay

upon event <P, Init> do
    alive := Π; // Set of all processes, initially all are considered alive
    starttimer(2δ);

upon event <Timeout> do
    for all p ∈ alive do
        trigger <pl, Send | p, [HEARTBEAT]>;
    alive := ∅;
    starttimer(2δ);

upon event <pl, Deliver | q, [HEARTBEAT]> do
    if q ∉ alive then
        alive := alive ∪ {q};
        // A process is detected as crashed if it was previously alive
        // but did not send a new heartbeat in the last round.
        // This algorithm implicitly detects crashes by omission from the 'alive' set,
        // but we can make it explicit for clarity.
        // For simplicity, we consider a process "crashed" if it's not in the 'alive' set.

```

4. Yes, that statement satisfies the synchronous-computation assumption.\
The core requirement for the synchronous model is that there is a known and finite upper bound on the time it takes to perform a computation.

5. Perfect failure detector for a byzantine fault is generally considered impossible as it is a condition where a distributed system presents different symptoms to different users,including imperfect information on whether a system component has failed.

6. 