# <center>Introduction to MPI</center>

# Shared Memory System vs Distributed Memory System 

## Shared memory Systems 
All compute elements shared access to same memory space, using two strategies of Memory Access

### Unified Memory Access
![Unified Memory Access](./img/shared_mem.gif)


### Non-Unified Memory Access 
![Non-Unified  Memory Access (UMA)](./img/numa.gif)

Both strategies running intra-node, with limited capacity to increase the computer processing capacity. **(Vertical Scaling)**

When we need increase the capacity, need use a different approach, **(Horizontal Scaling)**

##Distributed Memory System. 

Adding multiples nodes each one with your own memory space, and working like one compute unit, show the challenges of Distributed Computing

![Distributed Memory System](./img/hybrid_mem.gif)

In [None]:
!lscpu  # This command shows the CPU information and the Numa node information, put special attention to NUMA information
!numctl -H # This command shows the NUMA information

## MPI (Message Passing Interface)

### Overview
The Message Passing Interface (MPI) is a standardized and portable message-passing system designed to function on parallel computing architectures. MPI is widely used for parallel programming in high-performance computing (HPC) environments.

MPI addresses the message-passing parallel programming model: data is **moved from the address space** of one process to that of another process through cooperative operations on each process.

### MPI Standard
The MPI standard defines the syntax and semantics of library routines that can be used to write portable message-passing programs in C, C++, and Fortran. The most current version of MPI is MPI-3.1., but The MPI standard has gone through a number of revisions, with the most recent version being MPI-4.x but MPI-5.0 was approved on June 5 of 2025, please see [MPI Standard
](https://www.mpi-forum.org/docs/)

### MPI Implementations
There are several implementations of the MPI standard. Two of the most widely used implementations are:
- **MPICH**: A high-performance and widely portable implementation of MPI.
- **INTELMPI**: Intel specific implementation 
- **OpenMPI**: An open-source MPI implementation that is developed and maintained by a consortium of academic, research, and industry partners.

OpenMPI offer MPI Build Script for Linux Clusters, 


|Implementation |language   |ScriptName | Underlying Compiler|
|  --- |    --- |   --- |   --- |
|Open MPI       |	C	    | mpicc	    |C compiler for loaded compiler package|
|               |   C++	| - mpiCC <br/> - mpic++ <br/>- mpicxx	    |C++ compiler for loaded compiler package|
|               |   Fortran	|   -mpif77 <br/> - mpif90	| Fortran77 compiler for loaded compiler package <br/>Fortran90 compiler for loaded compiler package. Points to mpifort.|




## Setting Up the Environment
To start programming with MPI in C or C++, you need to have an MPI library installed. For this tutorial, we'll use OpenMPI. Below are the steps to install and compile MPI programs using GCC and OpenMPI.

### Installation of OpenMPI
You can install OpenMPI on a Unix-based system using a package manager. For example, on Ubuntu, you can use:
```bash
sudo apt-get update
sudo apt-get install openmpi-bin openmpi-common libopenmpi-dev
```
***Note:***: All libs and wrappers are installing, please don't try it. 

## Compiling MPI Programs
MPI programs are compiled using the `mpicc` or `mpiCC` compiler wrappers, which are part of the OpenMPI package. These wrappers call the underlying compiler (e.g., GCC) with the correct flags and libraries.


`hello_world.c`
```C
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    printf("Hello world from rank %d out of %d processors\n", world_rank, world_size);

    MPI_Finalize();
    return 0;
}

```


Let's compile the program using mpi wrapper compiler

In [22]:
!mpicc code/hello_world.c -o code/hello_world 

which should create an executable file called `hello_world`. And now execute the program, and see in details the runtime execution using `mpirun` command or summit it to current cluster using a Slurm Job Manager with command `srun` and the parameter used

In [3]:
#!mpirun -np 4 ./hello_world # This example uses 4 processes on the same machine 
!srun -N 2 -n 4 --ntasks-per-node=2 ./hello_world  # This example uses 4 processes on 2 nodes

I am rank 3 out of 4 on host ss-19 
I am rank 1 out of 4 on host ss-18 
I am rank 2 out of 4 on host ss-19 
I am rank 0 out of 4 on host ss-18 


Note that the execution block is enclosed in a function called `main()`, which returns the value 0 if it is completed successfully. The declaration of `main()` is mandatory in C/C++.


## Unit 1: MPI Basics

### Subtopic 1.1: MPI Initialization and Finalization
#### Explanation
- **MPI_Init**: Initializes the MPI execution environment.
- **MPI_Finalize**: Terminates the MPI execution environment.

#### Example: Initialization and Finalization basic.cpp
```c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);
    //During MPI_Init, all of MPI’s global and internal variables are constructed. 
    printf("MPI environment initialized.\n");
    MPI_Get_version(&version, &subversion);
    //Used to capture the Version and Release of Implementation
    printf("MPI Version: %d.%d\n", version, subversion);
    MPI_Finalize();
    //MPI_Finalize is used to clean up the MPI environment. No more MPI calls can be made after this one. 
    printf("MPI environment finalized.\n");
    return 0;
}


In [7]:
!mpicc code/basic_mpi.cpp -o code/unit_1_basic
!srun -N 2 -n 4 --ntasks-per-node=2 ./code/unit_1_basic

MPI environment initialized.
MPI Version: 3.1
MPI environment finalized.
MPI environment initialized.
MPI Version: 3.1
MPI environment finalized.
MPI environment initialized.
MPI Version: 3.1
MPI environment finalized.
MPI environment initialized.
MPI Version: 3.1
MPI environment finalized.


## Basic Anatomy of MPI Program. 

### Simple Communication.

![Simple MPI Send and Receive ](https://cvw.cac.cornell.edu/mpip2p/intro/SimpleSendAndRecv.gif)


```c


// include libs 
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {

//inicialization 
    MPI_Init(&argc, &argv);
//get the number of rank and the size of Communicator 
  int world_rank;
  int world_size;  
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
//
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);
  printf("Running on node: %s\n",processor_name);
  printf("Number of ranks: %d\n",   world_size);
//evaluation of rank id to take action
// Normaly rank_id=0 mean that is the master process
    if (world_rank == 0) {
        int data = 100;
        //Send data to communicators 
        MPI_Send(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
        printf("Process 0 sent data %d to process 1\n", data);
    } else if (world_rank == 1) {
        int data;
        //receive data 
        MPI_Recv(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        printf("Process 1 received data %d from process 0\n", data);
    }
  MPI_Finalize();
  printf("MPI environment finalized.\n");
}
```

### Types of communications. 

In MPI (Message Passing Interface), communications are divided into two main **categories**: point-to-point and collective communications. Each of these types has subtypes depending on synchronization, buffering, and topology


### Point-to-Point Communication
#### Blocking communication
- **MPI_Send**: Sends a message to another process.
- **MPI_Recv**: Receives a message from another process.
### Simple Communication.

![Simple MPI Blocking Communications ](https://cvw.cac.cornell.edu/mpip2p/communication-modes/SynchSendAndRecv.gif)



#### Example: Send and Receive
```c
#include <stdio.h>
#include "mpi.h"


int main(int argc, char **argv)
{
  MPI_Init(&argc, &argv); // alt.: NULL,NULL
  printf("MPI environment initialized.\n");
  int size, rank;

  // Copy the communicator

  int world_rank;
  int world_size;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);
  printf("Running on node: %s\n",processor_name);
  printf("Number of ranks: %d\n",   world_size);
    if (world_rank == 0) {
        int data = 100;
        double start = MPI_Wtime();
        MPI_Send(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
        double end = MPI_Wtime();
        printf("Time taken on send : %f seconds\n", end - start);
        printf("Process 0 sent data %d to process 1\n", data);
    } else if (world_rank == 1) {
        int data;
        double start = MPI_Wtime();
        MPI_Recv(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        double end = MPI_Wtime();
        printf("Time taken on received : %f seconds\n", end - start);
        printf("Process 1 received data %d from process 0\n", data);
    }
  MPI_Finalize();
  printf("MPI environment finalized.\n");

  return 0;
}
```
This type of communications is **blocking**, because Sender blocks until buffer is safe to reuse; receiver blocks until message is received.

In [11]:
!mpicc code/blocking_comm.cpp -o code/blocking_comm
!srun -N 2 -n 4 --ntasks-per-node=2 ./code/blocking_comm

MPI environment initialized.
Running on node: ss-19
Number of ranks: 4
MPI environment finalized.
MPI environment initialized.
Running on node: ss-18
Number of ranks: 4
Time taken on received : 0.000238 seconds
Process 1 received data 100 from process 0
MPI environment finalized.
MPI environment initialized.
Running on node: ss-19
Number of ranks: 4
MPI environment finalized.
MPI environment initialized.
Running on node: ss-18
Number of ranks: 4
Time taken on send : 0.000020 seconds
Process 0 sent data 100 to process 1
MPI environment finalized.


### Buffered Send
The blocking buffered send MPI_Bsend copies the data from the message buffer to a user-supplied buffer and then returns. The data will be copied from the user-supplied buffer over the network once the "ready to receive" notification has arrived.
![Simple MPI Buffered Send Communications ](https://cvw.cac.cornell.edu/mpip2p/communication-modes/BuffSendAndRecv.gif)

```c
#include <stdio.h>
#include <cstdlib>
#include "mpi.h"


int main(int argc, char **argv)
{
  MPI_Init(&argc, &argv); // alt.: NULL,NULL
  printf("MPI environment initialized.\n");
  int size, rank;

  // Copy the communicator

  int world_rank;
  int world_size;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);
  printf("Running on node: %s (rank %d of %d)\n", processor_name, world_rank, world_size);
  // Declare the times 
        double start_i, end_i;
           
    if (world_size < 2) {
        if (world_rank == 0) {
            fprintf(stderr, "This example needs at least two processes.\n");
        }
        MPI_Finalize();
        return 1;
    }

    if (world_rank == 0) {
        int data = 42;

        // Calculate size needed for the buffer
        int bufsize;
        MPI_Pack_size(1, MPI_INT, MPI_COMM_WORLD, &bufsize);
        // Calculate the Buffer Size. 
        // MPI_BSEND_OVERHEAD represents the size, in bytes, of the memory overhead generated 
        // everytime an MPI_Bsend or MPI_Ibsend is issued.
        bufsize += MPI_BSEND_OVERHEAD;

        // Allocate and attach buffer
        void* buffer = malloc(bufsize);
        if (buffer == NULL) {
            fprintf(stderr, "Could not allocate buffer\n");
            MPI_Abort(MPI_COMM_WORLD, 1);
        }
        // Setuo the buffer 
        MPI_Buffer_attach(buffer, bufsize);
         
        // Send data using buffered send
        double start = MPI_Wtime();
        MPI_Bsend(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
        double end = MPI_Wtime();
        printf("Time taken: %f seconds\n", end - start);
        printf("Process 0 sent buffered data %d to process 1\n", data);

        // Detach buffer (waits for completion of internal copies)
        void* detached_buffer;
        int detached_size;
        MPI_Buffer_detach(&detached_buffer, &detached_size);
        free(detached_buffer);
    }

    else if (world_rank == 1) {
        int received;
        start_i = MPI_Wtime();
        MPI_Recv(&received, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        end_i = MPI_Wtime();
        printf("Time taken: %f seconds\n", end_i - start_i);
        printf("Process 1 received data %d from process 0\n", received);
    }

    MPI_Finalize();
    return 0;
}
```

## Differences

The **real difference** between **buffered** and **non-buffered** communication in MPI lies in **how the send operation completes** and **who manages the memory and synchronization of the data transfer**.

---

### a. Conceptual Differences

| Aspect                 | **Buffered Communication (`MPI_Bsend`)**                                         | **Standard Communication (`MPI_Send`)**                                                    |
| ---------------------- | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| **Completion**         | Returns once the user-provided buffer is **copied** into an internal MPI buffer. | May block until the receiver has **started receiving** or MPI has safely handled the data. |
| **Buffer Management**  | **User** must allocate and manage a buffer via `MPI_Buffer_attach`.              | **MPI implementation** decides whether to use buffering or synchronous send.               |
| **Deadlock Potential** | Lower risk; sender does not wait for receiver.                                   | Higher risk; sender may block if receiver is not ready (especially in circular waits).     |
| **Use Case**           | Useful when you need to **avoid blocking** without using non-blocking functions. | Suitable for simple, tightly-coupled communication patterns.                               |
| **Control Level**      | Medium-level; user controls when to buffer, how much.                            | Low-level; MPI decides buffering or synchronization strategy.                              |

---

### b. Performance Considerations

| Factor               | Buffered Send (`MPI_Bsend`)                                                        | Standard Send (`MPI_Send`)                                                |
| -------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| **Latency**          | Typically **higher** (extra memory copy).                                          | May be **lower** if receiver is ready.                                    |
| **Throughput**       | May degrade with large messages or limited buffer size.                            | Can be optimized by MPI via protocol (eager vs rendezvous).               |
| **Memory Footprint** | Requires explicit buffer allocation by the user.                                   | Depends on internal MPI behavior (may still use buffers internally).      |
| **Scalability**      | Less scalable due to user-managed buffers, especially with many messages or ranks. | More scalable when combined with non-blocking (`Isend/Irecv`) techniques. |

---
### c. Monitor System Resources

Use tools like:

* `valgrind` or `gprof` to check memory usage.
* `perf`, `mpiP`, or `TAU` for MPI event profiling.
* `top` / `htop` / `mpstat` for CPU and memory under pressure.

### Final Insight

In practice:

* Prefer **non-blocking (`Isend`/`Irecv`)** for high performance and overlapping communication/computation.
* Use `Bsend` **only when non-blocking is not viable** and you want simpler code **without risk of deadlock**.
* Use `Send` when communication is simple and tightly coordinated.


In [10]:
!mpicc code/buffered_comm.cpp -o code/buffered_comm
!srun -N 2 -n 4 --ntasks-per-node=2 code/buffered_comm

MPI environment initialized.
Running on node: ss-18 (rank 1 of 4)
Time taken: 0.000270 seconds
Process 1 received data 42 from process 0
MPI environment initialized.
Running on node: ss-19 (rank 3 of 4)
MPI environment initialized.
Running on node: ss-18 (rank 0 of 4)
Time taken: 0.000027 seconds
Process 0 sent buffered data 42 to process 1
MPI environment initialized.
Running on node: ss-19 (rank 2 of 4)


### Benchmarking of both approach 


In [26]:
!mpicc -O2  code/mpi_bench_latency.cpp -o code/mpi_bench_latency
!srun -N 1 -n 2  code/mpi_bench_latency

Running benchmark on 2 MPI ranks...
[MPI_Send] size: 1 bytes, avg latency: 0.410186 us
[MPI_Send] size: 2 bytes, avg latency: 0.290445 us
[MPI_Send] size: 4 bytes, avg latency: 0.291977 us
[MPI_Send] size: 8 bytes, avg latency: 0.287632 us
[MPI_Send] size: 16 bytes, avg latency: 0.331509 us
[MPI_Send] size: 32 bytes, avg latency: 0.322792 us
[MPI_Send] size: 64 bytes, avg latency: 0.319868 us
[MPI_Send] size: 128 bytes, avg latency: 0.427865 us
[MPI_Send] size: 256 bytes, avg latency: 0.505741 us
[MPI_Send] size: 512 bytes, avg latency: 1.677368 us
[MPI_Send] size: 1024 bytes, avg latency: 1.889915 us
[MPI_Send] size: 2048 bytes, avg latency: 2.456696 us
[MPI_Send] size: 4096 bytes, avg latency: 6.586184 us
[MPI_Send] size: 8192 bytes, avg latency: 4.932796 us
[MPI_Send] size: 16384 bytes, avg latency: 5.934676 us
[MPI_Send] size: 32768 bytes, avg latency: 6.455069 us
[MPI_Send] size: 65536 bytes, avg latency: 8.161791 us
[MPI_Send] size: 131072 bytes, avg latency: 12.611481 us
[MPI_Se

In [29]:
# only if whe have the permission over the kernel metrics 
!perf stat -e cache-misses,cache-references,context-switches,page-faults,cycles,instructions \
  mpirun -np 2 ./mpi_bench_latency

[sudo] password for ss2: Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 4:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)




### Non-Blocking communication

MPI provides both blocking and nonblocking point-to-point communication:

**Blocking communication** means that the process waits to ensure the message data have achieved a particular state before processing can continue.
**Nonblocking communication** means that the processor merely requests to start an operation and continues processing.

**Nonblocking** calls merely initiate the communication process. The status of the data transfer, and the success of the communication, must be verified at a later point in the program. The purpose of a nonblocking send is mostly to notify the system of the existence of an outgoing message: the actual transfer might take place later. It is up to the programmer to keep the send buffer intact until it can be verified that the message has actually been copied someplace else. Likewise, a nonblocking receive signals the system that a buffer is prepared for an incoming message, without waiting for the actual data to arrive


- **MPI_ISend/MPI_IRecv**: Initiates communication and returns immediately;requires MPI_Wait or MPI_Test to complete.'
- **MPI_Wait / MPI_Waitall**  Blocks the caller until the operation associated with the **MPI_Request** is complete

```c
#include <mpi.h> //nonblocking_comm.cpp
#include <stdio.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int world_rank;
    int world_size;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    printf("Running on node: %s\n", processor_name);
    printf("Number of ranks: %d\n", world_size);

    if (world_rank == 0) {
        int data = 100;
        MPI_Request request;
        MPI_Isend(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &request);  // Non-blocking send
        MPI_Wait(&request, MPI_STATUS_IGNORE);  // Wait for completion
        printf("Process 0 sent data %d to process 1\n", data);
    } else if (world_rank == 1) {
        int data;
        MPI_Request request;
        MPI_Irecv(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);  // Non-blocking receive
        MPI_Wait(&request, MPI_STATUS_IGNORE);  // Wait for completion
        printf("Process 1 received data %d from process 0\n", data);
    }

    MPI_Finalize();
    printf("MPI environment finalized.\n");

    return 0;
}

```

In [2]:
!mpicxx code/nonblocking_comm.cpp -o code/nonblocking_comm
!srun -N 2 -n 4 --ntasks-per-node=2 ./code/nonblocking_comm

[01m[Kcc1plus:[m[K [01;31m[Kfatal error: [m[Kcode/nonblocking_comm.cpp.cpp: No such file or directory
compilation terminated.
slurmstepd-ss-19: error: execve(): /home/ss2/hpcsummer2025/material/Track3/./code/nonblocking_comm: No such file or directory
slurmstepd-ss-18: error: execve(): /home/ss2/hpcsummer2025/material/Track3/./code/nonblocking_comm: No such file or directory
slurmstepd-ss-19: error: execve(): /home/ss2/hpcsummer2025/material/Track3/./code/nonblocking_comm: No such file or directory
slurmstepd-ss-18: error: execve(): /home/ss2/hpcsummer2025/material/Track3/./code/nonblocking_comm: No such file or directory
srun: error: ss-19: tasks 2-3: Exited with exit code 2
srun: error: ss-18: tasks 0-1: Exited with exit code 2


### Exercise No 1
Create a program using blocking and nonblocking communication to create a distributed sum, compile it and running on two nodes each one with 4 process

In [None]:
!mpicc exercises/exercise1.cpp -o exercises/exercise1 #for blocking communication
!srun -N #TODO# --ntasks-per-node=2 ./exercises/exercise1
!mpicc exercises/exercise2.cpp -o exercises/exercise2 #for nonblocking communication
!srun -N #TODO# --ntasks-per-node=2 ./exercises/exercise2

In [8]:
!mpic++ solutions/exercise2.cpp -o solutions/exercise2
!srun -N 3 -n 6 --ntasks-per-node=2 solutions/exercise2

Total sum computed at root: 21



## MPI Communicators

Collective communication involves all the processes in a **communicator**. The purpose of collective communication is to manipulate a shared piece or set of information. The collective communication routines were built upon point-to-point communication routines. You could build your own collective communication routines in this way, but it might involve a lot of tedious work and might not be as efficient.

Parallel applications in a **distributed memory environment** sometimes require explicit or implicit synchronization. Like other message-passing libraries, MPI provides a routine, **MPI_BARRIER**, to synchronize all processes within a communicator. A barrier is simply a synchronization primitive. Any process calling it will be blocked until all the processes within the group have called it. Once all the processes in the communicator group have reached the barrier, the function will return, and all processes in the group can continue.

MPI provides **three categories** of collective data-movement routines in which one process either sends to or receives from **all processes**: broadcast, gather, and scatter. There are also **allgather and alltoall routines**, which require all processes both to send and receive data. The gather, scatter, allgather, and alltoall routines have variable-data versions. For their variable data ("v") versions, each process can send and/or receive a different number of elements. The list of MPI collective data movement routines are:

broadcast
gather, gatherv
scatter, scatterv
allgather, allgatherv
alltoall, alltoallv

#### Explanation
- **MPI_COMM_WORLD**: Default communicator including all processes.
- **MPI_Comm_size**: Determines the size of the group associated with a communicator.
- **MPI_Comm_rank**: Determines the rank of the calling process in the communicator.


## MPI Collective Communication

The Type of Collective communication on MPI are:
![Collective](./img/collective_comm.gif)
<br>
![MPI_AllGather](https://mpitutorial.com/tutorials/mpi-scatter-gather-and-allgather/allgather.png)

### Broadcast
#### Explanation
- **MPI_Bcast**: Broadcasts a message from the process with rank "root" to all other processes in the communicator.

#### Example: Broadcast
```c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    int data = 0;
    if (world_rank == 0) {
        data = 100;
    }
    MPI_Bcast(&data, 1, MPI_INT, 0, MPI_COMM_WORLD);
    printf("Process %d received data %d\n", world_rank, data);

    MPI_Finalize();
    return 0;
}
```

### MPI  Data Types

To share data between nodes, is required use same data types, to cast values and manipulate side to side. MPI predefines its primitive data types:


|C Data Types 1| C Data Types 2|
|   ---  |  --- |	
|MPI_CHAR<br/>MPI_WCHAR<br/>MPI_SHORT<br/>MPI_INT<br/>MPI_LONG<br/>MPI_LONG_LONG_INT<br/>MPI_LONG_LONG<br/>MPI_SIGNED_CHAR<br/>MPI_UNSIGNED_CHAR<br/>MPI_UNSIGNED_SHORT<br/>MPI_UNSIGNED_LONG<br/>MPI_UNSIGNED<br/>MPI_FLOAT<br/>MPI_DOUBLE<br/>MPI_LONG_DOUBLE|MPI_C_COMPLEX<br/>MPI_C_FLOAT_COMPLEX<br/>MPI_C_DOUBLE_COMPLEX<br/>MPI_C_LONG_DOUBLE_COMPLEX<br/>MPI_C_BOOL<br/>MPI_LOGICAL<br/>MPI_C_LONG_DOUBLE_COMPLEX<br/>MPI_INT8_T<br/>MPI_INT16_T<br/>MPI_INT32_T<br/>MPI_INT64_T<br/>MPI_UINT8_T<br/>MPI_UINT16_T<br/>MPI_UINT32_T<br/>MPI_UINT64_T<br/>MPI_BYTE<br/>MPI_PACKED|



|MPI Reduction Operation|	C Data Types	|
|   --- |   --- |
|MPI_MAX|	maximum	|integer, float	|
|MPI_MIN|	minimum	integer, float	|
|MPI_SUM|	sum	|integer, float	|
|MPI_PROD|	product	|integer, float	|
|MPI_LAND|	logical AND|	integer	|
|MPI_BAND|	bit-wise AND|integer MPI_BYTE|	
|MPI_LOR|	logical OR|	integer	|
|MPI_BOR|	bit-wise OR	integer, MPI_BYTE|	
|MPI_LXOR|	logical XOR	|integer	
|MPI_BXOR|	bit-wise XOR	|integer, MPI_BYTE|
|MPI_MAXLOC|	max value and location	|float, double and long double|	
|MPI_MINLOC|	min value and location	|float, double and long double|	


### Exercise No 2. 

Please compile  with support for MPI and Submit it to Job Manager to compile and run the example of `distributed_sum.c`, use several configuration to runtime: Example 2 nodes with 2 Process per node,  with 4 process per node, 3, 4 Nodes too.  Please comment the results. 