<a href="https://colab.research.google.com/github/Somanathan-R/ParallelProgrammingCUDA/blob/main/gpu_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GPU PROGRAMMING ASSIGNMENT
# Name : Somanathan R
# Reg no : RA2111028010038

**Q1. Update Directive in OpenACC**

In [4]:
%%writefile update_directive_example.c

#include <stdio.h>
#include <stdlib.h>

#define N 10

int main() {
    float *a;
    int i;

    // Allocate memory for array 'a' on the host (CPU)
    a = (float *)malloc(N * sizeof(float));

    // Initialize array 'a' on the device (GPU)
    #pragma acc parallel loop
    for (i = 0; i < N; i++) {
        a[i] = i;
    }

    // Update array 'a' on the host with values from the device
    #pragma acc update self(a)

    // Print the updated array
    printf("Array a after update:\n");
    for (i = 0; i < N; i++) {
        printf("%f\n", a[i]);
    }

    // Free allocated memory
    free(a);

    return 0;
}


Overwriting update_directive_example.c


In [5]:
!gcc -o update_directive_example -fopenacc update_directive_example.c
!./update_directive_example


Array a after update:
0.000000
1.000000
2.000000
3.000000
4.000000
5.000000
6.000000
7.000000
8.000000
9.000000


**Q2. Data directive in OpenACC**

In [11]:
%%writefile loop_optimization_example.c

#include <stdio.h>
#include <stdlib.h>

#define N 10

int main() {
    float A[N][N], B[N][N], C[N][N];
    int i, j;

    // Initialize matrices A and B
    printf("Matrix A:\n");
    for (i = 0; i < N; i++) {
        for (j = 0; j < N; j++) {
            A[i][j] = i + j;
            printf("%6.1f ", A[i][j]);
        }
        printf("\n");
    }
    printf("\n");

    printf("Matrix B:\n");
    for (i = 0; i < N; i++) {
        for (j = 0; j < N; j++) {
            B[i][j] = i - j;
            printf("%6.1f ", B[i][j]);
        }
        printf("\n");
    }
    printf("\n");

    // Compute element-wise addition of matrices A and B, storing the result in matrix C
    #pragma acc parallel loop collapse(2)
    for (i = 0; i < N; i++) {
        for (j = 0; j < N; j++) {
            C[i][j] = A[i][j] + B[i][j];
        }
    }

    // Print a few elements of matrix C to verify the computation
    printf("First few elements of matrix C:\n");
    for (i = 0; i < 5 && i < N; i++) {
        for (j = 0; j < 5 && j < N; j++) {
            printf("%6.1f ", C[i][j]);
        }
        printf("\n");
    }

    return 0;
}



Overwriting loop_optimization_example.c


In [12]:
!gcc -o loop_optimization_example -fopenacc loop_optimization_example.c
!./loop_optimization_example


Matrix A:
   0.0    1.0    2.0    3.0    4.0    5.0    6.0    7.0    8.0    9.0 
   1.0    2.0    3.0    4.0    5.0    6.0    7.0    8.0    9.0   10.0 
   2.0    3.0    4.0    5.0    6.0    7.0    8.0    9.0   10.0   11.0 
   3.0    4.0    5.0    6.0    7.0    8.0    9.0   10.0   11.0   12.0 
   4.0    5.0    6.0    7.0    8.0    9.0   10.0   11.0   12.0   13.0 
   5.0    6.0    7.0    8.0    9.0   10.0   11.0   12.0   13.0   14.0 
   6.0    7.0    8.0    9.0   10.0   11.0   12.0   13.0   14.0   15.0 
   7.0    8.0    9.0   10.0   11.0   12.0   13.0   14.0   15.0   16.0 
   8.0    9.0   10.0   11.0   12.0   13.0   14.0   15.0   16.0   17.0 
   9.0   10.0   11.0   12.0   13.0   14.0   15.0   16.0   17.0   18.0 

Matrix B:
   0.0   -1.0   -2.0   -3.0   -4.0   -5.0   -6.0   -7.0   -8.0   -9.0 
   1.0    0.0   -1.0   -2.0   -3.0   -4.0   -5.0   -6.0   -7.0   -8.0 
   2.0    1.0    0.0   -1.0   -2.0   -3.0   -4.0   -5.0   -6.0   -7.0 
   3.0    2.0    1.0    0.0   -1.0   -2.0   -3.0   -4.0 

**Q4.Example of loop optimization clauses**

In [13]:
%%writefile openacc_data_directive.c

#include <stdio.h>
#include <stdlib.h>

#define N 10

int main() {
    float *a;
    int i;

    // Allocate memory for array 'a' on the host (CPU)
    a = (float *)malloc(N * sizeof(float));

    // Initialize array 'a' on the host
    for (i = 0; i < N; i++) {
        a[i] = i;
    }

    // Print the initial values of array 'a'
    printf("Initial values of array a:\n");
    for (i = 0; i < N; i++) {
        printf("%6.1f ", a[i]);
    }
    printf("\n");

    // Transfer array 'a' to the device (GPU)
    #pragma acc enter data copyin(a[0:N])

    // Perform some computations on the device (GPU)
    #pragma acc parallel loop
    for (i = 0; i < N; i++) {
        a[i] *= 2;
    }

    // Transfer array 'a' back to the host (CPU)
    #pragma acc exit data copyout(a[0:N])

    // Print the updated values of array 'a'
    printf("Updated values of array a after computation:\n");
    for (i = 0; i < N; i++) {
        printf("%6.1f ", a[i]);
    }
    printf("\n");

    // Free allocated memory
    free(a);

    return 0;
}


Writing openacc_data_directive.c


In [15]:
!gcc -o openacc_data_directive -fopenacc openacc_data_directive.c
!./openacc_data_directive


Initial values of array a:
   0.0    1.0    2.0    3.0    4.0    5.0    6.0    7.0    8.0    9.0 
Updated values of array a after computation:
   0.0    2.0    4.0    6.0    8.0   10.0   12.0   14.0   16.0   18.0 


**Q3.Example of worker vector operations**

In [16]:
%%writefile openacc_worker_vector.c

#include <stdio.h>

#define N 100

int main() {
    float A[N], B[N], C[N];
    int i;

    // Initialize arrays A and B
    for (i = 0; i < N; i++) {
        A[i] = i;
        B[i] = 2 * i;
    }

    // Perform element-wise addition of arrays A and B, storing the result in array C
    #pragma acc parallel loop worker vector
    for (i = 0; i < N; i++) {
        C[i] = A[i] + B[i];
    }

    // Print the first few elements of array C
    printf("First few elements of array C:\n");
    for (i = 0; i < 5 && i < N; i++) {
        printf("%6.1f ", C[i]);
    }
    printf("\n");

    return 0;
}


Writing openacc_worker_vector.c


In [17]:
!gcc -o openacc_worker_vector -fopenacc openacc_worker_vector.c
!./openacc_worker_vector


First few elements of array C:
   0.0    3.0    6.0    9.0   12.0 
