# 3D Temperature Diffusion Model: Bringing it all together

------- The goal for this section is to provide the simple easy version, and then get them to do the NumPy and CuPy versions. The goal of the temperature diffusion to give the basic example and do two thing, make scaling plots for particular function and implement NumPy and CuPy version and then do the same profiling for them -----

----- The analysis that i have conducted can still be included but this should be for my RTX 3070 rather than on the HPC and then for them to use that to compare to what they have. 

To highlight the difference between NumPy and CuPy, a 3D temperature diffusion model is used to highlight the difference in performance that can be achieved for computationally intensive tasks. 

# Data 

For this task, starting data of 3-dimensional Ocean Temperatures are required, which we can download from the [Copernicus Marine Data Service](https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_PHY_001_024/description). 

## Downloading Data 

A utility function has been included with the repo for this course bundled with poetry, as explained in the [README](../README.md). 

``` bash 
poetry run download_data
```

will download the required dataset for this course into nthe `./data` directory. The dataset that is downloaded is: 

**Description**:  
This dataset was downloaded from the **Global Ocean Physics Analysis and Forecast** service. It provides data for global ocean physics, focusing on sea water potential temperature.

- **Product Identifier**: `GLOBAL_ANALYSISFORECAST_PHY_001_024`
- **Product Name**: Global Ocean Physics Analysis and Forecast
- **Dataset Identifier**: `cmems_mod_glo_phy-thetao_anfc_0.083deg_PT6H-i`

**Variable Visualized**:  
- **Sea Water Potential Temperature (thetao)**: Measured in degrees Celsius [°C].

**Geographical Area of Interest**:  
- **Region**: Around the United Kingdom
- **Coordinates**:
  - **Northern Latitude**: 65.312
  - **Eastern Longitude**: 6.1860
  - **Southern Latitude**: 46.829
  - **Western Longitude**: -13.90

**Depth Range**:  
- **Minimum Depth**: 0.49 meters  
- **Maximum Depth**: 5727.9 meters

**File Size**:  
- **267.5 MB**

## Visualising Data 

To make the process of visualising the data easier, three different utility functions have been created. The defeault output locations for the visualisations is within the `output` directory.

### Visualise Slice (Static) 

Visualizing a 2D temperature slice. The depth that will be targetted is the surface, e.g. 0.49m.

``` bash 
poetry run visualise_slice_static
```

The output producded will be a `.png` file, such as: 

![Temperature Slice](../_static/temperature_slice_static.png)


### Visualise Slice - Interactive HTML file

Visualizing a 2D temperature slice in an interactive HTML file, allowing for a time series to be visualised. 

``` bash 
poetry run visualise_slice --target_depth 0 --animation_speed 100
```

The command above will create an interactive HTML file, that will have each timestep in the animation last for 100 milliseconds (`--animation_speed`) at the nearest depth to the closest depth (`--target_depth`), in this case 0.49m. For the above command the output producded will be: 

<iframe
  src="../_static/temperature_slice.html"
  width="800"
  height="600"
  frameborder="0"
  allowfullscreen
></iframe>

When run within your own space the file produced will be `output/original_temperature_2d_interactive.html`.

### Visualise Cube - Interactive HTML file

Visualizing a 3D temperature slice in an interactive HTML file, allowing for a time series to be visualised. 

``` bash 
poetry run visualise_cube --num_depths 5 --num_time_steps 3
```

The command above will create an interactive HTML file, that will visualise the first 5 depth, for 3 time steps. For the above command the output producded will be: 

<iframe
  src="../_static/temperature_cube.html"
  width="800"
  height="600"
  frameborder="0"
  allowfullscreen
></iframe>

When run within your own space the file produced will be `output/original_temperature_3d_interactive.html`.

## Summarising Data 

Calculates and prints summary statistics for temperature data in a specified NetCDF file. Prints its mean, max, min, and standard deviation. Also provides information about the dataset’s dimensions and coordinates.

``` bash 
poetry run summary
```


The above command will print out the summary of the data on the original datafile downloaded from Copernicus. The above command will output the following: 

``` 
The dimensions of the data is: (5, 50, 222, 241)
Temperature Summary Statistics:
Mean temperature: 8.56154727935791
Max temperature: 14.050389289855957
Min temperature: -2.591400146484375
Standard deviation: 3.1273183822631836

Dataset Dimensions and Coordinates:
<xarray.Dataset>
Dimensions:    (depth: 50, latitude: 222, longitude: 241, time: 5)
Coordinates:
  * depth      (depth) float32 0.494 1.541 2.646 ... 5.275e+03 5.728e+03
  * latitude   (latitude) float32 46.83 46.92 47.0 47.08 ... 65.08 65.17 65.25
  * longitude  (longitude) float32 -13.83 -13.75 -13.67 ... 6.0 6.083 6.167
  * time       (time) datetime64[ns] 2024-01-01 ... 2024-01-02
Data variables:
    thetao     (time, depth, latitude, longitude) float32 13.47 13.42 ... nan
Attributes: (12/14)
    Conventions:                   CF-1.6
    area:                          GLOBAL
    contact:                       servicedesk.cmems@mercator-ocean.eu
    credit:                        E.U. Copernicus Marine Service Information...
    institution:                   Mercator Ocean
    licence:                       http://marine.copernicus.eu/services-portf...
    ...                            ...
    product_user_manual:           http://marine.copernicus.eu/documents/PUM/...
    quality_information_document:  http://marine.copernicus.eu/documents/QUID...
    references:                    http://marine.copernicus.eu
    source:                        MERCATOR GLO12
    title:                         Instantaneous fields for product GLOBAL_AN...
    copernicusmarine_version:      1.3.4
```



# Leaveaging GPUs

## Pseudocode

The psuedocode that implements the diffusion loop is: 

```
1. For each timestep from 1 to num_timesteps:
   2. Copy the current temperature values to a temporary array (temp_copy)
   3. Initialize arrays for neighbor sums and neighbor counts with zeros
   4. For each valid cell (ignoring boundaries):
      5. Calculate the sum of neighboring cells:
         - Add the value of the front neighbor if valid
         - Add the value of the back neighbor if valid
         - Add the value of the left neighbor if valid
         - Add the value of the right neighbor if valid
         - Add the value of the top neighbor if valid
         - Add the value of the bottom neighbor if valid
      6. Count the number of valid neighbors for each direction
   7. Update the cell's temperature:
      - New temperature = current temperature + diffusion coefficient * (neighbor_sum - 6 * current temperature) / neighbor_count
   8. Ensure invalid points (NaN) remain unchanged
   9. Update the main temperature array with the new values
```

## Running with NumPy 

``` bash 
poetry run diffusion_numpy --num_timesteps 100
```

The above command will run the 3D diffusion model using the NumPy version of the code for 100 timesteps. Once the execution has finished then a report will be provided concerning the time taken for execution. When running on an AMD EPYC 7552 48-Core Processor, the execution outputs:

```  
NumPy model completed in 489.2647 seconds. Average time per timestep: 4.8926 seconds.
```

You can visualise the model outputs producded with 

``` bash 
poetry run visualise_slice --target_depth 0 --animation_speed 100 --data_file predicted_temperatures_numpy.nc 
```

Of note is that the file `predicted_temperatures_numpy.nc` is generated during the execution of the above command for the script `diffusion_numpy`. This will then generate a new interactive HTML file `output/predicted_temperature_2d_interactive.html`.


## Running With CuPy

As the same code has been wrote in CuPy you can experiment with the difference between CPU and GPU code with the following:

``` bash 
poetry run diffusion_cupy --num_timesteps 100
```

The above command will run the 3D diffusion model using the CuPy version of the code for 100 timesteps. Once the execution has finished then a report will be provided concerning the time taken for execution. When running on an NVIDIA A40 GPU, the execution outputs:

``` 
CuPy model completed in 171.9884 seconds. Average time per timestep: 1.7199 seconds.
```

You can visualise the model outputs producded with 

``` bash 
poetry run visualise_slice --target_depth 0 --animation_speed 100 --data_file predicted_temperatures_cupy.nc 
```

Of note is that the file `predicted_temperatures_numpy.nc` is generated during the execution of the above command for the script `diffusion_numpy`. This will then generate a new interactive HTML file `output/predicted_temperature_2d_interactive.html`.

## Performance Comparison: CPU vs GPU

### Overall Speedup
- **CPU runtime**: 489 seconds  
- **GPU runtime**: 171.9884 seconds  
- **Speedup factor**:  
  \[
  \text{Speedup} = \frac{\text{CPU time}}{\text{GPU time}} = \frac{489}{171.9884} \approx 2.84
  \]  
  The GPU completed the task approximately 2.84 times faster than the CPU.

### Per-Timestep Speedup
- **CPU average timestep**: 4.9 seconds  
- **GPU average timestep**: 1.7199 seconds  
- **Speedup factor per timestep**:  
  \[
  \text{Speedup per timestep} = \frac{\text{CPU timestep}}{\text{GPU timestep}} = \frac{4.9}{1.7199} \approx 2.85
  \]  
  On a per-timestep basis, the GPU is about 2.85 times faster.

### Efficiency Observation
- The consistent speedup factor (both overall and per timestep) suggests that the GPU effectively parallelizes computations without significant overhead from data transfer or kernel launches.

### Implications
- **Computational Efficiency**:  
  Using a GPU provides substantial performance gains, especially for tasks with repetitive, parallelizable computations such as numerical modeling or simulations.
- **Observed Speedup** (~2.84x improvement) suggests:  
  - The task is well-suited for GPU acceleration.  
  - Full potential of the GPU might not yet be realized due to:
    - Limited parallelism in the workload.  
    - Overheads from memory transfers between CPU and GPU.  
    - Suboptimal use of GPU-specific optimizations.

The GPU's performance significantly outpaces the CPU for this task, reducing runtime by approximately 65%. Of note is that this approach is simply a direct move from NumPy to CuPy which represents a minimal amount of effort. Further optimization of the GPU code could enhance performance and exploit its full potential, leveraging on known time intensive tasks for GPUs such as data transfer. 
