## Introduction

Why repeat analysis? When new members join a team or when members depart organizations, valuable information also leaves the organization due to jupyter Notebooks, word documents, or PDFs containing critical analysis existing on the departing members’ desktop. This issue of knowledge management causes many person-hours to be wasted. However, modern solutions allow analysis to persist beyond members' time at an organization. This project places this issue in the context of physics-aware machine learning of highly energetic material. It explicitly addresses data storage related to direct numerical simulations of a shock traveling through porous explosive material with a single void in its construction. When the shock occurs, chemical reactions cause the void to collapse and a denotation to occur, modeled using Neural Networks instead of direct numerical simulations. 

The specific goal of this project is to ingest the simulation dataset to present an analysis of how the state of the system changes over time before any machine learning. After reading the dataset, a dashboard will display the distribution of the fields of interest, elementary statistics (mean, median standard deviation) of the variables of interest, and more advanced analysis like a spatial gradient and an animation of the simulation. The current analysis of this dataset resides in a jupyter notebook with handpicked visualizations of a subset of the simulation data. The main contribution of this project is creating the first shareable influx database for scientific machine learning concerning physics-informed machine learning.

## Background

### PARC
This research uses Physics Aware Recurrent Convolutional Neural Networks version 2 (PARCv2) [1] to predict how the energetic material changes over time once a sock is applied to the system. This process can be described formally as an advection-diffusion-reaction system and is modeled using a partial differential equation (PDE) given below. 
$$ \frac{\partial \textbf{x}}{\partial t} = k  \nabla \textbf{x} - \textbf{u} \cdot \Delta + \textbf{R}_{\textbf{x}}(\textbf{x},\textbf{u},\textbf{c})$$


With initial conditions

$$ \textbf{u}(t=0) = \textbf{u}_{0} $$
$$ \textbf{x}(t=0) = \textbf{x}_{0} $$


In the above PDE, $\textbf{x}$ is the variable of interest which is temperature, pressure, or microstructure. If $ \textbf{R}_{\textbf{x}} $  is equal to zero, then it is known as Burgers’ equation. Essentially, PARCv2 is trying to learn the next state of the system, which can be solved numerically.

### Influx
Created by InfluxData Inc, InfluxDB is a NoSql database focused on the storage and visualization of time series data. 

It stores time series data in a parquet format.  It has an official docker image at https://hub.docker.com/_/influxdb. The utility it provides is the ability to query time series data quickly. [2]


## Data

The primary data source is data 200 direct numerical simulations of highly energic material receiving a shock over time. Each file contains velocity, temperature, and pressure fields at a $(x,y)$ position in space over time. Each simulation is a npy file representing a 64 by 128-pixel image with each $(x,y)$ pixel representing the location and containing temperature, pressure, and microstructure values at time $t$ of energetic material. 

Each simulation lasts lasts for 20 to 40 nanoseconds. The temperature is measured in degrees kelvin and is valued in the range $[300, 5000]$. Velocity is measured in micrometers per nanosecond. Pressure is measured on gigapascals, and microstructure is expressed as in the range $[0,1]$. Each file is hosted in the Visual Intelligence Laboratory project folder on the University of Virginia High-Performance Computing HPC System.

## Potential Analysis

In data science, one should understand the problem before making an inference. Therefore, this project will focus on producing a dashboard to present an exploratory analysis of the simulation data before feeding the data to the neural network within PARCv2. The project will focus on moving the spy files to a database and producing a dashboard for exploratory data analysis of the simulation data. The dashboard will have a dropdown list to select a particular simulation and then show the elementary statistics and the distribution of the variables of interest. Additionally, the dashboard could allow the user to see the simulation play over time. This is shown in Figure Z for the temperature field. An example of elementary statistics is shown in Table X. Figure Y shows an example histogram of the temperature field from a simulation. Another goal is to use the PARC model to predict the future state of a variable of interest.

 Animating the simulation data in a GIF is a


![title](images/00_002.gif)

Figure X shows an example histogram of the temperature data from a simulation, and below it are some statistics about the data.

![title](images/pressure_histogram.png)

## Challenges

The main challenge of this project will be scope. There is utility in producing exploratory data analysis beyond this class project. So, I envision continuing this work well beyond the fall semester. The main challenge of this project will be working with the multi-dimensional arrays stored in the npy files. The files will need to be read unpacked and read in the influx db. Influx is a NoSQL database, and I have no experience with this type of backend. The syntax is written in Flux, Influx DB’s scripting and query. The true challenge is reading from the data source outside the HPC environment, defining a database schema for the influx datable, writing the spy file to the influx database, and querying the influx database to populate visualization and create these visualizations. The npy potentially will have to be converted to CSV’s to facilitate populating the database, which would also be challenging.
Another course of action is to read the data in Postgres. I will ask the University of Iowa team responsible for creating the data for the specific license for the simulation data. A fallback is to use data at https://github.com/pdebench/PDEBench 
