# 30/10:

## Comments on the article:
Patel, Tirthak, et al. "What does power consumption behavior of hpc jobs reveal?: Demystifying, quantifying, and predicting power consumption characteristics." 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020.

### INTRODUCTION
Most HPC Systems are highly utilized, but a fraction of their power is “stranded”, i.e., the power allocated to the cluster is not fully used while being payed for.

Most jobs consume much lower power than the node-level thermal design power (**TDP**).

- **TDP:** the maximum power that one should be designing the system for.

Applying system-level **power-capping** and **hardware over-provisioning** doesn't help only for big supercomputers; it's also helpful for smaller academic HPC systems, saving on electricity costs.

- **Power-capping:** limiting the power consumption of a computing system to a predefined or dynamically adjusted power level.
- **Hardware over-provisioning:** deploying more physical resources (such as processors, memory, storage, etc.) in a computing system than are immediately necessary for the current workload.

The order of applications based on per-node power use varies between systems. Just switching the architecture can change how much power each application consumes. Operators and designers can't assume that the most power-hungry app on one system will be the same on others.

When HPC jobs are running on a large computer system, the amount of power they use can vary from node to node, even though the workload is the same for all nodes. This is because of differences in the way the nodes are built and how the jobs are executed.

In the future, there will be a need for new techniques that can automatically distribute power evenly to all nodes in a large computer system. However, these techniques must also take into account the fact that different nodes can use different amounts of power, even when they are running the same job.

- **Temporal variance:** how much the power consumption of an HPC job changes over time. HPC jobs have limited temporal variance, meaning that their power consumption does not change very much over time.
- **Spatial variance:** how much the power consumption of an HPC job varies from node to node. HPC jobs have a high degree of spatial variance, meaning that their power consumption can vary significantly from node to node.
- **Workload imbalance:** a situation where some nodes in a computer system are doing more work than others. This can cause the power consumption of the nodes to vary significantly.
- **Manufacturing variability:** no two machines are exactly the same. Even if two nodes are made from the same parts, there will be small differences in their construction that can affect their power consumption.
- **Equal power allocation:** a technique for distributing power evenly to all nodes in a computer system. This can be achieved by dynamically adjusting the power consumption of each node based on its workload.
- **Power-aware scheduling:** a technique for scheduling HPC jobs in a way that takes into account the power consumption of the nodes. This can be used to improve the overall energy efficiency of a computer system.

About 20% of users typically account for the majority of HPC system power consumption, a group that largely aligns with users who use the most node-hours.

- **Node-hours:** one hour of conputation time on a single compute node.

HPC operators can enhance energy efficiency by targeting improvements for this specific user subset, using "node-hours" as a proxy for energy consumption.

Significant power consumption differences exist among jobs submitted by the same user, that's why applying a uniform policy for all jobs from the same user may be insufficient due to diverse power consumption behaviors.

Clustering jobs based on the number of nodes and requested wall time reduces power variation. User ID, number of nodes, and wall time are effective predictive features for job power consumption, allowing high-accuracy predictions even before job execution begins.

- **Wall time:** total amount of time that a job is allowed to run.

### Data Collection Methodology

One minute granularity was observed to achieve acceptable overhead in production environment without compromising accuracy.

**One minute granularity:** the system monitoring samples data every minute.

**Observed to achieve acceptable overhead:** researchers found that taking data samples every minute did not significantly slow down the system or interfere with its normal operation.

**Without compromising accuracy:** taking data samples every minute is still frequent enough to provide a good understanding of how the system is performing.

### ANALYZING SYSTEM-LEVEL POWER UTILIZATION TRENDS

**Motivation:** quantify and understand the utilization level and corresponding power consumption level of nodes.
**Questions:**
- What is the level of system utilization of both HPC systems?
- Are the HPC systems utilizing their power budget at the same level as their system utilization?

Is the power consumption of HPC systems always proportional to their workload? In other words, if an HPC system is running at 100% utilization, is it also using 100% of its power budget?

This is not always the case. In some cases, HPC systems may use more power than they need, even when they are not running at full capacity. This is called power inefficiency.

Mid-scale academic HPC systems may waste over 30% of power, known as the "stranded power" problem.

Reducing the system's power cap below the worst-case provisioning level can mitigate stranded power, guided by dynamic observations of system power consumption.

### JOB-LEVEL POWER CONSUMPTION CHARACTERISTICS IN HPC SYSTEMS

**Motivation:** understand the reason for “stranded power” in compute nodes.
**Questions:**
- Do HPC jobs consume less power than the node’s TDP level?
- Do job-level power consumption characteristics of key applications vary between two different systems?

**Per-node power consumption:** the power consumption of a job averaged over its entire runtime and also over all of its nodes.
*"Note that the per-node power consumption metric is useful when distinguishing among jobs with different power consumption profiles (as opposed to using a job’s total power consumption aggregated across time and nodes – that is, the total energy consumed by a job) as it eliminates the effect of a job’s runtime and the number of nodes."*

The **per-node power consumption metric** is a better way to compare the power consumption of different HPC jobs than using the total energy consumed by a job. This is because the total energy consumed by a job is affected by the job's runtime and the number of nodes it uses. The per-node power consumption metric, on the other hand, is not affected by these factors, so it is a more accurate measure of the power consumption of a job.

HPC jobs exhibit a wide range of power consumption characteristics, with some jobs using significantly less per-node power than others.

The ranking of applications based on per-node power consumption changes when the underlying system architecture is altered. Different applications are impacted in distinct ways and degrees.

This diversity in power consumption has implications for making better decisions regarding power allocation and system over-provisioning, such as applying power-capping for individual jobs.

System operators should not assume that the most power-hungry application on one system will be the same on other systems, highlighting the challenge of porting power consumption characteristics across systems, even if they use CPUs from the same vendor.

A small positive correlation exists between per-node power consumption, execution time, and the number of nodes for jobs.

In power-consumption aware pricing, using total execution time and job size as proxies for fair pricing may not be accurate. Longer-running and larger-size jobs tend to have higher per-node power consumption and, therefore, higher energy costs per node and per unit time compared to shorter-running and smaller-size jobs.

Longer (larger) jobs show less per-node power consumption variation compared to shorter (smaller) jobs, adding complexity to fair pricing considerations.

**Motivation:** Confirm that HPC jobs' power consumption varies considerably during their jun, due to intensive phases of compute, memory, network and I/O activity.
**Question:** How does the power consumption of an HPC job vary during its runtime and across the nodes it is running on?

Jobs don't change much in how much power they use over time, but they show big differences in different parts of the system, maybe because of uneven work or manufacturing differences.

In future super powerful systems, where they want to provide lots of resources and share power evenly based on how jobs behave over time, the study suggests this overlooks the fact that different parts of the system use power very differently.

### USER-LEVEL POWER USAGE ANALYSIS
**Motivation:** identify user-level power consumption patterns and understand their implications.
**Question:** Are a small fraction of users responsible for most of the energy consumed by the HPC systems?

As expected, a small group of users use most of the energy on an HPC system, and interestingly, this group is pretty much the same as the users who use the most computing time (node-hours).

This discovery has important implications. Firstly, those in charge of the system can concentrate on a small group of users to make the energy use more efficient (like making the energy use of jobs from this small group better). 

Secondly, when deciding which users to optimize, the amount of computing time a user uses (which we can easily find out) can be used as a stand-in for how much power they're using (which we might not always know).

**Motivation:** investigate if jobs originating from the same user are likely to have similar power consumption behavior.
**Questions:**
- Do different jobs submitted by the same user with the same number of nodes and wall time have similar power consumption?
- Can these three job characteristics: user, number of nodes, and wall time, be used to predict the power consumption of a job?

Users submit jobs that use power in very different ways, so using a single solution for everyone might not work well.

We can predict how much power a user's job will use quite accurately by looking at the number of computers it needs and how long it will run.

This is important because there's growing interest in making jobs more energy-efficient based on user guidance. By using these predictions and user guidance, we can explore new ways to schedule jobs and adjust power before they even start running.

### Resume
**Stranded Power in HPC Systems:**
  - Over 30% of power in HPC systems is often wasted, creating a "stranded power" problem.
  - System operators can cap power consumption, using the leftover power for other purposes or over-provisioning with more nodes for better throughput without increasing the electricity bill.
  - This power-harvesting approach is effective even for mid-scale HPC systems.

**Diverse Power Consumption in HPC Jobs:**
  - HPC jobs vary widely in power consumption characteristics, dependent on micro-architecture and system-architecture.
  - Blanket solutions that work for all applications and architectures are not effective; each application's power behavior on each system should be handled separately.

**Correlation between Job Power and Characteristics:**
  - Longer and larger HPC jobs tend to use more power on average, emphasizing the need for power consumption-aware pricing.
  - Job execution time and size cannot be used as a fair pricing proxy, as longer and larger jobs have higher energy costs per node and time unit.

**Power Allocation Strategies and User Focus:**
  - Efforts to adjust power allocation based on job temporal characteristics do not show significant variance on mid-scale HPC systems.
  - Static power allocation at the beginning of job execution can effectively minimize stranded power.
  - A small number of users consume the majority of energy and node-hours, suggesting a focus on improving energy efficiency for this user subset.

**Predictability of User Job Power Consumption:**
  - HPC users submit jobs with a wide range of power consumption behaviors.
  - Power consumption of user jobs can be predicted accurately using the number of nodes and wall time as features.
  - Predicting power consumption before execution allows for static power allocation, avoiding dynamic high-overhead policies.

# 20/11
- IPMI measures work during whole month
- create gitlab repo
- visualize data

At job-table/job_info/singlenode:

    - job_id
    - num_cpus alocated
    - num_nodes (single)
    - run_time
    - start_time
    - end_time
    - user_id
    - node (id)

Check if time difference between two times equals runtime to check for data consistency.

At IPMI/total_power/singlenode (for all nodes in the system during that month):

   - total_power is collected each 30s
   - node id
   - job_id that executed in a given timestamp

To do:

- Join these two files by job_id
- Visualize the time series of power consumption
- Do the median of power consumption
- Use the article's metrics as a basis
- See articles that cited this article
- Fill in the logbook

# 27/11
1. Created github repo
2. Filled in logbook
3. Added more relevant articles
4. Checking if time difference between two times equals runtime to check for data consistency for each month dataset.
5. Visualizing the time series of power consumption for the longest running job during August 2022.
6. Calculating the median of power consumption for the longest running job in August 2022.
7. Visualizing total power consumption and median power consumption for all jobs during longest job run in August 2022.

To do:
- Fix last plot to show whole month of August: note that the consumption measurement is not aligned for each node
- Remove articles from public repo, add as links on logbooks.
- Explore Zotero
- Change first plot to scatter plot and zoom in to see specific behavior during smaller time frame
- Cluster jobs by user id, number of nodes and wall-time
- Read https://project.inria.fr/aaltd19/files/2019/08/AALTD_19_Boussard.pdf
- Explore the data analysis done in the first article (platform as a whole), then look at single job characteristics
- Create overleaf file for report
- Try and apply CFD-Autoperiod technique in the data to find a period of power consumption on the job

# 04/12
1. Created Overleaf file
2. Removed articles from public repo
3. Zoomed first plot in to see specific behavior during smaller time frame (scatter option didn't have good visibility)

## Comments on the article:
Puech, Tom, and Matthieu Boussard. (2019) A fully automated periodicity detection in time series. Available at: https://openreview.net/forum?id=HJMCdsC5tX

### Introduction
Time series are defined by three components:
- Trend
- Periodic
- Random
The study assumes every time series stationary regarding mean and variance, so as to focus on the periodic component and find trends on it.

**Periodicity**: pattern that repeats itself in TS.

 - Cyclical TS: time interval of repetition can't be defined and is not constant. These are more difficult to identify since they're inconsistent and need more data to show the periodicity. Related to most TS in the world (tide waves, menstrual cycles etc).

 - Seasonal TS: time interval of repetition is well defined and constant.

 **Fourier transform:** decomposes an original sign (sequence of values changing over time) $\{s(t_j)\}_{j \in [1,N]}$ (where $t_j$ are different time points) in a sum of complex sinusoids (sine waves).
 - The sum creates a Fourier seies, a way to express the original sign as a combination of simpler wave patters.
 - Given that:
   - $N$ is the number of different frequencies (or types of waves) that are considered.
   - $P$ is the periodicity of the signal, which is how often it repeats.
   - $c_k$ represents a coefficient for each frequency component, they are complex number that determine the amplitude and phase of each sinusoidal component.

   The Fourier series is represented by:
   $$ s_N(t) = \sum^{N-1}_{k=0} c_k \exp{i \frac{2 \pi k t}{P}} $$

   Where:
   - $s_N(t)$ is the reconstructed signal using N components
   - $\exp{i \frac{2 \pi k t}{P}}$ is a complex sinusoid with frequency $\frac{k}{P}$, P being the periodicity of the signal.
   - $\frac{2 \pi k t}{P}$ is the angular frequency of the sinusoid (how quickly the sinusoid completes one full cycle in radians per unit of time)

### Discrete Fourier Transform (DFT) of a Discrete Signal: 
Used to analyze the frequency content of a discrete signal. For a discrete signal $({s(t_j)})$, where $(t_j)$ represents the discrete time values, the DFT is defined as:

$$DFT(f_k) = \sum_{j=0}^{N-1} s(t_j) \cdot e^{-i \frac{2\pi k j}{N}}$$

Here:
- $DFT(f_k)$ represents the complex coefficients obtained from the DFT for the frequency $f_k$.
- $N$ is the number of samples in the discrete signal.
- $s(t_j)$ is the signal value at time $t_j$.
- $f_k$ is the frequency associated with the $k$-th component, given by $f_k = \frac{2 \pi k}{N}$.

### Periodogram in the Time Domain:
A measure of the spectral content of a signal, and it can be derived from the DFT coefficients. The formula for the Periodogram is given by:

$$P(f_k) = ||DFT(f_k)||^2 = ||c_k||^2$$

Here:
- $P(f_k)$ represents the Periodogram for the frequency $f_k$.
- $||DFT(f_k)||^2$ denotes the squared magnitude of the DFT coefficient for the corresponding frequency.
- $||c_k||^2$ represents the squared magnitude of the complex coefficient $c_k$ obtained from the DFT.

### Frequency Components:
- The variable k in $f_k$ ranges from 0 to $\frac{N-1}{2}$. This range covers the non-redundant positive frequencies because the DFT of a real signal has symmetry, and the information beyond $\frac{N-1}{2}$ is redundant.
- The frequency $f_k = \frac{2 \pi k}{N}$ corresponds to the frequency captured by each component in the DFT.

### Interpretation:
The Periodogram $P(f_k)$ represents the power or intensity of each frequency component in the signal. Squaring the magnitude of the DFT coefficients provides a measure of the energy or power associated with each frequency. The frequency $f_k$ corresponds to the rate of oscillation of the sinusoidal component captured by the k-th term in the DFT.

In the frequency domain, our ability to distinguish between different frequencies is pretty consistent, thanks to the constant step between bins. However, in the time domain, especially for longer periods, the variable size of bins might limit our ability to precisely estimate how often certain events or patterns repeat in the signal.


### Autocorrelation Function (ACF) Basics:

- The ACF measures how similar one part of the signal is to another part, separated $\Delta t$ units of time form each other.

### Formula Explanation:

1. **Autocorrelation Function Formula:**
     $$ACF(\Delta t) = \frac{1}{N} \sum^{N-t}_{j=0} s(t_j) \cdot s(t_j + \Delta t)$$

   - N is the total number of elements in the signal.
   - $t_j$ represents the time index in the signal.
   - $s(t_j)$ is the value of the signal at time $t_j$.
   - $s(t_j + \Delta t)$ is the value of the signal at a later time $t_j + \Delta t$ where $\Delta t$ is the time lag.

- The formula calculates the product of the signal value at time $t_j$ with the value at a later time $t_j + \Delta t$ for all relevant j values. It then averages these products over the entire signal length N.

- If $ACF(\Delta t)$ is high, it suggests a strong correlation between the signal values at time $t_j$ and $t_j + \Delta t$, meaning there's a repeating pattern with the specified time lag.

- As $\Delta t$ increases, the ACF is computed for larger time separations, helping to understand how the correlation between elements changes with time.

- ACF might be better at capturing patterns in signals that have longer, more spread-out repetitions, but it faces challenges when it comes to selecting the most predominant peaks in the analysis.

- It's difficult when a signal has multiple periodicities. For a given periodicity $p_1$​, the autocorrelation generates peaks not only for $p_1$​ but also for each multiple of $p_1$​. This means that if there are several periodicities in the signal, the autocorrelation function produces peaks for each of their multiples.

- Selecting the relevant peaks becomes a challenge. When multiple periodicities contribute to a signal, the autocorrelation may create peaks for each of them and their multiples, making it hard to identify which peaks correspond to the most meaningful or predominant periodicities in the signal.

### A new methodology: CFD-Autoperiod

Idea: 
1. Apply FT to the signal, get a Periodogram
2. Apply Density Clustering to it, then a Centroids Projection
3. Go through a procces of lowpass filter, auto correlation and linear detrend
4. Validate
5. Get the periodicities

## Spectral leakage
- When using the Fourier Transform to select periodicity hints, we use the 99% confidence technique to find the threshold between hints and noise.
- When dealing with noisy signals, it's common to encounter peaks in the frequency spectrum that may be due to random noise rather than meaningful periodic components.
- Peaks in the frequency spectrum that surpass the established confidence threshold are considered significant and are treated as potential periodicity hints (99% of noise-only peaks would fall below the threshold).

- First we need to find the maximum spectral power generated by the noise.

**Spectral power:** the amount of power associated with a specific frequency or frequency range in a signal.

Let ${s'(t_j)}_{j \in [1,N]}$ be a permuted sequence of a periodic sequence ${s(t_j)}_{j \in [1,N]}$, it should not exhibit periodicity due to the random permutation it went through.

Its maximum spectral power should also not surpass that of a true peridocity in $s$, so we can use this value as a threshold to eliminate noise.

**Problem:** Rather than finding an unique periodicity hint, spectral leakage produces multiple hints near the true one because of the finite resolution of the Fourier Transform. 

This means that the transform cannot determine the frequency of a signal component if it does not align with the bins used in the transformation. If a true periodic component's frequency falls in between these discrete bins, the Fourier Transform may not accurately represent it.

Instead, the energy from that component can "leak" into adjacent frequency bins, leading to the detection of multiple hints or peaks around the true frequency.

Spectral leakage generates imprecise periodicy hint points above the threshold, and we need to remove them.

### Density clustering
Since spectral leakage happens more around true periodicity, we can cluster over its hints and use the clustering centroids as periodicity hints to reduce the number of them.

In density clustering, a fundamental value is the range of seeked neighbors $\epsilon$. In our case, since the leak might've come from an adjacent DFT bin, for a given hint of periodicity $\frac{N}{k}$, $\epsilon$ is the next bin plus a constant:
$$ \epsilon_{\frac{N}{k}} = \frac{N}{(k-1)} + 1 $$

In [None]:
import numpy as np

class Hint:
    def __init__(self, value, nextBinValue):
        self.value = value
        self.nextBinValue = nextBinValue    

# Density Clustering code

# Function to calculate the next bin value based on the given periodicity hint
def nextBinValue(k, N):
    return (N / (k + 1))

# Generate random hints, introduce noise for spectral leakage, then combine true hints and leaked hints
true_hints = np.arange(5, 100, 0.1)
leaked_hints = true_hints + np.random.normal(0, 0.2 * np.mean(true_hints), size=len(true_hints))
hints = np.concatenate([true_hints, leaked_hints])

# Shuffle the combined hints to simulate random ordering, then sort
np.random.shuffle(hints)
hints.sort()

clusters = []
cluster = []

# Calculate first epsilon and append first hint to first cluster
epsilon = nextBinValue(0, N=len(hints)) + 1
cluster.append(hints[0])

for idx, hint in enumerate(hints[1:]):
    # Check if the hint is within the epsilon range
    if hint <= epsilon:
        cluster.append(hint)
        epsilon = nextBinValue(idx, N=len(hints)) + 1
    else:
        # If the hint is outside the epsilon range, start a new cluster
        clusters.append(cluster)
        cluster = []

# Calculate centroids for each cluster
centroids = [np.mean(cluster) for cluster in clusters]

print(centroids)

### 12/12
- First: Try to visualize data and its properties, assert the quality of data => follow the first article
- Second: Analyse quality of the time series
- Apply FT to one node only to debug the code
- see the work of a single node during the month/week/day
- Align timestamps if using the sum of nodes
- atenção a medidas (talvez) imprecisas, linhas retas nas séries temporais podem indicar falta de dados
- check if the timestamps of each node are all separated 20 seconds from each other

### 18/12
- Analyse quality of the time series: Apply FT to one node only to debug the code
- Align timestamps if using the sum of nodes

#### Tests do to:
- Correlate energy consumption for each job with geometry: execution time x nb of processors
    - Table 2: aggregation from time series

- Figure 11: for each user, calculate energy consumption (J) of all their jobs (integral of power/time curve) and sum them, and do it for all users, then calculate histogram with energy consumption for each user (80/20 rule)

- Calculate the percentage of failed/timeouted jobs of these users that consume the most energy (timeout means run_time exceeded time_limit)

- total energy consumed vs job status (completed/timeout/failed)

1. Relation between job geometry or job size with chance of failing? (user_id, num_nodes, time_limit)
- given the infos of a job and a time_limit, can we predict if it's going to fail/timeout? Logistic Regression
- static approach (see only the job's info) vs dynamic (look to past jobs with similar characteristics)

2. identify users with higher job failure rate

3. Is the proportion of TO and failed jobs meaningful?
    3.1. If yes, how can we predict the prob of failure for a job bf execution? 
    3.2. Then what to do? Reduce priority?
    
4. Monitore time series of power and memory consumption to try and find a pattern in the moment of crash

- Ver com povo da bolsa se a defesa pode ser dia 06/02

- Change CDF plot and use describe() to find the percentiles
- Extend percentage of high consuming jobs per state to all jobs
- use one hot encoding on User ID for regression
- try and use star time (hour of the day) in the regression
- tres_per_node (needs cleaning)

Extra:
- Monitore time series of power and memory consumption to try and find a pattern in the moment of crash.

Jobs that shared a node with other nodes and jobs that executed for less than a minute were excluded.

176 softwares running on the cluster

- RFE on features (v)
- line on 0.5 on cdf plot (v)
- add singlenodes to the experiments as well (v)
- use only COMPLETED, FAILED e TIMEOUT and under sampling (v)
- check coefficients and weights for each user_id after one hot encoding, find if there is one bigger than the other (or correlation coeffs) (v)
- heat map of job state classes (columns) and features (rows) (use ice fire) (v)
- try to answer the question "is one user responsible for most of the timeouts" or "does one user has a higher prob of timeouting?" (v)

- do version 1 of report
- share slides of last year as model
- change date of presentation (v) 

Parallel Computing: 4
Industry engineering: 2
Electronics and Electromagnetics: 5
Quantum Chemistry and Physics: 15
Mesh Processing: 1
Molecular Dynamics: 16
Data Science: 3
Genomic Annotation and Sequence Alignment: 34
Linear Algebra: 7
Version Control: 2
Variant Calling: 1
C++ libraries: 1
Read Mapping: 1
Deep Learning Framework: 1
Climate Data Analysis and Meteorological Data Processing: 2
FITS Data Handling: 1
Computational Geometry: 1 
Build Management: 2
Performance Visualization: 6
Python/C Integration: 1 
Optimization and Uncertainty Quantification: 1
Partial Differential Equations Solvers: 1
Eigenvalue Solver: 2
Multimedia: 1
Fast Fourier Transform: 1
Compilation: 2
Arbitrary Precision Arithmetic: 2
Plotting Utility: 2
Earth Science Data Visualization: 1
Numerical Computation: 3
Triangulated Surface Processing: 1
Scientific Data Storage: 1
Astrophysics: 1
Software Development: 3
High-Performance Computing: 4
Runtime Environment: 1
JSON Parsing: 1
File Transfer: 1
Domain names: 1 
Mathematical Computing: 3
Graph Partitioning and Sparse Matrix Ordering: 1
Message Passing Interface (MPI): 4
Multimedia: 1
File Transfer Protocol: 1
Data Visualization: 3
Data Manipulation: 1
Data Formats: 3
Neuroscience: 1
Scientific Computing: 6
Computer Vision: 1
Computational Fluid Dynamics (CFD): 5
Adaptive Octree Management: 1
parsing arguments: 1
Data Interchange Format: 1
Programming language: 1
Machine Learning: 2
Cross-Platform Application Framework: 1
Virtualization: 1
Data Compression: 2
YAML Parsing and Emission: 1

- align timestamps (da costa 2017)
- plot power consupmtion over time for whole system (look only at IPMI table)

- add that time series analysis can be a future work
- add to report which features were selected by the RFE in each model
- explain the train test split process in report

- check tres per node column treatment is correct (Tres per node: #GPUs per node), use Danilo's script, allow None as 0 GPUs, create new int column in place of tres per node (maybe gpus per node) (v)

- look into job status class dist fro user 2 (single node)

- add cross validation 