# Model execution

This notebook provides an analysis of the sharing of models against our best practices for enabling other to use and execute the simulation model.  In summary this is defined as:

1. The authors provide a readme or other obvious instruction file for users to consult; 
2. The authors provide step by step instructions to run the DES model;
3. Models are shared with either informal or formal software dependency management; 
4. Models are shared with details of model and/or code testing;
5. The model or model code is downloadable to enable local execution;
6. The model is shared in a manner that enables execution online without the need to install locally.

## Notebook aims

The notebook analyses the following questions related to best practice:

1.  What proportion of the shared model artefacts have a readme or equivalent file?
2.  What proportion of artefacts have step by step instructions to use them?
3.  What proportion of models have formal and informal dependency management included?
4.  What proportion of models are shared with evidence that they have been tested?


## Data used in analysis

The dataset is a subset of the main review - limited to models shared.  The type of model shared is coded as **Visual Interactive Modelling (VIM)** based (e.g Anylogic, Simul8, Arena) versus **CODE** (e.g. Matlab, Python, SimPy, Java, R Simmer).

> The data can be found here: https://raw.githubusercontent.com/TomMonks/des_sharing_lit_review/main/data/bp_audit.zip

The following fields are analysed in this notebook.

* `model_format` - VIM or CODE
* `readme` - is there an obvious file(s) where a user would look first? (0/1)             
* `steps_run` - are there steps to run a model? (0/1)
* `formal_dep_mgt` - has the model been shared with formal software dependency management? (0/1)
* `informal_dep_mgt` - have any informal methods of dependency management been shared?  E.g. a list of software requirements. (0/1)
* `evidence_testing` - do the model and artefacts in the repository contain any evidence that they have been tested? (0/1)
* `downloadable` - can the model and artefacts be downloaded and executed locally? (0/1)
* `interactive_online` - can the model and its artefacts be executed online without local installation? (0/1)

## 1. Imports

### 1.1. Standard

In [1]:
import pandas as pd
import numpy as np

### 1.2 Preprocessing

In [2]:
from preprocessing import load_clean_bpa

## 2. Constants

In [5]:
FILE_NAME = 'https://raw.githubusercontent.com/TomMonks/' \
    + 'des_sharing_lit_review/main/data/bp_audit.zip'

## 3. Analysis functions

A number of simple functions to conduct the analysis and format output.

In [4]:
def balance_of_model_format(df):
    unique_elements, counts_elements = np.unique(df['model_format'], 
                                                 return_counts=True)
    return unique_elements, counts_elements

In [6]:
def field_by_sharing_tools(df, field=LICENSE_LABEL):
    '''
    Return a DataFrame containing licenses (rows) by type of sharing
    i.e. archive, cloud repo, journal supp , personal/org website, platform.
    
    Parameters:
    -----------
    df: pd.DataFrame
        Contains data to analysis.  Eg.full dataset or subset
        
    Returns:
    -------
    DataFrame (9, 6)
    '''
    selected_columns = ['model_archive', 'model_repo', 'model_journal_supp',
                        'model_personal_org', 'model_platform']
    license_by_sharing = df.groupby(by=field)[selected_columns].count()
    return license_by_sharing.sort_values(by='model_repo', 
                                          ascending=False)

In [7]:
def format_license_table(df):
    '''
    Format the license table.
    '''
    column_headers = ['Archive', 'Repository', 
                      'Journal', 'Personal/org', 'Platform']
    df.columns = column_headers
    return df

## 3. Load and inspect dataset

The clean data set has 27 fields included.  These are listed below.  

In [8]:
clean = load_clean_bpa(FILE_NAME)
clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39 entries, 0 to 38
Data columns (total 27 columns):
 #   Column                        Non-Null Count  Dtype   
---  ------                        --------------  -----   
 0   model_format                  39 non-null     category
 1   key                           39 non-null     object  
 2   item_type                     39 non-null     category
 3   pub_yr                        39 non-null     int64   
 4   author                        39 non-null     object  
 5   doi                           38 non-null     object  
 6   reporting_guidelines_mention  39 non-null     category
 7   covid                         39 non-null     category
 8   sim_software                  39 non-null     object  
 9   foss_sim                      39 non-null     category
 10  model_archive                 4 non-null      object  
 11  model_repo                    18 non-null     object  
 12  model_journal_supp            9 non-null      object

## 4. Results

## 4.1 What proportion of the shared model artefacts have a readme or equivalent file?

In [12]:
unique_elements, counts_elements = np.unique(clean['readme'], 
                                                   return_counts=True)

has_readme = counts_elements[1]
has_readme_percent = (has_readme / len(clean)) * 100
rm_result = f'A total of {has_readme} ({has_readme_percent:.1f}\%) models ' \
    + 'were shared with a readme or equivalent file.'
print(rm_result)

A total of 24 (61.5\%) models were shared with a readme or equivalent file.


In [18]:
cols = ['readme', 'steps_run', 'formal_dep_mgt', 'informal_dep_mgt', 
        'evidence_testing', 'downloadable', 'interactive_online']

In [37]:
results = clean.groupby(by='model_format')[cols].describe()
results = results.T.reset_index()
results = results.drop('model_format', axis=1)
results

KeyError: "['model_format'] not found in axis"