# Simulation software

The results in this notebook do not directly answer any of our primary research questions.  The results support RQ2: 

> 2. What proportion of these papers that use Free and Open Source Simulation and of these what number are shared?

The results also illustrate that ~12% of the literature do not report the simulation software used.


## 1. Imports 

### 1.1. Standard Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
# set up plot style as ggplot
plt.style.use('ggplot')

## 1.2 Imports from preprocessing module

In [2]:
# function for loading full dataset
from preprocessing import load_clean_dataset

## 2. Constants

In [3]:
FILE_NAME = 'https://raw.githubusercontent.com/TomMonks/' \
    + 'des_sharing_lit_review/main/data/share_sim_data_extract.zip'

RG_LABEL = 'reporting_guidelines_mention'
NONE = 'None'
WIDTH = 0.5

## 3. Functions

### 3.1. Functions to create summary statistics

Two functions are used together in order to generate the high level results by year. 

* `high_level_metrics` - takes a subgroup of the dataset and generates summary statistics and counts
* `analysis_by_year` - loop through the years passing each to `high_levle_metrics` and concatenates datasets at the end.

In [4]:
def software_count(column, threshold=2):
    """
    Return a count of simulation software.
    
    If the count of software is less than 2 the it is labelled as 'Other' 
    
    Params:
    -------
    column: pandas Series
    
    Returns:
    -------
    pd.DataFrame
    """
    counts = column.value_counts().to_frame().reset_index()
    counts.columns = ['software', 'count']
    summarised = counts[counts['count'] <= threshold].sum()
    counts.loc[counts['count'] <= threshold, 'software'] = 'Other'
    counts = counts.groupby('software').sum()
    counts.loc['Other'] = summarised

    return counts



## 4. Read in data

In [5]:
clean = load_clean_dataset(FILE_NAME)

## 5. Results

### 5.1 Overall summary table

In [6]:
software_counts = software_count(clean['sim_software'], threshold=2)
software_counts['n(\%)'] = \
    software_counts['count'] / software_counts['count'].sum() *100
software_counts = software_counts.sort_values('count', ascending=False)
software_counts['n(\%)'] = software_counts['n(\%)'].round(1)
software_counts

Unnamed: 0_level_0,count,n(\%)
software,Unnamed: 1_level_1,Unnamed: 2_level_1
Arena,119,21.1
AnyLogic,76,13.5
Unknown,64,11.3
Simul8,51,9.0
Other,39,6.9
R,34,6.0
FlexSim,22,3.9
Excel,21,3.7
Simio,21,3.7
MATLAB,19,3.4


## 6. Output to LaTeX

In [7]:
print(software_counts.style.to_latex(hrules=True, 
                                   label="DES Software", 
                    caption="Software used in DES healthcare studies"))

\begin{table}
\caption{Software used in DES healthcare studies}
\label{DES Software}
\begin{tabular}{lrr}
\toprule
 & count & n(\%) \\
software &  &  \\
\midrule
Arena & 119 & 21.100000 \\
AnyLogic & 76 & 13.500000 \\
Unknown & 64 & 11.300000 \\
Simul8 & 51 & 9.000000 \\
Other & 39 & 6.900000 \\
R & 34 & 6.000000 \\
FlexSim & 22 & 3.900000 \\
Excel & 21 & 3.700000 \\
Simio & 21 & 3.700000 \\
MATLAB & 19 & 3.400000 \\
SimPy & 18 & 3.200000 \\
R Simmer & 15 & 2.700000 \\
TreeAge & 14 & 2.500000 \\
Python & 10 & 1.800000 \\
ExtendSim & 6 & 1.100000 \\
C++ & 6 & 1.100000 \\
MedModel & 5 & 0.900000 \\
Flexsim & 5 & 0.900000 \\
ProModel & 4 & 0.700000 \\
Salabim & 4 & 0.700000 \\
Plant Simulation & 3 & 0.500000 \\
WITNESS & 3 & 0.500000 \\
anyLogistix & 3 & 0.500000 \\
iGrafx & 3 & 0.500000 \\
\bottomrule
\end{tabular}
\end{table}

