# Simulation software

### Primary research questions:

These results presented in this notebook the following questions




## 1. Imports 

### 1.1. Standard Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
# set up plot style as ggplot
plt.style.use('ggplot')

## 1.2 Imports from preprocessing module

In [2]:
# function for loading full dataset
from preprocessing import load_clean_dataset

## 2. Constants

In [3]:
FILE_NAME = 'https://raw.githubusercontent.com/TomMonks/' \
    + 'des_sharing_lit_review/main/data/share_sim_data_extract.zip'

RG_LABEL = 'reporting_guidelines_mention'
NONE = 'None'
WIDTH = 0.5

## 3. Functions

### 3.1. Functions to create summary statistics

Two functions are used together in order to generate the high level results by year. 

* `high_level_metrics` - takes a subgroup of the dataset and generates summary statistics and counts
* `analysis_by_year` - loop through the years passing each to `high_levle_metrics` and concatenates datasets at the end.

In [4]:
threshold = 2

def software_count(column):
    """
    Return a count of simulation software.
    
    If the count of software is less than 2 the it is labelled as 'Other' 
    
    Params:
    -------
    column: pandas Series
    
    Returns:
    -------
    pd.DataFrame
    """
    counts = column.value_counts().to_frame().reset_index()
    counts.columns = ['software', 'count']
    summarised = counts[counts['count'] <= threshold].sum()
    counts.loc[counts['count'] <= threshold, 'software'] = 'Other'
    counts = counts.groupby('software').sum()
    counts.loc['Other'] = summarised

    return counts



## 4. Read in data

In [5]:
clean = load_clean_dataset(FILE_NAME)

## 5. Results

### 5.1 Overall summary table

In [10]:
software_counts = software_count(clean['sim_software'])
software_counts['n(\%)'] = software_counts['count'] / software_counts['count'].sum() *100
software_counts.sort_values('count', ascending=False)

Unnamed: 0_level_0,count,n(\%)
software,Unnamed: 1_level_1,Unnamed: 2_level_1
Arena,95,19.269777
AnyLogic,58,11.764706
Unknown,58,11.764706
Simul8,42,8.51927
Other,39,7.910751
R,33,6.693712
FlexSim,22,4.462475
Excel,20,4.056795
Simio,16,3.245436
SimPy,15,3.042596
