# Sharing Method

This notebook provides an summary of **how** DES models were shared.  This uses Jansson et al (2020) methodology. The summary breaks into the following categories: open science archives, online code repositories, personal or organisation websites, or an online platform.  This is further summarise by models developed via code based tools or Visual Interative Modelling (VIM) software.  The latter is typically a single file.

## Data used in analysis

The dataset is a subset of the main review - limited to models shared.  The type of model shared is coded as **Visual Interactive Modelling (VIM)** based (e.g Anylogic, Simul8, Arena) versus **CODE** (e.g. Matlab, Python, SimPy, Java, R Simmer).

> The data can be found here: https://raw.githubusercontent.com/TomMonks/des_sharing_lit_review/main/data/bp_audit.zip


## 1. Imports

### 1.1. Standard

In [1]:
import pandas as pd
import numpy as np

### 1.2 Preprocessing

In [2]:
from preprocessing import load_clean_bpa, drop_columns

## 2. Constants

In [3]:
FILE_NAME = 'https://raw.githubusercontent.com/TomMonks/' \
    + 'des_sharing_lit_review/main/data/bp_audit.zip'

## 3. Analysis functions

A number of simple functions to conduct the analysis and format output.

In [4]:
def get_counts(df, column):
    '''
    For a specified column return a Dataframe containing two columns
    methods and counts.  The methods are unique and the n represents
    the number of instances in the dataset.
    
    Params:
    ------
    df: pd.DataFrame
        The pandas dataframe containing the cohort of interest
        
    columns: str
        The column containing the values to count.
        
    Returns:
    -------
    pd.DataFrame 
    
    '''
    method = df[~df[column].isna()][column]
    unique_elements, counts_elements = np.unique(method, return_counts=True)
    unique_elements, counts_elements = pd.DataFrame(unique_elements), \
                                        pd.DataFrame(counts_elements)
    results = pd.concat([unique_elements, counts_elements], axis=1)
    results.columns = ['method', 'n']
    return results.set_index('method').sort_values('n', ascending=False)

## 4. Load and inspect dataset

The clean data set has 27 fields included.  These are listed below.  

In [5]:
clean = load_clean_bpa(FILE_NAME)

## 5. Results


### 5.1 Overall numeric summary

In [6]:
jansson_method = ['model_format', 'model_archive', 'model_repo', 'model_journal_supp',
                  'model_personal_org', 'model_platform']

clean[jansson_method].groupby(by='model_format').count().T

model_format,CODE,VIM
model_archive,1,3
model_repo,18,1
model_journal_supp,5,4
model_personal_org,4,1
model_platform,5,6


### 5.2 Open science archives 

In [7]:
ARCHIVE = 'model_archive'
archive_results = get_counts(clean[jansson_method], ARCHIVE)
archive_results

Unnamed: 0_level_0,n
method,Unnamed: 1_level_1
Zenodo,2
Mendeley,1
Research Square;,1


### 5.2 Model repositories

In [8]:
repo_results = get_counts(clean[jansson_method], 'model_repo')                            
repo_results

Unnamed: 0_level_0,n
method,Unnamed: 1_level_1
GitHub,16
Github,2
GitLab,1


### 5.3 Format of models stored in journal supplmentary material

In [9]:
supp_results = get_counts(clean[jansson_method], 'model_journal_supp')                            
supp_results

Unnamed: 0_level_0,n
method,Unnamed: 1_level_1
File,4
Word doc,2
PDF,1
R model in word file,1
r script,1


### 5.4 Personal and organisational websites

In [10]:
org_results = get_counts(clean[jansson_method], 'model_personal_org')                            
org_results

Unnamed: 0_level_0,n
method,Unnamed: 1_level_1
Personex,2
Google Drive,1
https://resp.core.ubc.ca/research/Specific_Projects/EPIC,1
personex,1


### 5.5 Platform

In [11]:
platform_results = get_counts(clean[jansson_method], 'model_platform')                            
platform_results

Unnamed: 0_level_0,n
method,Unnamed: 1_level_1
AnyLogic Cloud,3
Anylogic Cloud,3
CRAN,2
BinderHub,1
Google Colab,1
R Shiney,1


### 5.7 Overall summary table