# BioPharma user demo

This notebook demonstrates how to use the BioPharma Python software to set up and run models of biopharmaceutical facilities.

To run everything, select 'Run All' from the Cell menu. To run a single cell, click within it then either press Control+Return together, or select 'Run Cells' from the Cell menu.

If you have made changes to the model equations in the biopharma package, select 'Restart & Run All' from the Kernel menu to ensure your changes are loaded.

If you wish to make a copy of this notebook for your own work, you can select 'Make a Copy...' from the File menu.

First we load the biopharma software.

In [None]:
import biopharma as bp

Then we set up the facility to model. This requires defining also the product(s) to be produced, and the steps required to produce each product.

The parameters for all these aspects of the model are loaded from files in the [data](./data) folder. Each model component (Facility, Product, each process step) has a corresponding .yaml file giving the parameters for that component. Some of these also reference tabular data stored in .csv files.

In [None]:
facility = bp.Facility(data_path='data')

# Define the steps needed to create our single product
from biopharma.process_steps import (
    
)
steps = [
]
product = bp.Product(facility, steps)

# Load all model parameters from file
facility.load_parameters()

We are now in a position to run the model and examine the outputs.

To modify the details of the steps (e.g. change equations) edit the files in [biopharma/process_steps](./biopharma/process_steps) and select 'Restart & Run All' from the 'Kernel' menu. If you only want to change step parameters or inputs, edit the files in [data](./data) and just re-run all cells.

In [None]:
facility.run()

print('Overall facility outputs:')
for output_name in sorted(facility.outputs.keys()):
    value = facility.outputs[output_name]
    print('  Output {} = {}'.format(output_name, value))
print()
print('Product outputs:')
for output_name in sorted(product.outputs.keys()):
    value = product.outputs[output_name]
    print('  Output {} = {}'.format(output_name, value))
print()
print('Process sequence outputs:')
for output_name in sorted(product.sequence.outputs.keys()):
    value = product.sequence.outputs[output_name]
    print('  Output {} = {}'.format(output_name, value))

It is also possible to write outputs to disk, either in a YAML file (which will contain all quantities output by every model component) or, with some extra coding to define the table, as a CSV file. This uses functionality from the [Pandas](http://pandas.pydata.org/) Python data analysis library (which is also used for reading the CSV parameter files).

The first cell below will write outputs to the file [data/saved_outputs.yaml](data/saved_outputs.yaml).
The second cell writes to [data/saved_step_info.csv](data/saved_step_info.csv) as well as displaying on screen.

In [None]:
# Save all outputs to disk in the 'data' folder
facility.save_outputs('saved_outputs.yaml')

In [None]:
# Define a table of information about the unit operations
import pandas as pd
import os
step_info = pd.DataFrame(
    {'Name': [step.name for step in steps],
     'Yield (%)': [step.parameters['effectiveYield'].magnitude for step in steps],
     'Mass In (g)': [step.inputs['mass'].magnitude for step in steps],
     'Mass Out (g)': [step.outputs['mass'].magnitude for step in steps],
     'Volume In (L)': [step.inputs['volume'].magnitude for step in steps],
     'Volume Out (L)': [step.outputs['volume'].magnitude for step in steps],
    },
    columns=('Name', 'Yield (%)', 'Mass In (g)', 'Mass Out (g)', 'Volume In (L)', 'Volume Out (L)'),
    index=pd.Index(range(1, len(steps)+1), name='Step'))
step_info.T.to_csv(os.path.join(facility.data_path, 'saved_step_info.csv'), index_label='Step')
step_info

## Analysing outputs

Having run the model, we then have the full power of Python available to analyse the results. In this section I show how to calculate some simple summaries of the process steps. For more about how to work with Numpy arrays see the [Numpy website](http://www.numpy.org/).

The `ProcessSequence` class has two helper methods for extracting per-step data:
* `step_outputs` gives the value of a single output for all steps. This is used in plotting below.
* `step_increments` gives the change in a particular input/output for all steps. This is used here.

In [None]:
# For getting data on the steps it's convenient to be able to refer to the sequence directly.
sequence = product.sequence

In [None]:
water_per_step = sequence.step_increments('water')
print('Average water use: {}'.format(water_per_step.mean().to('L')))

max_index = water_per_step.argmax()
max_value = water_per_step[max_index].to('L')
max_step = sequence.steps[max_index].name
print('Maximum water use: {} at step {}'.format(max_value, max_step))

## Graphical output display

Below I demonstrate how to create different kinds of charts summarising the outputs:
1. Bar chart showing breakdown of costs (COG/g) per unit operation per cost category.
2. Bar chart giving Time per unit operation
3. Line chart giving Mass per unit operation

In [None]:
# Set up plotting library
import operator
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook

In [None]:
# Cost breakdown data
y_units = 1000 * bp.units.GBP
labour_costs = sequence.step_increments('labourCost') / y_units
materials_costs = sequence.step_increments('rawMaterialsCost') / y_units
equipment_costs = sequence.step_increments('equipmentCost') / y_units

plt.figure()
ind = range(len(sequence.steps))  # The x locations for the bars
p_labour = plt.bar(ind, labour_costs)
p_materials = plt.bar(ind, materials_costs, bottom=labour_costs)
p_equipment = plt.bar(ind, equipment_costs, bottom=labour_costs + materials_costs)

plt.title('Cost breakdown for unit operations')
plt.ylabel('Costs per batch ({})'.format(y_units))
plt.xticks(ind, [step.name for step in sequence.steps], rotation=40, ha='right', fontsize=8)
plt.legend((p_labour[0], p_materials[0], p_equipment[0]),
           ('Labour', 'Materials', 'Equipment'))
plt.subplots_adjust(bottom=0.25)
plt.show()

In [None]:
# Time breakdown data
# This one is more complicated because 3 different fields are used to track time taken!
cat1_time = sequence.step_increments('cat 1 time') / bp.units.day
cat2_time = (sequence.step_increments('cat 2a time') + sequence.step_increments('cat 2b time')) / bp.units.day

plt.figure()
ind = range(len(sequence.steps))  # The x locations for the bars
p_cat1_time = plt.bar(ind, cat1_time)
p_cat2_time = plt.bar(ind, cat2_time, bottom=cat1_time)

plt.title('Time breakdown for unit operations')
plt.ylabel('Time per batch (days)')
plt.xticks(ind, [step.name for step in sequence.steps], rotation=40, ha='right', fontsize=8)
plt.legend((p_cat1_time[0], p_cat2_time[0]), ('USP', 'DSP'))
plt.subplots_adjust(bottom=0.25)
plt.show()

In [None]:
# Mass breakdown data
mass_per_step = sequence.step_outputs('mass') / bp.units.g

plt.figure()
ind = range(len(sequence.steps))  # The x locations for the points
plt.plot(ind, mass_per_step)

plt.title('Mass breakdown for unit operations')
plt.ylabel('Mass remaining after step (g)')
plt.xticks(ind, [step.name for step in sequence.steps], rotation=40, ha='right', fontsize=8)
plt.subplots_adjust(bottom=0.25)
plt.show()