# **Making a Classifier**

## Step Zero: The Overall Idea
One of the issues in creating an "everything for everybody" style biochemical ODE solver is that many of the models that people will use on it have an optimal solver method, but that's difficult to intuit just from some of the parameters of the model. There are some rules of thumb when dealing with large, slow-moving populations and fast-reacting small populations but everything after that is kind of a mixed bag of uncertainty. 

This sounds like a traditional ML classification/logistic regression problem. 

There are a few concerns when we take this tact:

+ We need a large dataset of varying models, and will probably have to clean and aggregate the data.

+ We will have to pickle and create a persistent object that represents our classifier, which means introducing even further dependancies to the project. (pickle is core, I'm talking about TF)

+ Whenever a new solver is written it has to be integrated into our classification script and rerun. This integration could have some issues with institutional memory and turover.

+ Will have to determine representation of models while limiting dimensionality/normalization. [Population vice rate kinetics would be my primary concern]

This comes with some benefits:

+ Opportunity to create something that people would find useful. (The Dream)

+ Practical Use of an ML Model. (Academic Gravitas)

Graphical Representation of Proposed PipeLine:

[BioModels Database] -> [gillespy2 SMBL Module] -> [Conversion of gillespy2 models into pandas dataframes] -> [classifier training] -> [insertion of trained classifier into library] 

## Step One: Getting the Data
I believe this was mentioned in one of the past research meetings: [BioModels Database](https://wwwdev.ebi.ac.uk/biomodels/) This is a database that contains approx. 700 ODE style SBML Models. 

gillespy2 is written to take SBML models. (Cue sudden looking around.) I'm writing this in media res, so we'll see how this works. 

I'm not writing a scraper script just because it took me about 10 minutes to get all the models and it'd take me much longer to fumble through a BFS4 script.

In [1]:
import sys
import platform
import matplotlib.pyplot as plt
import numpy
import os
import timeit
sys.path.append("/home/jackson/Research/GillesPy2/")
import gillespy2

In [2]:
print(f'Benchmarks of System:\nPlatform: {platform.dist()[0]}\nProcessor: {platform.processor()}\nPython Build: {platform.python_version()}\nRunning off SSD\nMake And Model: Dell XPS 13 Develepor Edition Linux Variant (with carbon fiber finsish)')

Benchmarks of System:
Platform: debian
Processor: x86_64
Python Build: 3.6.2
Running off SSD
Make And Model: Dell XPS 13 Develepor Edition Linux Variant (with carbon fiber finsish)


In [3]:
test = gillespy2.StochMLDocument()
test.from_file("/home/jackson/Classifier/BIOMD0000000017_url.xml")
#Code says to use a serialize function which does not exist.
model = test.to_model("check")
from gillespy2.basic_ssa_solver import BasicSSASolver
results = model.run(solver=BasicSSASolver)
print(results)

{'time': [0, 0.04999999999999716, 0.09999999999999432, 0.14999999999999147, 0.19999999999998863, 0.2499999999999858, 0.29999999999998295, 0.3499999999999801, 0.39999999999997726, 0.4499999999999744, 0.4999999999999716, 0.5499999999999687, 0.5999999999999659, 0.649999999999963, 0.6999999999999602, 0.7499999999999574, 0.7999999999999545, 0.8499999999999517, 0.8999999999999488, 0.949999999999946, 0.9999999999999432, 1.0499999999999403, 1.0999999999999375, 1.1499999999999346, 1.1999999999999318, 1.249999999999929, 1.299999999999926, 1.3499999999999233, 1.3999999999999204, 1.4499999999999176, 1.4999999999999147, 1.549999999999912, 1.599999999999909, 1.6499999999999062, 1.6999999999999034, 1.7499999999999005, 1.7999999999998977, 1.8499999999998948, 1.899999999999892, 1.9499999999998892, 1.9999999999998863, 2.0499999999998835, 2.0999999999998806, 2.149999999999878, 2.199999999999875, 2.249999999999872, 2.2999999999998693, 2.3499999999998664, 2.3999999999998636, 2.4499999999998607, 2.499999999

I have no idea what this model is actually supposed to be so I'm gonng a try another one. However, I'm really happy it got at least that far.

In [4]:
test = gillespy2.StochMLDocument()
test.from_file("/home/jackson/Classifier/MODEL0848062679_url.xml")
#Code says to use a serialize function which does not exist.
model = test.to_model("check")
from gillespy2.basic_ssa_solver import BasicSSASolver
results = model.run(solver=BasicSSASolver)
print(results)

{'time': [0, 0.04999999999999716, 0.09999999999999432, 0.14999999999999147, 0.19999999999998863, 0.2499999999999858, 0.29999999999998295, 0.3499999999999801, 0.39999999999997726, 0.4499999999999744, 0.4999999999999716, 0.5499999999999687, 0.5999999999999659, 0.649999999999963, 0.6999999999999602, 0.7499999999999574, 0.7999999999999545, 0.8499999999999517, 0.8999999999999488, 0.949999999999946, 0.9999999999999432, 1.0499999999999403, 1.0999999999999375, 1.1499999999999346, 1.1999999999999318, 1.249999999999929, 1.299999999999926, 1.3499999999999233, 1.3999999999999204, 1.4499999999999176, 1.4999999999999147, 1.549999999999912, 1.599999999999909, 1.6499999999999062, 1.6999999999999034, 1.7499999999999005, 1.7999999999998977, 1.8499999999998948, 1.899999999999892, 1.9499999999998892, 1.9999999999998863, 2.0499999999998835, 2.0999999999998806, 2.149999999999878, 2.199999999999875, 2.249999999999872, 2.2999999999998693, 2.3499999999998664, 2.3999999999998636, 2.4499999999998607, 2.499999999

In [4]:
model2 = gillespy2.import_SBML("/home/jackson/Classifier/MODEL0848062679_url.xml")

SBML module looks unloved, formatting and reintegrating.

Notes:
+ Mostly needed to append 2 to any gillespy reference.
+ This error appendation thing is hideous.
+ Ask about removal/rewrite of the testing script at the bottom. (Smells like Python 2 to 3 conversion based on library imports.)

Lots of errors about SBML based rules. Real talk, my knowledge of SBML is cursory at best so I might need to bone up on this.

http://sbml.org/Software/libSBML/Downloading_libSBML

In [11]:
results = model2[0].run(solver=BasicSSASolver)
print(model2[0].listOfSpecies)

OrderedDict()


It's got a name though! Looking at the resultant dictionaries, it looks like the only thing that gets through is time. Looking at the errors, I wonder how truly difficult it would be to actually write the SBML parsing system. 