# Putting Machine Learning Systems in Production

Folks often talk about the technical requirements around building machine learning systems, the machine learning operations, but they rarely talk about the design requirements, the nature of scientific inquiry or the trade offs in machine learning systems.  In this chapter we will look deeper at how to build a machine learning system from the lens of a scientific procedure. 

## It Starts With a Feeling

### The Difference Between Engineering and Machine Learning Systems

The inception of most machine learning systems in a business context do not start from a place of scientific inquiry.  They begin from a place of business requirement.  That means, these systems are typically designed more like engineering systems and often the engineering design process is applied to these systems.  This has the unfortunate consequence of leading to many machine learning proof of concepts and very few machine learning systems, outside of specific domains.  I will provide more evidence on why this is the case further down.

One of the key differences between engineering systems and machine learning systems is the design process.  Most engineering systems that we see today are based on a well defined set of steps and a well defined medium.  This means you can more or less go from idea to implementation on a set schedule, with clear guidance around expectations from inception of the system to delivery.  Here I am of course mostly talking about websites and mobile applications.  As an aside, and to be clear, I am not suggesting that every website or mobile app ever developed will succeed.  Rigorous user testing is required in order to ensure that the website or mobile app is popular, useful and bug free.  These are hard problems.  That said, the implementation of these applications is both well understood and typical.  What I'm saying is, building these systems is more or less straight forward.

When you are developing a machine learning system, yes frameworks also exist and there are straight forward ways to process and analyze data.  That said, all of these systems and frameworks are subject to the shape and structure of the data itself, the patterns encased in it, as well as the correctness of the data, the emergence of the underlying phenomenon, the contiuation of the underlying phenomenon, the occurence of rare events, and the stability of the underyling frameworks; amongst other things.  My point is this, there is a lot of stuff you have to consider when building a machine learning system, because machine learning systems are exceptionally flexible and have the ability to solve an exponentially larger class of problems than even traditional statistical models or engineering individually, there is a very real chance of misuse, misunderstanding or even explicit lying through obfuscation and complexity.  

Even in the academic community, which makes use of third party double blind peer review, some scientifically verified phenomenon are proved to be untrue or inaccurate.  And in a business setting, you can't ask for third party double blind review, it's on your team, more or less, to verify the correctness of your assumptions, the correctness of your data, the correctness of your implementation, and the underlying truth you are trying to capture through a machine learning system.  So, you are starting at a strong disadvantage over the academic community.  

All this said, if care is taken, it is possible to build robust and powerful machine learning systems that can solve an exceptionally large number of problems.  Part of this comes from the fact that neural networks, support vector machines, and tree based models are parameter free is a huge deal.  They can approximate more functions than traditionally statistical models like linear regression or logistic regression.  This means we can represent the mathematical equations for highly complex systems through high level computational interfaces, which are well documented, open source, and more or less easy to use.  This ease of use has danger because it's powerful.  More powerful than any set of computational tools we've had before. 

### How The Differences Between Machine Learning and Engineering Manifest

Because machine learning systems are different than traditional engineering systems they have different requirements.  For instance, you won't always know which machine learning model is going to be effective in predicting your target.  For some well understood classes of problems you certainly can make a pretty good guess.  Basically, you have to treat the problem you are solving as a classification for the methods to try the same way you would treat the system you are building as a classification for the type of engineering you are going to to implement.  Let's look at a few examples:

Engineering Systems and Typical tools used:

System: A webpage on a major website

Tools used: CI/CD, browser testing, mobile screen size testing, front end in javascript (probably react), Golang or Python (maybe flask or something) in the backend.

System: An operating system

Tools used: C programming language

System: A CLI

Tools used: C programming language

System: A mobile application

Tools used: react native or Objective C or Java

Machine Learning Problems and Typical models used:

Problem: Object detection

Models: Residual Network, or Vision Transform

Problem: Market Mix Modeling

Models: A convolutional neural network with time varying weights or a linear model with lots of preprocessing

Problem: Time Series Forcasting

Models: A recurrent neural network, xgboost or penalized linear model

Problem: Named Entity Recognition

Models: A transformer of some kind

Something that should be noted in each of the machine learning systems - these models are all _guesses_ based on things that have worked for me personally in the past when I've built these kinds of systems.  I don't _know_ that any of them will work for future systems.  In fact there is a good chance these specific model architectures will fail!  

This is the inherent different between machine learning and software engineering, the tools more or less work every time you want to build a thing.  Maybe the tools themselves change, like in web development most folks use react these days instead of jquery (not that there is anything wrong with jquery!), but the problems you are solving with react and jquery are more or less the same.  The way you solve the problems to build the system has changed, but that's it.

As the types of machine learning models change and as the tools get better, the classes of problems you can solve and therefore the types of overall systems you can build changes.  In some cases, you may be solving your specific problem for the first time ever!  That's very exciting, but there is a lot of risk in doing that.  And accepting and internalizing that these risks require more careful analysis and the acceptance of the possability of killing a system or feature.

A natural consequence of these risks is that it may not be possible to iteratively improve a system.  It might not matter how many different things you try, the system may work one day and stop working the next.  Or it may never work at all.  That's because a machine learning system that goes to production must be able to generalize and it's pattern must hold within the context of the system.  If it doesn't then no amount of iteration is going to make it work, no matter how hard you try, some things will fail.  

And if you can't accept that, you won't be successful as a person doing machine learning professionally.  It's important to communicate to your stakeholders and management that failure is a possability.  To set realistic expectations about the likelihood of success.  And as a consequence, if you don't, you may work on a system for months or years and when you go to production, it may fall over.  And it may never work again.  

### What To Do When Your System Isn't Going To Work

This is a bit of personal conjecture and story telling, it's anacedotal, but I believe this pattern generalizes.  Often times, when a business person comes to you saying they want to build a new machine learning system or have an idea about some new feature or product, they are doing that for self interested reasons.  They may care, but probably don't care about the specific system you build or how you do it.  Specifically, most business people only care about "innovation", not how you innovate or what you solve.  They want to use machine learning or AI because it increases the chances of them getting a promotion, most of the time.  Some folks in management really do understand statistics and machine learning.  But they are far and few between.

So, what do you do when the system you've been asked to build won't work?  

1. Present alternatives

Just because you can't build the thing they asked for, doesn't mean you can't build something.  If someone took the time to put together the data, it's likely that there is some pattern you can exploit to improve some process, make money or lower some costs.  If you have to deliver bad news on a specific idea, make sure, if at all possible, that you have an alternative idea you can present

2. Pair with your manager

If your manager has a statistics background, then you are very fortunate!  And it means you can actually pair with them on the exploratory phase of building the machine learning system and searching for patterns.  This means, you can actually verify an idea is viable before you reach a failure mode and if you fail, you can pivot quickly.  In some cases, if your manager has an engineering background, you can bring them along on the journey as well.  Although this is typically a harder sell as managers with an engineering background typically are uncomfortable with the messiness of exploratory data work.

3. Be rigorous in how this idea won't work

If you have to show that your managers idea won't work, make sure you can back it up.  Try a bunch of stuff, perhaps more than you need to, to convince yourself.  If your manager sees that you truly put in the effort and things really won't work.  If they are good at their job, they will drop it.  That isn't a guarantee though.  If after everything they still don't accept that the idea won't work, it may be a good idea to look elsewhere for employment.  Because even if this idea can be proven through some extensive gymnastics, they will likely present some idea in the future which will reflect poorly on you.  More or less, you are eventually going to be set up to fail.  And that is bad for your career.

At the end of the day, you have agency.  Remember that.

##  Problem Definition

Once you are sure that you actually need to build a machine learning system, be very clear about the problem you are solving and make sure your problem is falsafiable. Because your solution is approximate, if you can't falisify if the pattern holds, then you will be building a bridge to nowhere. You may not always be able to improve the system through iteration, because machine learning systems are built on patterns. If the pattern doesn't hold, you have nothing valuable to gain by doing the engineering.

### Falsifiability

Falsifiability means that a statement is able to be either true or false.  In other words, the statement or subject of inquiry is not subjective.  Falsifiability is central to many questions and problems.  It is even at the heart of computational systems through binary.  The ability to say whether or something is true or untrue is extremely powerful.  Many statements and problems are subjective, so making sure you can state your business problem in falsifiable terms is imperative, because otherwise you can't ever be sure if you are truly building something of value.

That said, maching learning systems do not provide exact answers, they provide approximate ones.  Which further reinforces the need for falsifiability.  If you system is not built on ironclad assumptions it becomes brittle and subject to easily failing.  Stringency in requirements for acceptance of our machine learning model allows us to have confidence about going to production.  And gives us a quantifiable measure of whether or not our system will deliver value.  But moreover, it can even give us approximations on how much value we can expect to gain, which can further inform whether a machine learning system is worth building.

For instance, suppose that a given machine learning system, will save us 10,000 dollars per year, but training the system costs 100,000 dollars per year.  Then it isn't worth building the system, _even though_ the system has value.  

#### Examples of Falsifiable Business Problem Definitions

1. Using a machine learning system, we can improve the probability that the relevant search result will appear within the first three pages of our search engine by 60%

2. Using a machine learning system, we can reduce the claims processing time for our insurance company by 80%

3. Using a machine learning system, we can predict system failures by monitoring our data infrastructure 90% faster and then resolve failures 20% faster, leading to a total of 10% less in system down time

#### Counter Examples of Falsifiable Business Problem Definitions

1. By adding machine learning to our search engine, we will have a better overall experience for users

2. Neural networks are really cool, let's use them to predict how likely someone is to click on an advertisement and then adjust who we show ads to, based on the results of the neural net

3. Let's add an AI chatbot to our system because other companies are doing it and we don't want to get left behind

#### Analysis Of These Examples

Let's start with the examples of falisfiable business problem definitions.  Each of them:

1. Sets a measurement about how much of a difference adding machine learning will improve things
2. Each statement is either true or false, but not both
3. Each statement lays out a clear plan for how we are going to test if our model will make a difference

Now let's look at the counter examples:

1. Each is more like a directive, indicating we are going to do this thing, regardless of whether it will actually help.
2. Each statement operates essentially on hype, the fundamental ethos here, which many engineers operate under is, this thing is cool, so we better use it!  This is the worst reason to use machine learning and is the most likely to lead to trouble down the line.
3. Each statement is subjective and doesn't lay out anything about how we are going to measure results and what success looks like.  If we don't know what success looks like, we can never actually succeed.  Which means success is subjective.  

Now let's compare and contrast:

Basically, the biggest difference between the first set of statements and the second is, they are well defined.  They are clearly stated.  And even though the first three statements are stated in 'positive' terms, they can still be falsified.  If these statements end up being false, then we clearly know, as the implementation team, that we won't succeed at getting to the results the stakeholders want.  And then we can communicate this clearly to the stakeholders.  This leads to more open communication and clear expectations.

It also allows for readjustment.  If the stakeholders see evidence that the system won't meet their expectations, they can choose to kill the project or change the business requirements to meet new measures!  So to be clear, by stating things falsifiably, it doesn't necessarily imply failure, but does imply that possability of failure!  

Being explicit and setting expectations is imperative for clear lines of communication, but more so, to make sure you have job security.  If you can communicate failure modes clearly then you are won't be at fault for failures outside of your control.  At least not from a good management team.  

As an aside, there is a danger here, some managers promote based on survivorship bias.  That is, they may hand out crazy ideas to a bunch of different teams, and then one or two of those crazy ideas may happen to work out.  As a result, that management team may come to see that particular team as successful, because they happened to succeed.  But that doesn't mean this is always the case.  If bad teams succeed because of dumb luck, rather than skill, then bad practice can be set.  This bad practice can lead to systemic failures organizationally.  And that leads to a lot of wasted or failed efforts, chasing crazy ideas and constantly stressing out teams.  It also means the best people within an organization will quit, because their chances to succeed or fail reasonably disappear and instead random chance governs their fate.  But I digress.

The second largest difference is in the counter examples the features are getting added to the system, regardless of actual business value.  If we add machine learning just for the sake of adding it, then even if the system is technical integrated, this can lead to problems.  Specifically adding unnecessary costs to the system, creating bloat and unnecessary features, degrading system performance in terms of speed or accuracy, and many many other problems.  Being explicit here, adding machine learning doesn't always make a system better, just like any design choice doesn't make a system better.  Therefore careful measurement and falsifiability are key towards the success of any system with machine learning either baked in or being added later on.

### Tools For Falsifiability

As we saw in [chapter 1](https://github.com/EricSchles/datascience_book/tree/master/1) hypothesis testing is a fantastic exploratory tool for falsification.  Additionally, some times building simple models on your dataset can give you a sense of whether or not a given pattern exists, at least at a point in time.  It's worth noting a simple model doesn't necessarily mean one that's mathematically simple, but almost always means simple to code.  If an idea is going to fail, you want it to fail as soon as possible.  Otherwise people can get attached.  Promises get made.  People get excited.  You want to fail fast.  

For prototyping models I highly recommend [scikit-learn](https://scikit-learn.org/stable/) or [keras](https://keras.io/).  You can build machine learning models incredibly fast with both of these tools and quickly invalidate hypotheses.  

If you do get a positive result, I personally recommend making use of the tool I wrote, [randomizer-ml](https://github.com/EricSchles/randomizer_ml).  Randomizer-ml randomizes the train-test split of your data, so you can see what the best and worst case scenario is for testing your models.  By doing so, you are able to make sure your model didn't just get a lucky train test split.  

Of course, this isn't enough!  The next stage is to make sure your model 'makes sense'.  At this point, it's best to show your preliminary results to stakeholders and domain experts.  If the model performs well.  Otherwise, best to come with a few alternatives for when you give the bad news.  Of course, it's always possible a mistake was made during development and the initial assessment, so be sure to carefully go over results with your stakeholders.  If an incorrect assumption was made, that could change the nature of your results.

#### Worked Example of EDA



# Putting Machine Learning Systems in Production

Folks often talk about the technical stages of building a machine learning system, but they rarely talk about the scientific steps towards building a production machine learning system.


* the stages:
    
    
## it starts with a feeling:

Getting product requirements from end users and making sure that the system is actually necessary
A lot of ML systems don't need to be built - make sure yours is actually necessary.  Specifically machine learning is good for a much wider range of problems than traditionally programmed systems.  But there is a trade off.  You lose specificity by gaining a larger domain of problems to solve.  Do the user research first.  Make sure you really need the thing before you go off and build it.  Specifically, machine learning systems are fuzzy.  They don't operate the way other code does.  There isn't a so called, closed form solution to machine learning problems.  You don't get an exact solution.  You get an approximation.

## Problem Definition

Once you are sure that you actually need to build a machine learning system, be very clear about the problem you are solving and make sure your problem is falsafiable.  Because your solution is approximate, if you can't falisify if the pattern holds, then you will be building a bridge to nowhere.  You may not always be able to improve the system through iteration, because the machine learning systems are built on patterns.  If the pattern doesn't hold, you have nothing valuable to gain by doing the engineering.

### Falsafiability

Talk about randomizer ML

TODO: add a way to check for secondary metrics which make it easier to test for the realization of certain emergent properties of a machine learning model.  This can be as naive as shap values, but should probably be other measures as well.  

TODO: add a way to tell which train test splits produce different results.  Give a way to interrogate the data further for the best and worst performing train test splits

TODO: add some of the testing that's talked about in https://www.youtube.com/watch?v=BDEBF62iZx0&ab_channel=PyData


## Acceptance Criteria

Just because you found a pattern, that doesn't mean the pattern will hold.  It's important to set rigorous acceptance criteria, not only with your stack holders but also for yourself.  The acceptance criteria is the generalization of falsiability to an extent.  It allows you to communicate to stakeholders and yourself various edge cases and failure modes.  Just because a pattern exists at a point in time, it doesn't mean the pattern will persist.  This may mean the model architecture needs to change, it may mean the underlying infrastructure may need to change, because training the model to an acceptable level of inaccuracy takes longer.  It may even mean the project needs to die.  Making sure you are consistently meeting acceptance criteria is an imperative for the success of any machine learning system.

Cascade error!  Talk about and define this!