# Tools of the Trade

In the preceding section, we covered all the foundational elements of NLP and how to develop NLP models. Starting with this chapter, we'll cover what you should begin to think about as you come out of the wonderful world of training magnificent models on carefully curated datasets and into the mess that is the real world.

In this chapter specifically, we will discuss mainstream machine learning software and the choices you will face as you decide what to include in your stack. Then, in Chapter 10, we build custom web apps for machine learning and data science using an easy-to-use open source Python library called Streamlit (https://streamlit.io), and we will conclude this section (in Chapter 11) with model deployment at scale using software from the industry leader, Databricks (https://databricks.com). By the end of these three chapters, you will have a good understanding of how to productionize machine learning models as web apps, APIs, and machine learning pipelines.

Let's start with a topic many developers love spending inordinate amounts of time arguing over: tools.

People who should probably be spending their time coding, love hashing out the standard TensorFlow versus PyTorch or best programming language debates on endlessly long Twitter threads, but we want to take a step back and talk about some of the more practical decisions you'll have to wrestle with in the real world. After all, "applied" is in the title of this book.

Here are a few obligatory disclaimers:

- It is almost certain that what we recommend today will become outdated over time. Instead of being overly prescriptive with our advice, we want to help you develop intuition for what matters when you make decisions about to include in your tech stack.

- We recoginze that you probably have your own set of restrictions-for example, your company may already have a set of tools you are obligated to use. Or, you might be part of a large team where the choice of programming language, cloud provider, etc., has already been made for you. But hopefully, this chapter will still provide you with a sense of what else is out there.

- Making choices for your tech stack can be overwhelming. There are so many different providers offering similar competing services, and the prices and features they offer change frequently. This makes picking the absolute best a nearly impossible exercise. The sheer variety of competing providers can lead to decision fatigue. We will try to do our best to keep the choices you have to make to a minimum. In fact, we will go a step further. We, the authors of this book, will pick our own favorite tools to work with when building NLP applications!

- The list here is neither comprehensive nor definitive (nor is it in any particular order or ranking). The tools we list here are simply the ones that we, the authors, have found useful, popular, or interesting. The decision on what to use is, as always, up to you.

- What may work for one person in the field may not work for you. Take our suggestions with a grain of salt, and think critically about what makes the most sense for you.

- In the end, what's most important is not what tools you use but how you use them. In fact, you'll find that a lot of the deep learning frameworks, programming languages, etc., are often very similar, and it's not too hard to learn one once you've learned another.

We've split the tools info a few categories and have listed a few under each. At the end of each section, you'll find two specific recommendations, labeled "Ankur's Pick" and "Ajay's Pick" in the classic style of The Motley Fool (https://www.fool.com). These are our individual personal favorites:

Ankur's picks  
These wil trend to be more production-oriented, with a focus on tools that are stable and popular in industry, and that scale well.

Ajay's picks  
These will be more experimental and research-oriented. These tools are designed for rapid experimentation and prototyping and will help you stay on the bleeding edge of modern research.

By the end of this chapter, you should be well aquainted with the landscape of tools available to you as you build NLP applications, both to prototyping and to deploy in production.

## Deep Learning Frameworks

Let's start with the deep learning frameworks. These frameworks are the core building blocks for nearly all of NLP (that's relevant to us), and we will use them extensively throughout this book. Most deep learning frameworks do the same exact thing-they perform tensor computations on GPUs.

What differentiates them is the way they implement the various high-level features and abstractions as well as how they manage the less obvious backend implementation that governs the actual performance of your code.

Over the last decade, multiple frameworks have phased in and out of existence. Older and increasingly less popular ones that you might have heard of in passing are Theano, Chainer, Lua, Torch, and Cafee. As of 2020, we think these smaller frameworks are, for the most part, obsolete and not worth exploring in great detail. 

The big ones that you're familiar with and perhaps have used already are PyTorch and TensorFlow. These two frameworks were launched by two of the most successful technology companies today-Facebook and Google, respectively. Partly because of developer communities to adopt and support their deep learning frameworks. Both frameworks have several things in common: they are both open source and interface with Python as the primary programming language. But there are a few differences between the two, which we will highlight in detail.

> PyTorch is based on the Torch framework, and TensorFlow is based on the Theano framework. Even though Torch and Theano have declined in popularity, their derivatives are now the most dominant in the deep learning space.

However, there are also a few new kids on the block, which you're probably less familiar with. Jax, Julia, and Swift for TensorFlow all promise killer new features, far better performance/speed, and are fairly drastic departures from what we've seen so far. They are still not as fleshed out as PyTorch and TensorFlow in terms of stability, comunity, and hardware support, but they show a lot of potential and have good development momentum, so be prepared to dip your toes in those as well.

### PyTorch

Let's start with PyTorch, the fastest growing deep learning framework over the past several years. It was developed by Facebook' AI Research lab(FAIR) and released publicly in October 2016. The consensus is that PyTorch is now more popular among researchers.

At the core of PyTorch lies the torch.tensor object. It's a type of multidimensional array, almost identical to a numpy.ndarray, that can live in GPU memory and be used for fast parallel computation. Almost all of PyTorch is built for manipulating these tensors with operations such as matrix multiplicaiton, convolution, etc.

The other big component of PyTorch is autograd. This feature automatically calculates a quantity called the gradient whenever you use PyTorch tensor operations, which is extremely useful for training neural networks.

Beyond this, the easiest way to describe PyTorch would be to call it "NumPy on the GPU" with added convenience functions for deep learning. Typically, deep learning involves repeatedly performing similar computations on large tensors, which is where GPUs excel. NumPy performs computation on the CPU, which, in most cases, is much slower than running lower-precision computations in parallel on the GPU.

For most Python programmers, PyTorch will feel natural and "Pythonic" since its interface is very similar to NumPy. This is one of the main reasons that PyTorch has continued rising in popularity over the last few years, despite the fact taht it was released after TensorFlow.

Both PyTorch and TensorFlow offer distributed computation features, but PyTorch has better optimization for training because it has native support for asynchronous execution.

The job of deep learning frameworks can be described as executing a "graph" of computations on tensor data structures. In PyTorch you deine the graph at runtime, which allows you to easily go back and forth between planning and execution. the ability to evaluate operations immediately, without compiling graphs explicitly, is known as eager execution.

Eager execution allows you to prototype faster and create new types of architectures, but at the cost of speed. Think of this as the difference between compiled and interpreted languages.

This used to be a big deal a few years ago since TensorFlow used static graphs back then, requiring you to define the entire graph first before pushing data through. However, both frameworks now support eager execution by default, and this has since been adopted as the go-to industry standard.

Following are itemized lists of things to consider before you start using PyTorch. First the pros:  

- Easier to learn and more intuitive; Python-like coding  
- Dynamic graph  
- Exellent for fast experimentation and prototyping  
- Requires less reading through documentation  
- Better integration with other Python packages  
- Rapidly gaining popularity among researchers  

Here are some of the cons of using PyTorch:  

- Relies on third party for visualization (e.g., Visdom)  
- Has a less-robust native system for edge device deployments (requires API server)

Now, let's compare PyTorch with TensorFlow, the framework that remains the most popular in the industry today despite PyTorch's rapid ascent.

### TensorFlow

Developed by the Google Brain team for internal Google use, TensorFlow 1.x was released in late 2015. It has a larger user base in industry, though this is likely due to the fact that it was released earlier and many companies have existing TensorFlow experience and legacy code. For the same reasons, TensorFlow has a larger community base overall. 

In general, we would not recommend the 1.x version of TensorFlow, since it has a very bloated API and is generally more verbose and less user friendly than PyTorch (and actually slower in some cases due to problems with the backend).

However, with TensorFlow 2.0, the differencies between TensorFlow and PyTorch have narrowed. TensorFlow now offers the ability to build dynamic graphs, instead of static graphs. TensorFlow 2.0 also fully integrates Keras, a ery popular high-level API for TensorFlow. While TensorFlow 2.0 has resolved a lot of its issues with a complete redesign of the framework, it has also faced criticism for the drastic changes it introduced, which breaks nearly all 1.x code.

Compared to PyTorch, TensorFlow has excellent built-in visualization capabilities (e.g., TensorBoard) and has better support for mobile platforms with TensorFlow Lite (though this is changing with PyTorch Mobile). Because of this, TensorFlow can be easier to deploy in a production setting thanks to tools like TensorFlow Serving, which uses REST client APIs.

TensorFlow, in general, consists of a lot more than the Python framework, Though. There are now more variants than we can count, including TensorFlow Lite, TensorFlow Extended, TensorFlow Serving, TensorFlow.js, TensorFlow.jl, TensorFlow Probability, and many more. This could be a helpful ecosystem or a confusing nuisance to deal with, depending on your perspective.

We recommend TensorFlow to developers that are ready to build production-ready applications and who may have existing code/infrastructure built on the TensorFlow ecosystem.

Following are itemized lists of things to consider before you start using TensorFlow. First, the pros:  

- With Keras in TensorFlow 2.0, has a simple built-in high-level API  
- Now supports eager mode  
- Excellent visualization (TensorBoard)  
- Production-ready (TensorFlow Serving)  
- Great mobile support  
- Large developer communit and comprehensive documentation  
- Has better performance at very large scale  
- The dominant framework in industry  

Following are some cons in using TensorFlow:  

- Many people complain that TensorFlow still carrier the baggage from its 1.x version, which was completely different from the TensorFlow we have today and was generally much harder to use.  
- It has a steeper learning curve, and can feel at tims like a new language.  

While PyTorch and TensorFlow are the two most popular deep learning frameworks available today, let's explore some of the fast-rising newcomers that may eventually challenge the incumbents.

### Jax

Jax is a new numerical computing library introduced by Google very recently. It takes the idea of "NumPy on GPUs" popularized by PyTorch to a whole new level. At its core, Jax provides autograd funtionality (the ability to calculate gradients of chained functions without explicitly specifying a derivative, which is extremely important for deep learning frameworks) directly on top of the standard NumPy and Python functions. This means Jax's autograd can handle loops, conditionals, closures, and other native Python constructs without any modification to your code!

But why is Google making a new library that has very similar functionality to TensorFlow? Who knows? The Jax project uses components and tools like XLA that stemmed from TensorFlow, but it seems to be a much cleaner rewrite of it. Will it eventually replace TensorFlow? Maybe. Only time well tell. But what we have now seems to indicate a promising new direction for deep learning frameworks focused on high performance on accelerators and reducing boilerplate code and syntax. 

### Julia

julia, unlike others on this list, is not just another framework or library, it is an entirely new programming language. Its creators expressed concerns that many suboptimal decisions were made from a performance perspective when Python was created. It was, after all, designed to be easy-to-use first and everything else second.

But today, we're using Python tools to manage large datasets, run complex scientific simulations, and train deep neural networks with billions of parameters. This doesn't seem like something that should be done in a language that sccrifices performance for simplicity. 

julia was designed from the ground up for numerical and scientific computation. While Python has many use cases, including server backends, databases, and scripting, Julia focuses on the traditional "data science stack" that Python programmers use (i.e., NumPy, pandas, matplotlib, SciPy, etc.).

We won't be covering Julia extensively in this book, but we highly recommend checking it out yourself.

> Honoralbe Mention: Swift for TensorFlow
> Swift for TensorFlow (sometimes abbreviated to S4TF) attempted to solve an issue similar to the one julia does-the fundamental limitations of Python.

> The project made valuable contributions to the space of differentiable programming, compilers, and numerical computing in general, but unfortunately stopped development in 2021. We thank the S4TF team for their efforts, which have now introduced a number of upstream changes to the Swift programming language itself; its example has inspired other projects that attempt to build mainstream differentiable programming languages.

Without further ado, here are our personal picks:  
Ankur's pick  
This is a very difficult choice for me. On one hand, I love to prototype in PyTorch, given how "Pythonic" it is. On the other hand, TensorFlow is so well entrenched in industry that it's hard not to invest heavily in learning and developing in TensorFlow. My recommendation is to learn TensorFlow because you prefer the ease and simplicity of PyTorch, which is my top pick personally.  
Ajay's pick  
My deep learning framework of choice is PyTorch. While I'm super exited about some of the new ones and can't wait for deep learning frameworks to expand into other programming langauges, PyTorch still seems to e the most reliable solution at the moment. It's a great tool for research, and a lot of the latest academic literature is implemented in PyTorch, which makes tweaking and testing new architectures, optimizers, etc., extremely easy.  
Next, we will discuss visualization and experiment tracking software for your deep learning training needs. 

## Visualization and Experiment Tracking

Often, you'll start training one model, then another, then the next, and "Oh wait, maybe if I try this..."

Once you've set up your training pipeline, it becomes extremely easy to quickly run multiple experiments, perhaps even simultaneously. At this stage, most of your effort as a deep learning pratitioner will not go into writing code but into making tweaks to a few key components of your model, data, or training loop.

As you start this rapid experimentation phase, you might need to run hundreds of experiments to find the best solution. Without software to visualize and track your experiments, it would be challenging to keep track of which experiments were most promising and which directions are worth pursuing further. Debugging these models is also difficult and time-consuming without good visualization software. Also, because most of machine learning today is highly colaborative, you'll need software to track your work within a team and share progress with others to avoid issues like redundant experiments.

That's what this section is all about-tools that help you track experiments, monitor performance, version control your experiments, and share your results with the rest of your team.

### TensorBoard

Tensorboard is TensorFlow's built-in visualization software. It's open source and free and has a very large community of users. It allows us to visualize the graph, track and visualize metrics such as loss and accuracy, view histograms of weights and biases over time, project embeddings into a lower-dimensional space, and display images, text, and audio data.

With TensorBoard, we can run multiple experiments and track which experiments are leading to better/worse performance. This helps us optimize model performance by tuning hyperparameters more easily, for example. It is also easier to troubleshoot machine learning models with TensorBoard.

The latest version of TensorBoard, Tensorboard.dev, even allows us to host, track, and share our experiments with others; this is especially useful for collaboration within and among teams. Prior to TensorBoard.dev, we had to submit screenshots of TensorBoard to others to collaborate on work.

While TensorBoard is a good built-in solution for TensorFlow, it lacks a lot of the colaborative features that other players in the space offer. The main advantage of TensorBoard lies in the fact that it's an official, first-party, built-in tool, something that PyTorch currently does not offer.

### Weights & Biases

Some machine learning practitioners rely on tracking ML experiments with a spreadsheet. Unless you're from the 20th century, this approach is both brittle and nonscalable in industry. Given the needfor great deep learning visualization and experiment tracking software, companies such as Weights & Biases have sprung up.

Founded in 2017, Weights & Biases allows teams to track their ML experiments, visualize, and optimize model performance, and maintain versioning of datasets and models with just a few line of code. TensorBoard was designed for indivisuals to experiment independently, but Weights & Biases was designed with collaborative teams in mind.

Weights & Biases automatically tracks hyperparmeters, metrics, etc., and logs them to the cloud. You can then visualize results through an interactive dashboard that updates in real time. You can log practically anything you might care about, including plots, sample predicitons, audio, video, 3D models, and even raw HTML. This tool also offers tags, filtering, grouping, and the ability to export to a wide variety of formats to keep your experiments well-organized.

### Neptune

 Much like Weights & Biases, Neptune allows us to track experiments and organize work for our team. The best part about Neptune is it easily hooks into multiple frameworks and is a very lightweight tool. It works very easily in notebook environments (e.g., Jupyter, JpyterLab, and Google Colab).

 Neptune is best for users who want a lightweight experiment management tool for all mdoel training (classic machine learning, deep learning, reinforcement learning, etc.). It also offers great notebook tracing (for Jupyter and JupyterLab). If you do most of your machine learning work in notebooks, Neptune is a top contender for experiments tracking.

### Comet

Comet is great for any model training, not just deep learning. It also offers meta machine learning capabilities (e.g., AutoML) that the other experiment tracking software platforms lack. Like Weights & Biases, Comet is a robust piece of software, one we recommend to industry practitioners. 

### MLflow

Developed in 2018 by the creators of Databricks, one of the leading data science platforms today (more on Databricks soon), MLflow is a free open source technology to track machine learning experiments, register models, and deply models. In other words, MLflow helps manage the entire machine learning life cycle from prototyping to deployment. While MLflow is helpful to indivisuals who need to track many experiments, it really shines with teams. Teams can collaborate better by reproducing results of their peers and leveraging prior experimentation and the modeling others have already done. Since models are registerd at a central repository, MLflow also makes it clear to team members which mdoels are in production and how to access them.

MLflow is unlike TensorBoard and Weights & Biases because it manages the entire machine learning life cycle; in other words, it is more than just experiments-tracking software. But, it does have light experiments-tracking features.

The major downside of MLflow is the lack of visualization capabilities that the likes of TensorBoard, Weights & Biases, Comet, and Neptune offer. In fact, MLflow has a very limited user interface altogether. Moreover, MLflow works best when used with databricks. As a standalone technology, it lacks many of the features enterprises will need, such as user management.

In Chapter 11, we will revisit MLflow and show where it shines best: model resitry and model deployment.

Here are our picks:  
Ankur's pick  
The single best pick here is Weights & Biases. The team there truly understands how to develop software to help with machine learning work. The founders of Weights & Biases previously founded a very popular and successful data annotation firm called Figure Eight (formerly known as CrowdFlower), which I have used in the past and has since been acquired by Appen. If you want to make your experimentation process more organized with better process, Weights & Biases is your solution. Weights & Biases also integrates well with nearly all the major data science frameworks and platforms, including Databricks, which we use in Chapter 11.

Ajay's pick  
I'm biased here, since I've been using Weights & Biases much more than anything else. But that's because it's the first tool I tried, and I found it perfect for what I do. The way I look at it, Weights & Biases really helps you move from working in code space to idea space. Having all my models and results in one place has really improved my productivity as a deep learning practitioner.

Now, let's move on to automated machine learning, which may help you with your training process.  

## AutoML

Machine learning has become more mature and increasingly in demand. As a result, startups that specialize in automated machine learning (AutoML) have become a hot topic of conversation in the data science community in recent years. Let's explore the current major players in AutoML and how they could be helpful in building NLP applications.

The standard machine learning pipeline includes the following steps:  

1. Import data.  
2. Preprocess data (e.g, handle missing values and outliers, check and convert data types, etc.).  
3. Perform feature scaling, engineering, and selection.  
4. Structure data (e.g., create training, cross-validation, and test sets, etc.).  
5. Define evaluation metric and choose and test models with various hyperparameters.
6. Set up algorithms, and choose and test models with various hyperparameters.  
7. Selec model(s) to deploy in production.  
8. Refactor code, write testss, and push into production.  
9. Monitor and maintain model in production.  
10. Collect actual results, and retrain model, as necessary.  

AutoML is machine learning that has been automated to some extent, reducing the effort required from human coders. AutoML may include the following:  
- Automated data preparation (e.g., imputation of missing values, feature scaling, feature selection, etc.)
- Automated grid search and hyperparameter optimization  
- Automated evaluation of multiple algorithms  
- Automated ensembling of models (e.g., ensemble selection and stacking)  

By automating some portions of the standard machine learning pipeline, AutoML frees up time for us to work on data preprocessing, feature engineering, and model deployment and maintainance.

### H2O.ai

### Dataiku

### DataRobot

## ML Infrastructure and Compute

### PaperSpace

### FloydHub

### Google Colab

### Kaggle Kernels

### Lambda GPU Cloud

### Our Pick

## Edge / On-Device Inference

### ONNX

### Core ML

### Edge Accelerators

## Cloud Inference & Machine Learning as a Service (MLaaS)

### AWS

### Microsoft Azure

### Google Cloud Platform

## CI/CD

## Conclusion