In [None]:
[[ch12]]

# Conclusion

This brings us to the end of our journey together. Over the course of 11 chapters, we introduced the origins of natural language processing and retraced how the field has advanced over the past decade. We delved into the nitty-gritty details of the space, including preprocessing and tokenization and several types of word embeddings, such Word2Vec, GloVe, and fastText.

We covered everything from vanilla recurrent nets to gated variants such as LSTM and GRUs. And, we explained how attention mechanisms, contexttualized word embeddings, and Transformers helped shatter previous performance records. Most importantly, we used large, pretrained language models to perform transfer learning and fine-tune models and discussed how to productionize the models using various tools of the trade.

Instead of getting bogged down in theory, we focused mostly on applying state-of-the-art NLP techniques to solve real-world problems. We hope this helped you build greater intuition about NLP, how it works, and how to apply it well.

By now it should be clear that getting up and running with NLP is relatively easy, partly thanks to the open sourcing of large, pretrained language models by research teams at Google, Facebook, OpenAI, and others. Companies such as spaCy, HuggingFace, AllenNLP, Amazon, Microsoft, and Google have introduced great tooling for NLP, too, making it less painful to develop NLP models of your own from scrach or fine-tune existing models.

## Ten Final Lessons

But, as we said in the Prface, many organizations today still struggle with developing and productionizing NLP applications and fail to get a good return on the investment in time, effort, and money that they make. With this in mind, we want to share with you some parting advice from hard-learned lessons we've experienced along the way.

### Lesson 1: Start with Simple Approches First

While it is tempting to turn to the latest state-of-the-art models to build NLP applications and to strive to beat industry benchmarks in performance, it is generally better to start with simple approaches first. Based on our experience, the newer and the more complex the modeling approach, the longer it will take to build the application and push it to production.

This is bad for several reasons:

1. First, it delays the time to tangible impact from any modeling you do. Your organization could benefit faster from a simpler model that takes less time to develop and push to production.
2. Next, a long model development cycle may be demoralizing not only to the machine learning team but to the leadership team and to the investors that have backed the machine learning initiatives at your organization. It is best to ship early and often in the early stages of any machine learning initiative to deliver quick wins to all interested parties and to show that machine learning can help the organization, even if the gains are more modest at first.
3. Finally, you learn a lot more about the problem at hand by working on the machine learning solution end to end. You may also get more real-world intel on the problem once your simple model is in production and you  begin to see how it performs on live dat. You may begin to ask and answer questions such as: what edge cases did we fail to account for during the development process? Where does the simple model fail most dramatically? How could we better design the model given what we know now?

Simple models are not only simpler to develop and deploy, but they are also easier to interpret than more complex models. For example, a simple NLP model using Light Gradient Boost Machine (LightGBM) (https://oreil.ly/9xBY) with some NLP-specific feature engineering is a lot easier to interpret than a more complex neural network based model. Simple models aslo require far less compute resources and time to train, whereas the latest state-of-the-art models are generally much larger and more compute-intensive.

Of course, the defintion of what is simple changes over time. For example, BERT was state of the art in 2018 and considerably more difficult to use back then than it is today. Contextualized word representations and Transformer-based pipelines are now the norm in NLP model development, too. If you started your model development with these techniques today, it would be fairly straightforward and simple (but not necessarily in 2018 when the techniques were frist publicly released).

While neural networks are now the norm, there is also room for classical, non-neural, network-based NLP applications in enterprise, too. To develop classical NLP models, you will need to perform your own feature engineering using steps such as preprocsssing, tokenization, and vectorization. Sometimes these classical NLP approaches quicly lead to pretty good results for the problem at hand, whereas the latest neural networks would take considerably longer.

Even rule-based methods might have a place and should not be shunned in enterprise; not everything has to be model-based. The goal should be to deliver value to the organization fast and reliably, eventually replacing the stpgap measure with better performing ones. As Volaire said, perfect is the enemy of good.

### Lesson 2: Leverage the Community

This brings us to another reason for starting with simple approaches first. The more simple approaches today are the ones that have been tried and tested over at least some reasonable amount of time. They are not purely theoretical and experimental; rather, they are battle-tested. Tried and tested is better than new and flashy for applied work.

These approaches have better documentation and fewer bugs, and they support a larger community of practitioners on sites such as Stack Overflow (https://stackoverflow.com), which you will be able to tap into when you run into issues with your model development and deployment. Theses communities are full of helpful tips and suggestions.

There is comfort knowing that others have tried and tested the more simple approaches you are starting your NLP build with today, so you are unlikely to have to pave the way for others from scratch. You are going down a well-paved with open community support along the way.

Once you have achieved some modest success with the simpler approaches and deployed your model to production, you will have bought yourself more time, and you can invest more energy in the more complex and more experimental state-of-the-art approaches. Even if you run into issues building the more complex model, at least your organization has a modest-performing model delivering tangible value in production as a stopgap measure.

This should be your mantra: ship models early and often in the early days to buy yourself more time to invest in longer R&D cycles. You will get more believers and champions for your initiatives as you show tangible impact along the way. 

### Lesson 3: Do Not Create from Scratch, When Possible

Before you invest a substantial amount of time and resources into building a solution to solve your problem, spend a modest chunk of time and resources exploring open source or third-party alternatives. Perhaps there is a decent pre-built model available as an API for your particular problem; why build a model from scratch when you could cheaply access an existing model?

Even if the open source or third-party solution is no a perfect long-term fit for your problem, it is generally better to use the solution in the interim since it will deliver immedicate value to your organization while you build the in-house solution for your organization's long-tern strategic needs.

Don not build what already exists. At the very least, do not start building until you've done the research to evaluate and rule out third-party options. It is very tempting as a programmer to want to build models and applications from scratch, wholly owning the process from start to finish. Building from scratch feeds the ego, but the better option may be to buy what you can from existing players and build only what you cannot find in the market.

The more generic and universal your problem is (such as receip extraction), the more likely that a decent solution already exits for you to buy. The more custom and specific your problem is, the more likely that you will have to build in-house. Choose wisely what to spend your time working on.

### Lesson 4: Intuition and Experience Trounce Theory

Our stance here remains consistent throughout the book: get your hands dirty fast with code and data if you want to advance in the field quickly. While it is certainly important to learn the theory, it is not where you should spend the majority of your time as an applied NLP practitioner.

Theory is most vital for researchers that want to build on top of the work of prior researchers and develop newer state-of-the-art approaches. But, if your goal is to deliver tangible value to your organization fast, it's best to start working with code and data as early as you can.

Our recommendation is for you to start with applied books such as this one (kudos to an excellent start already!) and software that have wrappers that allow you to easily work with large, pretrained language models. Our favorite places to start include spaCy, Hungging Face, and fast.ai, all of which we have explored in this book.

These companies have toy datasets and starter code to help you advance in your NLP journey fast. All three players, like us, favor intuition over theory and are biased toward action. Of the three, fast.ai has the best course materials and will help you build more of your foundational knowledge of NLP. spaCy and Hugging Face are better to explore once you have worked your way through several toy datasets and are ready to transition to performing NLP on larger datasets.

Even if you are a seasoned vet, you will likely need a resource to absorb the latest advances in NLP since the field is constantly changing; fast.ai is the place to go to make this continuing education as painless as possible. Afterward, you can turn to the official blog posts, research papers on arXiv, third-party blog posts, Medium, youTube, and other resources.

One last caveat: while we recommend that you start on toy datasets if you are new to NLP, it is critical that you transition to a real-world project before long. Working on a real-world problem (and all the other issues that come from working with data in the wild) will really push you to develop as an applied practioner in a way that working on toy datasets simply won't.

### Lesson 5: Fight Decision Fatigue

As a newcomer to NLP, it is easy to succumb to decision indecision, especailly when choosing among all the various tools of the trade that we explored in Chapter 9. Do not succumb; fight the urge. We recommend starting simple and being biased toward action, as always.

Start with fast.ai as a resource. Choose one of the two main frameworks; we recommend PyTorch if you are new to machine learning. Begin coding on Google Colab or on your local environment. Do not worry about all the different cloud compute providers or experiment tracking or productionizing models just yet. All of this can come later with experience and practice. The main goal is getting started as fast and painlessly as possible.

### Lesson 6: Data Is King

While we have spent the majority of the book discussing how to develop and productionize NLP models, what makes or breaks performance on many applied use cases is not the modeling approach, but rather the quality and quantity of data you have available to train the model. The more data, the better.

It is best to leverage publicly available datasets for your problem, where possible. You may also find datasets available for purchase online. But, to develop a truly performant model, you will likely need to build first-party data capture into your application so that you control the data off of which you build models. At the very least, you will likely need to have a partnership with a player that has great data capture.

Once you have data, annotations are vital. You could perform the annotation yourself, which we recommend you do initially to get started fast and to learn more about how to annotate the data well. Or you could hire an annotation firm, such as Appen or Scale AI. You could also hire low-cost labor through firms such as Invisible Technologies or Odetta to perform  the annotation. Amazon Mechanical Turk is also a good option, but requires more hands-on oversight than the other annotation companies.

There is good off-the-shelf and open source annotation software to perform the annotations, including Prodigy, which we explored in Chapter 3. However, to have the highest-qaulity annotations, you may need your organization to build a custom annotation UI. But again, start with off-the-shelf third-part tools, whree possible. Don't build from scratch unless you absolutely need to.

### Lesson 7: Learn on Humans

When develpoing an ML-based product, you will neend to leverage humans in the loop to handle edge cases that the model fails on and to perform active learning, which is the process of having humans annotate data points where the model performs poorly. Without the human in the loop, even if your NLP application is 90% good, it may not be ready for production because your users demand more than 99% accuracy. To deliver this accuracy to your users, you can leverage the humans in the loop to deal with the 10% of cases in which your model performs poorly. AI is not magic. For AI to be production ready, you will likely need to pair it with humans, at least initially.

More generally, build fault-tolerant experiences for your users when building NLP applications. Your model will fail in ways that you may never have anticipated, and, unless your application gracefully handles the failure, the model's failures may frustrate or anger your users. For example, Google Assistant asks users to confirm a question that is being asked when Google is not sure, and Google responds with a "i'm not sure I can help with you that" when it is truly befuddled. This softens the poor experience for the user.

### Lesson 8: Pair Yourself with Really Great Engineers

If your strongest skill set is in developing NLP models, pair yourself with really great engineers to help you more rubustly and easily productionize your NLP models. Great engineers will bring much needed systems thinking to your NLP pipeline, desining tests, managing MLOps, and more. In general, pair yourself with others who complement your particualr skill sets best because you will not be able to master everything on your own.

### Lesson 9: Ensemble

Ensembling is the closest thing to a free lunch in machine learning. Once you have a good model in production, design more models to complement what you have and include all the models together in an ensemble. To the extent the models have similiarly good performance but uncorrelated errors, the ensemble will outperform any of the standalone models in the ensemble. It's one of the easiest ways to improve the overall performance of your application in the enterprise.

### Lesson 10: Have Fun

This brings us to our very fianl piece of advice: have fun and enjoy the journey you are on. NLP is hard, and the path to mastery is long. Much like neural nets learn layer by layer to solve pretty complex problems, you will learn how to master NLP a step at a time. Just be patient, start simple, and, most importantly, celebrate the samll wins along the way. the more you allow yourself to experience joy, the greater the sense of flow you will experience and the faster you will become a true master of NLP.