# The rise of AI consultancy



AI consultancy is all the rage. Accenture says that it now does USD2.5bln in ARR on Generative AI projects and BCG says 20% / USD2.5bln of its revenue this year and 40% in 2026 will come from AI!

Now of course, some of these claims are just marketing and re-labelling non-AI revenue. But there is certainly a real and voracious demand for AI implementation projects by companies. Which is not surprising considering the vast cost opportunity even today from LLMs - and the potential for real superior work done by future LLM generations.

In this article, I'll write about Accenture and my thoughts on their research and strategy in AI.


## Accenture, winner or loser to generative AI?

Accenture is the largest technology consulting firm in the world. When ChatGPT first emerged I said to friends that they would be one of the biggest winners of LLMs as implementation work would be hugely important to this technology. However, I would say that they have been very slow to respond, and that they don't have a sufficiently strong technical team today to really be a leader in this market. There's still plenty of time to turn things around, and they have a great distribution position, but the real strategic leadership in AI today is not at Accenture, who likely need stronger technical people in AI leadership positions.

Recently, Accenture published their [Tech Vision 2024](https://www.accenture.com/content/dam/accenture/final/accenture-com/document-2/Accenture-Tech-Vision-2024.pdf#zoom=40) report, their flagship research view of the state of technology. Unsurprisingly, a significant portion is devoted to LLMs. 

In this article, I will discuss the technical errors that their research team made, followed by some thoughts on areas where Accenture can generate significant revenue using its scale and distribution. I also will lay out my opinion about how Accenture should position itself to best benefit from AI. 

This discussion is particularly interesting as Accenture has a huge opportunity if they get generative AI right, but also have $30bln of technology implementation revenue at significant risk from LLM disintermediation. They have to get AI right for the company to even be moderately successful from here. 

Key questions investors should ask themselves:
- Can Accenture can succeed in this highly technical field without that expertise?
- Is deep technical expertise even necessary for them to succeed? 
- Who stands to benefit if they fall short?

# 1. What did Accenture get wrong?

Here are a few technical issues that Accenture got wrong in their research

- Alpaca and Chinchilla as small language models
- The RAG method
- RAG/few shot prompting vs fine tuning.
- CoT prompting and AutoGPT
- Vector Databases are essential to represent high dimensional data


### Smaller Language Models

On page 21, Accenture wrote a paragraph that should be a significant red flag for investors expecting them to succeed in the Generative AI space:

> "A slight variation on this is also gaining traction.
Enterprises are beginning to fine-tune smaller
language models (SLMs) for specialized use
cases. SLMs like DeepMind’s Chinchilla and
Stanford’s Alpaca have started to rival larger
models while requiring only a fraction of the
computing resources. These SLMs are not
only more efficient, running at lower cost with
smaller carbon footprints, but they can be
trained more quickly and used on smaller,
edge devices."

This paragraph is broadly wrong.

1. Chinchilla cannot be finetuned or used in any way. It is a research model that is not available to users or companies.
2. The main Chinchilla model was among the largest LLMs at the time when built. It is not a 'smaller' language model. Even today outside of GPT-4 class models it would be a big model. 
3. Smaller Language Models are not trained more quickly. Instead they trade off slower training to make them faster and cheaper at inference time. The clearest way to see this progression is at the ~7bln parameter mark. The Chinchilla [paper](https://arxiv.org/pdf/2203.15556) (see my discussion of the Chinchilla frontier below) suggests that a 7bln parameter model should be trained for around 150bln tokens to be compute optimal. Starting with Llama-1, we've seen a succession of models of this size trained for much longer than this (Llama-1 7B params 1T tokens, Llama-2 7B params 2T tokens, Mistral 7B params 3T tokens, Llama-3 8B params 15T tokens)  in order to build more performative models that are smaller. In fact, Llama-3 8B was trained for more tokens than the original GPT-4 (see my discussion of GPT-4 compute [here](https://github.com/HNx1/dl_comments/blob/master/gpt4_semianalysis_leak.ipynb)). The complete opposite of Accenture's claims on training speed apply to smaller language models.
4. As touched on in 3, the entire point of the Chinchilla paper is opposed to small language models. Chinchilla discusses building compute optimal models which are much larger than actual models used in practice, not smaller. A model trained for 15T tokens like Llama-3 8B/70B would be 600bln params (or 100x/10x bigger respectively) at Chinchilla optimality (on token equivalence, on compute equivalence would be 70bln/200bln params respectively). Chinchilla is the opposite of a 'smaller language model' philosophy.
5. Alpaca is not relevant to this topic. It is a finetuning of the Llama model family, which is in fact the main flag-bearer for inference-optimized open-source LLMs.  If I had written this report, I would have focused on the Llama / Mistral family when discussing these smaller LLMs used for cheaper inference. Microsoft's Phi is also a highly important model family in this area which Accenture could have used.
6. Alpaca is not about making a language model smaller. Instead, Alpaca was an early effort to bring instruction tuning to open-source LLMs. In [InstructGPT](https://arxiv.org/pdf/2203.02155) OpenAI introduced a method of teaching LLMs to effectively follow human instructions, which became the basis for the success of ChatGPT. Alpaca was simply an early effort to replicate that instruction tuning process on an open-source model. OpenAI expended significant commercial resources to build their instruction dataset, and the core contribution of Alpaca was a synthetic open counterpart to this data that had manageable cost to build. 

In mitigation, the Chinchilla inclusion can be slightly justified by these sentences in the abstract: 

>"Chinchilla uniformly and significantly outperforms ... [larger models] ... on a large range of downstream evaluation tasks.
This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly
facilitating downstream usage."

These sentences possibly tricked Accenture about the true nature of Chinchilla. The modern philosophy of inference-optimized models (training for as long as possible, given no convergence observed + available data and training compute budget) is clearly different to the idea of compute-optimal scaling (stopping training far short of convergence) which is the core idea of the Chinchilla paper. The Chinchilla paper sits most clearly as a follow-up to [Scaling laws for neural language models](https://arxiv.org/pdf/2001.08361) from OpenAI. These papers studied how model performance scaled with added size, data and compute.  They are not about making language models smaller, and there are numerous other models Accenture could have used that would have been more illustrative than Chinchilla about the very real phenomenon of using smaller models, trained for longer, to create faster and cheaper applications of LLMs.



### Misunderstandings about Retrieval Augmentation

On page 22, Accenture make a misleading claim about RAG

>"The LLM is, of course, trained on a huge amount
of data initially, but only uses the specific
information it receives to generate its response
to the user. "

One might believe on reading this that any information retrieved in RAG will override the knowledge encoded in the model weights. This is not the case. The model will still use its own knowledge in a RAG response, not only what is retrieved. It is indeed sometimes the case that irrelevant information from RAG will cause hallucination from the model, but other times the model will ignore the irrelevant information.

### RAG cost trade-offs

Again on page 22, Accenture make the following claim:

>"Grounding an LLM through in-context learning
and RAG takes much less time and compute
power ... than ... finetuning"

Compute-wise this is not necessarily true, but it is true unless you have very high usage relative to the dataset size - in most practical applications RAG will be cheaper compute-wise. 

Time-wise - it's not true. I would say it's very case dependent in training/preparation. At inference time RAG will always take a bit longer than a finetuned model.  

At a basic level, this happens because RAG / extra context increases the input token length which increases the initial quadratic attention cost in the first token generation (subsequent tokens use KV caching in all but very rare edge cases so extra input tokens only induce extra compute cost on the first token generated). So it is definitely possible to have higher costs when using RAG vs an effectively finetuned model. Moreover, RAG is an extra process that involves vector search, which introduces extra latency into the model, particularly if it's not very well configured (e.g. your vector store and model call are in different AWS regions) and can be significant extra latency for simple use cases / small models / low output length. RAG can in some cases actually add to hallucination - if an irrelevant piece of information is recovered, then it can confuse the model - but for the most part RAG will reduce hallucination and of course creates much better verifiability for the model.

From a practical point of view, these are small issues but can be amplified for the right use case. Accenture should be able to advise clients when these nuances exist and what their impacts can be.

Generally I advise companies to not finetune (see section 3 [here](https://github.com/HNx1/dl_comments/blob/master/consultants_building_llms.ipynb)) but for a mostly different set of reasons - among those reasons I agree with Accenture that complexity is a good reason not to finetune. [Azure](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/fine-tuning-considerations) pretty aggressively steer companies away from finetuning for this reason. Scale AI recently published a report finding that 43% of ML practitioners are finetuning generative AI models vs 38% doing RAG which suggests that at least among companies actively developing generative AI applications at the cutting edge, finetuning is popular. As larger enterprises spend more time building these systems, I would certainly expect less finetuning due to its complexity and the other reasons I discuss in that report I linked.



### AutoGPT and CoT prompting

On page 34-35:

>"Chain-of-thought prompting is an approach developed
to help LLMs better understand steps in a
complex task. It started with researchers
realizing they could provoke better outcomes
by breaking down prompts into explicit steps,
or even prompting the model to “think about
this step by step.”"

Let's think step by step came after few shot CoT. It's not accurate to say that it started with zero shot CoT.

>"AutoGPT and BabyAGI are two
open-source applications that leverage LLMs
and automate chain-of-thought prompting."

Chain Of Thought, or CoT, has become a quite a common thing to see in modern model technical reports as an evaluation tool - Gemini and Claude 3 both had quite a few CoT benchmarks. The term was defined in this [paper](https://arxiv.org/pdf/2201.11903) from Google Brain. 

The core idea of CoT is that it provides extra reasoning steps to the model through in-context examples. Then, when the model gives an answer it should try to produce its own intermediate reasoning steps which help the model get to the right answer with a single model response. You can find another description of it [here](https://arxiv.org/pdf/2210.09261) which again stresses the presence of in-context exemplars followed by a single model call.

The processes of AutoGPT/BabyAGI do not exhibit in-context exemplars.

Now you could correctly say - what about [zero shot CoT](https://arxiv.org/pdf/2205.11916) which of course doesn't have in-context exemplars either? To that I would say that zero shot CoT is defined by the phrase 'Let's think step by step' which doesn't appear in BabyAGI or AutoGPT.

Finally both systems rely on chains of model calls and self-interaction, whereas CoT is a single model response. It's possible the authors were confused between a chain of tool/model calls and a chain of thoughts - these two chains are very different in definition!





### Vector DBs

On page 20:

>"vector databases are
essential to represent high-dimensional data
for inferring relationships and similarity"

High dimensional data has been stored and accessed for many years in non vector formats. Usage of vector databases even for high dimensional data is very use case dependent.

Enterprises should use vector databases only when three additional conditions apply:

1) The data cannot be easily disaggregated on dimensional lines
2) 100% recall is not needed
3) There is too much data for a fast exact search

Disaggregate on dimensions - If you take the US census as an example, it collects 100s of millions of people's information across a number of different characteristics, some of which can take on many values (e.g. employment). Clearly this dataset has high dimensionality, even from a statistical point of view. But if I wanted to search this data, there would be no need for a vector db, or even any embeddings at all - I could simply run a straightforward database query filtering each characteristic for the values I want. 

100% recall - if a user logs in to my website, I need to be able to find that exact user with 100% accuracy. An approximate search using vector similarity would not be acceptable. If a solution that needs 100% recall isn't scaling effectively, I can't just plug in a vector DB - instead I need an alternative solution.

Too much data - If you have 100,000 documents, you can easily fit them in local server memory and do exact search locally with minimal latency. For 100mln documents you would want a vector db.

It's worth remembering that Google served internet scale search long before vector DBs and embedding models existed and apparently does not use either in its main search today.




# 2. Agreements

What does Accenture talk about in the report that represent significant revenue opportunities for them?

- The importance of agents
- LLMs change how we interact with information in a significant and disruptive way - page 17
- Data security and access is a huge risk of LLM usage
- Golden questions and correct answers

### The importance of agents

### Interaction with data

### The risks of data leakage

### Golden Questions

Can we do SEO through dataset poisoning?


# 3. Where can Accenture add value that they didn't mention?

Accenture did omit some more ideas that I think offer significant opportunity for them.

- Model evaluation
- Synthetic Data processes
- Should companies publish lots of their own data?
- Code/knowledge documentation practices for LLMs
- Extracting data and expertise from existing employees
- Collect high quality feedback without asking for it


### Model evaluation

When LLMs are deployed, companies will want to know how they are performing. Evaluation is a hard problem as many problems don't have a clear answer. However, Accenture has the scale to build great evaluations. 

There are a couple of ways Accenture can help build internal evaluation systems
- Identify tasks with concrete correct outcomes and measure model success in achieving those outcomes
- Ask the model to generate solutions for historic tasks where an employee already generated a solution

Public evaluation can be used to generate publicity and a strong reputations for Accenture in the LLM community.

Accenture has an unparalleled understanding of the real business problems and real software problems that businesses face, making it uniquely capable of building practical enterprise evaluation sets for LLMs. Accenture evaluations should be appearing in every model release.

### Synthetic data process


### Should companies generate lots of their own data on the internet?

One of the biggest recommendations I have for a startup is to generate lots of synthetic data representing good usage of their product. Not for humans, but to be scraped for LLM training data. You want the latest models to know how to use your product/system well! It's also worth considering whether you should organize your data in a way that makes it easy for browser tools to find.




### Information documentation practices for LLMs

Companies should build RAG ready versions of their information for clients, of their code documentation etc. This allows far more utility for LLMs when interacting with their products. 

### Extracting data and expertise from existing employees

On page 40, Accenture briefly mention human talent extracting their skills and knowledge to help agent performance but really the surrounding text talks about data that is already documented - they give examples of Morningstar and Morgan Stanley using existing documentation. Companies should be going beyond and more extensively documenting their processes and expertise.

In my view, one of the best ways for Accenture to win in AI is to systematically document the expertise, solutions and actions of its employees - which becomes training data for models that Accenture can use. Same goes for their clients - companies who systematically extract high quality data from their employees will be able to build better models in the future. An extant example of this from a slightly different angle is Adept, which teaches models how to interact with software by tracking how actual employees at their clients do it.

## Feedback without solicitation




# 4. The vision

So what should Accenture's plan for generative AI be then?  They obviously can't train models better, or build better infrastructure, but they can differentiate in a couple of key areas where scale and client access really matter.

- Evaluation for business
- Inference
- Data for business

## Evaluation

The biggest missed opportunity for Accenture to be involved in the Generative AI zeitgeist is model evaluation. They have the scale and access to build great benchmarks for model performance in business. Their brand could be all over every new model release if they built great evals!

## Inference

Accenture is well positioned to build a deep understanding of how models can be prompted and tooled to achieve specific business outcomes. They should be consistently publishing best practices and guides in this area. IF they do $500mln in AI projects a quarter they should know what works and what doesn't. They should be marketing this expertise aggressively. The enterprise world is struggling today to understand how to get the best of models - and Accenture should already know. 

## Data for business

Accenture should build datasets that are uniquely valuable for business. They work across so many industries and problems that should be useful for model training and RL data. They are 500x the size of OpenAI. They have vast scale and expertise to lever for model improvement.


