[Bug] Token Tracking Issue - Bedrock with Streamified Predict

### What happened?

Token tracking is not working as expected when using a streamified module using Amazon Bedrock. `completion_tokens` is populated as expected, but all other fields have a value of 0. The one exception is `total_tokens`, but that just has the same value as `completion_tokens` since all other fields are 0.  

I have spent a lot of time trying to track down the issue on this. I've walked through the LiteLLM code in the debugger and I can see that all usage fields are being set, but at some point in the chunk iterator it seems that they are getting stripped. The LiteLLM code is way too dense and indirect for me to follow exactly where that is happening.  

The interesting thing is that while walking through LiteLLM code all of the usage fields looked right, total_tokens was the sum of prompt and completion tokens, which tells me that it's getting re-computed at some point.  

### To Recap
Token usage is getting set properly in LiteLLM, but at some point between LiteLLM and DSPy all usage is set to 0 except for `completion_tokens`

I have tried updating to the newest version of LiteLLM and that didn't make a difference.
LiteLLM version tested: `1.71.1`, `1.72.6.post1`

### Steps to reproduce

I have tried many different parameters for the streamify call. The example below uses `async_streaming=False` because the results are the same either way and the example is less cumbersome without async streaming.  

Here is a minimal example that will reproduce the error:
```python
import os
import dspy

# If no AWS profile is configured you can use access key and secret key instead
os.environ["AWS_PROFILE"] = "YOUR_AWS_PROFILE"
# os.environ["AWS_ACCESS_KEY_ID"] = "YOUR_AWS_ACCESS_KEY_ID"
# os.environ["AWS_SECRET_ACCESS_KEY"] = "YOUR_AWS_SECRET_ACCESS_KEY"
# os.environ["AWS_SESSION_TOKEN"] = "YOUR_AWS_SESSION_TOKEN" <-- Optional
os.environ["AWS_REGION"] = "us-east-1"


class QAPredictor(dspy.Signature):
    """Answer questions with helpful, accurate responses."""

    question = dspy.InputField()
    answer = dspy.OutputField()


predictor = dspy.Predict(QAPredictor)
lm = dspy.LM("bedrock/us.amazon.nova-lite-v1:0")
dspy.configure(lm=lm, track_usage=True)
streaming_predictor = dspy.streamify(
    predictor,
    async_streaming=False,  # <--- Doesn't matter if this is True or False fot token tracking
    include_final_prediction_in_output_stream=True,
)


def generate_response(question):
    for chunk in streaming_predictor(question=question):
        if isinstance(chunk, dspy.Prediction):
            usage = chunk.get_lm_usage()
            print(usage)


if __name__ == "__main__":
    generate_response(question="What is the capital of Maine?")
```

### DSPy version

3.0.0b1, also tested with 2.6.27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Token Tracking Issue - Bedrock with Streamified Predict #8416

What happened?

To Recap

Steps to reproduce

DSPy version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Token Tracking Issue - Bedrock with Streamified Predict #8416

Description

What happened?

To Recap

Steps to reproduce

DSPy version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions