You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running a Claude API relay called Feiyuan API (feiyuanapi.com) for a few months and wanted to share some cost analysis that might be useful for CherryStudio users.
What makes it different from typical reverse-proxy relays:
1. Built on Anthropic's official paid API (no reverse-proxy / no jailbreak)
Feiyuan uses a paid Anthropic commercial account and calls the official API directly. No risk of the "401 account banned mid-workflow" issue that has been hitting many relay services.
2. Prompt/response content not stored
The backend only logs: model name, token count, timestamp. Prompt content and responses are not written to any database.
3. Native Prompt Caching pass-through
The relay passes through Anthropic's cache_control headers without modification, so if you use a fixed system prompt, caching works exactly as documented.
Cost projections(based on Anthropic's published cache pricing — cache reads billed at 10% of input token cost; these are projections, not from a controlled benchmark):
Scenario
Without Cache
With Cache
Est. Saving
5K-token system prompt, 100 calls/day
~500K tokens/month
~50K tokens/month
~89% on cached input
The ~89% figure applies to the cached portion of input tokens only. Actual results depend on your cache hit rate.
Integration with CherryStudio:
Settings → Model Provider → Add OpenAI-compatible provider:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I've been running a Claude API relay called Feiyuan API (feiyuanapi.com) for a few months and wanted to share some cost analysis that might be useful for CherryStudio users.
What makes it different from typical reverse-proxy relays:
1. Built on Anthropic's official paid API (no reverse-proxy / no jailbreak)
Feiyuan uses a paid Anthropic commercial account and calls the official API directly. No risk of the "401 account banned mid-workflow" issue that has been hitting many relay services.
2. Prompt/response content not stored
The backend only logs: model name, token count, timestamp. Prompt content and responses are not written to any database.
3. Native Prompt Caching pass-through
The relay passes through Anthropic's
cache_controlheaders without modification, so if you use a fixed system prompt, caching works exactly as documented.Cost projections (based on Anthropic's published cache pricing — cache reads billed at 10% of input token cost; these are projections, not from a controlled benchmark):
The ~89% figure applies to the cached portion of input tokens only. Actual results depend on your cache hit rate.
Integration with CherryStudio:
Settings → Model Provider → Add OpenAI-compatible provider:
https://feiyuanapi.com/v1claude-sonnet-4-6/claude-opus-4-7/deepseek-chat/qwen3Models available: Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 + DeepSeek-V3/R1, Qwen3, Kimi
Docs: https://feiyuanapi.com/docs/?utm_source=github&utm_medium=discussion&utm_campaign=feiyuan&utm_content=cherry-studio
Happy to answer questions about caching setup.
Telegram group: https://t.me/feiyuanapi_group
Beta Was this translation helpful? Give feedback.
All reactions