In [1]:
import litellm
import agentops

## Science experiment: Non-stochastic Parrot
I'm personally very frustrated when I ask LLMs to repeat *exactly* what I want back. Why is this always such a problem?

This is just a fun experiment to build a better mental model about how much I can trust these things. Here's the categories of datasets I watn to try:

* Sequences of numbers
* Book passages
* Foreign language passages
* Random-ish copy+pastes
* Code

All included with and without typos!

# Find/generate data

In [2]:
# Random number sequence
import random
rng_string = ''.join([str(random.randint(-9999,9999)) for x in range(1000)])

In [3]:
rng_string[:50]

'67772958-616-94919626-232376845722690916-3832-3253'

In [4]:
# Book passage
passage_string = """To avoid fixating on pen-based solutions, we could rephrase the
problem space as: “a writing instrument that works in zero gravity.”
That would allow for a pencil as a solution. But that’s still anchored
on “a writing instrument” solution. We can do even better than that:
“a way to record notes in zero gravity for later reference that is easy
touse.”Thatproblemspacestatementwouldallowformorecreative
solutions such as voice recording with playback. In fact, considering
out-of-the-boxsolutionideascanhelpyourefineyourproblemspace
definition, even if they aren’t feasible. In this case, a voice recorder
would probably not be as good a solution as a Space Pen. It would
needapowersourceandwouldrequireplaybacktorefertothenotes
again, which would be less convenient than being able to scan and
read them. But undergoing this thought exercise would allow us to
furtherrefineourproblemspacedefinitionto:“awaytorecordnotes
in zero gravity for convenient reference later on that is easy to use, is
inexpensive, and does not require an external power source.”
Ialwaysliketoclarifythatthisexampleisbynomeansanattempt
to make fun of NASA. I tell the story a certain way to highlight the
point I want to make. Indeed, the conclusion that NASA came to
turned out to be the best one. There are good reasons not to use
pencils in space: the lead tips can break off and float into an astro-
naut’seyeorcauseashortinanelectricalconnection.Afterthetragic
Problem Space versus Solution Space 15
Apollo 1 fire in 1967, NASA required all objects in the cabin to
be nonflammable, including the writing instruments. So the Space
Penactuallywas ausefulinnovation,whichtheRussianspaceagency
also adopted.
When I mention the space pen in my talks, there is often someone
who claims that the story is an urban legend. However, it isn’t,
as NASA explains at http://history.nasa.gov/spacepen.html, and the
Fisher Space Pen Company confirms at http://fisherspacepen.com/
pages/company-overview. The key point of debate usually is, who
spent the money on research and development: NASA or Fisher?
Fisher did, as I pointed out above.
PROBLEMS DEFINE MARKETS
Early in my product career, Intuit’s founder Scott Cook helped
me solidify the concept of problem space versus solution space
when I heard him talk about TurboTax. Speaking to a group of
product managers, Scott asked us, “Who is TurboTax’s biggest
competitor?” Multiple hands shot up. At the time, the other major
tax preparation software in the market was TaxCut by H&R Block.
After someone confidently answered, “TaxCut,” Scott surprised us
all by saying that the biggest competitor to TurboTax was actually
pen and paper. He pointed out that, at the time, more Americans
were still preparing their taxes by hand using IRS forms than all tax
software combined.
This example highlights another advantage of clear problem space
thinking: having a more accurate understanding of the market in
which your product is really competing. Those of us in the audience
were narrowly thinking in solution space of the “tax preparation
software” market, as defined by the two main software products.
Scott was thinking in problem space of the broader “tax prepa-
ration” market—one that would also include tax accountants to
whom customers delegate their tax preparation. As the previous
chapter discusses, a market is a set of related customer needs, which
rests squarely in problem space. A market is not tied to any specific
solutions that meet those needs. That is why you see “market
disruptions”: when a new type of product (solution space) better
meets the market needs (problem space). New technology can often
16 The Lean Product Playbook
enable a market disruption to deliver similar benefits at a much
lower cost. Voice-over-Internet-Protocol (VOIP) is a great example
of a disruptive technology that has replaced traditional telephone
service. At first, the sound quality of VOIP calls couldn’t compare to
that of traditional phone lines, but the cost was so much lower that
it offered a superior solution for much of the telephone market.
THE WHAT AND THE HOW
As a product manager at Intuit, I learned to write detailed product
requirements that stayed in the problem space without getting into
the solution space. We were trained to first focus on “what” the
product needed to accomplish for customers before getting into
“how” the product would accomplish it. You often hear strong
product teams distinguishing between the “what” versus the “how.”
The “what” describes the benefits that the product should give the
customer—what the product will accomplish for the user or allow
the user to accomplish. The “how” is the way in which the product
delivers the “what” to the customer. The “how” is the design of the
product and the specific technology used to implement the product.
“What” is problem space and “how” is solution space.
OUTSIDE-IN PRODUCT DEVELOPMENT
A failure to gain a clear understanding of the problem space before
proceedingto the solution space is prevalentin companiesand teams
that practice “inside-out” product development, where “inside”
refers to the company and “outside” refers to customers and the
market. In such teams, the genesis of product ideas is what one or
more employees think would be good to build. They don’t test the
ideas with customers to verify if the product would solve actual
customer needs. The best way to mitigate the risk of an “inside-out”
mindset is to ensure your team is talking with customers. That’s
why Steve Blank urges product teams to “get out of the building”
(GOOB for short).
In contrast, “outside-in” product development starts with an
understanding of the customer’s problem space. By talking with
Problem Space versus Solution Space 17
customers to understand their needs, as well as what they like and
don’t like about existing solutions, outside-in product teams can
form a robust problem-space definition before starting product
design.Leanproductteamsarticulatethehypothesestheyhavemade
and solicit customer feedback on early design ideas to test those
hypotheses. This approach is the essence of Lean—and was actually
firstadvocatedforyearsagobypractitionersofuser-centereddesign.
SHOULD YOU LISTEN TO CUSTOMERS?
Somepeoplecriticizeuser-centereddesignbysayingthattalkingwith
userswillnotleadyoutocomeupwithnew,breakthroughsolutions.
Those critics like to quote Henry Ford, who famously said: “If I had
askedpeoplewhattheywanted,theywouldhavesaidafasterhorse.”
They also like to point out the example of Steve Jobs and how Apple
haslaunchedmanysuccessfulproductsusingwhatseemstobeavery
“inside-out” product development process. In fact, Steve Jobs cited
the same Henry Ford quote in a 2008 interview with Forbes.
It is true that customers are not likely to identify the next break-
through solution in your product category. But why would anyone
expect them to? They are not product designers, product managers,
or technologists. The fallacious thinking comes in when people use
this argument to rationalize why it’s not important to talk with cus-
tomers or to understand their needs and preferences. Most people
who make that argument are really using it as an excuse to not talk
with customers because they want to adopt an “inside-out” philoso-
phy. They think that they have all the answers and that talking with
customers is a waste of time. They don’t understand problem space
versus solution space.
It’s likely true that customers won’t invent a breakthrough prod-
uct for you; but that doesn’t mean it’s a waste of time to understand
their needs and preferences. On the contrary, a good understanding
of customer needs and preferences helps product teams explore new
potential solutions and estimate how valuable customers are likely
to find each one to be.
Critics of user-centered design like to justify their views by
saying, “Apple doesn’t talk to customers.” At Apple’s 1997
Worldwide Developers Conference, Steve Jobs shared a more
18 The Lean Product Playbook
enlightened perspective that is consistent with the Lean Product
Process when he said:
You’vegottostartwiththecustomerexperienceandwork
backwards to the technology. You can’t start with the
technology and try to figure out where you’re going to try
to sell it.... As we have tried to come up with a strategy
and a vision for Apple, it started with: What incredible
benefits can we give to the customer?. . . Not starting
with: Let’ssitdownwiththeengineersandfigureout what
awesome technology we have and then how we’re going
to market that. And I think that’s the right path to take.
A TALE OF TWO APPLE FEATURES
Even though Apple does indeed have a reputation for not solicit-
ing customer feedback on products before they’re launched, a large
part of why their products are so successful is because, despite that,
theyhaveanin-depthunderstandingofcustomerneeds.Considerthe
TouchIDfingerprintsensorthatAppleintroducedwiththeiPhone5S.
Touch ID utilizes advanced technology: the high-resolution sensor is
only 170 microns thick and captures 500 dots per inch. The button
is made of sapphire crystal—one of the clearest, hardest materials
available—toprotectthesensor.Thebuttonalsoactsasalenstopre-
ciselyfocusthesensorontheuser’sfinger.TouchIDmapsoutindivid-
ualdetailsintheridgesoffingerprintsthataresmallerthanthehuman
eyecanseeandcanrecognizemultiplefingerprintsinanyorientation.
It’s unlikely that any iPhone customer would have come up with
suchasolution.IwouldguessthatAppledidn’ttestthesolutionwith
many customers before launching it. Despite that, I argue that the
iPhone team had a good understanding of the problem space and
couldbeconfidentthatcustomerswouldconsiderTouchIDvaluable.
TouchIDofferedanewalternativetothetraditionalwayofunlocking
your iPhone and logging in to the App Store to make a purchase.
Touch ID is better because what matters to customers when they’re
authenticatingishowconvenientandhowsecureitis.Usually,thereis
atensionbetweenthosetwocustomerbenefits,withmoreconvenient
authentication mechanisms being less secure (and vice versa).
Problem Space versus Solution Space 19
Most iPhone users will tell you that they unlock their phones quite
frequently, often multiple times per day. Because people value their
time, reducing the time it takes to unlock is a clear benefit. iPhone
users value security, too. They don’t want unauthorized people to be
able to access their phone, especially if it is lost or stolen. With a
four-digit passcode, the odds of someone guessing your passcode are
1 in 10,000. According to Apple, the odds that two fingerprints are
similarenoughforTouchIDtoconsiderthemthesameis1in50,000
(and it’s much harder to try different fingers than it is to type in
different numbers).
Touch ID makes authenticating much quicker than having to enter
an unlock passcode or App Store password. It’s also more conve-
nient because users no longer have to worry about forgetting these
passcodes.
Because Touch ID clearly saves time, is more convenient, and is
more secure than the previous solution, the iPhone team could be
confident that customers would consider the feature valuable, even
without explicitly validating it with them. However, if Apple didn’t
test Touch ID with customers, it still ran the risk of some unforeseen
negative consequence. It’s worth pointing out that Apple does test
their products internally with their employees (who are often a good
proxy for customers). This internal testing tactic where you use your
own product is called “dogfooding.”
That being said, Apple isn’t perfect. For example, customers were
not happy with a product “improvement” that Apple made with the
power button on the 2013 MacBook Pro. In the prior version of the
laptop, the power button was located away from the keyboard keys,
was smaller, had a different color, and was inset, all of which made
it difficult to press by accident. When users pressed the button in the
prior version, a dialog window would appear, providing options to
restart, sleep, or shut down their laptop, along with the option to
cancel any action. But Apple decided to change the power button
designforthe2013version:theymadeitlookliketheotherkeysand
incorporated it into the keyboard (in the upper right, where the eject
key used to be). The new power button was placed right next to the
“delete”key as well as the key that increasesthe sound volume, both
of which are used frequently. As a result, users started accidentally
pressing the power button (and then had to click the cancel button).
20 The Lean Product Playbook
To add insult to injury, Apple’s subsequent operating system
update—OSX Mavericks—changed the behavior of the power
button. When the power button is pressed in Mavericks, you no
longer get the dialog window with its various choices; instead
your computer goes right to sleep. The combined effect of those
two changes (moving the power button and changing its behavior)
resulted in frustrated users whose laptops would suddenly go to
sleep unexpectedly. Usability issues such as this are easy to identify
through customer testing—even with a small number of testers.
Let’s compare these two Apple examples. In the case of the
Touch ID, there were clear benefits and no unforeseen risks arose.
In the case of the power button changes, what were the intended
customer benefits? It’s unclear what they were. Perhaps the new
power button design addressed internal company objectives related
to aesthetics or reduced cost. Regardless, the button’s new design
and behavior resulted in dissatisfaction for customers. It’s true that
customers aren’t going to lead you to the Promised Land of a break-
throughinnovativeproduct,butcustomerfeedbackislikeaflashlight
inthenight:itkeepsyoufromfallingoffacliffasyoutrytofindyour
way there.
USING THE SOLUTION SPACE TO DISCOVER THE PROBLEM SPACE
Customers are also not likely to serve you their problem space needs
on a silver platter. It’s hard for them to talk about abstract benefits
and the relative importance of each—and when they do, it’s often
fraught with inaccuracies. It’s therefore the product team’s job to
unearth these needs and define the problem space. One way is to
interview customers and observe them using existing products. Such
techniques are called “contextual inquiry” or “customer discovery.”
You can observe what pain points they run into even if they don’t
explicitlymentionthemtoyou.Youcanaskthemwhattheylikeand
don’tlikeaboutthecurrentsolutions.Asyouformhypothesesabout
the customer needs and their relative importance, you can validate
and improve your hypotheses using these techniques.
Therealityisthatcustomersaremuchbetteratgivingyoufeedback
in the solution space. If you show them a new product or design,
they can tell you what they like and don’t like. They can compare
it to other solutions and identify pros and cons. Having solution
Problem Space versus Solution Space 21
space discussions with customers is much more fruitful than trying
to explicitly discuss the problem space with them. The feedback you
gather in the solution space actually helps you test and improve your
problem space hypotheses. The best problem space learning often
comes from feedback you receive from customers on the solution
space artifacts you have created.
Problem space and solution space are an integral part of the
Product-Market Fit Pyramid, as shown in Figure 2.1. Your product’s
feature set and UX live in solution space—they’re what customers
can see and react to. The other three layers of the pyramid live in
problem space. The important interface between problem space
and solution space occurs between your value proposition and your
featureset.Itis,ofcourse,withinyourcontroltochangeyourfeature
set and UX as you like. Unlike customers and their needs, which you
can target but can’t change, value proposition is the problem space
layer over which you have the most control.
As Dave McClure of 500 Startups said, “Customers don’t care
aboutyoursolution.Theycareabouttheirproblems.”Keepingprob-
lem space and solution space separate and alternating between them
as you iteratively test and improve your hypotheses is the best way
to achieve product-market fit. The Lean Product Process gives you
step-by-stepguidanceonhowtodothat.Let’sjumpintothefirststep
of the process: identifying your target customer."""

In [5]:
passage_string[:100]

'To avoid fixating on pen-based solutions, we could rephrase the\nproblem space as: “a writing instrum'

In [6]:
# https://www.lemonde.fr/international/article/2024/11/02/election-presidentielle-americaine-2024-pour-les-democrates-la-defense-de-l-avortement-est-une-arme-a-double-tranchant_6372755_3210.html
french_article = """Vous pouvez partager un article en cliquant sur les icônes de partage en haut à droite de celui-ci. 
La reproduction totale ou partielle d’un article, sans l’autorisation écrite et préalable du Monde, est strictement interdite. 
Pour plus d’informations, consultez nos conditions générales de vente. 
Pour toute demande d’autorisation, contactez syndication@lemonde.fr. 
En tant qu’abonné, vous pouvez offrir jusqu’à cinq articles par mois à l’un de vos proches grâce à la fonctionnalité « Offrir un article ». 

https://www.lemonde.fr/international/article/2024/11/02/election-presidentielle-americaine-2024-pour-les-democrates-la-defense-de-l-avortement-est-une-arme-a-double-tranchant_6372755_3210.html

Avec des référendums sur le sujet dans dix Etats américains, le scrutin du 5 novembre pourrait se révéler comme un plébiscite pour la défense de l’avortement aux Etats-Unis. Il n’est pas sûr, en revanche, que la mobilisation autour d’un droit que les femmes américaines croyaient intouchable profite autant aux démocrates qu’ils l’espéraient. Leur candidate, Kamala Harris, qui a fait de la défense de la « liberté de choisir » l’un des principaux axes de sa candidature, pourrait même indirectement pâtir du clivage que la campagne sur l’avortement a accentué.
Dans ces dix Etats (Arizona, Colorado, Dakota du Sud, Floride, Maryland, Missouri, Montana, Nebraska, Nevada et New York), les électeurs sont saisis, en même temps que du choix du président et des membres du Congrès, d’amendements constitutionnels visant à élargir ou à protéger l’accès à l’avortement. Deux font partie des Etats-clés pour l’élection présidentielle : l’Arizona et le Nevada. Deux autres ont une importance cruciale pour le contrôle du Sénat : le Montana et la Floride.
Les questions sont diversement formulées. Dans les Etats où l’avortement est légal au-delà de la quinzième semaine (Colorado, Nevada, Maryland et Montana), il s’agit d’en garantir l’accès ou de l’élargir, en autorisant, par exemple (Colorado), l’utilisation de fonds publics pour le remboursement des interruptions volontaires de grossesse (IVG). Dans les autres (Arizona, Missouri, Dakota du Sud et Floride), c’est la levée des restrictions actuelles qui est en jeu, avec des conséquences concrètes pour des millions de femmes."""

In [7]:
# https://www.spiegel.de/deinspiegel/so-arbeitet-ein-hufschmied-haemmern-schmieden-raspeln-schleifen-a-8fbe3967-a855-4224-be03-67a9f8648f08
german_article = """Mehrere Zehntausend Pferdehufe hat Hannes seitdem beschlagen. Alle sechs Wochen sollten die Tiere zur »Pferde-Fußpflege«. Wenn die Hauspferdehufe nicht gekürzt und zugeschnitten werden, kommt es leicht zu Fehlstellungen des Fußes, Sehnen am Pferdebein können sich entzünden, das Tier hat Schmerzen und läuft nicht mehr rund.

 Hufschmied Hannes Heuschneider mit Pferdebesitzer André Kühn auf einem Gestüt in Landshut

Bild vergrößern
Hufschmied Hannes Heuschneider mit Pferdebesitzer André Kühn auf einem Gestüt in Landshut Foto: Frank Bauer / Dein SPIEGEL
Hufschmied ist ein harter Job. Es kommt schon mal vor, dass ein Pferd nervös reagiert, sich plötzlich bewegt oder mit den Beinen ausschlägt. Immer wieder mal steigt ein 700-Kilogramm-Pferd auf den Fuß des Schmieds – deshalb trägt Hannes Schuhe mit Metallkappen. »Ich muss schnell reagieren, um auszuweichen«, sagt er. Trotzdem: »Gerissene Daumensehnen, gebrochene Rippen, ausgekugelte Gelenke, geklemmte Finger: In meinem Beruf ist man wohl ähnlich oft verletzt wie in dem des Eishockey- oder Football-Spielers.«

Um den Nutztieren Schmerzen zu ersparen und sie einsatzfähiger zu machen, versuchen die Menschen seit mehreren Tausend Jahren, die Hufe zu schützen. Im alten Griechenland und im Römischen Reich flocht man eine Art Socken aus Gräsern, Bast, Ginster oder Binsen um die Hufe. Man versuchte es mit Lappen, später mit »Hippo-Sandalen«. Das waren Eisenschuhe, in die die Pferdefüße gesteckt wurden. Vermutlich erfand dann das asiatische Reitervolk der Skythen die ersten Hufbeschläge. Sie brachten Eisen mit Nägeln an den Hufen an. Bis vor etwa 100 Jahren stellten Hufschmiede noch ihre eigenen Hufeisen her. Heute kauft man sie beim Fachhändler oder im Internet in vielen Größen und Materialien."""

## Chinese
https://www.w3newspapers.com/chinese/


In [8]:
chinese_article = """　　云南红河哈尼梯田俯瞰。
　　新华社记者 张 驰摄
　　山东曲阜孔子博物馆的明代竹雕摆件“万象回春”，以象为乘、寄托吉祥；
　　陕西历史博物馆的文物彩绘雁鱼铜灯，是西汉的“黑科技”环保灯；
　　北京中轴线参照天地宇宙秩序而建，体现出人对自然的认识和思考；
　　…………
　　与文物对话，我们不难发现，顺天造物、和谐共生、天人合一，这些中华优秀传统文化中的生态理念，贯穿其中、赓续延绵，融入一代代人的生态保护实践。
　　顺天造物
　　文物蕴含了人们顺应自然、保护自然、开发自然的理念
　　近两年，云南西双版纳一群亚洲象闯入人们视野，它们或觅食、或嬉戏，一路得到沿途群众的关照，最终安全回到保护区。野生大象在人们的爱护下享受着安全自由的生活，呈现出一派人与自然和谐相处的动人景象。
　　象，在中国传统文化中，有着“祥”的象征意义。东汉班固在《白虎通义·礼乐》中说：“武王曰《象》者，象太平而作乐，示已太平也”。象，成为中华传统文化中重要的动物形象之一。
　　中国社会科学院学部委员、考古研究所研究员冯时说：“智慧的先人驯服大象，借助大象行路、运输，适度为己所用，巧妙地顺应自然界的特征，追求人和自然和谐共处。”
　　山东曲阜孔子博物馆内，明代竹雕“万象回春”图摆件，大象身高腹大，回首卷尾，通体饰回纹。象背上，两人单跪跨步，双手捧抬盆栽万年青，寓意为“万象回春”。首都师范大学历史学院教授王涛说，儒家思想与大象有着千丝万缕的关联。大象诚实敦厚，象征诚信；长鼻竖立，可象征儒家的“中庸之道”；象鼻下垂，可象征儒家提倡的“礼”；象牙虽锋利，却从不杀戮无辜，又可代表“仁”。“所以，大象也被视为君子的象征。”
　　中华文明是世界上最早发展农业的文明之一，诸多文物体现了中国人因地制宜、因势而动、借势而为等理念。
　　走进广东韶关市博物馆，一件东晋时期的水田作业模型引人注目。模型为酱褐釉陶质地，底部是长方形水田，中央纵贯一条田埂，左右两侧各有一人一牛，牛颈部套有绳索，身后二人手持农具，作犁地耙田状。王涛认为，这件陶器展示出的水田农业，顺应了南方地区的湿润气候，是我国劳动人民顺应自然的创造。
　　顺天造物的观念，贯穿中华民族的发展历史，留下了一个个伟大的创造。浙江杭州市民周雪妮喜欢周末带着孩子到苏堤参观：“苏堤不仅可以让孩子领略一年四季不同的景色，还可以了解背后的人文内涵。”
　　始建于北宋元祐五年的苏堤，是苏东坡任杭州知州时疏浚西湖，利用挖出的淤泥葑草堆筑起一条南北走向的堤岸。这条将美学、生态与水利工程紧密结合的堤坝，具有极高的人文和生态价值，穿越时间的长河，成为杭州文化标识之一。
　　京杭大运河江苏扬州河段，著名的“三湾抵一坝”同样是顺应自然的创新。为降低流速，惯用方法多为裁弯取直，扬州当地人却反其道而行之，以三道河湾阻滞水流、提升水位，以保运河安澜。
　　近年来，扬州进一步将生态文明与历史文化相结合，建设扬州运河三湾生态文化公园，古老运河成为市民休闲游玩好去处。
　　和谐共生
　　先人的环保意识与今日的可持续发展思想一脉相承
　　秋天，云南红河哈尼梯田进入观赏季，阳光洒在层层叠叠的梯田上，灌满水的梯田映照着天光云影，美不胜收。
　　这是以哈尼族为主的各族人民利用“一山分四季，十里不同天”“山有多高，水有多高”的特殊地理气候，发挥聪明才智和创造精神开垦的上百万亩农业生态奇观。2013年，它被正式列入《世界遗产名录》。
　　“不同于其他世遗，哈尼梯田是一种新型的文化遗产。”国家文物局相关负责人说，红河哈尼梯田文化景观的历史可追溯到唐代，体现了人文文化和农耕文化的有机结合。1300多年来，哈尼梯田森林、村寨、梯田、水系“四素同构”的农业生态系统延续至今，呈现了人与自然和谐相处的画卷，给恒久发展提供启示。
　　和谐共生、永续发展，倡导人和自然、人和人之间的和谐共存，共同发展。
　　冯时介绍，早在商代，和谐共生的环保思想就已出现。研究甲骨文史料发现，商王田猎后，将所得猎物分类记录，一般的猎物命名为“获”和“擒”。但在用火烧的方法捕猎时，有可能误伤，商人将原本禁止猎取的动物单独列出，如幼兽等，此类动物称为“蔺赤”，并不算入打猎所得，以表禁忌。
　　湖北省博物馆展出的云梦睡虎地秦简，用大量的墨书秦篆记录着公元前的律令历法。其中的《田律》是我国迄今为止发现的最早的环保法律条文。《田律》记载，春二月，禁止“雍（壅）堤水”，即禁止堵塞河道；同时，也有不准采摘刚发芽的植物，不准捕捉幼兽、卵和幼鸟，不准毒杀鱼鳖，放置陷阱、渔网的记载，直到七月才能解除这些禁令。按照节令而有所区别的生态保护规定，体现出“顺时施政”的智慧。
　　片片龟甲、道道简牍，记录着先人的环保意识，与今日的可持续发展思想一脉相承。
　　环保、节能、可持续，也融入了器物设计的理念和工艺，推动了技术进步。距今7000—5000年的仰韶文化，常见一类陶器—小口尖底瓶，此前常被推测是汲水用具。最新的研究显示，很多陶器底部的残留物是以黍、粟等谷物为原料制作的发酵酒。王涛说，小口尖底，有利于减少酒精挥发，保存酒香，尖底则有助于酒液中的杂质沉淀，使酒质更加清澈。这种设计体现了先民对酿酒工艺的深刻理解和精湛工艺。
　　陕西历史博物馆的文物彩绘雁鱼铜灯，“定格”了鸿雁回首衔鱼的瞬间。造型优美、栩栩如生的器型，由雁首颈、雁体、灯盘、灯罩套合而成，雁腹内部装水，灯盘中盛放燃料，灯罩由两爿构成，转动其中一爿，可调节灯光亮度。细长的雁颈和雁首，会将烟雾下沉排入雁腹的水中，起到灭烟除尘的作用。
　　可以说，和谐共生的理念促进技术创新、产业升级，并不断造福于人类。
　　天人合一
　　思考自然之道，处理好天、地、人三者的关系
　　近日，一款天宫藻井文创冰箱贴的畅销，带火了北京古代建筑博物馆。走进北京古代建筑博物馆，高约4米、共6层的天宫藻井华丽夺目。藻井巨大的圆形外围装饰有68座精雕细琢、繁复精美的天宫楼阁建筑。藻井最上方的星空共有1472颗金色星宿，构成一幅星空图。博物馆志愿者、北京八中教师南洋说，藻井上重檐歇山的楼阁和星星似乎在诉说着中国人千年来对自然、对宇宙的追求，体现了中国古人敬畏天、探索天文规律的努力。
　　“天地与我并生，而万物与我为一”，《庄子·齐物论》对“天人合一”思想作出了阐释。冯时说：“天人合一启示我们，应思考自然之道，处理好天、地、人三者的关系。”
　　河南省博物院内，高大庄严的杜岭方鼎，吸引着慕名而来的观众。大鼎双耳、方腹，4个圆柱形空足，器表饰饕餮纹与乳钉纹。鼎，是“天人合一”在器物层面的体现。
　　“中国传统文化奉行‘所祭必象其类’的思想，古人认为天圆地方，天为阳，地为阴，故以圆鼎祭天，以属阳的牛、羊等动物作为牺牲；方鼎祭地，祭祀用属阴的谷物。”冯时说，天人合一的文化思想以器物为载体，在祭祀活动中尤为明显。“祭天以圜丘，礼地以方丘，都是这种观念的反映。”
　　天人合一的理念把人置于自然中，认为人与自然相互依赖，互为一体，体现出中华民族尊崇自然、尊重自然规律、崇尚和谐的精神追求。
　　玉有领璧，收藏于故宫博物院，双面各饰有同心圆形状的三圈弦纹。冯时说，古人用玉璧祭天，圆形的天可以描述为同心圆，表现的是太阳的运动轨迹。而祭地则以方中纳圆的玉琮，出土于安阳殷墟妇好墓的弦纹玉琮，正是这种中心呈圆筒状的方形柱体。《礼记》认为“地载万物，天垂象。取财于地，取法于天”，故内圆外方，意为地载万物、因天立法。受天人合一观念的影响，璧、琮两种礼器的固有形制确定下来并得以不断延续。
　　一条中轴线，贯穿南北、联通古今。今年，“北京中轴线—中国理想都城秩序的杰作”被列入《世界遗产名录》。这条始建于13世纪，形成于16世纪，不断演进发展至今的北京中轴线，是中国传统都城中轴线发展至成熟阶段的杰出范例，至今仍发挥着重要作用。
　　清华大学建筑学院教授吕舟说：“在中国文化传统中，都城位置的选择具有极其重要的意义，它在国土中的位置要符合观念中的中心地位，这也是中华文明‘天人合一’观念的重要组成部分。”可以说，北京中轴线不仅为中华文化传统和精神追求提供了特殊的物质见证，其中蕴藏的“天人合一”理念，也体现出中国人对人与自然和谐发展的认识和思考。
　　“天人合一”不仅体现在祭祀与礼仪，也要求顺应四时变化。冯时认为，一年四季寒暑更替，古人相信，器用须合于春生、夏长、秋收、冬藏的时令变化，才能达成天、地、人三者的和谐。
　　顺天造物、和谐共生、天人合一等理念沉淀在中华优秀传统文化基因中赓续传承。今天，“绿水青山就是金山银山”的生态文明观念已深入人心。爱绿护绿，爱鸟护鸟，从孩子教育开始，就已播下了生态环保的种子。绿色出行、低碳生活、减少排放、循环发展的观念，渗透到经济发展、社会生活的方方面面。古老智慧融入现代化建设，人与自然和谐共生的中国式现代化画卷，徐徐展开。"""

## Random strings
A bunch of snippets from random bookmarks/websties stitched together

In [9]:
random_strings = """ of Visuals.
When it comes to Data, visualization is a key concept. We grasp more by seeing visuals(graphs, charts, etc.) rather than just seeing data in the form of raw text. Visualization Seanabu.com
SEANABU.COM

So what?  Why do we care about stationarity? 

A stationary time series (TS) is simple to predict as we can assume that future statistical properties are the same or proportional to current statistical properties.

Most of the models we use in TSA assume covariance-stationarity (#3 above). This means the descriptive statistics these models predict e.g. means, variances, and correlations, are only reliable if the TS is stationary and invalid otherwise.

Fine Dining - World

When you’re looking for the crème-de-la-crème, these are the places worth getting on the waitlist for. Think: white tablecloths, best-in-class chefs, and unforgettable bites.
The Travelers’ Choice Awards Best of the Best title celebrates the highest level of excellence in travel. It’s awarded to those who receive a high volume of above-and-beyond reviews and opinions from the Tripadvisor community over a 12-month period. Out of our 8 million listings, fewer than 1% achieve this milestone.

A broad area of low pressure is organizing over the western Caribbean Sea.
This low could gradually try to spin up a tropical depression or storm by early week.
The western Caribbean is an area that has historically seen tropical development in November.
Patty also spins near the Azores in the north Atlantic.
Sign up for the Morning Brief email newsletter to get weekday updates from The Weather Channel and our meteorologists.

The season's next tropical depression or storm is likely to form in the western Caribbean in the coming days while Patty spins closer to Europe.

Here's the latest status on the Caribbean area to watch: The National Hurricane Center (NHC) says a broad area of low pressure is forming in the western Caribbean Sea (labeled system No. 1 below).

The system will be watched closely to see if it becomes better defined with concentrated thunderstorm activity. If that happens, a tropical depression or storm will likely form early in the week. For now, it has been tagged as Invest 97L. Invests are used by the National Hurricane Center to run specialized computer models on systems of interest.

T​his system could bring heavy rain to Jamaica, the Cayman Islands, Cuba and possibly Mexico's Yucatan peninsula.

(​MORE: What is an Invest?)

T​he NHC is also watching a trough of low pressure near Puerto Rico and Hispaniola (labeled system No. 2 below) which will bring local flooding rainfall to those areas over the next several days. Its chance of tropical development is low before it combines with the aforementioned Caribbean disturbance. Apple
 
iPhone 16 Pro
Hello, Apple Intelligence.
Learn more Buy
 
iPhone 16
Hello, Apple Intelligence.
Learn more Buy
 
MacBook Pro
A work of smart.
Available starting 11.8
Learn more Pre-order
Hello, Apple Intelligence.
 
Mac mini
Size down. Power up.
Available starting 11.8
Learn more Pre-order
Hello, Apple Intelligence.
 
iMac
Brilllllliant.
Available starting 11.8
Learn more Pre-order
Hello, Apple Intelligence.
 
AirPods Pro 2
Hearing Test, Hearing Aid, and Hearing Protection features in a free software update.1
Learn more Buy
 
Apple Intelligence
AI for the rest of us.
Learn more Watch the film
 
Apple Trade In
Get $180–$650 in credit when you trade in iPhone 12 or higher.2
Get your estimate
 
Apple Card
Get up to 3% Daily Cash back with every purchase.
Learn more Apply now
Apple TV+
See the schedule
Watch Messi, every club, and every match—live.
Stream now
Comedy · Breakdown. Breakthrough.
Stream now
Thriller · Any resemblance to persons living or dead is not a coincidence.
Stream now
Thriller · Emmy® Award winner.
Stream now
Comedy · The shady side of paradise.
Stream now
Action · George Clooney and Brad Pitt are rival fixers stuck on the same job for one wild night.
Stream now
Comedy · Kindness makes a comeback.
Stream now
Drama · Winner of 3 Emmy® Awards.
Stream now
Sci-Fi · The truth will surface.
Stream now
Thriller · We’re all different people at work.

Item 1
Item 2
Item 3
Item 4
Item 5
Item 6
Item 7
Item 8
Item 9
Item 10
FAM Gallery
Watch now
HIIT with Brian
Listen now
A-List Pop
Play now
NBA 2K25 Arcade Edition
Watch now
Run Your First 5K
Listen now
Beneath the Stars
Play now
Hello Kitty Island Adventure
Watch now
HIIT with Brian
Listen now
A-List Pop
Play now
NBA 2K25 Arcade Edition
Watch now
Run Your First 5K
Listen now
Beneath the Stars
Play now
Hello Kitty Island Adventure

Apple Footer
1. Hearing Aid and Hearing Test: The Hearing Aid feature has received FDA authorization. The Hearing Test and Hearing Aid features are supported on AirPods Pro 2 with the latest firmware paired with a compatible iPhone or iPad with iOS 18 or iPadOS 18 and later and are intended for people 18 years old or older. The Hearing Aid feature is also supported on a compatible Mac with macOS Sequoia and later. It is intended for people with perceived mild to moderate hearing loss.

Hearing Protection: The Hearing Protection feature works with AirPods Pro 2 with the latest firmware when paired with a compatible iPhone, iPad, or Mac with iOS 18, iPadOS 18, or macOS Sequoia and later. Feature is only available in the U.S. and Canada. See support.apple.com/120850 for total attenuation and more information. The Hearing Protection feature is not suitable for protection against extremely loud impulse sounds, such as gunfire, fireworks, or jackhammers, or against sustained sounds louder than 110 dBA.
2. Trade‑in values will vary based on the condition, year, and configuration of your eligible trade‑in device. Not all devices are eligible for credit. You must be at least the age of majority to be eligible to trade in for credit or for an Apple Gift Card. Trade‑in value may be applied toward qualifying new device purchase, or added to an Apple Gift Card. Actual value awarded is based on receipt of a qualifying device matching the description provided when estimate was made. Sales tax may be assessed on full value of a new device purchase. In‑store trade‑in requires presentation of a valid photo ID (local law may require saving this information). Offer may not be available in all stores, and may vary between in‑store and online trade‑in. Some stores may have additional requirements. Apple or its trade‑in partners reserve the right to refuse, cancel, or limit quantity of any trade‑in transaction for any reason. More details are available from Apple’s trade-in partner for trade‑in and recycling of eligible devices. Restrictions and limitations may apply.
To access and use all Apple Card features and products available only to Apple Card users, you must add Apple Card to Wallet on an iPhone or iPad that supports and has the latest version of iOS or iPadOS. Apple Card is subject to credit approval, available only for qualifying applicants in the United States, and issued by Goldman Sachs Bank USA, Salt Lake City Branch.
If you reside in the U.S. territories, please call Goldman Sachs at 877-255-5923 with questions about Apple Card.
Learn more about how Apple Card applications are evaluated at support.apple.com/kb/HT209218.
Apple Intelligence is available in beta on all iPhone 16 models, iPhone 15 Pro, iPhone 15 Pro Max, iPad mini (A17 Pro), and iPad and Mac models with M1 and later, with Siri and device language set to U.S. English, as part of an iOS 18, iPadOS 18, and macOS Sequoia update. English (Australia, Canada, Ireland, New Zealand, South Africa, UK) language support available this December. Some features, additional platforms, and support for additional languages, like Chinese, English (India, Singapore), French, German, Italian, Japanese, Korean, Portuguese, Spanish, Vietnamese, and others, will be coming over the course of the next year.
A subscription is required for Apple Arcade, Apple Fitness+, Apple Music, and Apple TV+.
Features are subject to change. Some features, applications, and services may not be available in all regions or all languages.
Shop and Learn

Store
Mac
iPad
iPhone
Watch
Vision
AirPods
TV & Home
AirTag
Accessories
Gift Cards
Apple Wallet

Wallet
Apple Card
Apple Pay
Apple Cash
Account

Manage Your Apple Account
Apple Store Account
iCloud.com
Entertainment

Apple One
Apple TV+
Apple Music
Apple Arcade
Apple Fitness+
Apple News+
Apple Podcasts
Apple Books
App Store
Apple Store

Find a Store
Genius Bar
Today at Apple
Group Reservations
Apple Camp
Apple Store App
Certified Refurbished
Apple Trade In
Financing
Carrier Deals at Apple
Order Status
Shopping Help
For Business

Apple and Business
Shop for Business
For Education

Apple and Education
Shop for K-12
Shop for College
For Healthcare

Apple in Healthcare
Mac in Healthcare
Health on Apple Watch
Health Records on iPhone and iPad
For Government

Shop for Government
Shop for Veterans and Military
Apple Values

Accessibility
Education
Environment
Inclusion and Diversity
Privacy
Racial Equity and Justice
Supply Chain
About Apple

Newsroom
Apple Leadership
Career Opportunities
Investors
Ethics & Compliance
Events
Contact Apple
More ways to shop: Find an Apple Store or other retailer near you. Or call 1-800-MY-APPLE.
United States
Copyright © 2024 Apple Inc. All rights reserved. Privacy Policy  Terms of Use  Sales and Refunds  Legal  Site Map"""

In [10]:
random_strings[:50]

' of Visuals.\nWhen it comes to Data, visualization '

## System prompt

In [11]:
prefix_messages = [{"role": "user", "content": "Repeat the following string back exactly as it is sent to you."}]

In [12]:
agentops.init(default_tags=['Stochastic Parrot'], auto_start_session=False)

In [13]:
litellm.completion(model="claude-3-sonnet-20240229", messages=prefix_messages, temperature=0).choices[0].message.content

[31;1m🖇 AgentOps: Could not record event. Start a session by calling agentops.start_session().[0m


'Okay, I will repeat the string back exactly as you send it to me.'

In [14]:
result = litellm.completion(model="claude-3-sonnet-20240229",
                    messages=[
                        {
                            "role": "user",
                            "content": [
                                {
                                    "type": "text",
                                    "text": f"""Your task is to repeat back the exact text that is provided to you, without any modifications or additions.\n\nHere is the text you need to repeat:\n<input_text>\n{rng_string}\n</input_text>\n\nInstructions:\n1. Read the text provided above carefully.\n2. Repeat the text exactly as it appears, without changing any words, punctuation, capitalization, or spacing.\n3. Do not add any additional comments, explanations, or text of your own.\n4. If the input text is empty, simply respond with an empty output.\n\nPlease provide your response within <repeated_text> tags."""
                                }
                            ]
                        }
                    ], 
                   temperature=0).choices[0].message.content

[31;1m🖇 AgentOps: Could not record event. Start a session by calling agentops.start_session().[0m


In [15]:
def generate_text(string_prompt) -> str:
    result = litellm.completion(model="claude-3-sonnet-20240229",
                    messages=[
                        {
                            "role": "user",
                            "content": [
                                {
                                    "type": "text",
                                    "text": f"""Your task is to repeat back the exact text that is provided to you, without any modifications or additions.\n\nHere is the text you need to repeat:\n<input_text>\n{string_prompt}\n</input_text>\n\nInstructions:\n1. Read the text provided above carefully.\n2. Repeat the text exactly as it appears, without changing any words, punctuation, capitalization, or spacing.\n3. Do not add any additional comments, explanations, or text of your own.\n4. If the input text is empty, simply respond with an empty output.\n\nPlease provide your response within <repeated_text> tags."""
                                }
                            ]
                        }
                    ], 
                   temperature=0).choices[0].message.content
    text = result.split('<repeated_text>\n')[1].split('\n</repeated_text>')[0]
    return text



In [16]:
def find_similarity(original, generated):
    set1 = set(original)
    set2 = set(generated)
    
    # Find the intersection of both sets
    intersection = set1.intersection(set2)
    
    # Calculate the percentage match based on the average size of the sets
    match_percent = (len(intersection) / len(set1.union(set2))) * 100
    
    return match_percent

## Great-- now we have a decent similarity ranker
Let's try this for a bunch of different datsets and at varying lengths

In [17]:
from Levenshtein import distance

In [18]:
def hamming_distance(str1, str2):
    if len(str1) != len(str2):
        print("Strings are not of equal length")
    return sum(c1 != c2 for c1, c2 in zip(str1, str2))


In [19]:
def compare_generated_text(rng_string, n=4):
    """
    Compares segments of a given string with generated text using different metrics.

    Parameters:
    n (int): The number of segments to split the comparison.
    rng_string (str): The input string to be segmented and compared.
    """
    for i in range(1, n + 1):
        first_n_chars = int((i / n) * len(rng_string))
        print(f"Comparing first {first_n_chars} characters")

        original_string = rng_string[:first_n_chars]
        generated = generate_text(original_string)

        char_count = find_similarity(original_string, generated)

        print(f"Character count: {char_count}")
        print(f"Levenshtein: {distance(original_string, generated)}")
        print(f"Hamming distance edits: {hamming_distance(original_string, generated)}")
        print('-' * 100)

In [20]:
agentops.start_session(tags=['rng string'])
compare_generated_text(rng_string)
agentops.end_session('Success')

🖇 AgentOps: [34m[34mSession Replay: https://app.agentops.ai/drilldown?session_id=3639e35e-eab2-498e-a75f-c726d800b1ea[0m[0m


Comparing first 1099 characters
Character count: 100.0
Levenshtein: 0
Hamming distance edits: 0
----------------------------------------------------------------------------------------------------
Comparing first 2199 characters
Character count: 100.0
Levenshtein: 0
Hamming distance edits: 0
----------------------------------------------------------------------------------------------------
Comparing first 3298 characters
Character count: 100.0
Levenshtein: 0
Hamming distance edits: 0
----------------------------------------------------------------------------------------------------
Comparing first 4398 characters
Character count: 100.0
Levenshtein: 0
Hamming distance edits: 0
----------------------------------------------------------------------------------------------------


🖇 AgentOps: Session Stats - [1mDuration:[0m 1m 36.7s | [1mCost:[0m $0.095583 | [1mLLMs:[0m 4 | [1mTools:[0m 0 | [1mActions:[0m 0 | [1mErrors:[0m 0
🖇 AgentOps: [34m[34mSession Replay: https://app.agentops.ai/drilldown?session_id=3639e35e-eab2-498e-a75f-c726d800b1ea[0m[0m


In [46]:
agentops.start_session(tags=['Book passage'])
compare_generated_text(passage_string)
agentops.end_session('Success')

Comparing first 4075 characters
Character count: 92.53731343283582
Levenshtein: 25
Hamming distance edits: 25
----------------------------------------------------------------------------------------------------
Comparing first 8151 characters
Character count: 92.0
Levenshtein: 117
Strings are not of equal length
Hamming distance edits: 6247
----------------------------------------------------------------------------------------------------
Comparing first 12227 characters
Character count: 93.42105263157895
Levenshtein: 98
Hamming distance edits: 98
----------------------------------------------------------------------------------------------------
Comparing first 16303 characters
Character count: 93.5064935064935
Levenshtein: 165
Strings are not of equal length
Hamming distance edits: 13886
----------------------------------------------------------------------------------------------------


In [47]:
compare_generated_text(french_article)

Comparing first 570 characters
Character count: 96.0
Levenshtein: 7
Hamming distance edits: 7
----------------------------------------------------------------------------------------------------
Comparing first 1141 characters
Character count: 95.23809523809523
Levenshtein: 16
Hamming distance edits: 16
----------------------------------------------------------------------------------------------------
Comparing first 1712 characters
Character count: 95.94594594594594
Levenshtein: 24
Hamming distance edits: 24
----------------------------------------------------------------------------------------------------
Comparing first 2283 characters
Character count: 96.05263157894737
Levenshtein: 36
Hamming distance edits: 36
----------------------------------------------------------------------------------------------------


In [48]:
compare_generated_text(german_article)

Comparing first 440 characters
Character count: 100.0
Levenshtein: 0
Hamming distance edits: 0
----------------------------------------------------------------------------------------------------
Comparing first 880 characters
Character count: 100.0
Levenshtein: 0
Hamming distance edits: 0
----------------------------------------------------------------------------------------------------
Comparing first 1320 characters
Character count: 100.0
Levenshtein: 0
Hamming distance edits: 0
----------------------------------------------------------------------------------------------------
Comparing first 1761 characters
Character count: 100.0
Levenshtein: 0
Hamming distance edits: 0
----------------------------------------------------------------------------------------------------


In [49]:
compare_generated_text(random_strings)

Comparing first 2344 characters
Character count: 98.4375
Levenshtein: 3
Hamming distance edits: 3
----------------------------------------------------------------------------------------------------
Comparing first 4689 characters
Character count: 98.73417721518987
Levenshtein: 4
Hamming distance edits: 4
----------------------------------------------------------------------------------------------------
Comparing first 7034 characters
Character count: 98.78048780487805
Levenshtein: 6
Hamming distance edits: 6
----------------------------------------------------------------------------------------------------
Comparing first 9379 characters
Character count: 98.82352941176471
Levenshtein: 5
Hamming distance edits: 5
----------------------------------------------------------------------------------------------------


In [51]:
compare_generated_text(chinese_article)

Comparing first 907 characters
Character count: 99.20212765957447
Levenshtein: 22
Hamming distance edits: 22
----------------------------------------------------------------------------------------------------
Comparing first 1815 characters
Character count: 99.49832775919732
Levenshtein: 41
Hamming distance edits: 41
----------------------------------------------------------------------------------------------------
Comparing first 2723 characters
Character count: 99.21875
Levenshtein: 57
Hamming distance edits: 57
----------------------------------------------------------------------------------------------------
Comparing first 3631 characters
Character count: 98.21640903686088
Levenshtein: 188
Strings are not of equal length
Hamming distance edits: 74
----------------------------------------------------------------------------------------------------
