Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since build 1.0.38482, splitting a text into Spans is no longer deterministic. #100

Open
jude-fisher-data opened this issue Sep 5, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@jude-fisher-data
Copy link

jude-fisher-data commented Sep 5, 2023

Describe the bug
IDocument Spans produces a list of spans within a document. This should be deterministic: splitting the same IDocument any number of times should produce the same result. Creating an IDocument with identical text should always result in the same Spans collection. This works correctly up to and including Nuget package version 1.0.38431. From v1.0.38482 to the current version it produces variable results for identical inputs with each run.

To Reproduce

  • Using a build >= 1.0.38482 create an IDocument from any text with multiple sentences. (We used a 915-word, 14-sentence block.)
  • Access and trace the spans created.
  • Send the same text again and again access and trace the spans created.

Expected behavior

  • For identical inputs, the output should be identical. Observed behaviour is that spanning varies considerably.

Sample Outputs
(First few lines of identical text input - traced to Visual Studio Debug window. IDocument is created, then Spans property is accessed.)

**FAULTY (Build : 1.0.38482 ) **

RUN A:
09:05:41:328 What We Offer
09:05:41:328 Create more personal computing.
09:05:41:578 Reinvent productivity and business processes.
09:05:41:578 Build the intelligent cloud and intelligent edge platform.
09:05:41:578 To achieve our vision, our research and development efforts focus on three interconnected ambitions:
09:05:41:578 Founded in 1975, we develop and support software, services, devices, and
09:05:41:578 solutions that deliver new value for customers and help people and businesses realize their full potential.
09:05:41:578 We're committed to making the promise of AI real and doing it responsibly.
09:05:41:578 At Microsoft, we provide technology and resources to help our customers create a secure
09:05:41:578 Our work is guided by a core set of principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability.
09:05:41:578 , productive work environment.

RUN B:
09:05:42:082 What We Offer
09:05:42:082 Create more personal computing.
09:05:42:082 Build the intelligent cloud and intelligent edge platform.
09:05:42:082 Reinvent productivity and business processes.
09:05:42:082 To achieve our vision, our research and development efforts focus on three interconnected ambitions:
09:05:42:082 Founded in 1975, we develop and support software, services, devices, and solutions that deliver new value for customers and help people and businesses realize their full potential.
09:05:42:082 We offer an array of services, including cloud-based solutions that provide customers with software, services, platforms, and content, and we provide solution support and consulting services.
09:05:42:082 At Microsoft, we provide technology and resources to help our customers create a secure, productive work environment.

CORRECT (Build: 1.0.34831)
Text is identical with each run:
09:14:11:865 What We Offer
09:14:11:865 Create more personal computing.
09:14:12:109 Build the intelligent cloud and intelligent edge platform.
09:14:12:109 Reinvent productivity and business processes.
09:14:12:109 To achieve our vision, our research and development efforts focus on three interconnected ambitions:
09:14:12:109 Founded in 1975, we develop and support software, services, devices, and solutions that deliver new value for customers and help people and businesses realize their full potential.
09:14:12:109 At Microsoft, we provide technology and resources to help our customers create a secure, productive work environment.
09:14:12:109 Our family of products plays a key role in the ways the world works, learns, and connects.
09:14:12:109 We're committed to making the promise of AI real and doing it responsibly.
09:14:12:109 We offer an array of services, including cloud-based solutions that provide customers with software, services, platforms, and content, and we provide solution support and consulting services.

@jude-fisher-data jude-fisher-data added the bug Something isn't working label Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant