[SPARKNLP-1291] Adding support fort input string column on readers #14665
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please merge before #14668
Description
This change enables readers to accept a column of type String as input (in addition to existing types), so you can provide raw text directly rather than only files or external sources.
Motivation and Context
Before this patch, Spark NLP readers only supported inputs via file paths. That means if you already had a DataFrame with text content (say from another pipeline or a preliminary load), you had to write it to disk just to let the reader ingest it. This adds friction and overhead, especially in streaming or in-memory pipelines.
With this change, you can:
This enhancement broadens the usability of the readers and removes a common impedance mismatch in real-world ETL / NLP workflows.
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: