New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvertFrom-Yaml, ConvertTo-Yaml #3607

Open
DarwinJS opened this Issue Apr 20, 2017 · 36 comments

Comments

Projects
None yet
@DarwinJS
Copy link
Contributor

DarwinJS commented Apr 20, 2017

Would be great to support Yaml natively.

This was also mentioned by @fabiendibot on #3046

It would also be nice if the CMDLets had the goal of cleanly handling conversion of objects that came from XML as it seems like it would be a frequent use case. Maybe some good tests around this conversion?

@ArieHein

This comment has been minimized.

Copy link

ArieHein commented Apr 20, 2017

We had a similar discussion from DSC aspect,
allowing us to change json based configuration files, we wanted to have options for modifying xml based files, YAML based files ,INI based files supporting RegEx swaps from within Text Manipulation cmdlets.

Lack of existing support in PS means we have to work hard to get such ability.
It has been on hold pending community contribution, but if it was baked into PS, it would make it much easier for the DSC part as well.

@lzybkr

This comment has been minimized.

Copy link
Member

lzybkr commented Apr 20, 2017

When you say natively, do you mean like XML or JSON?

The current thinking is that YAML should not be baked into PowerShell at all, instead it should be a separate module that you can update without picking up a new version of PowerShell.

If YAML were baked into PowerShell like XML, that would be impossible (think [xml]"b")

If we went the JSON route, you'd have cmdlets to work with YAML - so not really baked into PowerShell, but you'd still have the drawbacks of needing to update PowerShell to get YAML updates.

@joeyaiello

This comment has been minimized.

Copy link
Member

joeyaiello commented Apr 20, 2017

@lzybkr I know we said we didn't want to bring in a new library, but I think this is something we might need to reassess. Ideally, we should also ship the module on the Gallery, but I think a TON of modern scenarios require YAML now.

Maybe not in 6.0 timeframe, but we should talk about it.

@DarwinJS

This comment has been minimized.

Copy link
Contributor

DarwinJS commented Apr 20, 2017

@ArieHein - I have some simple functions that save and retrieve a hash array to the registry. Only handle REG_SZ - but for a simple set of settings it is sufficient - let me know if you want a copy.

I mispoke when I said "native" - I primarily meant "built-in" - it wouldn't bother me if they were shipped-in script modules that could be updated.

@iSazonov

This comment has been minimized.

Copy link
Collaborator

iSazonov commented Apr 21, 2017

Our first discussion #2109

@DarwinJS

This comment has been minimized.

Copy link
Contributor

DarwinJS commented Apr 21, 2017

@iSazonov - ah yes I see!

I noticed the reference to AWS support of YAML on the thread - I have been converting some templates and have found this to helpful: https://github.com/awslabs/aws-cfn-template-flip

@joeyaiello

This comment has been minimized.

Copy link
Member

joeyaiello commented Apr 21, 2017

@iSazonov thanks for the pointer, I couldn't find it for some reason. Remember it well, though.

In re-reading that original thread, I think we should definitely implement the cmdlets at some point in the future, and ship them in the Gallery. Based on their quality, and people's perceived usefulness (along with some refactoring work we hope to do after 6.0.0), we can make the in-box vs. Gallery-only call.

@iSazonov

This comment has been minimized.

Copy link
Collaborator

iSazonov commented Oct 8, 2017

@MattTunny

This comment has been minimized.

Copy link

MattTunny commented Oct 25, 2017

yeah this would be awesome to have, ended up using https://github.com/awslabs/aws-cfn-template-flip to convert

@iSazonov

This comment has been minimized.

Copy link
Collaborator

iSazonov commented Oct 25, 2017

@MattTunny Welcome to contribute! :-)

@josepmv

This comment has been minimized.

Copy link

josepmv commented Dec 1, 2017

@SteveL-MSFT SteveL-MSFT added this to the 6.1.0-Consider milestone Dec 12, 2017

@Satak

This comment has been minimized.

Copy link

Satak commented Mar 13, 2018

This should definitely be part of the native PS 6.1 library. So many things these days are in YAML.

@bergmeister

This comment has been minimized.

Copy link
Contributor

bergmeister commented Apr 24, 2018

There are now psyaml and powershell-yaml modules on the PSGallery but both are not even able to round-trip a YAML file from a VSTS build definition. I don't mind if the module is baked into PowerShell or is a module from the PSGallery.

@BrucePay

This comment has been minimized.

Copy link
Member

BrucePay commented Apr 24, 2018

I wonder if the core problem here is the clunky way we deploy modules. Today, you have to find, trust and install a module before you can use it. Compare this with the (apparently) slick way that Javascript does var m = require('mymodule'). Maybe we should have some way to do what DSC does but for native PowerShell. In DSC, when a module is referenced in a configuration, it's automatically downloaded and installed on the target node with no manual effort. Making critical but non-core modules available that way should eliminate the "it should be part of core" arguments. And for nodes that are disconnected from the net, we could have a tool that bundled the dependencies in a script into an archive which is then deployed to the target. This is how the Azure DSC resource extension works - there is a tool that scans a script to figure out the required modules then builds a zip file containing everything that is needed and publishes it to a blob. The Azure resource extension then pulls this blob, installs the modules and runs the script.

@bgshacklett

This comment has been minimized.

Copy link

bgshacklett commented Apr 24, 2018

For something that is this important, I really don't ever want to depend on a third-party library unless I have some way of vendoring it. It's way too easy for third party developers to potentially break entire ecosystems (see https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/).

Broader issues aside, there is currently no good YAML module for PowerShell, as @bergmeister pointed out. This is a must for a language which is heavily focused towards automation. YAML based configuration files are hugely popular now and it's very hard to avoid them even if you don't have to contend with the opinions of a team to do so. Think of the reasoning behind including XML and JSON as core parts of the language. The case for YAML really isn't so different.

@BrucePay

This comment has been minimized.

Copy link
Member

BrucePay commented Apr 24, 2018

@bgshacklett From what I've heard from the Puppet guys, there are just no good YAML parsers :-)

@iSazonov

This comment has been minimized.

Copy link
Collaborator

iSazonov commented Apr 25, 2018

Is platyPS parser good enough?

@vors Is there simple way to reuse platyPS YAML parser in PowerShell Core repo?

@josepmv

This comment has been minimized.

Copy link

josepmv commented Apr 25, 2018

I prefer the idea of a separate official module in PowerShell Gallery like @lzybkr says because it would be possible to use it in older powershell versions and it could have its own releases. That would be like the sqlserver module. @BrucePay if it were a page in PowerShell Gallery with Microsoft own modules, it would be easier to find and everybody would know that they can trust them.

But I would understand if it were backed into Powershell as XML and JSON.

The important thing is that it exists ConvertFrom-YAML and ConvertFrom-YAML official functions because YAML is a widely used format for configuration files and it shouldn't be a third-party module, as @bgshacklett point out.

I made a blog entry testing and comparing the two modules I've found to work with YAML files: PSYaml and powershell-yaml.

They have different behaviours because internally they're using different objects:

module mappings sequences
PSYaml OrderedDictionary Array
powershell-yaml Hastable List

I think we need a standard ConvertFrom-YAML and ConvertFrom-YAML.

@gaelcolas

This comment has been minimized.

Copy link

gaelcolas commented May 2, 2018

Actually, ConvertFrom-Yaml in powershell-yaml uses OrderedDictionary when converting with the -ordered parameter.
I've been using this module successfully for a while (in my Datum module for DSC Configuration data, and with kitchen yamls), but don't have a vsts build definition to test with.

Bear in mind that the right way to call it is: get-content -Raw MyFile.yml | ConvertFrom-Yaml -Ordered (people often miss the -Raw).

I wonder why we'd need a Microsoft official module, putting even more overhead on MSFT and reinventing the wheel... Maybe trying to contribute to an existing one first, add tests to avoid regression, open issues to make sure the owner knows the problems is a better approach...
You know what happens when you're trying to create a standard out of the 99 existing implementations...

And yes it would be better outside the language, I agree that the dependency management could be better, bundling everything in PS is not a solution though.
The broad npm issue is also a failure in process. Fork and re-publish fixed it in no time, building apps out of latest version of the internet was the reason it broke so many live apps.

@markekraus

This comment has been minimized.

Copy link
Collaborator

markekraus commented May 2, 2018

I agree with @gaelcolas I think this is better with everyone working with the owners of an existing community module to raise and ensure quality.

I'll just add that tests for such a project should include working with a large number of real-world YAML files for things like AppVeyor, Travis CI, VSTS, AWS CloudFormation, etc. For my own experience with YAML deserilization, I have had little success with one solution working universally and have ultimately had to reinvent the wheel several times. In that sense, I agree with @BrucePay "there are just no good YAML parsers".

@iSazonov

This comment has been minimized.

Copy link
Collaborator

iSazonov commented May 2, 2018

We are talking about this platyPS module because it is already actively used in PowerShell Help environment. I guess no one from MSFT can tell how good this module is because of Code of Conduct. They can either silently reject it or improve it.
And although we've been talking about this a long time ago I don't see how we could use the components of this module here in a simple way.
Maybe @adityapatwardhan and @SteveL-MSFT will open their plans and timeline especially as the new Help RFC is already in the experiment stage.

@SteveL-MSFT

This comment has been minimized.

Copy link
Member

SteveL-MSFT commented May 2, 2018

My personal view is that I would rather see more community modules succeed and become de facto standard than requiring "official" modules from Msft.

@markekraus

This comment has been minimized.

Copy link
Collaborator

markekraus commented May 2, 2018

@iSazonov It is one thing to have a solution that works for serializing/deserializing a well defined schema. It is quite another to have a solution that works in general with all schemas that are compliant YAML.

@iSazonov

This comment has been minimized.

Copy link
Collaborator

iSazonov commented May 2, 2018

I understand the desire of MSFT to reuse community projects to cut costs. But the situation is, in fact, that MSFT may not make use of so many community projects:

  • many have bad code, have no trust
  • many projects are one person

MSFT has published Powershell specifications more 10 years ago, but nobody ported it yet until MSFT did.
The OpenSSL Project has existed for many years but nobody ported it to Windows while MSFT has not done this.
MSFT revealed many thousands of API interfaces, but how many of them were ported to Unix?
The interesting thing about why the company launched its project .Net Core rather than reuses Mono?
PowerShell is already a year and a half is an open source project, but I see that in this repository only one person from the community makes systematic contribution in the code @markekraus and only one person makes systematic analysis @mklement0.
I don't think that if we divide the project into parts, then we got more contributions.
I don't think the situation will change tomorrow. I wouldn't count on it.

@iSazonov

This comment has been minimized.

Copy link
Collaborator

iSazonov commented May 2, 2018

@bergmeister

This comment has been minimized.

Copy link
Contributor

bergmeister commented May 2, 2018

@iSazonov makes important points about support, trust and maintenance of 3rd party modules. Some 3rd party module can become a success and mature like e.g. Pester.
However, one should not assume that a great YAML module will evolve on its own over the next years. The reality is that most modules are published by authors who solved a particular problem and did the good deed of publishing their generic base code. This is how we ended up with 2 modules that aim to solve the same problem. Ideally one would need to merge them to focus efforts, otherwise they are going drift apart further in the future or just become stale and soon there will be more modules published by other people.
The underlying problem of having a proper parser indicates that basic (and substantial in terms of effort) ground work is needed and required to have a good YAML module.
I am not a YAML expert, but is this just a problem of the loose language specification itself or specific interpretation by various systems like VSTS or AppVeyor or is this only the lack of a good parser?
I found it frustrating to write YAML in VSCode and only when running it in VSTS to get an error that the VSTS parser does not like it...

@DarwinJS

This comment has been minimized.

Copy link
Contributor

DarwinJS commented May 2, 2018

To me this conversation is a case in point with open source's "code curation / architecture" problem.

Open source provides good seeding ideas and code bases - but if a serious architecture eye is not given to it when adopted as the most general solution - then it's 10 years of bug fixes for items that could have been taken care of in a decent design review.

In the true cases of @bergmeister "mature successes" it is often an active maintainer that has taken on the mission of generalizing the code base. But that can't be guaranteed to happen.

I think some of us are saying "YAML support is like support for writing files - it's core - it should be architected in the same way => with intention to the be the gold standard for that functionality"

The combination of 1) the semi-architected attribute of open source along with the 2) core nature of YAML that seem to make many of us urge for the highly architected approach we know the Microsoft PowerShell Developers apply to their work. It not necessarily a drifting from all the other cool things open source can indeed help us with.

@SteveL-MSFT

This comment has been minimized.

Copy link
Member

SteveL-MSFT commented May 2, 2018

Very valid points on software maturity. I haven't looked closely at the two modules listed here, nor at yamldotnet to make any opinion. Something we can look at as we start planning for 6.2.0

@gaelcolas

This comment has been minimized.

Copy link

gaelcolas commented May 2, 2018

Don't get me wrong, I do value the experience and systematic approach of the PowerShell team and MSFT developers, I just think it's wrong for them to try to fill all the gaps with a module of their own stamped MSFT... It does not scale (and we've seen the problem with DSC resources already).
Increasing the reliance on MSFT provided modules is fragile, and does not help grow the community, nor the diversity of the ecosystem.
I'm in favour of MSFT contributing to open source projects to share their experience and help improve practices and quality, while not creating a dependence on them (because you know, squirrels...!).
The MSFT as unique provider of approved things is an old model that they struggle already to educate on, and it is not helping the community to encourage this approach (i.e. I'll wait, or moan, at Microsoft for not solving the problem I have kind of attitude in the OSS ecosystem).

I agree YAML support is core, instead of the PS team re-writing from scratch, why not help existing maintainers of projects to improve, and give them an opportunity to merge projects and hear from them what it would take. A bit like an apprenticeship/mentorship from PS team on core functionality modules.
Just re-writing a new module sounds like an engineer's reaction to solve a problem which is not an engineering problem. Re-writing a YAML module is an easy engineering task for the PS Team, but would not (help to) fix the community maturity problem, nor give the right incentive.
Whether Yaml is the strategic item to tackle this is MSFT's call though :)

@markekraus

This comment has been minimized.

Copy link
Collaborator

markekraus commented May 3, 2018

@bergmeister

I'll preface this with myself not being a YAML expert. I happened to do some research on this when I wanted to bake some AppVeyor like yaml configs into my own franken-pipeline. I looked at how a dozen or so C# projects were consuming YAML. Since the PowerShell projects use YamlDotNet, I can only assume it's no easier. Though I have at least toyed around with both PSYaml and powershell-yaml and have looked less closely at a few PowerShell projects which use them.

I am not a YAML expert, but is this just a problem of the loose language specification itself or specific interpretation by various systems like VSTS or AppVeyor or is this only the lack of a good parser?

I suspect it's the nature of YAML being readable by humans at the possible expense of being more easily readable by machines. This readability-first paradigm extends into the way YAML authors write their YAML files. Though the resulting YAML is compliant under YAML spec, it is parsed in such away as to be unusable in code without using the deserialized object as an intermediary to an actually useful object.

That is to say, that 90% of the time the deserialization from YAML to an Object is not the issue, but the data design/architecture is. The other 10% of the time is parsing issues for which I can only chalk up to "YAML is hard to parse, man." However, the deserialized objects are often only slightly more useful than regex-ing what you are looking for....

As an example, the secure strings in AppVeyor.yml

environment:
  my_var1: value1
  my_var2: value2
  my_secure_var1:
    secure: FW3tJ3fMncxvs58/ifSP7w==

powershell-yaml and YamlDotNet do convert this to an object, but good luck using it without a bunch of logic. Once you have that logic, good for this schema, but what about another?

Some of these same data design problems plague JSON, but it is (in my experience and opinion) much easier to make models that can work around those shortcomings due to the more rigid nature of JSON. Trying to make models for any of the YAML deserializers mentioned in this thread is a nightmare if and where it is possible.

Granted, models are not a feature currently available in the JSON cmdlets, though I would really like to add it. If I had a say in the "official" YAML module/cmdlets I would put it down as a "must have" feature. It is a missed opportunity especially with the addition of PowerShell classes in v5.

IMO, Just getting YAML strings into an Object isn't good enough. That appears to be easy (90% of the time at least). The trick is getting YAML strings into useful objects. That requires some flexibility from the solution. But that flexibility must also be somewhat approachable and not require @IISResetMe and @lzybkr there to give you serialization advice....

To that effect, I haven't seen anything that works on a general scope. Projects adopt the solutions available, and then use their output as intermediaries for actually useful objects (leading to a bunch of wheel reinventing that probably should be baked in upstream). Or, the projects compromise YAML readability for ease of parsing from YAML to objects.

@iSazonov

This comment has been minimized.

Copy link
Collaborator

iSazonov commented May 3, 2018

@gaelcolas

I agree YAML support is core, instead of the PS team re-writing from scratch, why not help existing maintainers of projects to improve, and give them an opportunity to merge projects and hear from them what it would take

Ask yourself why MSFT started .Net Core project instead of continuing Mono many years later.

MSFT is a community too. And as any community has the same problems of interaction with other communities.

@DarwinJS

This comment has been minimized.

Copy link
Contributor

DarwinJS commented May 3, 2018

For context, I am not implying any work be done from scratch - code could be adopted - but should then be scrutinized from a Systems Development architecture perspective before being improved. It could even be open source after that review and re-release.

My point is to have a significant architectural review and remediation from a team that thoroughly understands the nuances of core code that will be leveraged virtually everywhere.

@SteveL-MSFT SteveL-MSFT modified the milestones: 6.2.0-Consider, Future Jun 21, 2018

@dchennells

This comment has been minimized.

Copy link

dchennells commented Oct 29, 2018

Another model always worth considering is acquire/contract/second. On this basis an effort is made to reach commercial terms with one or more community members/firms to recruit their services for a MSFT-led/facilitated development cycle to re-vamp and (in some fashion) integrate/connect the product(s). This was done successfully with Xamarin, which kicked the project to the Net Foundation, licensed it under MIT, and recruited/contracted/involved key resources such as Miguel de Icaza and Nat Friedman via Xamarin. Some whine that this is open source treason. But it does create positive incentives for folks and small firms to conceive and develop projects that later could be fit for widespread adoption and integration into at least one major ecosystem. Certainly it's preferred to jumping straight to a blank slate in-house redo that copies the whole concept and functionality and many of the ideas but jettisons the creators and (ostensibly) the code.

@vors

This comment has been minimized.

Copy link
Collaborator

vors commented Nov 1, 2018

@iSazonov sorry for a late reply, no the platyPS yaml parser is no good: it only supports key value pairs. We also use YamlDotNet to generate yaml there.

@bgshacklett

This comment has been minimized.

Copy link

bgshacklett commented Nov 3, 2018

Regarding the sentiment towards keeping this out of the core feature set: there's a very significant difference in how PowerShell handles dependencies compared to, say, Ruby, Python or Node.js.

Each of these languages has dependency management tools (bundler, pip, npm/yarn) which make the management of external dependencies easy and, more importantly, reproducible. Having something like a Gemfile/Gemfile.lock or package.json/package-lock.json [,yarn.lock] which makes for easy installation of all required packages and ensures that you are staying at a very specific patch level is a very significant distinction which is, in my opinion, what makes third-party libraries for something this fundamental feasible.

Perhaps there's something that could be done with Nuget to solve this issue, but I've never seen any articles describing dependency management strategies/patterns for PowerShell. Having the gallery is great, but if you've got to install all required packages manually it becomes unfeasible for any significant deployment.

edit:
So it seems like what I'm looking for may be available already: https://docs.microsoft.com/en-us/powershell/wmf/5.0/psget_moduledependency. I'll test this out as soon as I have a moment. If it works, I'll need to reconsider my position on whether this should be a core item or not. I'm still having difficulty reconciling it against the fact that JSON is a core functionality, but I suppose that it could be considered a "lowest common denominator".

@DarwinJS

This comment has been minimized.

Copy link
Contributor

DarwinJS commented Nov 3, 2018

@bgshacklett makes a super good point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment