Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line issues where repos sharedbetween unix and windows machines #138

Closed
RichieBzzzt opened this issue Sep 28, 2020 · 11 comments
Closed

Line issues where repos sharedbetween unix and windows machines #138

RichieBzzzt opened this issue Sep 28, 2020 · 11 comments

Comments

@RichieBzzzt
Copy link
Contributor

keep getting into issues where pushed/pulled notebooks from workspaces pickup changes in repos where the only change has been line endings. This is because engineers use windows/linux/macos and git settings are different. Latest attempt to fix this involves fixing line ending on local repo and for any notebooks that already exist overwrite the contents instead of overwriting the files.

@RichieBzzzt
Copy link
Contributor Author

Opened #139 with potential fix.

@pmooij
Copy link

pmooij commented Oct 20, 2020

@RichieBzzzt as I commented with you latest commit your latest changes are causing us normalization issues we didn't had before.
I'd request you to undo the normalisation to LF and put back in the normalisation to CRLF.

also I don't see how #139 should fix the issue you're describing, as the normalisation happens in the RegEx Replace and therefore creating the file using New-Item vs. Set-Content shouldn't make a difference, right?

btw, great work on fixing the phantom header line!

@pmooij
Copy link

pmooij commented Oct 20, 2020

furthermore we're getting issues on Unicode characters
image

afaik Set-Content is implicitly using ANSI encoding, causing this Unicode mismatch
while New-Item is using UTF8, handling Unicode characters properly

@RichieBzzzt
Copy link
Contributor Author

@pmooij we are getting issues on our machines where users are using macos, ubuntu and windows. We have build that run tests on windows and ubuntu. I am testing and fixing issues I encounter from other users via pull requests to the module which they then confirm as fixed.

The whole reason we go through this merry dance of altering content is because the databricks api inserts a comment on the first row of every notebook, and we aim to remove it. Part of me would be much happier to accept this comment and and then we can remove most of this [magic].(

$Response = @()
$Response = Get-Content $tempLocalExportPath -Encoding UTF8
$NewResponse = $Response -ne "# Databricks notebook source"
Remove-Item $tempLocalExportPath
if ($Format -eq "SOURCE") {
$ResponseString = ($NewResponse.replace("[^`r]`n", "`n") -Join "`n")
}
if ((Test-Path -PathType Leaf -Path $LocalExportPath) -eq $true) {
Set-Content -path $LocalExportPath -value $ResponseString | Out-Null
}
else {
New-Item -force -path $LocalExportPath -value $ResponseString -type file | Out-Null
}
). Then we don't have to account for any line endings or the minutia of encoding, as frankly it has burnt up a lot of my time trying to make this work consistently across various OS and versions of PowerShell: ie Set-Content can be set to UTF8 but only in later versions of PowerShell, and some users are on corporate laptops that are stuck on 5.1. The disadvantage would be that EVERYONE using this module would be affected by that change first time of downloading notebooks, and then it will work as normal after that.

I'll have a chat with the maintainer, see what he thinks. But in the meantime I suggest you pin yourself to a version that works for you whilst we figure out a solution that works for everyone using the module, or someone else adds a PR that fixes all the issues before we get there.

@pmooij
Copy link

pmooij commented Oct 20, 2020

thanks for you reply Richie. don't get me wrong, I really appreciate all the effort you guys are putting into this module!
skipping the whole top comment line removal and the therewith associated line ending & encoding normalization hustle sounds good to me, sounds good to discuss with Simon.

regarding the disadvantage that EVERYONE would be hit by this change: that's something EVERYONE is experiencing with the move from CRLF to LF normalization as well, so to speak for myself: it wouldn't be a big issue for me to deal with once.

for now i'll add the normalization to CRLF on top in our own scripts so we can stay with the latest version of the module.

for the encoding I don't understand why you could use UTF8 on Get-Content but why you couldn't use UTF8 on Set-Content. I'd say this should still be in place given the current approach.

@RichieBzzzt
Copy link
Contributor Author

Ah it was this issue I was thinking about with the encoding not being available in PowerShell 5.1 So yes set-content with encoding would fix the encoding.

@RichieBzzzt
Copy link
Contributor Author

OK I've got a potential fix here but I'm going to do some manual testing on a couple of different OS's to double check it works.

@pmooij
Copy link

pmooij commented Oct 21, 2020

OK I've got a potential fix here but I'm going to do some manual testing on a couple of different OS's to double check it works.

using [Environment]::NewLine & [system.io.file]::WriteAllText() with the correct UTF8 encoding looks very neat

@RichieBzzzt
Copy link
Contributor Author

RichieBzzzt commented Oct 22, 2020

Although fix with #151 is merged, suggest we keep issue open for a while, see if anything unexpected happens.

@pmooij
Copy link

pmooij commented Oct 23, 2020

herewith I confirm that with #151 the both the line-endings & encoding are both as expected again, great!

compared to the previous approach it's also removing the trailing white line now, this does effects all notebooks once, but it seems cleaner this way.

@simondmorias
Copy link
Member

Awesome. I will upgrade the release from preview to public this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants