Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to query with Oledb #1101

Merged
merged 24 commits into from Nov 12, 2021
Merged

Conversation

royashbrook
Copy link
Contributor

@royashbrook royashbrook commented Oct 29, 2021

Added basic function based on read-clipboard ps1 structure. Added some pester tests.

To run tests I switched into public and ran invoke-pester .\Read-OleDbData.Tests.ps1

Seems to be failing the build tests, but I didn't want to add it into the main project files before review.

@royashbrook royashbrook marked this pull request as ready for review October 29, 2021 17:31
@dfinke dfinke self-assigned this Oct 29, 2021
@dfinke
Copy link
Owner

dfinke commented Oct 29, 2021

Do these tests fail for you locally?

@royashbrook
Copy link
Contributor Author

royashbrook commented Oct 29, 2021

Do these tests fail for you locally?

@dfinke the pester tests don't fail for me for that new ps1. i didn't try running the full gambit on the project or anything yet as i figured you may want to take a look at it first. when i looked at the pipeline errors, they all looked like they were failing because the command wasn't setup yet in the project. see the screenshot below.

i can go ahead and move forward with resolving those, just didn't want to go nuts until you had an initial look at it.

image

here's a snip of running the tests successfully for just this command:

image

@dfinke
Copy link
Owner

dfinke commented Oct 29, 2021

Cool, yes, just checking. Proceed at the pace you are comfortable with. Thanks for the PR!

@royashbrook
Copy link
Contributor Author

These pass locally, but I need to add an exclusion for when the ace driver isn't there. I don't think it would be appropriate to add ace in as a dependency. More like something you should get a warning on if it's not there and you run it, and something the tests should check for and then just skip testing if it's not there.

@royashbrook
Copy link
Contributor Author

@dfinke all tests passing.

@royashbrook
Copy link
Contributor Author

royashbrook commented Nov 1, 2021

I went ahead and created a very basic module for this for the time being.

https://github.com/royashbrook/Read-OleDbData
https://www.powershellgallery.com/packages/Read-OleDbData/1.0.0.0

@dfinke
Copy link
Owner

dfinke commented Nov 1, 2021

Btw, I asked Jaykul, and he created a #importexcel channel on his PowerShell discord server. It's free, check it out. https://discord.com/invite/powershell

@royashbrook
Copy link
Contributor Author

Awesome! I joined, thank you.

@dfinke
Copy link
Owner

dfinke commented Nov 2, 2021

I like the way you used $ACEnotWorking to skip tests. Haha, I was writing a question here about running on Linux, then looked at the function and you handle it, terrific! I'll make more time to review when I can.

Thanks for the PR, looking forward to trying it all out.

@dfinke
Copy link
Owner

dfinke commented Nov 3, 2021

Thinking out loud.

It'd be cool to use Read-OleDbData out of the box like this. Thoughts?

Read-OleDbData ./test.xlsx 'select ROUND(F1) as [A1] from [sheet3$A1:A1]'

@royashbrook
Copy link
Contributor Author

Thinking out loud.

It'd be cool to use Read-OleDbData out of the box like this. Thoughts?

Read-OleDbData ./test.xlsx 'select ROUND(F1) as [A1] from [sheet3$A1:A1]'

I think so. Probably want to change the name to something like Read-XlsxUsingOleDb. Could include another command that way that just packages it up. The raw command could just be an internal really. Could also setup the params to work like that.

The unfortunate thing with no connection string is some of the ways of how oledb will treat excel depends on the connection string. So if you want headers (I'm not using them here) have to update the connection string, if you want to treat things as native types or not, more changes, etc.

I would say that as a broader use module, all of those are probably useful to someone. Whatever you think would be good for an mvp is good with me. It isn't too tough (I think) to support more use cases, especially the ones we mentioned.

@royashbrook
Copy link
Contributor Author

added a helper method with the simpler syntax. definitely makes the calls shorter! i figured adding a helper method for this would still give people the ability to do basically anything with the underlying command, but default to the more typical use case the way you noted.

PS C:\gh\ImportExcel> Read-XlsxUsingOleDb .\__tests__\ImportExcelTests\TestData1.xlsx "select * from [sheet1`$b3:b3]"

F1
--
 4

PS C:\gh\ImportExcel> cd .\__tests__\ImportExcelTests\                                                               
PS C:\gh\ImportExcel\__tests__\ImportExcelTests> Read-XlsxUsingOleDb TestData1.xlsx "select * from [sheet1`$b3:b3]"                             

F1
--
 4

PS C:\gh\ImportExcel\__tests__\ImportExcelTests> Read-XlsxUsingOleDb TestData1.xlsx "select 1" 

Expr1000
--------
       1

@dfinke
Copy link
Owner

dfinke commented Nov 3, 2021

"helper" functions work great. Especially when layering is in play. Read-XlsxUsingOleDb, I think the function names needs a nudge. Not sure what, mayge along the lines of Read-Excel... works with Import, and Export. I see why *UsingOleDb is needed. Wish Query was an accepted verb, Query-Excel feels like it is a great fit. A possible fallback is Invoke-ExcelQuery.

@royashbrook
Copy link
Contributor Author

royashbrook commented Nov 3, 2021

Yep, it's very literal at the moment. No reason that it has to be, especially since it's a helper. Can just redirect people to the underlying function in the help if they need it. And I don't know that it's really a requirement that folks using the method actually know that it's using oledb. It's easy enough to add to help for anyone who's curious. The Xlsx is similar because this won't work with a regular Xls file (I believe). But that may be redundant nowadays.

I use 'read' for this instead of 'invoke' just based on the stuff in the ms docs:

Use the Read verb to open a resource and extract information contained within.

Use the Invoke verb to perform synchronous operations, such as running a command and waiting for it to end.

That being said, Invoke is consistent with Invoke-Sqlcmd, so would definitely seem natural. The only issue there may be that someone may want to 'update' something. While technically that would work here, I probably wouldn't use a dataadapter and fill to a datatable with that and would maybe add some flags to use appropriate object types.

Can always note the use cases in the examples and if anyone asks, can add more functionality as needed. I personally only use this for reads as the writing has been a bit janky for my taste. In places where I write, I just use the regular ImportExcel cmdlets and avoid the oledb, or for some older jobs I still use epplus until I move them over to the module.

I am also totally fine with using a 'non-accepted' verb like Query-Excel, especially since it's a 'helper' function anyway. Or could also alias it. I have a utility function I use in a lot of my jobs that has a function name of Add-PrefixForLogging so it's nice and compliant with the right verbs, but I alias it as 'L' for logging and that's what I actually use. =P

I actually used to always use 'get' in most situations like this as it seemed more natural to me when reading the code, but eventually moved to read to be consistent with the guidelines.

@royashbrook
Copy link
Contributor Author

I think from the accepted verb option, Invoke-ExcelQuery is probably the most intuitive.

I added it as it's own function just to isolate it, but maybe it should just be an optional -Query param on Import-Excel? From a 'helper' standpoint, that seems like it may be more intuitive. I'm not sure how much demand there is for something like this, but if you think it would be useful, that may be a better place for it.

@dfinke
Copy link
Owner

dfinke commented Nov 4, 2021

Thanks. On adding -Query I like the idea, need to think it thru from a support standpoint. Could get lots of question on many levels, driver installs etc.

@royashbrook
Copy link
Contributor Author

Yeah, that was my thought as well. I figured my use was kind of an edge case and if a bunch of people asked about it, maybe later it would make sense. I think it's easy enough to leave the underlying helper stuff and then just later implement a helper switch if that made sense.

Oledb itself is not terribly descriptive/clear on some errors especially with the query formation. I suppose exhaustive examples could take care of that, but probably no way to cover every single instance and no real way to be sure people
Would read them.

I am game for whatever you think is best.

@dfinke
Copy link
Owner

dfinke commented Nov 4, 2021

Some Things

Good stuff!

  • I don't see Invoke-ExcelQuery in the psd1 in FunctionsToExport
  • Looks like on the rename, the call to Read-OleDbData got renamed to Invoke-ExcelQuery
  • Did these tests run for you locally, it should fail
  • What happens if folks don't have don't have the .NET installed with that driver?

Do the tests get skipped when the CI pipeline runs on Windows?

I made those first two change. I'd prefer you update the PR.

Then I was able to do the below. Very cool.

Import-Module ./importexcel.psd1 -force 

$xlFile = 'D:\mygit\ImportExcel\__tests__\Read-OleDbDataTests\Read-OleDbData.xlsx'

Get-ExcelSheetInfo $xlFile | ForEach-Object {
    Invoke-ExcelQuery $xlFile "select $($_.Index)"
}

@royashbrook
Copy link
Contributor Author

Thanks. On adding -Query I like the idea, need to think it thru from a support standpoint. Could get lots of question on many levels, driver installs etc.

i think the best approach on this, if it was added, would be to likely link to a readme of some kind or maybe a wiki page with a lot of details and examples. I mean you could put them in help, but maybe just some basic ones and then have a lot more on a page on the repo somewhere. that way someone could, presumably, submit a bug or feature request or ask for help on there. or maybe even refer them to discord or something.

I have never tried to setup a module where you can trigger the get-help -online stuff, but i'm thinking like that maybe.

current basic module I have installed locally looks like this:

╰─ get-help read-oledbdata                                                                                           ─╯

NAME
    Read-OleDbData

SYNTAX
    Read-OleDbData [[-ConnectionString] <string>] [[-SqlStatement] <string>]


ALIASES
    None


REMARKS
    None

but I think I'd want the remarks section to look more like a standard command, or at least have a link or something to follow like this:

REMARKS
    Get-Help cannot find the Help files for this cmdlet on this computer. It is displaying only partial help.
        -- To download and install Help files for the module that includes this cmdlet, use Update-Help.
        -- To view the Help topic for this cmdlet online, type: "Get-Help Copy-Item -Online" or
           go to https://go.microsoft.com/fwlink/?LinkID=2096990.

@royashbrook
Copy link
Contributor Author

Some Things

Good stuff!

  • I don't see Invoke-ExcelQuery in the psd1 in FunctionsToExport
  • Looks like on the rename, the call to Read-OleDbData got renamed to Invoke-ExcelQuery
  • Did these tests run for you locally, it should fail
  • What happens if folks don't have don't have the .NET installed with that driver?

Do the tests get skipped when the CI pipeline runs on Windows?

I made those first two change. I'd prefer you update the PR.

Then I was able to do the below. Very cool.

Import-Module ./importexcel.psd1 -force 

$xlFile = 'D:\mygit\ImportExcel\__tests__\Read-OleDbDataTests\Read-OleDbData.xlsx'

Get-ExcelSheetInfo $xlFile | ForEach-Object {
    Invoke-ExcelQuery $xlFile "select $($_.Index)"
}

we must have been typing at the same time as i submitted my other reply just as this one popped up. let me take care of these.

@royashbrook
Copy link
Contributor Author

royashbrook commented Nov 5, 2021

  • I don't see Invoke-ExcelQuery in the psd1 in FunctionsToExport

Fixed! I also added the underlying function to the exported functions.

  • Looks like on the rename, the call to Read-OleDbData got renamed to Invoke-ExcelQuery

Fixed!

  • Did these tests run for you locally, it should fail

Yes, and I thought that was kind of odd. I originally thought it was because I was running in a ps where I had run the ps1 originally, but after killing it, it was still fine. I poked around and found this in the ImportExcel.psm1:

foreach ($directory in @('Private', 'Public', 'Charting', 'InferData', 'Pivot')) {
    Get-ChildItem -Path "$PSScriptRoot\$directory\*.ps1" | ForEach-Object { . $_.FullName }
}

And since I was force loading the module up top, it seemed to reload these each time. I just snagged that top matter when I was poking around in the other Public methods looking for one to use as a sort of a template.

  • What happens if folks don't have don't have the .NET installed with that driver?

They'll just get the errors thrown out. If no oledb classes, they'll get some lib error from dotnet about a bad class. If no provider, they'll get a bad provider error I believe. I don't currently have any machines that are missing the dotnet libs or the provider so I am not 100% sure what the errors look like. They'll also get a variety of oledb errors if there is poorly crafted sql. I currently just let them come out as I prefer to see the errors. We can try and catch some of those and provide some guidance, but I just let them bubble out so the error is easier to search for. If I was going to catch them, it might be better to just give a link to a wiki or readme with frequent issues that could be updated without having to update the method itself.

Do the tests get skipped when the CI pipeline runs on Windows?

yes because there won't be any providers installed like that. it just does a basic test to open the file and if that doesn't work, it will set the flag re:ace working. you have to have the access 2010 runtime installed to use it, although i see there is now a 2016 version too:

It's been a few years, but I think I've only installed the 2010 one.

I kind of viewed this use as an edge case. If you think more people would use it then maybe it's worth doing some work to get the provider stuff in as part of module installation. Or maybe, at least, upon the first run of this command some checks and an auto install. I'm not 100% sure what that will look like as I only use this functionality on servers I control, so I just have the prereqs documented as part of the installation procedure.

whatever you think makes the most sense works for me.

edit: i should actually say on the CI skip test thing, I am assuming it is skipping those tests, but i don't actually know. i see it skipping and not running some tests in the results, but not sure if those are my tests. they pass locally and it should only skip 10 tests if ace isn't working. i see it skipping 16 in the results, not sure if you know what the norm should be.

image

@royashbrook
Copy link
Contributor Author

Good stuff, I'm playing with the Invoke-ExcelQuery. A few things:

  1. For the local skip, leave as is

👍

  1. I like the up front check. Easier for the user to know before they get errors

👍 Agreed. Will add something

  1. Auto installs need to be avoided. You need to prompt a user before installing etc

👍

  1. I'm ok with an auto install in the tests. I control that with the CI pipeline. Knowing if a function name was misspelled etc is good to catch in the tests before publishing the module

👍 Sounds good. I'll take a stab at it.

  1. Could you add a InvokeExcelQuery directory in the Examples folder and add an example

👍 Yep. I'll pull all of the other examples and tests in there and work under this name.

  1. Do you blog? This would be a good blog post. Something simple, not elaborate.

👍 Yep. Occasionally, anyway @ ashbrook.io =P Definitely!

Experimenting with this

Something along these lines would be good for the example and a post. The xl file is attached.

$q = 'select * from [sheet1$A:A]'
$q = 'select * from [sheet1$]'
$q = 'select * from [sheet1$A2:E11]'
$q = 'select F2,F5 from [sheet1$A2:E11]'
$q = 'select * from [sheet1$A2:E11] where F2 = "Grocery"'
$q = 'select F2 as [Category], F5 as [Discount], F5*2 as [DiscountPlus] from [sheet1$A2:E11]'

Invoke-ExcelQuery .\testOleDb.xlsx -Query $q | Format-Table

testOleDb.xlsx

Looks good to me. Will update appropriately.

@royashbrook
Copy link
Contributor Author

Good stuff, I'm playing with the Invoke-ExcelQuery. A few things:

  1. For the local skip, leave as is

i'm not sure how to adjust the tests to check for local only. i didn't see any tests that seemed to indicate that, but i didn't go through every one.

  1. I like the up front check. Easier for the user to know before they get errors
  • I left the readoledb one as is just as a check for linux/macos because you could technically use it for any oledb source. i could make this an internal method and then only support the Invoke-ExcelQuery command as public, but i figured i would just leave it as public and then just in case someone wanted to do something funny they could still use it.
  • i updated the invoke cmd to check for the provider and give a warning message before returning with the ms url for the ace install
  • i also updated the invoke command to have the same check for the system.data.oledb call which should throw an error if no oledb is installed or the system.data space isn't there.
  1. Auto installs need to be avoided. You need to prompt a user before installing etc

👍

  1. I'm ok with an auto install in the tests. I control that with the CI pipeline. Knowing if a function name was misspelled etc is good to catch in the tests before publishing the module

i couldn't find a direct link to download the exe from ms. i can download the exe and add it to the repo and see about installing it from there. i can take a stab at the install for ci. i haven't ever included an exe auto install in a pipeline, so you may see a million build messages as I mess around with things. if you have an idea on how to do it, let me know otherwise I'll play around with it. I found this thread which seems to have a similar situation and does have some commands someone suggested: https://stackoverflow.com/questions/65081902/how-to-install-a-odbc-driver-on-the-azure-integration-runtime-machine. But the commands are basically similar to what you have in the installpowershell script which is what i was going to use as a template, although I'm not exactly clear on how that is invoked =P

for now, i switched to the newer test check so there is a warning in the tests during discovery that will state why the tests are being skipped

  1. Could you add a InvokeExcelQuery directory in the Examples folder and add an example

👍 Added

  1. Do you blog? This would be a good blog post. Something simple, not elaborate.

👍 Once it's done. =P

@dfinke
Copy link
Owner

dfinke commented Nov 9, 2021

Thanks for all of that. I'd prefer errors when the tests run, but it's no worth the effort. Yeah, for the installpowershell script, I vaguely remember taking that out at one point 🤷‍♂️

Having the warning in the function is good enough. If more is needed down the road, it'll get addressed.

Again, thanks for all of this. I want to let it sit for a day or so, I'll come back to it and exercise it. See what pops out ;-)

@royashbrook
Copy link
Contributor Author

sounds good. could always do some negative tests. i haven't tried bubbling up or doing some global/script level var sets in tests, but i think that might do the trick. right now i'm setting that skip var. i think some testing would guarantee it for sure, but i think the pester tests are in an ordered list so could maybe set the $skip for later tests based on the earlier ones. i guess that doesn't buy much more than is happening here, but then it's done in the testing loop instead of in discovery.

there's got to be a way to add ace into the pipeline to test it. i didn't want to go nuts fiddling with it unless you thought it was worth it. i don't love that the testing basically gets skipped in CI, just not sure exactly how to load ace into that pipeline.

@dfinke
Copy link
Owner

dfinke commented Nov 9, 2021

Agreed. I have not tried installing anything other than ps modules in the test run. At this point, it is an MVP. If you'd like, create another PR for experimentation, the AZDO is free, because this is a public project.

@royashbrook
Copy link
Contributor Author

I'll leave that for after the blog posting. Once the PR is integrated, I need to update the importexcel in house and update some jobs to swap over to this function instead of the custom read-oledb function i have now. =)

@royashbrook
Copy link
Contributor Author

I'm seeing about getting this working in a sandbox instance. Seems to be good to go, but the installer is pretty large so seeing about sticking it in a repo on here just to host it, rather than including it. Not sure how well that will work, but would be nice to have the ability to not skip the tests. I can't see a way to get the downloader from ms directly in a script, and the exe didn't seem to really want to install silently. I was able to just extract the msi from the exe file and it seems to work properly.

I'm basically using the same method in your InstallPowershell.ps1 script. See below for output from win10 sandbox:

PS C:\Users\WDAGUtilityAccount> (New-Object system.data.oledb.oledbenumerator).GetElements().SOURCES_NAME
SQLOLEDB
MSDataShape
ADsDSOObject
Windows Search Data Source
MSDASQL
MSDASQL Enumerator
SQLOLEDB Enumerator
MSDAOSP
PS C:\Users\WDAGUtilityAccount> $MsiPath = (gci C:\Users\WDAGUtilityAccount\Desktop\Downloads\2016_AccessDatabaseEngine_X64\aceredist.msi).fullname
PS C:\Users\WDAGUtilityAccount> Start-Process 'msiexec.exe' -Wait -ArgumentList "/i $MsiPath /quiet"
PS C:\Users\WDAGUtilityAccount> (New-Object system.data.oledb.oledbenumerator).GetElements().SOURCES_NAME
SQLOLEDB
MSDataShape
Microsoft.ACE.OLEDB.12.0
Microsoft.ACE.OLEDB.16.0
ADsDSOObject
Windows Search Data Source
MSDASQL
MSDASQL Enumerator
SQLOLEDB Enumerator
MSDAOSP
``

@royashbrook royashbrook changed the title Add basic function, tests, and sample file Add ability to query with Oledb Nov 12, 2021
@royashbrook
Copy link
Contributor Author

I got it up on github using lfs, but after looking at the way that works, we'll run out of bandwidth pretty fast. Only 1GB for free and the install is almost 100mb. We could host it somewhere else, but I think the right way to do things is probably to have a custom image setup with the ace drivers installed.

I have always used a process similar to what we're already doing where I just use a base image and modify it. I have built some custom docker containers in the past for something like this, but I think that may all be a bit much for an MVP.

I'm going to just leave it as is for now unless you think otherwise. =)

@dfinke
Copy link
Owner

dfinke commented Nov 12, 2021

This is great! Thanks for adding it . I tweaked the example to output the $query. I'm going to merge and publish it. I have your github and twitter links, let me know of any other links to add, blog, linkedin etc.

Thanks again. Plus, great work. A pleasure to collaborate :-)

@royashbrook
Copy link
Contributor Author

Thank you, the feeling is mutual!

linkedin: https://linkedin.com/in/royashbrook
blog: https://ashbrook.io

=)

@dfinke dfinke merged commit 43b6ebf into dfinke:master Nov 12, 2021
@royashbrook royashbrook deleted the Read-OleDbData branch November 16, 2021 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants