New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Specific select filters to run for source table #316

Open
pvdvreede opened this Issue Dec 1, 2015 · 20 comments

Comments

Projects
None yet
2 participants
@pvdvreede

pvdvreede commented Dec 1, 2015

HI there,

It would be great if there was some way to specify extra select where clauses that are executed on the source tables, either globally or for specific tables.

We have a ms sql server db with some huge tables, and the db seems to hang when trying to run a select across the whole table. We dont even need all the data, it would be great to just select rows that have been updated in the last year for certain tables...

Thanks.

@dimitri dimitri added the WishList label Dec 2, 2015

@dimitri

This comment has been minimized.

Owner

dimitri commented Dec 2, 2015

In the MySQL support we have the MATERIALIZE VIEWS option, where we SELECT * FROM view basically and retrieve the view resultset as the data to load in PostgreSQL, rather than the data of the base tables... I would think that it would fit your use case pretty well, allowing you to do any SQL processing of the data, not just a WHERE clause.

The main limitation I have for MS SQL support (bug fixes and enhancement) is having a working test environment with a database to play with containing real data etc. If you are in a position to provide the pgloader projet with that it would help tremendously!

@pvdvreede

This comment has been minimized.

pvdvreede commented Dec 2, 2015

Thanks Dimitri,

That materialized views feature is exactly what I need! I understand from reading the info about it why you cant just do aribtrary SQL.

I tried to use the materialized view option against a MSSQL database and got the following parsing error, is that option available when used against MS SQL as opposed to MYSQL?

An unhandled error condition has been signalled: At

    MATERIALIZE VIEWS test1 AS
    ^ (Line 16, Column 3, Position 399)

Could not parse KW-EXCLUDING. Expected:

     the character Space
  or the character Tab
  or the character Newline
  or the string "--"
  or the string "/*"
  or the string "excluding", disregarding case

Reached via

     EXCLUDING-LIKE-IN-SCHEMA
  -> (AND EXCLUDING-IN-SCHEMA (* EXCLUDING-IN-SCHEMA))
  -> EXCLUDING-IN-SCHEMA
  -> (AND KW-EXCLUDING KW-TABLE KW-NAMES KW-LIKE FILTER-LIST-LIKE KW-IN
          KW-SCHEMA QUOTED-NAMESTRING)
  -> KW-EXCLUDING
  -> (AND IGNORE-WHITESPACE (~ "excluding") IGNORE-WHITESPACE)
  -> "excluding"

Could not parse KW-INCLUDING. Expected:

     the character Space
  or the character Tab
  or the character Newline
  or the string "--"
  or the string "/*"
  or the string "including", disregarding case
@dimitri

This comment has been minimized.

Owner

dimitri commented Dec 3, 2015

The option is now only available for MySQL, I will see what it takes to make it available for MS SQL. Always nervous about adding code I can't test myself...

@pvdvreede

This comment has been minimized.

pvdvreede commented Dec 3, 2015

Cool thanks!

Im not able to give you a database with data to test with, but if you do the work on a branch, I am more than happy to test it before you merge it to master? Just let me know on this issue and I can give it a go.

@dimitri

This comment has been minimized.

Owner

dimitri commented Dec 3, 2015

Will do that then, next week tho, early week-end here ;-)

@pvdvreede

This comment has been minimized.

pvdvreede commented Dec 11, 2015

Hi Dimitri,

Just checking in to see if you have had any luck adding the views option to SQL Server, or if there is any code you want me to test yet?

Thanks.

@dimitri

This comment has been minimized.

Owner

dimitri commented Dec 14, 2015

No luck yet, still on my list, might have to wait another couple of weeks. I intend on delivering the feature on a branch for you to test, and when it's all good, to merge to master!

dimitri added a commit that referenced this issue Dec 30, 2015

Implement PostgreSQL catalogs data structure.
In order to share more code in between the different source types,
finally have a go at the quite horrible mess of anonymous data
structures floating around.

Having a catalog and schema instances not only allows for code cleanup,
but will also allow to implement some bug fixes and wishlist items such
as mapping tables from a schema to another one.

Also, supporting database sources having a notion of "schema" (in
between "catalog" and "table") should get easier, including getting
on-par with MySQL in the MS SQL support (materialized views has been
asked for already).

See #320, #316, #224 for references and a notion of progress being made.

In passing, also clean up the copy-databases methods for database source
types, so that they all use a fetch-metadata generic function and a
prepare-pgsql-database and a complete-pgsql-database generic function.
Actually, a single method does the job here.

The responsibility of introspecting the source to populate the internal
catalog/schema representation is now held by the fetch-metadata generic
function, which in turn will call the specialized versions of
list-all-columns and friends implementations. Once the catalog has been
fetched, an explicit CAST call is then needed before we can continue.

Finally, the fields/columns/transforms slots in the copy objects are
still being used by the operative code, so the internal catalog
representation is only used up to starting the data copy step, where the
copy class instances are then all that's used.

This might be refactored again in a follow-up patch.
@dimitri

This comment has been minimized.

Owner

dimitri commented Dec 30, 2015

Could you test the current sources again? they don't contain the MATERIALIZE VIEWS option yet, but it is now quite easy to consider adding them. For that I will need to figure out how to list views from the MS SQL database... in MySQL we can use the same query for tables and views...

@pvdvreede

This comment has been minimized.

pvdvreede commented Dec 31, 2015

Thanks,

I have tried to run it on current master branch and am getting this error on boot:

An unhandled error condition has been signalled:
   :CREATE-SCHEMAS is not a known argument keyword.

Date/time: 2015-12-31-01:08An unhandled error condition has been signalled:
                              :CREATE-SCHEMAS is not a known argument keyword.

I ran the command: pgloader -v -d myconfig.load

It does not even start loading data - it is happening on boot up. Let me know if you need more info.

@pvdvreede

This comment has been minimized.

pvdvreede commented Dec 31, 2015

Also here is how I got all the views for a database:

USE <yourdb>
GO

select 
  TABLE_CATALOG -- Database name
  , TABLE_SCHEMA -- schema
  , TABLE_NAME -- this is actually the view name
  , VIEW_DEFINITION --  the create view statement
  , CHECK_OPTION
  , IS_UPDATABLE
from INFORMATION_SCHEMA.VIEWS

dimitri added a commit that referenced this issue Jan 1, 2016

Install the :create-schemas option back.
In the previous refactoring patch that option mistakenly went away,
although it is still needed for MS SQL and it is planned to make use of
it in the other source types too...

See #316 for reference.
@dimitri

This comment has been minimized.

Owner

dimitri commented Jan 1, 2016

This commit should fix the immediate problem, I fear some more will need an hand. If your next feedback contains another comparable mistake I will then fire my MS SQL testing environment (that tests only some schema definitions, but that's already that)...

About the views query, can you test the list-all-columns query as found at https://github.com/dimitri/pgloader/blob/master/src/sources/mssql/mssql-schema.lisp#L70 and modify it so that it works with views instead? I will then implement the MATERIALIZE-VIEWS option from there!

@pvdvreede

This comment has been minimized.

pvdvreede commented Jan 5, 2016

Thanks!

Re the query, I have played with that query, and it looks like all I had to do was filter the table_type to view so it seems like you just need to pass in :view to the function (or however lisp works as I dont understand it): https://github.com/dimitri/pgloader/blob/master/src/sources/mssql/mssql-schema.lisp#L72 and that should return all the columns for views.

@pvdvreede

This comment has been minimized.

pvdvreede commented Jan 5, 2016

With the new build on master I am now getting the following error when I boot up:

An unhandled error condition has been signalled:
   Undefined alien: "SSL_CTX_set_default_verify_dir"

Is this something I am doing wrong?

@dimitri

This comment has been minimized.

Owner

dimitri commented Jan 6, 2016

Looks like it failed to find and load libssl now?

@pvdvreede

This comment has been minimized.

pvdvreede commented Jan 6, 2016

It seems weird because it is the same docker image I am running it in when it was working before these changes. Here is the Dockerfile I am using, can you see any reason why it wouldnt work when libssl-dev is already installed?

FROM debian:jessie
RUN apt-get update
RUN apt-get install -y sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev git libssl-dev
RUN mkdir -p /pgloader
WORKDIR /pgloader
RUN curl -L https://github.com/dimitri/pgloader/archive/master.tar.gz | tar -xvz
RUN cd /pgloader/pgloader-master && make pgloader
ENV PATH /pgloader/pgloader-master/build/bin:$PATH
COPY freetds.conf /etc/
ENV FREETDS /etc/freetds.conf
@dimitri

This comment has been minimized.

Owner

dimitri commented Jan 7, 2016

In the meantime I did add a Dockerfile in pgloader: https://github.com/dimitri/pgloader/blob/master/Dockerfile; feel free to use it! I think you're missing the runtime support for SSL from the package openssl.

@pvdvreede

This comment has been minimized.

pvdvreede commented Jan 8, 2016

hmmm, this is very weird, I have tried adding all your packages in your Dockerfile, including all the ssl ones, as well as updating my docker images caches and debian package caches and it still says:

An unhandled error condition has been signalled:
   Undefined alien: "SSL_CTX_set_default_verify_dir"

; 
; compilation unit aborted
;   caught 1 fatal ERROR condition

Could it be something in my config file, or my connections? Im not sure what else has changed since it was working...

FROM debian:jessie
RUN apt-get update
RUN apt-get install -y \
  wget curl make git bzip2 time libzip-dev libssl1.0.0 openssl \
  patch unzip libsqlite3-dev gawk freetds-dev sbcl
RUN mkdir -p /pgloader
WORKDIR /pgloader
RUN curl -L https://github.com/dimitri/pgloader/archive/master.tar.gz | tar -xvz
RUN cd /pgloader/pgloader-master && make pgloader
ENV PATH /pgloader/pgloader-master/build/bin:$PATH
COPY freetds.conf /etc/
ENV FREETDS /etc/freetds.conf
WORKDIR /app
@dimitri

This comment has been minimized.

Owner

dimitri commented Jan 8, 2016

I just built an image docker from your docker file, just commenting out the two lines about freetds.conf, and it works well here:

# docker run --rm --name pgloader pgloader:316 pgloader --version
pgloader version "3.3.0.50~devel"
compiled with SBCL 1.2.4.debian
@dimitri

This comment has been minimized.

Owner

dimitri commented Jan 10, 2016

See cl-plus-ssl/cl-plus-ssl#33 ; I guess the apt-get update just gave you a newer version of OpenSSL, and maybe the container/Dockerfile caching system makes it that the build-time and run-time versions of libssl1.0.0 are not the same?

@dimitri

This comment has been minimized.

Owner

dimitri commented Aug 7, 2016

Added the “sponsor needed” tag. If you want Materialize View option for MS SQL, consider sponsoring the project thanks to its Moral License available here: http://pgloader.io/pgloader-moral-license.html. Send me an email (see my github's profile) to see about the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment