Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datashape not Record Type? #580

Closed
mbyim opened this issue Sep 29, 2017 · 3 comments
Closed

Datashape not Record Type? #580

mbyim opened this issue Sep 29, 2017 · 3 comments

Comments

@mbyim
Copy link

mbyim commented Sep 29, 2017

I'm trying to use Odo because I would like to load a few csv's into a sqlite3 database - one of which is over 30gb in size. However, when I try to load the csv's the datashape is not recognized as a record type:

AssertionError: datashape must be Record type, got 0 * {msno: string, is_churn: int64}

I attempted to use discover and resource (as seen in the docs), but this didn't work and it was unsure of types (ie: "?string"). So I then tried to hardcode like so:

odo(f, db, dshape='var * {msno: string, is_churn: int64}')

but still, this didn't fix the assertion error for isrecord().

I also looked at stack overflow, but neither of the two related questions had been answered [1][2].

Apologies if this isn't the place for this, but I'm unsure of where to turn for an answer after consulting the docs and stackoverflow/google.

[1] https://stackoverflow.com/questions/44598799/python-odo-sql-assertionerror-datashape-must-be-record-type-got-0
[2]https://stackoverflow.com/questions/46287776/using-odo-to-load-csv-postgres-on-aws

@makmanalp
Copy link

@mbyim I hit this too, and the solution was to specify the table name in the URI for the destination, e.g. odo("hdfstore://data.h5::/country_partner_year", "sqlite:///test.sqlite::asdf").

For posterity, I did some digging and the error seems to be happening around here:

ds = kwargs.pop('dshape', None)

where ds is being pulled from dshape in the kwargs, but there's no dshape key, there's an expected_dshape one.

@makmanalp
Copy link

Full backtrace:

n [16]: odo("hdfstore://data.h5::/country_partner_year", "sqlite:///test.sqlite", dshape=ds)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-16-7a5767b4c49e> in <module>()
----> 1 odo("hdfstore://data.h5::/country_partner_year", "sqlite:///test.sqlite", dshape=ds)

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/odo/odo.py in odo(source, target, **kwargs)
     89     odo.append.append      - Add things onto existing things
     90     """
---> 91     return into(target, source, **kwargs)

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
    162             self._cache[types] = func
    163         try:
--> 164             return func(*args, **kwargs)
    165 
    166         except MDNotImplementedError:

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/odo/into.py in wrapped(*args, **kwargs)
     41             raise TypeError('dshape argument is not an instance of DataShape')
     42         kwargs['dshape'] = dshape
---> 43         return f(*args, **kwargs)
     44     return wrapped
     45 

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/odo/into.py in into_string_string(a, b, **kwargs)
    147 @validate
    148 def into_string_string(a, b, **kwargs):
--> 149     return into(a, resource(b, **kwargs), **kwargs)
    150 
    151 

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
    162             self._cache[types] = func
    163         try:
--> 164             return func(*args, **kwargs)
    165 
    166         except MDNotImplementedError:

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/odo/into.py in wrapped(*args, **kwargs)
     41             raise TypeError('dshape argument is not an instance of DataShape')
     42         kwargs['dshape'] = dshape
---> 43         return f(*args, **kwargs)
     44     return wrapped
     45 

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/odo/into.py in into_string(uri, b, dshape, **kwargs)
    140     resource_ds = 0 * dshape.subshape[0] if isdimension(dshape[0]) else dshape
    141 
--> 142     a = resource(uri, dshape=resource_ds, expected_dshape=dshape, **kwargs)
    143     return into(a, b, dshape=dshape, **kwargs)
    144 

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/odo/regex.py in __call__(self, s, *args, **kwargs)
     89 
     90     def __call__(self, s, *args, **kwargs):
---> 91         return self.dispatch(s)(s, *args, **kwargs)
     92 
     93     @property

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/odo/backends/sql.py in resource_sql(uri, *args, **kwargs)
    615     if ds:
    616         create_from_datashape(engine, ds, schema=schema,
--> 617                               foreign_keys=foreign_keys)
    618     return engine
    619 

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
    162             self._cache[types] = func
    163         try:
--> 164             return func(*args, **kwargs)
    165 
    166         except MDNotImplementedError:

/nfs/projects_nobackup/c/cidgrowlab/Mali/intl-atlas-api/env/lib/python3.4/site-packages/odo/backends/sql.py in create_from_datashape(engine, ds, schema, foreign_keys, primary_key, **kwargs)
    401 def create_from_datashape(engine, ds, schema=None, foreign_keys=None,
    402                           primary_key=None, **kwargs):
--> 403     assert isrecord(ds), 'datashape must be Record type, got %s' % ds
    404     metadata = metadata_of_engine(engine, schema=schema)
    405     for name, sub_ds in ds[0].dict.items():

AssertionError: datashape must be Record type, got 0 * {
  location_id: categorical[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...], type=int64, ordered=False],
  partner_id: categorical[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...], type=int64, ordered=False],
  year: categorical[[1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, ...], type=int64, ordered=False],
  import_value: float64,
  export_value: float64
  }

@mbyim
Copy link
Author

mbyim commented Oct 3, 2017

Wonderful, that worked for me! I wish I noticed that detail, I also just realized that the answer is on one of the stack overflow questions as well. Thanks for the answer and explanation!

@mbyim mbyim closed this as completed Oct 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants