You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My current use case is the following: I have (large) .ecsv files and I'd like to ingest them into a Postgres database as
a bulk using the COPY ... FROM <file> postgres command. However this requires to know where the actual .csv data starts.
Since the ECSV format is natively supported by astropy my first thought was to use the library to parse the file and access the header and the data.
This was not as straight forward as I though, since the UniversalReadWriteMethods only provide an API adapted to the astropy.Table which I don't need.
I could solve my issue doing the following:
reader=Ecsv()
withopen("file.ecsv", "r") asfd:
lines=fd.readlines() # store all lines into a listreader.header.get_cols(lines) # set the table and columns metadata from the headerreader.data.data_lines=reader.data.process_lines(lines) # extract and store the lines representing the data
I can now access the header and the data like:
reader.data.data_linesreader.header.cols
A clean API to access these objects -- without having to deal with an astropy.Table -- may be usefull: using astropy as a parser for .ecsv files.
Nevertheless I still face an issue. The file I deal with are quite large (several GB) and must be ingested as a stream. Alike: https://stackoverflow.com/a/51882751
csv_file_name='/home/user/some_file.csv'sql="COPY table_name FROM STDIN DELIMITER '|' CSV HEADER"cursor.copy_expert(sql, open(csv_file_name, "r"))
Therefore I am actually wondering if it would be possible to implement the following API:
csv_filename='/home/user/some_file.csv'sql="COPY table_name FROM STDIN DELIMITER '|' CSV HEADER"reader=Ecsv()
withopen(csv_filename, "r") asfd:
# scan until end of header (seek points to beginning of data)reader.header.get_cols(fd) # --> this raises an issue# Ingest the data without storing entire file in memorycursor.copy_expert(sql, fd)
The former would work alike:
reader=Ecsv()
withopen("file.ecsv", "r") asfd:
reader.header.get_cols(fd) # --> this raises an issuereader.data.data_lines=reader.data.process_lines(fd)
The issue raised is the following: astropy.io.ascii.core.InconsistentTableError: column names from ECSV header [<colnames>] do not match names from header line of CSV data [<first-data-row>]
To me it is not clear why this issue is thrown. The pointer seems to be one line ahead.
Additional context
The text was updated successfully, but these errors were encountered:
Welcome to Astropy 👋 and thank you for your first issue!
A project member will respond to you as soon as possible; in the meantime, please double-check the guidelines for submitting issues and make sure you've provided the requested details.
GitHub issues in the Astropy repository are used to track bug reports and feature requests; If your issue poses a question about how to use Astropy, please instead raise your question in the Astropy Discourse user forum and close this issue.
If you feel that this issue has not been responded to in a timely manner, please leave a comment mentioning our software support engineer @embray, or send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org.
# Read the first non-commented line of table and split to get the CSV# header column names. This is essentially what the Basic reader does.header_line=next(super().process_lines(raw_lines))
header_names=next(self.splitter([header_line]))
# Check for consistency of the ECSV vs. CSV header column namesifheader_names!=self.names:
raisecore.InconsistentTableError('column names from ECSV header {} do not ''match names from header line of CSV data {}'
.format(self.names, header_names))
It is not clear to me why the header_line needs a next here. And it seems that this is the reason why the issue is raised. It takes the first data-line instead of the header line in case of a streamed data (raw_lines is not a list but an io.File).
Description
My current use case is the following: I have (large) .ecsv files and I'd like to ingest them into a Postgres database as
a bulk using the
COPY ... FROM <file>
postgres command. However this requires to know where the actual .csv data starts.Since the ECSV format is natively supported by astropy my first thought was to use the library to parse the file and access the header and the data.
This was not as straight forward as I though, since the
UniversalReadWriteMethods
only provide an API adapted to theastropy.Table
which I don't need.I could solve my issue doing the following:
I can now access the header and the data like:
A clean API to access these objects -- without having to deal with an
astropy.Table
-- may be usefull: usingastropy
as a parser for.ecsv
files.Nevertheless I still face an issue. The file I deal with are quite large (several GB) and must be ingested as a stream. Alike:
https://stackoverflow.com/a/51882751
Therefore I am actually wondering if it would be possible to implement the following API:
The former would work alike:
The issue raised is the following:
astropy.io.ascii.core.InconsistentTableError: column names from ECSV header [<colnames>] do not match names from header line of CSV data [<first-data-row>]
To me it is not clear why this issue is thrown. The pointer seems to be one line ahead.
Additional context
The text was updated successfully, but these errors were encountered: