Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING(encoding/csv): add return type to csv's parse and remove a parse func from args #1724

Merged

Conversation

mitch292
Copy link
Contributor

Resolves #881

Took from a comment in the issue by @dsherret and removed the ability to pass a parse arg to the function which gave it the responsibility of parsing and, optionally, data transformation. Let me know wha you think!

* This is executed on each entry of the header.
* This can be combined with the Parse function of the rows.
*/
parse?: (input: string) => unknown;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be reading too much into the comment thread in #881. To keep the logic consistent I removed the parse function from columns as well. Should this be done?

Now, if the parse is removed from column options maybe the function signature for columns?: string[] | ColumnOptions[] should just be simplified to columns?: string[] as below will evaluate to the same.

parse(data, { columns: ['a', 'b', 'c']});

parse(data, { columns: [{ name: 'a' }, { name: 'b' }, { name: 'c' } ] });

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of removing this

Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitch292
Thanks for working on this! Left a few comments. I think we can do a little further better typing

* This is executed on each entry of the header.
* This can be combined with the Parse function of the rows.
*/
parse?: (input: string) => unknown;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of removing this

*/
export async function parse(
input: string | BufReader,
opt: ParseOptions = {
skipFirstRow: false,
},
): Promise<unknown[]> {
): Promise<string[][] | Record<string, unknown>[]> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we do overloading like the below, it would be able to return more correct type depending of the options:

export async function parse(
  input: string | BufReader
): Promise<string[][]>;
export async function parse(
  input: string | BufReader,
  opt: Omit<ParseOptions, "columns" | "skipFirstRow">,
): Promise<string[][]>;
export async function parse(
  input: string | BufReader,
  opt: Omit<ParseOptions, "columns"> & {
    columns: string[] | ColumnOptions[],
  },
): Promise<Record<string, unknown>[]>;
export async function parse(
  input: string | BufReader,
  opt: Omit<ParseOptions, "skipFirstRow"> & {
    skipFirstRow: true,
  },
): Promise<Record<string, unknown>[]>;
export async function parse(
  input: string | BufReader,
  opt: ParseOptions = {
    skipFirstRow: false,
  },
): Promise<string[][] | Record<string, unknown>[]> { ... }

@mitch292
Copy link
Contributor Author

Added some function overloads and updated the readme as well!

Copy link
Member

@dsherret dsherret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Yeah, people can just take the output and put it through their own parse function now in a way that works well with TypeScript. This is nicer than typing everything as unknown.

Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work!

@kt3k kt3k added this to the 1.18 milestone Dec 23, 2021
@kt3k kt3k changed the title feat(encoding): Adds return type to csv's parse and remove a parse func from args BREAKING(encoding/csv): add return type to csv's parse and remove a parse func from args Dec 23, 2021
@kt3k kt3k merged commit 373ecdf into denoland:main Jan 20, 2022
@bx2
Copy link

bx2 commented Apr 17, 2022

Hi! This change renders the point of having column options pointless. We do allow to change column names (which is transforming the data) but we do not allow to parse the given column to, for instance, change the type of the field on the fly. I will give you an example - say you are parsing a file with monetary values passed in as micros:

YearsExperience,Salary
10, 23345213000

To get the decimal value you would divide the Salary field by 1000000. With the parse added per column, you can do all the basic transformations while loading the CSV, something like what pandas in python does (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html). Currently, you cannot do that anymore.

Also, having column options just to rename the fields but being required to always specify ALL the fields seems somewhat contrary what you are trying to accomplish.

Now, while I do understand that the basic parser should be just that and only parse, but then let's skip the entire concept of column options completely and we should update the docs for it which still claims that it can parse rows. I can prep a MR for that if you wish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std/encoding/csv.ts - Improve parse return type
4 participants