Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraneous spaces and other problems in CSV file #29

Closed
weaverba137 opened this issue Feb 28, 2020 · 5 comments
Closed

Extraneous spaces and other problems in CSV file #29

weaverba137 opened this issue Feb 28, 2020 · 5 comments

Comments

@weaverba137
Copy link
Member

When i save a CSV file, it contains many extra spaces. My understanding is that this is not standard CSV. The extraneous spaces could also cause problems when converting to other formats.

TargetID , ExpID , Spec version , Redrock version , Redrock spectype , Redrock z , VI scanner , VI class , VI issue , VI z , VI spectype , VI comment
288230398549299571 , -1 , 0 , 0 , STAR , 0.000 , BAW , 4 , -- , -- , -- , Comment
288230398549299488 , -1 , 0 , 0 , STAR , 0.000 , BAW , 4 , -- , -- , -- , Comment
288230398549299583 , -1 , 0 , 0 , STAR , -0.000 , BAW , 4 , -- , -- , -- , Foo
288230398549299522 , -1 , 0 , 0 , STAR , 0.000 , BAW , 4 , -- , -- , -- ,  bar
288230398549299554 , -1 , 0 , 0 , STAR , -0.000 , BAW , 4 , -- , -- , -- ,  baz
288230398549299531 , -1 , 0 , 0 , STAR , 0.000 , BAW , 1 , R , 0.1 , GALAXY ,  this is a test
288230398549299577 , -1 , 0 , 0 , STAR , 0.000 , BAW , 4 , -- , -- , -- ,  foo
288230398549299551 , -1 , 0 , 0 , STAR , 0.000 , BAW , 4 , -- , -- , -- ,  foo
288230398549299465 , -1 , 0 , 0 , STAR , -0.000 , BAW , 3 , -- , -- , -- , foo bar
288230398549299470 , -1 , 0 , 0 , STAR , 0.000 , BAW , 4 , -- , -- , -- ,  foo

Also note that the redshift (Redrock z) in this example is not stored with enough significant digits.

@armengau
Copy link
Collaborator

armengau commented Mar 9, 2020

@weaverba137 Thanks for these issues/suggestions, I did not changed the csv file format as of now as there are other priorities and people are currently VIing miniv data, but will soon.

About CSV file format, I add two other issues/suggestions from Anand:

  • maybe add deltaChi2 in the output CSV file
  • avoid names with spaces, e.g.: "Spec version" -> "Spec_version" (for example in topcat it makes life easier).

@armengau
Copy link
Collaborator

armengau commented May 8, 2020

Summary of possible modifications to VI csv file format, to be vetted by the VI leads:

  • rm extra spaces around comma separators: " , " --> ","
  • header: replace spaces in field names e.g.: "Spec version" -> "Spec_version" (Anand's suggestion, better for topcat)
  • rm "--" for fields with no record ? eg. "bla,,5.667," instead of "bla , -- , 5.667 , --"
  • VI comments: handle differently some special characters, to be more easily read. Right now " is recorded as "" and , is recorded as "," . I don't think there's a clear, unique csv rule for that.
    • See emails in desi-data
    • prohibit (remove) non-ascii characters, convert lambda and angstrom.
    • commas , could be simply replaced by semicolons ;
    • the full VI comment field could be placed into """ """ ?
  • Possible other fields to be included:
    • rr template version (not the same as rr version)
    • rr deltachi2 (Anand) (which one?: the absolute one or the one relative to same-template fit)
    • metadata: night, tile ; other ?

@weaverba137
Copy link
Member Author

This sounds good, but one other suggestion. Protecting individual commas with quotes is not standard, I don't think. Instead, the entire phrase or field containing the comma should be protected. For example, I enter this, then that, which would be rendered in the CSV file as ...,123.4,"this, then that",-9999.0,....

Also, I think quotation marks could be protected by backslashes, though this should be tested. I enter this looks "OK" to me, would be rendered as ...,"this looks \"OK\" to me",....

Are people opening these CSV files with anything other than Python? If not, using a more advanced form of CSV might be the way to go here, such as Astropy's ECSV.

armengau added a commit that referenced this issue May 21, 2020
…ved. New fields NIGHT TILEID Template_version. TARGETID renamed (also in widget). Fields in header have underscores. VI comments: ',' replaced automatically by ';'. Non ascii chars replaced by '?' except for angstroms, alpha beta gamma delta lambda (replaced automatically by plain ascii equivalents).
@armengau
Copy link
Collaborator

Changes implemented, currently in the 'disp-models' branch. From the git log:

  • Extra spaces and '--' removed.
  • New fields: NIGHT TILEID Template_version.
  • TARGETID renamed (also in widget).
  • Fields in header have underscores.
  • VI comments:
    - ',' replaced automatically by ';'.
    - Non ascii chars replaced by '?' except for angstroms, alpha beta gamma delta lambda (replaced automatically by plain ascii equivalents)."

@armengau
Copy link
Collaborator

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants