Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-detect file format better when importing BED or TSV bookmarks #4362

Merged
merged 2 commits into from
Apr 29, 2024

Conversation

cmdcolin
Copy link
Collaborator

this is an alternative to #4360 that retains the TSV format, but tries to make it more explicit on import what file type is being imported. this adds extra UI steps for the user, but (hopefully) the explicitness can potentially avoid issues

I came around to the idea of retaining the TSV because it is indeed a 'bulk format' that exports everything to a single file in a way that BED file does not

@carolinebridge
Copy link
Contributor

carolinebridge commented Apr 29, 2024

Presently, neither import is working for me on this branch. No errors in the console

@cmdcolin
Copy link
Collaborator Author

what do you think about the conceptual aspect of the PR? making the user explicitly choose file type?

@carolinebridge
Copy link
Contributor

Honestly, don't really think it's necessary; you hardly ever see file acceptance like this anymore...if we want more explicit file support we can check the extension on the supplied file and apply the separate functions to a TSV vs BED, and should probably give the user an error message if they give a bad file or if something went wrong.

The only difference between the TSV and BED functions is how they handle the assemblies -- I can see a desire to more cleanly handle these for code readability, but this could be done with an if statement.

I don't think there's a ton of confusion around the file differences, since they come from our system.

@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Apr 29, 2024

The only difference between the TSV and BED functions is how they handle the assemblies -- I can see a desire to more cleanly handle these for code readability, but this could be done with an if statement.

this is sort of why i proposed the PR, because i think this is something that we can't necessarily handle it with an if statement.

the current code automatically imports column 5 as the assembly name, regardless of whether it is tsv or bed. bed is a generic format that exists here https://genome.ucsc.edu/FAQ/FAQformat.html#format1 and if a user imported a bed file not generated by jbrowse, then that column might be score, or some other thing. to me, this means that we should make the user explicitly say they are importing bed, instead of automatically importing column 5 as the assembly name

@carolinebridge
Copy link
Contributor

Perhaps I am not understanding fully --

  1. How is the user specifying their filetype via radio button any different than checking the file extension?
  2. How does specifying the filetype assure their BED is formatted "correctly" ?

@cmdcolin
Copy link
Collaborator Author

  1. It makes it explicit. We could certainly infer BED/TSV from the file extension (file extension isn't always reliable or even exists in the case of urls sometimes, but it can be good most of the time). but, the main thing is just by making the code path different for TSV, we just make it clear that column 5 is assemblyName only for TSV

  2. This PR does not check that it is formatted correctly, but by at least offering the choice, it avoids assuming column 5 is assemblyName. currently, column 5 is assumed to be assemblyName in both the BED and TSV case and this PR separates this so that BED never assumes assemblyName is a column. Note: Ensuring well formatted would be a good added step to look into, especially because when there is an error, it can crash the session even on reload due to the reloading of corrupted bookmarks from localstorage

@cmdcolin
Copy link
Collaborator Author

from on call discussion we'll try to keep the process of auto-detecting the file type from contents instead of explicitly asking user

@cmdcolin cmdcolin merged commit bdb492d into main Apr 29, 2024
10 checks passed
@cmdcolin cmdcolin deleted the new_import_form branch April 29, 2024 17:45
@cmdcolin cmdcolin changed the title Make user explicitly select if they are importing BED or TSV bookmarks Auto-detect file format better when importing BED or TSV bookmarks Apr 29, 2024
@cmdcolin cmdcolin added the bug Something isn't working label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants