Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Selecting Arbitrary Nth Column in Syntax #17
I was working on the mostly trivial case of fasta-index format (faidx) and I think because it was so simple I found a very nice way to select columns by the order in which they appear. The only requirement right now is that it is in a tab-delimited file.
What it does is match the first column until the first tab, scopes it, then pushes to
The third column is then selected, scoped and pushed to
etc... This push-pop back and forth with tabs can be repeated for N number of columns which means that .bed, .bedpe, .gtf, .sam, and possibly some of .vcf can now be 'solved' since we know what type of data is supposed to be in the Nth column.
Can anyone think of a reason that this won't work or will break at some edge-case?
If not, we'll need to re-work those syntaxes as I think this is a more robust approach then trying to select each column by the data range which could be there.
This is really cool.
I think it looks pretty robust as it is. Though, would it work for files with >5 columns? Might need to do some figuring out on how to encode the 5th column if there is a 6th column. Maybe something along the lines of '(?<=\t[\S]\t)[\S]\t'?
Even simpler version with an open-ended scope for all columns greater then 5.
Robust Nth Column Selection
I think this same logic could be applied for gedit and Vim syntax as well. There is a Match Start // Match End logic which can be extended in this way. I would say if we figure this out soon we'll simplify our lives greatly.
Maybe read some syntax highlighting files for other complex langauges (C / XML etc...) to learn how other people solved similar problems.
I'd say let's not worry 100% about all the color schemes just yet. This was based off of bioMonokai for Sublime which is dark background. Gedit is based off of Kate and is light background so it might not work. The third column is simply the default 'numeric' color, fourth + fifth are comment colored.
We're going to have to formalize all the colors and/or set one dark one light theme to make the same for all the different programs. We can worry about this last; now we need the syntax files to work reliably for all the different software as the highest priority.