Skip to content

Commit

Permalink
working on csvslice doc
Browse files Browse the repository at this point in the history
  • Loading branch information
dannguyen committed Nov 9, 2020
1 parent e45b003 commit ef9813a
Show file tree
Hide file tree
Showing 7 changed files with 492 additions and 46 deletions.
6 changes: 4 additions & 2 deletions TODOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@

### csvslice

- [ ] refresh usage memory
- [ ] start on docs
- do docs; this should be style and template for others
- [ ] write usage examples
- [ ] write "compared to"
- [ ] mention performance issues


### csvsed
Expand Down
21 changes: 12 additions & 9 deletions csvmedkit/moreutils/csvslice.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def add_arguments(self):
)

def calculate_slice_ranges(self) -> typeNoReturn:
# TODO: this is ugly spaghetti but it works
self.slice_ranges: typeList[typeSequence]
self.slice_lower_bound: typeUnion[
int, float
Expand All @@ -54,18 +55,20 @@ def calculate_slice_ranges(self) -> typeNoReturn:
# just a regular index
indexes.append(int(rtxt))
else:
i_start, i_end = [int(i) if i else 0 for i in rtxt.split("-")]
if i_start and i_end:
if i_end <= i_start:
# todo: this could be really cleaned up
i_start, i_end = rtxt.split("-")
i_start = int(i_start)
i_end = None if i_end == "" else int(i_end)
if i_end is None:
# implicitly, there is an i_start, even if it's 0
# interpret '9-' as '9 and everything bigger'
self.slice_lower_bound = min(self.slice_lower_bound, i_start)
else:
# implicitly, i_start and i_end are both there
if i_end < i_start:
raise InvalidRange(f"Invalid range specified: {rtxt}")
else:
intervals.append(range(i_start, i_end + 1))
elif i_end:
# interpret '-42' as: everything from 0 to 42
intervals.append(range(0, i_end + 1))
elif i_start:
# interpret '9-' as '9 and everything bigger'
self.slice_lower_bound = min(self.slice_lower_bound, i_start)

self.slice_ranges = [sorted(indexes)] + intervals

Expand Down
193 changes: 180 additions & 13 deletions docs/moreutils/csvslice.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,27 @@
csvslice
********

:command:`csvslice` TKTK
:command:`csvslice` is a command for selecting rows by 0-based index and/or inclusive ranges.


For example, given ``data.csv`` that contains::

id,val
a,0
b,1
c,2
d,3


.. code-block:: shell
$ csvslice -i 0,2-3 data.csv
id,val
a,0
c,2
d,3
TK TK TK Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
.. contents:: Table of contents
Expand All @@ -14,8 +32,10 @@ TK TK TK Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusm



Usage reference
===============
Options and flags
=================

``csvslice`` has only one unique and required option: ``-i/--intervals``


-i, --intervals <intervals>
Expand All @@ -39,19 +59,166 @@ Multiple interval values can be passed into ``-i/--intervals``, e.g.
Note: this is a required option
.. note:: This is a required option.




Usage overview and examples
===========================


These examples refer to data as found in :download:`ids.csv </../examples/ids.csv>`


Slicing individual rows by index
--------------------------------

You can specify rows to be sliced by 0-based index:


.. code-block:: sh
csvslice -i 1 ids.csv
id,val
1,b
You can also specify a series of individual indexes as a comma-delimited string:


.. code-block:: sh
csvslice -i 0,5 ids.csv
id,val
0,a
5,f
Slicing rows by range
---------------------

Rows can be specified by using a range syntax: ``start-end``

The range is *inclusive*:


.. code-block:: sh
$ csvslice -i 1-3 ids.csv
id,val
1,b
2,c
3,d
Omitting the right-side *end* value returns an open range of values:

.. code-block:: sh
$ csvslice -i 3- ids.csv
id,val
3,d
4,e
5,f
Like indexes, a series of ranges can be specified as a comma-delimited string:


.. code-block:: sh
$ csvslice -i 0-1,3- ids.csv
id,val
0,a
1,b
3,d
4,e
5,f
And you can combine ranges with individual indexes:

.. code-block:: sh
$ csvslice -i 0,2-3,5 ids.csv
id,val
0,a
2,c
3,d
5,f
Errors and quirks
-----------------


Even though ``3-1`` and is technically a valid range, ``csvslice`` will throw an error if the ``end`` value is smaller than the ``start`` value::

.. code-block:: sh
$ csvslice -i 3-1 examples/ids.csv
InvalidRange: Invalid range specified: 3-1
For the most part, though, ``csvslice`` will allow the user to pass in a messy or otherwise nonsensical value for ``-i/--intervals``.

No matter what order you specify the indexes and ranges, it will always return rows in sequential order::

.. code-block:: sh
$ csvslice -i 4,0,2 ids.csv
id,val
0,a
2,c
4,e
.. code-block:: sh
$ csvslice -i 4,0-2,3 ids.csv
id,val
0,a
1,b
2,c
3,d
4,e
If you pass in repeated indexes and/or overlapping ranges, ``csvslice`` will still only return the original, sequential data, i.e. it will *not* return duplicates of rows:

.. code-block:: sh
$ csvslice -i 3,1,3,1,1 ids.csv
id,val
1,b
3,d
.. code-block:: sh
$ csvslice -i 1,0-2,1-3 ids.csv
id,val
0,a
1,b
2,c
3,d
.. TODOTODO
.. And references to non-existent row indexes are also ignored:
High level overview
===================
.. .. code-block:: sh
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
.. csvslice -i 5,42 ids.csv
.. id,val
.. 5,f
Expand Down
7 changes: 7 additions & 0 deletions examples/ids.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
id,val
0,a
1,b
2,c
3,d
4,e
5,f
1 change: 0 additions & 1 deletion tests/cmk/test_aggy.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@


class TestParseAggyString(TestCase):

def test_basic(self):
ag = Aggy.parse_aggy_string("count")
assert isinstance(ag, Aggy)
Expand Down
73 changes: 64 additions & 9 deletions tests/cmk/test_mixed_arg_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,18 +132,73 @@ def test_opts_then_args(self, opts):

class MixedParams(MixCase):
"""when the options and args are all over the place!!!"""

mixed_params = [
[ "-F", "1", "A", "-c", "3,2", DEFAULT_PATH, ],
[ "-c", "3,2", "1", "A", DEFAULT_PATH, "-F", ],
[ '1', "-c", "3,2", "A", DEFAULT_PATH, "-F", ],
[ "1", "-c", "3,2", "A", "-F", DEFAULT_PATH, ],
[ "1", "-c", "3,2", "-F", "A", DEFAULT_PATH, ],
[ "1", "-F", "A", DEFAULT_PATH, "-c", "3,2", ],
[ "1", "A", "-c", "3,2", DEFAULT_PATH, "-F", ],
[ "1", "A", "-Fc", "3,2", DEFAULT_PATH, ],
[
"-F",
"1",
"A",
"-c",
"3,2",
DEFAULT_PATH,
],
[
"-c",
"3,2",
"1",
"A",
DEFAULT_PATH,
"-F",
],
[
"1",
"-c",
"3,2",
"A",
DEFAULT_PATH,
"-F",
],
[
"1",
"-c",
"3,2",
"A",
"-F",
DEFAULT_PATH,
],
[
"1",
"-c",
"3,2",
"-F",
"A",
DEFAULT_PATH,
],
[
"1",
"-F",
"A",
DEFAULT_PATH,
"-c",
"3,2",
],
[
"1",
"A",
"-c",
"3,2",
DEFAULT_PATH,
"-F",
],
[
"1",
"A",
"-Fc",
"3,2",
DEFAULT_PATH,
],
]


@parameterized.expand(mixed_params)
def test_input_file_path(self, *params):
"""when input_file is an actual file path/object"""
Expand Down

0 comments on commit ef9813a

Please sign in to comment.