Very simple, somewhat stoned tokenizer for teaching purposes.
Current version 1.0 both for this repo and for the pip-installable version.
Behaviorally inspired by the early versions of the
easyinput module;
shares with it some similar aims, but not the aim of
conceptual consistency with C/C++. A separate, different
evolution of easyinput
is yogi.
The usual incantation should work: pip install pytokr
or,
in case you already have an earlier pytokr
,
pip install --upgrade pytokr
(maybe with either sudo
or --user
or within a
virtual environment).
If that does not work, download or clone the repo, then
put the pytokr
folder where Python can see it from
wherever you want to use it.
Finds items (simple tokens, white-space separated) in a string-based iterable such as stdin (default). Ends of line are counted as white space but are otherwise ignored.
Simplest usage is
from pytokr import item
Then call item()
to keep retrieving white-space-sparated
items from stdin
. In case no items remain, a custom
EndOfDataError
exception will be raised. Note that,
as white-space is ignored, including ends of line,
in case only white-space remains then the program is at
end of data. The outcomes are str
: casting them into
int
or float
or whatever, if convenient, falls upon
the caller.
Of course you can assign to the function a different name
at import
time by using a standard as
clause.
Alternatively, you may import an iterator on the whole
contents of stdin
:
from pytokr import items
It is most naturally employed in a for
loop:
for itm in items():
Then, the iterator gracefully stops at end of data and
does not raise the EndOfDataError
exception. Again the
renaming option applies, of course, and again ends of line
are ignored as white space.
In case you import both, they will interact naturally: the
individual item()
function can be called inside a for
loop on the iterator, provided there is still at least
one item not yet read. That call will advance the items;
so, the next item at the loop will be the current one after
the local advances. Briefly: both advance the same iterator.
Alternatively, import the function that creates the reading functions:
from pytokr import pytokr
Call then pytokr
to obtain the tokenizer function; give it
whatever name you see fit, say, item
:
item = pytokr()
If a different source of items is desired, say source
(e.g. a file
just open
'ed or a list of strings),
simply pass it on:
item = pytokr(source)
In either case, a second output can be requested, namely, an
iterator over the items, say you want to name it items
:
item, items = pytokr(iter = True)
(such a call would accept as well a source
as first parameter).
Then you can run for itm in items():
or make up a ls = list(items())
and, with some care, avoid the dependence on the EndOfDataError
exception. Both combine naturally as explained above.
Also from pytokr import __version__
works as expected.
Based on Jutge problem P29448 Correct Dates (and removing spoilers):
from pytokr import pytokr
item, items = pytokr(iter = True)
# alternative: from pytokr import item, items
for d in items():
m, y = item(), item()
if correct_date(int(d), int(m), int(y)):
print("Correct Date")
else:
print("Incorrect Date")
The import of item
and items
has gone through several
deprecation and undeprecation stages. They are currently
undeprecated and usable with normality. Please try to upgrade
to the most advanced version of pytokr
and check the descriptions above.
The function make_tokr
from earlier versions stays
deprecated. If employed on version 1.0 it will still work
but will print a deprecation message on stderr
.