Like "head" or "tail" but picks 10 random lines
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


                 like "head" or "tail" but picks 10 random lines

sampl is a command-line tool for randomly picking a number of lines from
large data files. By default, it picks 10 lines like the Unix commands
"head" and "tail".

sampl is useful in order to get an idea of what's in a data file without
worrying whether the beginning of the file is representative of the rest.

sampl is fast because it doesn't scan the entire file(s) but simply picks
lines found at random positions.


Requires a standard installation of OCaml.

 $ make
 $ make install  # Installation directory defaults to $HOME/bin.

PREFIX and BINDIR are supported, so if you want to install sampl in /usr/local,
just do:

 $ sudo make PREFIX=/usr/local install


 $ make uninstall


"sampl" is typically used as replacement for "head" on large data files
whose first records are often not representative of typical lines.

$ ./sampl --help
Usage: ./sampl [OPTIONS] FILE1 [FILE2 ...]

sampl picks 10 random lines from the input files without scanning 
the entire file.


          Keep outputting random lines indefinitely instead of stopping
          after 10 lines. See also -n.
          Specify sample size, i.e. the number of lines to pick.
          The default is 10 lines. See also -i.
          Specify seed of the random number generator.
          By default, the seed is initialized randomly from system resources.
          Print sampl version and exit.
  -help  Display this list of options
  --help  Display this list of options

Algorithm for picking a random line

1. A random byte position is chosen in the input files,
   just as if the files were concatenated as one big file.

2. The position is moved forward to the nearest beginning of line.

3. The line is read and printed to stdout.

This algorithm gives more chances to lines that follow long lines.
Therefore, lines are picked really "randomly" only if the length of a
line is statistically independent from any property of the line
that follows.

Also, the same line can be picked several times over the execution of the