Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
cor
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ascii_plots

These are just some silly scripts which I like to have on my command line when I'm doing quick and dirty data analysis and can't be bothered to start R. They all receive the data by piping, typically downstream of awk, cut...

They all handle non-numeric data as NA.

author

Daniel Zerbino

Use

For a quick demo, run:

> sh demo.sh

cor - correlation:

Takes in stdin a file with two columns, print out Pearson correlation.

> cut -f1,2 test.tsv | ./cor
0.987425

summary:

Takes in stdin a tab delimited data file with or without headers (anything numeric is assumed to be data, anything else NA) and prints out basic stats on each column (position, header (or first value), min, mean, max, sum)

> cat test.tsv | ./summary
COL |           1 |          2 |          3 |          4 |
NAM |           A |          B |          C |          D |
MIN |           7 |          5 |          9 |          0 |
AVG |           3 |       1.75 |       3.75 |          0 |
MAX |           7 |          5 |          9 |          0 |
SUM |          12 |          7 |         15 |          0 |

hist - histogram:

Either:

  • Takes in a single column of numbers, displays histogram
  • Takes in a double column of numbers, and displays a weighted histogram of the data, assuming the first column are values and the second column weights.

The size of the bins is 1 by default, but can be specified as an option

    > awk 'func r(){return sqrt(-2*log(rand()))*cos(6.2831853*rand())}BEGIN{for(i=0;i<10000;i++)s=s"\n"0.5*r();print s}' | ./hist 0.1
      -1.8 |     0.0001 |          1 | 
      -1.6 |     0.0004 |          4 | 
      -1.5 |     0.0005 |          5 | 
      -1.4 |     0.0007 |          7 | 
      -1.3 |     0.0018 |         18 | 
      -1.2 |      0.004 |         40 | **
      -1.1 |     0.0058 |         58 | **
        -1 |     0.0085 |         85 | ****
      -0.9 |     0.0126 |        126 | ******
      -0.8 |     0.0197 |        197 | **********
      -0.7 |     0.0285 |        285 | **************
      -0.6 |     0.0349 |        349 | *****************
      -0.5 |     0.0422 |        422 | *********************
      -0.4 |     0.0532 |        532 | ***************************
      -0.3 |     0.0634 |        634 | ********************************
      -0.2 |     0.0681 |        681 | **********************************
      -0.1 |     0.0756 |        756 | **************************************
         0 |     0.1557 |       1557 | ********************************************************************************
       0.1 |     0.0743 |        743 | **************************************
       0.2 |     0.0698 |        698 | ***********************************
       0.3 |     0.0628 |        628 | ********************************
       0.4 |     0.0546 |        546 | ****************************
       0.5 |      0.042 |        420 | *********************
       0.6 |     0.0351 |        351 | ******************
       0.7 |     0.0252 |        252 | ************
       0.8 |     0.0208 |        208 | **********
       0.9 |      0.014 |        140 | *******
         1 |     0.0104 |        104 | *****
       1.1 |     0.0065 |         65 | ***
       1.2 |     0.0035 |         35 | *
       1.3 |      0.002 |         20 | *
       1.4 |     0.0014 |         14 | 
       1.5 |     0.0009 |          9 | 
       1.6 |     0.0005 |          5 | 
       1.7 |     0.0001 |          1 | 
       1.8 |     0.0001 |          1 | 
       1.9 |     0.0002 |          2 | 
         2 |     0.0001 |          1 | 
TOTAL      |          1 |      10000 |

bars:

Like histogram, but for categorical data:

> cut -f1 test.tsv | ./bars

	 1.0 |       0.25 |          1 | ********************************************************************************
	 4.0 |       0.25 |          1 | ********************************************************************************
	 7.0 |       0.25 |          1 | ********************************************************************************
	   A |       0.25 |          1 | ********************************************************************************
TOTAL            |          1 |          4 |

scatter:

Takes in a double column of numbers, and displays a sketchy ascii density plot.

> awk 'func r(){return sqrt(-2*log(rand()))*cos(6.2831853*rand())}BEGIN{for(i=0;i<10000;i++)s=s"\n"0.5*r()"\t"0.5*r();print s}' | ./scatter
---------------------------------------------------------------------------------------------------------------------- 2.00418
|                                       '              '                                                             |
|                                                    '                                                               |
|                                               '                     '                                              |
|                              '       '         '                                                  '                |
|                                          '  '     '   `  ''        `'   '    '                                     |
|                                ' '  '  '   ' '    ' `    '     '      '     '                                      |
|                          '     '     '''  '' '   '    ' ''          '  ' '             '                           |
|                              '     '    '' '   '   ,`' ,'`' ` '''   ` `' '  '            ''   '                    |
|                     '      '' ` `' ' '`'' '' '  `'   ''``''`''`'';' `,  ''  ''    `' '                             |
|                         ,     ' '''''  ````'' '`,`!' ,,;` `;`'> ``'```'``,'`  '`' `'   ' `                         |
|                    '   '' '''`,'`; ' '``',`````;!!,,;'`;,;''! ,;,!,;'';'`, '` `''  `        '    '                 |
|                   '    '    ,'`'`,,`,;',``,;;,`!~!;{,,!'!!!,!>!;`!~,,;'`,';`''`'' '''` ' `   '              '      |
|                 ' `'   ` `' ',;, `' ',`!~`!; !']~{-!{~~]>;!!-!{!;';!!~;;;' !'`;`','''``'     `  '                  |
|                  `''    '''';'',;,;'>>)>,)~;-{]|~-j~]t~)]{t~)~!->]-!>!;,!`>`,`, ,;;,'`',''                         |
|          '   ''' ''',' `, `!``;;;,~]];!!>]!)-{]vt|vj]n-~-{|j,)>-~n!]~{~!!>'>!;`!`,`` , ` '`   `        '           |
|               '' ' `' '`'',`;;;~!-{`~~|>{~{]v>{|)XX|v-~{otnjCtC)v;{tX)]->;)>>,`!],`;;  `;` ; '`      '      '      |
|          '  '  '`''  , '`',`!!,!)>!-~{{j|,j|-0njC||0U0CoXn|]o-tUjC{|U)|>-]->{~{!!,;' !`  ,`'  ''  ' ' ' ' '        |
|             '  `  `  ` !;- ;>,-;|>{~>!{]|-U]j]XUU00kjXkj00)|jtXjjttnv]n-`]{;{!~!~!-!';,''`'  '' ' ` '              |
|   '       '   '   ' ;,``'`'`,!>!]])-|t{t|{-U{)|ntCtCnkvkvqCXZUC&Z~0||)-!|{{|;>),]!~; ,`,`';'  ''' ''      '        |
|      ` '   '   ,`'`''``;'`', ,;{)]{{tn)]]{UvdjCZv#-jtC0tZtC|UZ0jUUUC{{0]|!>]{>~~~~{!;`,; ';;;' ' '  '             '|
|            '  '  ' ,''``;`~,!];;>!;!|-)]CZvt{UXZqCC$ZnZ|$nokXCUUkjXt]--X|t])>`]!->``;;`>;`'' ''                '   |
|      ''   '`''''' '', `,'>!)!!->)]|))|UtX)|tnvt{|nZvqU0nqdC0#{v)Uqnt|{--t{)])!;,>`-,,;,';`''``   ' `               |
|         '      ,' ,'``,;'';!,>>]~~~>~]>]q)|0vnvjjCZvqvnqtX0n)qttvv{X)]t~|]j!),]' !;`,``'``'' ''''   '              |
|           '  ' '` ````!,``'>>'{;>~;!;~{|j!)]nZtXnj|U0Udtd0njXvjj){nn>]]){{`~;>>- ,;!, `,'`,'     '       '         |
|   '  '             '''`'` ;`>,>;,;-,~~`-)>]t~t|{-n)t{{tnU]jXUv]n-~],;-t;~;!{!~>,`;,`,`;`'` '` `''      ''          |
|'        '    `'     '' '',,`,`;~;,,,;!-|~-tj-])v!>|]t--j)>Uv]>-~]~!;;!,~-,>'!',,'``''' ',`' ` ' '     '  '         |
|              '  `   `'>   ',;```!;~;|!~;~->>-,]]~>-;~]))]!>!-`-)-,]-{~,;,`;`',',`'; ,`,'`''  '   '                 |
|                     ' ' `''' ''`'',,,;;!!-,{`-];>~-,>-~~>;{!)];`;--,>`;!,;`;;`'`; ` ''''   '      `                |
|                  '        ' ' '''` `` `';`';`,;>!,!~!~,~-;>~;!!``!,>!`',!`,`'`,, '' ,', ''''                       |
|                '          '''  ' `'`; ''`;`;``> ';>;,!>'''>!>'`;;;;` `'' `' '`''     '           '                 |
|         '                `  ``'  `'`  ' '''`!`;`!'`,`'` ``;;'!` `! ,'`;',` '' ' '                                  |
|           '             ' ' ''   ' `  ,' ` `', ,'`''';'`'``''' ''''```'     `, '' `''   '                          |
|                            '     ''   ' ''   ' `' ''' ` ' `', '''' ' ` '   '''                '                    |
|                       ''               ' `   ' `   `'   ' '  `'      '         ' '                                 |
|                        '        '   '      , '   ' '     '             '           '                               |
|                                    '              ''       ' '      `                 '                            |
|                                                         '  '              '                                        |
|                                                                 '                                                  |
---------------------------------------------------------------------------------------------------------------------- -1.7106
-1.826500                                                                                                     1.910550

curve:

Draws a curve from a single column of numbers [NOTE: requires scatter to be in the same directory]

> awk 'BEGIN{for(i=0;i<100;i++)s=s"\n"sin(i/10);print s}' | ./curve 
---------------------------------------------------------------------------------------------------------------------- 0.999574
|               $$$$$$ $                                                                 $$$$$ $                     |
|             $         $                                                             $ $       $$                   |
|           $$           $                                                           $            $                  |
|                         $                                                         $              $                 |
|          $               $                                                                        $                |
|         $                 $                                                      $                                 |
|        $                                                                        $                   $              |
|                             $                                                  $                     $             |
|      $                       $                                                                                     |
|     $                                                                        $                        $            |
|                               $                                                                        $           |
|    $                                                                        $                                      |
|                                $                                           $                            $          |
|   $                                                                                                                |
|  $                              $                                         $                              $         |
|                                  $                                                                                 |
| $                                                                        $                                 $       |
|                                    $                                                                               |
|$                                                                        $                                   $      |
|                                     $                                                                        $     |
|                                                                        $                                           |
|                                      $                               $                                        $    |
|                                                                                                                    |
|                                       $                             $                                          $   |
|                                        $                                                                           |
|                                                                    $                                            $  |
|                                         $                                                                         $|
|                                                                   $                                                |
|                                          $                       $                                                 |
|                                            $                                                                       |
|                                                                 $                                                  |
|                                             $                 $                                                    |
|                                              $               $                                                     |
|                                               $             $                                                      |
|                                                $           $                                                       |
|                                                 $         $                                                        |
|                                                   $$$ $$ $                                                         |
|                                                      $                                                             |
---------------------------------------------------------------------------------------------------------------------- -0.999923
2.000000                                                                                                    101.000000

column_descriptions:

Extracts the header and a number of sample values from each column:

> cat test.tsv | ./column_descriptions
1	A	3 sampled numerical values: 4.000000 ± 2.449490 (total = 12.000000)
2	B	5, NA, 2
3	C	NA, 6, 9
4	D	NA

About

Convenience function for quick and dirty data analysis

Resources

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •