# Introduction to the Linux command line

There's a huge amount of documentation on Linux programming (start e.g. [here](https://tldp.org/index.html)), this is an introduction to only a very small and specific subset.

The commands below are typically run in a terminal, which is actually the interpreter of a [shell](), such as [bash](), the default shell on most Linux distributions.

To run the commands inside this notebook, you need to install the bash kernel for jupyter, e.g. like this:
```
pip install bash-kernel
python -m bash_kernel.install
```

In [1]:
export LC_ALL=C

The first command sets an __environment variable__ to control the system __locale__ for this session, this ensures that your language and region settings do not interfere with how some tools such as __sort__ work in the examples below. __You need not understand this for now__, but these are still topics worth getting familiar with:
- [locale](https://en.wikipedia.org/wiki/Locale_(computer_software))
- [environment variables](https://en.wikipedia.org/wiki/Environment_variable)
- [sorting and locale](https://stackoverflow.com/questions/28881/why-doesnt-sort-sort-the-same-on-every-machine).

Let's look at the contents of the data directory

In [2]:
ls -l data

total 9492
-rw-rw-r-- 1 recski recski  597304 Aug 11 11:19 1984.txt
-rw-rw-r-- 1 recski recski  150503 Aug  4 17:32 alice.txt
-rw-rw-r-- 1 recski recski  163327 Aug 10 16:29 alice_de.txt
-rw-rw-r-- 1 recski recski  157697 Sep 14 11:48 alice_tok.txt
-rw-rw-r-- 1 recski recski    3355 Sep 14 11:48 ent_freqs.txt
-rw-rw-r-- 1 recski recski     936 Aug  5 10:53 foo
-rw-rw-r-- 1 recski recski  440318 Aug 11 10:18 lewis_carroll_wp.html
-rw-rw-r-- 1 recski recski     936 Sep 14 11:48 stopwords.txt
-rw-rw-r-- 1 recski recski 8188354 Sep 14 10:33 ta_restaurants_31EU.tsv


`ls` is the name of a program for listing directory contents, `-l` is an **option**. Most programs used in this notebook have dozens of options and you shouldn't expect to learn about all of them at once.

You can get a summary of what each command does and what paramteres it has using the `man` command

In [3]:
man ls

LS(1)                            User Commands                           LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort  is  speci-
       fied.

       Mandatory  arguments  to  long  options are mandatory for short options
       too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print C-style escapes for nongraphic characters

       --block-size=SIZE
              with  -l,  scale  sizes  by  SIZE  when  printing  them;   e.g.,
              '--block-size=M'; see SIZE format below

       -B, --ignore-backups
              do not list implied entries ending with ~

       -c     with -lt: s

For example, this command will list the directory contents by file size and also use human-readable numbers for file size:

In [4]:
ls -lSh data

total 9.3M
-rw-rw-r-- 1 recski recski 7.9M Sep 14 10:33 ta_restaurants_31EU.tsv
-rw-rw-r-- 1 recski recski 584K Aug 11 11:19 1984.txt
-rw-rw-r-- 1 recski recski 430K Aug 11 10:18 lewis_carroll_wp.html
-rw-rw-r-- 1 recski recski 160K Aug 10 16:29 alice_de.txt
-rw-rw-r-- 1 recski recski 155K Sep 14 11:48 alice_tok.txt
-rw-rw-r-- 1 recski recski 147K Aug  4 17:32 alice.txt
-rw-rw-r-- 1 recski recski 3.3K Sep 14 11:48 ent_freqs.txt
-rw-rw-r-- 1 recski recski  936 Aug  5 10:53 foo
-rw-rw-r-- 1 recski recski  936 Sep 14 11:48 stopwords.txt


### Day-to-day tasks with pipes

##### What jupyter processes am I running?

In [5]:
ps aux | grep `whoami` | grep jupyter

recski     46729  0.2  0.5 267072 92728 pts/3    Rl+  07:34   1:03 /home/recski/miniconda3/envs/nlp_course/bin/python /home/recski/miniconda3/envs/nlp_course/bin/[01;31m[Kjupyter[m[K-notebook
recski     62174  0.3 13.6 4397212 2201044 ?     Ssl  11:26   0:32 /home/recski/miniconda3/envs/nlp_course/bin/python -m ipykernel_launcher -f /home/recski/.local/share/[01;31m[Kjupyter[m[K/runtime/kernel-88975581-bb2a-469a-97b6-e7c0225d5d4e.json
recski     63248  0.0  0.3 470240 48732 ?        Ssl  11:48   0:01 /home/recski/miniconda3/envs/sandbox/bin/python -m bash_kernel -f /home/recski/.local/share/[01;31m[Kjupyter[m[K/runtime/kernel-96590846-4d7c-4dad-a5d0-73566eb400b4.json
recski     71657  101  2.4 1177976 392496 ?      Rsl  14:11   0:28 /home/recski/miniconda3/envs/nlp_course/bin/python -m ipykernel_launcher -f /home/recski/.local/share/[01;31m[Kjupyter[m[K/runtime/kernel-b912ddd5-d48d-4918-8d4f-a13dd653ae95.json
recski     71701 30.0  0.3 470244 48780 ?        Rsl  14:11  

##### What directories are using up all the disk space?

In [6]:
du -h --max-depth=1 | sort -h

40K	./external
124K	./.ipynb_checkpoints
588K	./media
9.3M	./data
11M	.


### Simple data processing with pipes

The file in `data/ta_restaurants_31EU.tsv` contains basic information about restaurants in 31 EU cities, from TripAdvisor. It is based on a [Kaggle dataset](https://www.kaggle.com/damienbeneschi/krakow-ta-restaurans-data-raw).

#### What is in the data?

In [7]:
cat data/ta_restaurants_31EU.tsv | head

# created from original CSV at https://www.kaggle.com/damienbeneschi/krakow-ta-restaurans-data-raw
# fields: Name, City, Cuisine Style, Rating, Price Range, Number of Reviews
Martine of Martine's Table	Amsterdam	French|Dutch|European	5.0	$$ - $$$	136.0
De Silveren Spiegel	Amsterdam	Dutch|European|Vegetarian Friendly|Gluten Free Options	4.5	$$$$	812.0
La Rive	Amsterdam	Mediterranean|French|International|European|Vegetarian Friendly|Vegan Options	4.5	$$$$	567.0
Vinkeles	Amsterdam	French|European|International|Contemporary|Vegetarian Friendly|Vegan Options|Gluten Free Options	5.0	$$$$	564.0
Librije's Zusje Amsterdam	Amsterdam	Dutch|European|International|Vegetarian Friendly|Vegan Options|Gluten Free Options	4.5	$$$$	316.0
Ciel Bleu Restaurant	Amsterdam	Contemporary|International|Vegetarian Friendly|Vegan Options|Gluten Free Options	4.5	$$$$	745.0
Zaza's	Amsterdam	French|International|Mediterranean|European|Vegetarian Friendly|Vegan Options|Gluten Free Options	4.5	$$ - $$$	1455.0
Blue Pepp

#### How many restaurants? How many in Vienna? How many in each city?

In [8]:
cat data/ta_restaurants_31EU.tsv | grep -v '^#' | wc -l

125523


In [9]:
cat data/ta_restaurants_31EU.tsv | grep -v '^#' | grep 'Vienna' | wc -l

3732


In [10]:
cat data/ta_restaurants_31EU.tsv | grep -v '^#' | cut -f2 | sort | uniq -c | sort -nr

  18211 London
  14874 Paris
   9543 Madrid
   8425 Barcelona
   7078 Berlin
   6687 Milan
   5948 Rome
   4859 Prague
   3986 Lisbon
   3724 Vienna
   3434 Amsterdam
   3204 Brussels
   3131 Hamburg
   2995 Munich
   2930 Lyon
   2705 Stockholm
   2605 Budapest
   2352 Warsaw
   2109 Copenhagen
   2082 Dublin
   1938 Athens
   1865 Edinburgh
   1667 Zurich
   1580 Oporto
   1572 Geneva
   1354 Krakow
   1227 Helsinki
   1213 Oslo
   1067 Bratislava
    657 Luxembourg
    501 Ljubljana


#### What are the top cuisines in the dataset?

In [11]:
cat data/ta_restaurants_31EU.tsv | grep -v '^#' | cut -f3 | tr '|' '\n' | sort | uniq -c | sort -nr

  32359 Vegetarian Friendly
  31350 unknown
  30226 European
  18427 Mediterranean
  17794 Italian
  13009 Vegan Options
  12120 Gluten Free Options
   9689 Bar
   9558 French
   9064 Asian
   8317 Pizza
   8220 Spanish
   7403 Pub
   7391 Cafe
   5111 Fast Food
   4836 British
   4807 International
   4483 Japanese
   4405 Seafood
   4160 Central European
   4052 American
   3690 Chinese
   3487 Sushi
   3296 Portuguese
   3117 Indian
   2339 Middle Eastern
   2300 Thai
   2259 Wine Bar
   2075 German
   1937 Czech
   1923 Greek
   1909 Healthy
   1868 Fusion
   1692 Steakhouse
   1676 Barbecue
   1672 Halal
   1653 Soups
   1619 Contemporary
   1592 Grill
   1574 Vietnamese
   1470 Eastern European
   1462 Gastropub
   1416 Turkish
   1387 Mexican
   1230 South American
   1183 Austrian
   1155 Delicatessen
   1122 Polish
   1093 Hungarian
   1038 Scandinavian
    989 Diner
    904 Lebanese
    888 Dutch
    877 Street Food
    857 Latin
    848 Belgian
    832 Irish
    744 Brew Pub

#### Which are the top-rated restaurants in Vienna?

In [12]:
cat data/ta_restaurants_31EU.tsv | grep -v '^#' | awk -F$'\t' '$2 == "Vienna"' | sort -t $'\t' -k4 -gr | head

wunderladen modecafe	Vienna	Cafe	5.0	$$ - $$$	13.0
s'Kellerstockl	Vienna	Wine Bar	5.0	$$ - $$$	4.0
restaurant Jiang	Vienna	Chinese	5.0	unknown	
repubblica del vino.vienna	Vienna	Italian	5.0	unknown	
kem’s Bar & Kitchenette	Vienna	Italian|Austrian|International|European|Middle Eastern	5.0	$$ - $$$	4.0
engels- die Bar	Vienna	Bar|Fast Food|Pub	5.0	$	8.0
daily Imbiss	Vienna	Indian|Asian|Vegetarian Friendly	5.0	$	42.0
Zypresse	Vienna	Turkish	5.0	$$ - $$$	11.0
Zuppa	Vienna	International|Mediterranean|Asian	5.0	$	3.0
Zum Suppentopf	Vienna	unknown	5.0	unknown	2.0
sort: write failed: 'standard output': Broken pipe
sort: write error


#### OK, how about top-rated cheap restaurants?

In [13]:
cat data/ta_restaurants_31EU.tsv | grep -v '^#' | awk -F$'\t' '$2 == "Vienna"'  | grep -v '\$\$' | sort -t $'\t' -k4 -gr | head

restaurant Jiang	Vienna	Chinese	5.0	unknown	
repubblica del vino.vienna	Vienna	Italian	5.0	unknown	
engels- die Bar	Vienna	Bar|Fast Food|Pub	5.0	$	8.0
daily Imbiss	Vienna	Indian|Asian|Vegetarian Friendly	5.0	$	42.0
Zuppa	Vienna	International|Mediterranean|Asian	5.0	$	3.0
Zum Suppentopf	Vienna	unknown	5.0	unknown	2.0
Zum Lieben Augustin	Vienna	unknown	5.0	unknown	4.0
Zum Johann	Vienna	Austrian|Cafe|European|Wine Bar	5.0	unknown	6.0
Zuckero	Vienna	unknown	5.0	unknown	7.0
Zuckerkringel eU	Vienna	unknown	5.0	unknown	2.0
sort: write failed: 'standard output': Broken pipe
sort: write error


#### How about top-rated, cheap, vegan restaurants?

In [14]:
cat data/ta_restaurants_31EU.tsv | grep -v '^#' | awk -F$'\t' '$2 == "Vienna"'  |  grep '\$' | grep -v '\$\$' | grep 'Vegan Options' | sort -t $'\t' -k4 -gr | head

Reformhaus Staudigl	Vienna	Healthy|Vegetarian Friendly|Vegan Options	5.0	$	8.0
Dem's Gourmet	Vienna	Indian|Asian|Vegetarian Friendly|Vegan Options|Halal	5.0	$	
Cha No Ma	Vienna	Japanese|Cafe|Asian|Vegetarian Friendly|Vegan Options	5.0	$	33.0
B.K Curry Indian Restaurant	Vienna	Indian|Asian|Vegetarian Friendly|Vegan Options|Halal	5.0	$	
neuDeli	Vienna	Indian|Mediterranean|Delicatessen|Vegetarian Friendly|Vegan Options	4.5	$	13.0
Wurstelstand Leo	Vienna	Street Food|German|Austrian|Fast Food|European|Vegan Options	4.5	$	57.0
Vietnam Bistro	Vienna	Asian|Vietnamese|Vegetarian Friendly|Vegan Options	4.5	$	57.0
Trzesniewski	Vienna	Austrian|European|Vegetarian Friendly|Vegan Options	4.5	$	1076.0
Tata Restaurant	Vienna	Asian|Vietnamese|Fusion|Vegetarian Friendly|Vegan Options	4.5	$	106.0
Shelanu	Vienna	Middle Eastern|Israeli|Vegetarian Friendly|Vegan Options|Kosher	4.5	$	2.0


#### Finally, filter those with less than 100 reviews

In [15]:
cat data/ta_restaurants_31EU.tsv | grep -v '^#' | awk -F$'\t' '$2 == "Vienna"'  |  grep '\$' | grep -v '\$\$' | grep 'Vegan Options' | awk -F$'\t' '$6 >= 100' | sort -t $'\t' -k4 -gr 

Trzesniewski	Vienna	Austrian|European|Vegetarian Friendly|Vegan Options	4.5	$	1076.0
Tata Restaurant	Vienna	Asian|Vietnamese|Fusion|Vegetarian Friendly|Vegan Options	4.5	$	106.0
Schillinger's Swing Kitchen	Vienna	American|Fast Food|European|Vegetarian Friendly|Vegan Options|Gluten Free Options	4.5	$	304.0
Schachtelwirt	Vienna	Austrian|Fast Food|European|Central European|Vegetarian Friendly|Vegan Options|Gluten Free Options	4.5	$	463.0
Kolar	Vienna	European|Vegetarian Friendly|Vegan Options	4.5	$	166.0
Blueorange	Vienna	Cafe|Vegetarian Friendly|Vegan Options|Gluten Free Options	4.5	$	116.0
Nguyen's Pho House	Vienna	Asian|Vietnamese|Soups|Vegetarian Friendly|Vegan Options|Gluten Free Options	4.0	$	204.0
Hungry Guy	Vienna	International|Street Food|Middle Eastern|Vegetarian Friendly|Vegan Options|Gluten Free Options	4.0	$	356.0
Der Wiener Deewan	Vienna	Indian|Asian|Pakistani|Vegetarian Friendly|Vegan Options|Halal	4.0	$	238.0
pizza bizi	Vienna	Italian|Pizza|Fast Food|Vegetarian Friendly|Ve