## The unix system

* Prepending commands with !
* Files, directories and paths
* AWS, virtual machines and AMI's
* Bash scripts (just to know what they are)
* grep, wc, sort
* .bashrc, environmental variables, paths.

In [1]:
%cd ~

/home/orysya


## listing files
* The `ls` command lists the files in the current directory
* `ls -F` identifies each files as directory, executable, text
* `ls -lrta`: 
    * `l` (long) lists extensive information about each file
    * `rt` (reverse time): list files in reverse order in time.
    * `a` show hidden files (whose names start with .)

In [2]:
%cd ~
!ls

/home/orysya
anaconda2  Documents  examples.desktop	Pictures  Templates
Desktop    Downloads  Music		Public	  Videos


In [3]:
!ls -F

anaconda2/  Documents/	examples.desktop  Pictures/  Templates/
Desktop/    Downloads/	Music/		  Public/    Videos/


In [4]:
!ls -lrta

total 132
drwxr-xr-x  3 root   root   4096 Oct 13 11:10 ..
-rw-r--r--  1 orysya orysya  655 Oct 13 11:10 .profile
-rw-r--r--  1 orysya orysya 8980 Oct 13 11:10 examples.desktop
-rw-r--r--  1 orysya orysya  220 Oct 13 11:10 .bash_logout
-rw-r--r--  1 orysya orysya   25 Oct 13 11:16 .dmrc
drwxr-xr-x  2 orysya orysya 4096 Oct 13 11:16 Desktop
drwxr-xr-x  2 orysya orysya 4096 Oct 13 11:16 Videos
drwxr-xr-x  2 orysya orysya 4096 Oct 13 11:16 Templates
drwxr-xr-x  2 orysya orysya 4096 Oct 13 11:16 Public
drwxr-xr-x  2 orysya orysya 4096 Oct 13 11:16 Pictures
drwxr-xr-x  2 orysya orysya 4096 Oct 13 11:16 Music
drwx------  3 orysya orysya 4096 Oct 13 11:16 .local
drwx------ 14 orysya orysya 4096 Oct 13 11:28 .config
-rw-------  1 orysya orysya 2202 Oct 13 15:32 .xsession-errors.old
-rw-------  1 orysya orysya   51 Oct 13 15:32 .Xauthority
-rw-------  1 orysya orysya   82 Oct 13 15:32 .xsession-errors
drwx------  3 orysya orysya 4096 Oct 13 15:32 .gnupg
-rw-------  1 orysya or

In [9]:
## Find out type of file
!file anaconda2/bin/*

anaconda2/bin/2to3:                                         a /home/orysya/anaconda2/bin/python script, ASCII text executable
anaconda2/bin/activate:                                     Bourne-Again shell script, ASCII text executable
anaconda2/bin/activate-global-python-argcomplete:           a /home/orysya/anaconda2/bin/python script, ASCII text executable
anaconda2/bin/anaconda:                                     a /home/orysya/anaconda2/bin/python script, ASCII text executable
anaconda2/bin/anaconda-navigator:                           a /home/orysya/anaconda2/bin/python script, ASCII text executable
anaconda2/bin/asadmin:                                      a /home/orysya/anaconda2/bin/python script, ASCII text executable
anaconda2/bin/assistant-qt4:                                ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.9, stripped
anaconda2/bin/binstar:                                   

## Navigating file paths

In [6]:
# pwd identifies the current working directory
!pwd

/home/orysya


In [10]:
# /home/ubuntu is the home directory of the user "ubuntu" that is - you!
!ls -lrt /home/ubuntu

ls: cannot access '/home/ubuntu': No such file or directory


In [14]:
# A shorthand for the home directory of the current user is "~"
%cd ~

/home/orysya


In [16]:
# !cd causes unix to move to a different directory, but leaves python behind
!cd /home/orysya/Documents/
!pwd

/home/orysya


In [18]:
# To actually change the directory use the magic %cd, which actually moves us and reports where we landed
%cd ~/Documents/DSE/DSE200-notebooks/

/home/orysya/Documents/DSE/DSE200-notebooks


Some useful shorthands for navigating directories
* **~** home directory of current user
* **~john** home directory of the user "john"
* **.** the current directory
* **..** the parent directory of the current directory.


In [21]:
%cd ~/Documents/DSE/DSE200-notebooks/data
print "!ls -F"
!ls -F
print "!ls ../"
!ls ../

/home/orysya/Documents/DSE/DSE200-notebooks/data
!ls -F
economics/  SQL/	     ThinkStatsData/	 weather/
NLTK/	    thinkstatsdata/  twitter-diversity/
!ls ../
data			      day_3_models_and_learning    README.md
day_1_python_and_unix	      day_4_OOP_and_matplotlib
day_2_github_io_numpy_pandas  day_5_mining_the_Social_web


### Symbolic links

In [22]:
# "scripts" and "data" are not actual directories, rather they are symbolic links to directories
%cd ~/Documents/DSE/DSE200-notebooks/data
!ls *

/home/orysya/Documents/DSE/DSE200-notebooks/data
economics:
AAA.csv  CPILFESL.csv  GS10.csv  INDPRO.csv
BAA.csv  GDPC1.csv     GS1.csv	 UNRATE.csv

NLTK:
BrownNews.txt  Chopped

SQL:
data.csv  usa_lat_long.csv

thinkstatsdata:

ThinkStatsData:
2002FemPreg.dat  binaryFile  pickle0.pkl  pickle.pkl	picklez2.pkl
2002FemResp.dat  json.jsn    pickle1.pkl  picklez0.pkl	survey.pl
2002Male.dat	 json.pkl    pickle2.pkl  picklez1.pkl	survey.tmp

twitter-diversity:
README.md  tag-volume.csv  total-volume.csv

weather:
ALL.corrupted.csv  ghcnd-stations_buffered.txt	SAMPLE_TMAX.csv
ALL.head.csv	   ghcnd-stations.txt		TMAX.10.csv
data-source.txt    ghcnd-version.txt		TMAX.head.csv
ghcnd-readme.txt   make_covar.raw_256


In [25]:
# Symbolic links are created by the command "ln -s" 
# Here we create a link from the home directory to the directory DSE200/data/NLTK/Chopped
%cd ~
!ln -s Documents/DSE/DSE200-notebooks/data/NLTK/Chopped minced
!ls -l minced
# the unix command "cat" prints out the contents of a file.
!cat minced/F0

/home/orysya
ln: failed to create symbolic link 'minced': File exists
lrwxrwxrwx 1 orysya orysya 24 Oct 13 17:46 minced -> DSE200/data/NLTK/Chopped
cat: minced/F0: No such file or directory


In [26]:
%cd minced/
!pwd

[Errno 2] No such file or directory: 'minced/'
/home/orysya
/home/orysya


### Creating files and directories

In [27]:
# to create a directory, use `mkdir`
!mkdir ~/tmp
%cd ~/tmp

/home/orysya/tmp


In [28]:
# to create a file or update the time-stamp of the file use `touch`
for i in range(10):
    !touch file$i
!ls -l
#path into operating system from python uses $

total 0
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file0
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file1
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file2
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file3
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file4
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file5
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file6
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file7
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file8
-rw-rw-r-- 1 orysya orysya 0 Oct 13 17:48 file9


### Moving and copying files

In [31]:
%cd ~/Documents/DSE/DSE200-notebooks/data/NLTK/Chopped/
!ls

/home/orysya/Documents/DSE/DSE200-notebooks/data/NLTK/Chopped
Caesar.txt  F120  F141	F164  F187  F209  F231	F42  F64  F85
F0	    F121  F142	F165  F189  F210  F232	F43  F65  F86
F1	    F122  F144	F166  F19   F211  F233	F44  F66  F87
F10	    F123  F145	F167  F190  F212  F24	F45  F67  F88
F100	    F124  F146	F168  F191  F213  F25	F46  F68  F9
F101	    F125  F147	F169  F192  F214  F26	F48  F69  F90
F102	    F126  F148	F170  F193  F215  F27	F49  F7   F91
F104	    F127  F149	F171  F194  F216  F28	F5   F70  F93
F106	    F128  F15	F172  F195  F218  F29	F50  F71  F94
F107	    F129  F151	F173  F196  F219  F3	F51  F72  F95
F108	    F13   F152	F174  F197  F22   F30	F52  F73  F96
F109	    F130  F153	F175  F198  F220  F31	F53  F74  F97
F11	    F131  F154	F176  F199  F221  F32	F54  F75  F99
F110	    F132  F155	F177  F2    F222  F33	F55  F76  kings.txt
F111	    F133  F156	F178  F20   F223  F34	F56  F77  Leonidas.txt
F112	    F134  F157	F179  F200  F224  F35	F57  F78  reconstructedF.txt

In [32]:
%cd ~/DSE200/DSE200-notebooks-2015/data/
#check out what was missed here

[Errno 2] No such file or directory: '/home/orysya/DSE200/DSE200-notebooks-2015/data/'
/home/orysya/Documents/DSE/DSE200-notebooks/data/NLTK/Chopped


In [33]:
!cat F138
#shows contents of the file

  982 1006 act `` to have these la


In [39]:
# cp copies a file to a new location, maintaining the original copy
!mkdir tmp
!cp F87 tmp   # copy a file to a new location, maintaining the name
!cp F87 tmp/newname # copy a file to a new location + name
!ls -l F87 tmp

-rwxr--r-- 1 orysya orysya   20 Sep 30 16:08 F87

tmp:
total 8
-rwxr--r-- 1 orysya orysya 20 Oct 13 17:51 F87
-rwxr--r-- 1 orysya orysya 20 Oct 13 17:51 newname


In [40]:
# you can also copy a whole directory and all it's subdirectories
!cp -r tmp newtmp
# -r is for recursive copy

In [41]:
# mv moves a file, or a whole directory, to a new location or a new name.
# it just manipulates pointers, so it is much faster than copy. (similar to 
# the difference between deep and shallow copy in python)
!mv newtmp tmp # move directory to a new location
!ls tmp/*

tmp/F87  tmp/newname

tmp/newtmp:
F87  newname


In [42]:
# mv file or directory to a new name (=rename)
!mv tmp/newname tmp/newername
!ls tmp/*

tmp/F87  tmp/newername

tmp/newtmp:
F87  newname


In [38]:
#cleanup
!rm -r tmp

### Removing files and directories

In [43]:
# to remove a file use the command `rm`
for i in range(1,10,2):
    !rm file$i
!ls -l

rm: cannot remove 'file1': No such file or directory
rm: cannot remove 'file3': No such file or directory
rm: cannot remove 'file5': No such file or directory
rm: cannot remove 'file7': No such file or directory
rm: cannot remove 'file9': No such file or directory
total 900
-rw-rw-rw- 1 orysya orysya  501 Oct 13 17:18 Caesar.txt
-rwxr--r-- 1 orysya orysya   24 Sep 30 16:08 F0
-rwxr--r-- 1 orysya orysya   15 Sep 30 16:08 F1
-rwxr--r-- 1 orysya orysya   22 Sep 30 16:08 F10
-rwxr--r-- 1 orysya orysya   14 Sep 30 16:08 F100
-rwxr--r-- 1 orysya orysya   16 Sep 30 16:08 F101
-rwxr--r-- 1 orysya orysya   25 Sep 30 16:08 F102
-rwxr--r-- 1 orysya orysya   20 Sep 30 16:08 F104
-rwxr--r-- 1 orysya orysya   28 Sep 30 16:08 F106
-rwxr--r-- 1 orysya orysya   14 Sep 30 16:08 F107
-rwxr--r-- 1 orysya orysya   16 Sep 30 16:08 F108
-rwxr--r-- 1 orysya orysya   19 Sep 30 16:08 F109
-rwxr--r-- 1 orysya orysya   20 Sep 30 16:08 F11
-rwxr--r-- 1 orysya orysya   14 Sep 30 16:08 F110
-rwxr--r-- 1 orysya orysy

In [None]:
# to remove an empry directory, use 'rmdir'
# If you want to remove a directory and everything that is in it use `rm -rf`. Note that this
# is an irreversible action, it is NOT like moving a file to the trash bin.
%cd ~
!rm -rf tmp
!ls -l tmp

### Groups and Unix File Permissions

It is often the case that a file should not be readable/writeable by all users of a machine (IE private data, system configuration).  
To enforce this there are a number of file properties which UNIX enforces.

Each user falls into one of three relationships with the file:

* **Owner** - The user who created the file and is able to modify permissions
* **Group** - The user is in a the user group assigned to the file (we won't talk about this much)
* **World** - Everybody else

Each file has three permissions for each of these user sets:

* **Read**  - The ability to view the file's contents
* **Write** - The ability to modify the file
* **Excecute** - The ability to run the file (if it is a script or program).  

Since there are three user sets and three permissions, there are 9 distinct true/false permissions which can be granted.  Thus each file has 9 bits to define these permissions.

#### Viewing permissions

To view permissions of a file use the -l option for ls

In [44]:
#First we create some files:
#Disregard the chmod command for now
!mkdir examples
%cd examples
!touch NoPermissions
!chmod 000 NoPermissions #user, group, world for each 0 
!touch AllRead
!chmod 444 AllRead 
!touch FullPermission
!chmod 777 FullPermission 
!touch OwnerOnly
!chmod 700 OwnerOnly 
!touch GroupOnly
!chmod 070 GroupOnly 
!touch WorldOnly
!chmod 007 WorldOnly 

#Now we list the permissions of the files
!ls -l

#Return to old working directory
%cd ../


/home/orysya/Documents/DSE/DSE200-notebooks/data/NLTK/Chopped/examples
total 0
-r--r--r-- 1 orysya orysya 0 Oct 13 17:52 AllRead
-rwxrwxrwx 1 orysya orysya 0 Oct 13 17:52 FullPermission
----rwx--- 1 orysya orysya 0 Oct 13 17:52 GroupOnly
---------- 1 orysya orysya 0 Oct 13 17:52 NoPermissions
-rwx------ 1 orysya orysya 0 Oct 13 17:52 OwnerOnly
-------rwx 1 orysya orysya 0 Oct 13 17:52 WorldOnly
/home/orysya/Documents/DSE/DSE200-notebooks/data/NLTK/Chopped


In the first column of the output you see dashes for ungranted permissions and letters (r, w, or x) for granted permissions.  

Lets break this down:

| Bit | Definition |
|---|------------------|
| 1 | Sticky Bit\* |
| 2 | Owner Read |
| 3 | Owner Write |
| 4 | Owner Execute |
| 5 | Group Read |
| 6 | Group Write |
| 7 | Group Execute |
| 8 | World Read |
| 9 | World Write |
| 10| World Execute |

\* The sticky bit is a special permission we won't be going into

Since you have three groups of three binary permissions, a common way to refer to permissions is via the octal representation of bits 2-10.  This yields a 3 digit octal number with the left most digit being owner permissions, middle digit group permissions, and right most bit the world permissions.

For example: -r--rw---x translates to 461 in octal

#### Changing Permissions

To change permissions, the owner of a file can use the command *chmod*.  The main use case is you specify the octal code of your desired permissions followed by the file name.  For examples of this look at the code we used to create the files above.

## manual pages

In [45]:
%man ls

## Exploring the computer

In [46]:
# Find out which version of Ubuntu you are running 
!lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.1 LTS
Release:	16.04
Codename:	xenial


In [47]:
# find out about the hardware
!cat /proc/cpuinfo

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
stepping	: 4
microcode	: 0x21
cpu MHz		: 2201.000
cache size	: 3072 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch epb fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap xsaveopt dtherm ida arat pln pts
bugs		:
bogomips	: 4402.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 42 bits physical, 48 bits virtual
power management:



In [48]:
# find out how much memory you are using
!free -m

              total        used        free      shared  buff/cache   available
Mem:            975         712          69          15         193          66
Swap:          1021         598         423


In [49]:
# find out how much disk space you are using
!df -h

Filesystem      Size  Used Avail Use% Mounted on
udev            469M     0  469M   0% /dev
tmpfs            98M  8.8M   89M   9% /run
/dev/sda1        19G  7.7G   10G  44% /
tmpfs           488M  216K  488M   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           488M     0  488M   0% /sys/fs/cgroup
/dev/sr0         71M   71M     0 100% /media/orysya/VMware Tools
tmpfs            98M  4.0K   98M   1% /run/user/108
tmpfs            98M   92K   98M   1% /run/user/1000


In [50]:
!pwd

/home/orysya/Documents/DSE/DSE200-notebooks/data/NLTK/Chopped


In [51]:
!ls -a .ipynb_checkpoints/
#to find the checkpoints taken
#can revert to checkpoints as needed

ls: cannot access '.ipynb_checkpoints/': No such file or directory


In [53]:
#find out which directories consume most of this disk space
%cd ~
!du -s * | sort -n
#Can also code nrt for sort by number in reverse time

/home/orysya
0	minced
4	Desktop
4	Music
4	Pictures
4	Public
4	Templates
4	tmp
4	Videos
12	examples.desktop
154640	Documents
408396	Downloads
1636880	anaconda2


In [54]:
# based on what we see here, we check the directory anaconda
!du -s anaconda2/*

1248	anaconda2/Examples
8	anaconda2/LICENSE.txt
60056	anaconda2/bin
2816	anaconda2/conda-meta
4	anaconda2/envs
52	anaconda2/etc
26056	anaconda2/include
1502604	anaconda2/lib
41804	anaconda2/pkgs
2204	anaconda2/share
8	anaconda2/ssl
12	anaconda2/var


## analyzing data
head, tail, more, grep, wc, sort, cut (awk)

In [56]:
%cd ~/Documents/DSE/DSE200-notebooks/data/ThinkStatsData/
!ls

/home/orysya/Documents/DSE/DSE200-notebooks/data/ThinkStatsData
2002FemPreg.dat  binaryFile  pickle0.pkl  pickle.pkl	picklez2.pkl
2002FemResp.dat  json.jsn    pickle1.pkl  picklez0.pkl	survey.pl
2002Male.dat	 json.pkl    pickle2.pkl  picklez1.pkl	survey.tmp


In [None]:
# print the number of lines, words and characters in each file
!wc *
#useful when loading files

In [7]:
# print the first 2 lines of a file
!head -2 2002FemPreg.dat

head: cannot open `2002FemPreg.dat' for reading: No such file or directory


In [8]:
## This list of tuples defines the names and locations of the elements.
fields=[
    ('caseid', 1, 12, int),
    ('nbrnaliv', 22, 22, int),
    ('babysex', 56, 56, int),
    ('birthwgt_lb', 57, 58, int),
    ('birthwgt_oz', 59, 60, int),
    ('prglength', 275, 276, int),
    ('outcome', 277, 277, int),
    ('birthord', 278, 279, int),
    ('agepreg', 284, 287, int),
    ('finalwgt', 423, 440, float),
]

In [9]:
## Lets transform it into a dictionary whose keys are the names of the field
fields_dict={name:(f,t,typ) for (name,f,t,typ) in fields}
fields_dict

{'agepreg': (284, 287, int),
 'babysex': (56, 56, int),
 'birthord': (278, 279, int),
 'birthwgt_lb': (57, 58, int),
 'birthwgt_oz': (59, 60, int),
 'caseid': (1, 12, int),
 'finalwgt': (423, 440, float),
 'nbrnaliv': (22, 22, int),
 'outcome': (277, 277, int),
 'prglength': (275, 276, int)}

In [10]:
# print the lines that contain a particular string
string='3116'
!grep $string 2002FemPreg.dat
#grep -> goes through line by line, if finds string then will print the line

grep: 2002FemPreg.dat: No such file or directory


In [11]:
#suppose we just want to know how many lines have this string inside them.
# this is our first use of pipes
#the output from grep serves as the input to wc
!wc 2002FemPreg.dat
!grep $string 2002FemPreg.dat | wc

wc: 2002FemPreg.dat: No such file or directory


      0       0       0


grep: 2002FemPreg.dat: No such file or directory


In [None]:
#cut is a command that cuts specific fields from from each line 
%man cut

In [12]:
# Extract from each line a specific field
field='babysex'
(fr,to,typ)=fields_dict[field]
Range=str(fr-1)+'-'+str(to)
print field,fr,to,Range
!cut -c $Range 2002FemPreg.dat | head -5

babysex 56 56 55-56


cut: 2002FemPreg.dat: No such file or directory


In [13]:
# lets sort these lines numerically, and look at the end, also known as the tail
!cut -c $Range 2002FemPreg.dat | sort -n | tail

cut: 2002FemPreg.dat: No such file or directory
-nThe system cannot find the file specified.



In [14]:
#count the number of times each value appears using uniq
!cut -c $Range 2002FemPreg.dat | sort -n | uniq -c

-nThe system cannot find the file specified.

cut: 2002FemPreg.dat: No such file or directory


In [None]:
# do the same thing but using an intermediary file
!cut -c $Range 2002FemPreg.dat > cut$Range
print 'head of cut',Range
!head cut$Range
!ls
print 'output from uniq'
!cat cut$Range | sort -n | uniq -c

## Environment variables

Environment variables are strings that define the set up of the session. Environment variables allow the user to avoid
retyping the same parameters over and over.

In [None]:
#view all of the currently defined environment variable
%env

In [15]:
#view the valuse of a particular variables 
!echo 'hello world!'
!echo $HOME $USER
#prints out values

'hello world!'
$HOME $USER


In [None]:
#The $ symbol is required as a prefix of the variable names
#When used inside ipython, this $ can be used to refer to any currently defined variable
i=25
!echo $i

In [16]:
#Particularly important are environment variables called "paths"
!env | grep -i path

PSMODULEPATH=C:\Users\Orysya\Documents\WindowsPowerShell\Modules;C:\Program Files (x86)\WindowsPowerShell\Modules;C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules\;C:\Program Files (x86)\Microsoft SQL Server\120\Tools\PowerShell\Modules\;c:\Program Files (x86)\Microsoft SQL Server\110\Tools\PowerShell\Modules\
HOMEPATH=\Users\Orysya
PATH=C:\Users\Orysya\Anaconda\Library\bin;C:\Users\Orysya\Anaconda\Library\bin;C:\Users\Orysya\AppData\Local\GitHub\PortableGit_284a859b0e6deba86edc624fef1e4db2aa8241a9\cmd;C:\Users\Orysya\AppData\Local\GitHub\PortableGit_284a859b0e6deba86edc624fef1e4db2aa8241a9\usr\bin;C:\Users\Orysya\AppData\Local\GitHub\PortableGit_284a859b0e6deba86edc624fef1e4db2aa8241a9\usr\share\git-tfs;C:\Users\Orysya\AppData\Local\Apps\2.0\TR9BQ6QV.4Y1\RO553A3D.HRE\gith..tion_317444273a93ac29_0003.0003_92b520eb113e6614;C:\Users\Orysya\AppData\Local\GitHub\lfs-amd64_1.3.1;C:\ProgramData\Oracle\Java\javapath;C:\Program Files (x86)\Intel\iCLS Client\;C:\Program Files\Intel\iCLS Clien

In [17]:
# The path defines where the system will look for commands and in what order.
# PATH tells the unix shell (bash) where to find the executables corresponding to commands
# while PYTHONPATH tells python from where to `import` packages.
# lets see where unix finds the command "sort"
!which sort
# Check on the variable PATH and you will see that /usr/bin is on it.

C:\WINDOWS\system32\sort.EXE


In [18]:
# You can also find all of the places along the path that have a definition relevant to sort
!whereis sort
# The last one is the manual page for sort which you can view using the command %man sort

C:\Windows\System32\sort.exe


#### Excercise 
find where the location of the commands `python`,`ipython` and `mail`

In [21]:
!which python

C:\Users\Orysya\Anaconda\python.EXE


## Wildcards and glob

We have seen the most used wild-card `*`, which matches any sequence of (non blank) characters.
For example `B*.py` will match any filename that starts with `B` and ends with `.py`.

Other useful wildcards are:

wild card | Description
--------|--------------------------------------------------------   
    `*`   |  An asterisk matches any number of characters in a filename, including none.
    `?`   |  The question mark matches any single character.
    `[ ]` |  Brackets enclose a set of characters, any one of which may match a single character at that position.
    `-`   |  A hyphen used within [ ] denotes a range of characters.
    `~`   |  A tilde at the beginning of a word expands to the name of your home directory.  If you append another user's login name to the character, it refers to that user's home directory.
    
**Here are some examples:**

1. **cat c* ** lists the contect of any file whose name begins with c including the file c, if it exists.
1. **ls *.c ** lists all files that have a .c extension.
1. **cp ../rmt?. ** copies every file in the parent directory that is four characters long and begins with rmt to the working directory. (The names will remain the same.)
1. **ls rmt[34567] ** lists every file that begins with rmt and has a 3, 4, 5, 6, or 7 at the end.
1. **ls rmt[3-7] ** does exactly the same thing as the previous example.
1. **ls ~ ** lists your home directory.
1. **ls ~hessen ** lists the home directory of the guy1 with the user id hessen.

#### within python, use glob()

You get the same functionality as wildcards by using the function `glob`, but instead of getting the result printed out, you get it as a list of strings.

In [26]:
!ls -d li* # do not descent into directories
from glob import glob
L=glob('*')
L
#will work outside of ipython notebook as well

ls: li*: Invalid argument
ls: #: No such file or directory
ls: do: No such file or directory
ls: not: No such file or directory
ls: descent: No such file or directory
ls: into: No such file or directory
ls: directories: No such file or directory


['1_pregnancy_length_analysis.ipynb',
 '2_python_basics.ipynb',
 '3_python_basics_excercises.ipynb',
 '4_unix,_sys,_and_os.ipynb',
 '5_unix_exercises.ipynb',
 '8queen.jpg',
 'LockFile.md']

## Loading and saving files

It is often useful to load short files into the notebook, alter them, and save them back into the file system. The magics `%load` and `%%writefile` are used to do that.


In [27]:
!ls

1_pregnancy_length_analysis.ipynb
2_python_basics.ipynb
3_python_basics_excercises.ipynb
4_unix,_sys,_and_os.ipynb
5_unix_exercises.ipynb
8queen.jpg
LockFile.md


In [28]:
# the magic %load, unlike using !cat, creates a new cell that can be executed inside the notebook
%load survey.pl
#can load code into ipython

ValueError: 'survey.pl' was not found in history, as a file, url, nor in the user namespace.

## Processes

A process is a sequence of commands that are executed in sequence, one after the other.
By using "time sharing" a single CPU can compute many processes at the same time, frequently switching from one process to the next. In a multi-core machine, there are several CPUs and so even more processes can execute at the same time.

In [29]:
# You can find out the current processes on your system using the command "top"
# without flags, the command will open a window that will constantly update and that also
# allows you to quit (or kill) processes. Here we use the flags to specify that top should only run once.
!top -b -n 1
#run top in terminal to determine what programs are running

'top' is not recognized as an internal or external command,
operable program or batch file.


## Pipes
We used pipes above to communicate between two or more unix commands.
We now discuss this in more detail.

Unix processes have three default input and output channels
* **stdin** the standard input channel - by default - the keyboard
* **stdout** the standard output channel - by default - the terminal.
* **stderr** the standard error channel - by default - the terminal.

Channels can be used to connect programs to each other and to connect programs and files. This is called **I/O redirection**.

Connecting a standard channel to files is done using the following symbols
(in bash, which is the standard shell in both ubuntu and os-x).

| command      | result |
|--------------|-----------------------------------------------------|
| < filename   | Redirect stdin to read from the file "filename" |
| > filename  | Redirect stdout to file "filename." |
| >>filename  | Redirect and append stdout to file "filename." |
| 1>filename   | Redirect stdout to file "filename." |
| 1>>filename  | Redirect and append stdout to file "filename." |
| 2>filename   | Redirect stderr to file "filename." |
| 2>>filename  | Redirect and append stderr to file "filename." |
| &>filename   | Redirect both stdout and stderr to file "filename." |


In [None]:
# One of the most basic unix command is cat
%cd ~
!cat < .bash_logout
#cat copies the standard input to standard output

In [None]:
# echo is another basic command, it pipes the string it gets as a parameter to std-out
# To create a file with some specific line we can use
!echo "MAS-DSE is the best" | cat > "MAS-FILE"
!ls MAS*
!cat MAS-FILE
!rm MAS-FILE

In [None]:
# some time we want to supress the error messages.
# To do that we redirect stderr to a fictitious file called /dev/null
# In addition, we can take the output and sort it (numerically, in reverse) according to the size
!du -s /Users/Orysya/Documents/DSE/*  | sort -nr
#dev/null sends to trash
#shows subdirectories under DSE

## Interacting with external programs through pipes

In [31]:
#the command top gives us a snapshot of the currently running processes
!top -l 1 -n 10

'top' is not recognized as an internal or external command,
operable program or batch file.


In [32]:
# We can run the program from within a python script (using ! requires running inside ipython)

# here we use python to find those processes that take a non-zero part of the memory.
import subprocess

output = subprocess.check_output(['top', '-l', '1', '-n','1'])
print 'Have %d bytes in output' % len(output)

lines=output.splitlines() # break output into lines
len(lines)
for line in lines:
    percent = line[47:50]
    try:
        p=float(percent)
    except:
        continue
    if p>0:
        print line,

WindowsError: [Error 2] The system cannot find the file specified

In [None]:
# If we use top in it's default non-batch form, the program crashes
output = subprocess.check_output(['top'])
print 'Have %d bytes in output' % len(output)


In [None]:
# it runs fine with !
!top

In [None]:
# what we need to do is run the process 'top' in the background - in parallel to the ipython session
# and print what the program outputs to stdout as it becomes available

# The slight problem is that this code has a bug and it does not work right, can you make it work?

import select
from time import sleep

def dataWaiting(source):
    " Check if data is waiting to be read "
    return select.select([source], [], [], 0) == ([source], [], [])

proc = subprocess.Popen(['top'], 
                        stdout=subprocess.PIPE,
                        shell=False
                        )

while True:
    sleep(0.1)
    try:
        for line in proc.stdout.readline():
            print line
    except:
        print 'exception'
        
#while True:
#    sleep(0.1)
#    if dataWaiting(proc.stdout):
#        print 'Data Available'
#        line=proc.stdout.readline()
#        print line,
#    else:
#        print 'not available'
