https://www.ibm.com/developerworks/ru/library/l-awk1/index.html

- seq.fasta as an input 
- `'{ print }'`command  without any parameter takes all line  
- `{ } ` are used to group text blocks
- so, print took file line by line and passed it to the stdout

In [13]:
%%bash
#!bin/bash

awk '{ print }' seq.fasta

>seq1
aaaaaaaaaaa
>seq2
ccccccccccc
>seq3
ttttttttttt
>seq4
ggggggggggg


- the same we can do by :

In [6]:
%%bash
#!bin/bash

cat seq.fasta

>seq1
aaaaaaaaaaa
>seq2
ccccccccccc
>seq3
ttttttttttt
>seq4
ggggggggggg


- variable `$0` in `awk` is the whole line
- that's why `{print}` and `{print $0}` give the same result
- **KEEP IN MIND!** quot signs around curved brackets must be ordinary

In [12]:
%%bash
#!bin/bash

awk '{ print }' seq.fasta
echo "========================"
awk '{ print $0 }' seq.fasta

>seq1
aaaaaaaaaaa
>seq2
ccccccccccc
>seq3
ttttttttttt
>seq4
ggggggggggg
>seq1
aaaaaaaaaaa
>seq2
ccccccccccc
>seq3
ttttttttttt
>seq4
ggggggggggg


- actually `awk` executes the script in curved brackets for each line of the text
- so, if we give `'{ print }'` empty string, it'll print it for every text line

In [14]:
%%bash
#!bin/bash

awk '{ print "" }' seq.fasta











In [15]:
%%bash
#!bin/bash

awk '{ print "executing for each line" }' seq.fasta

executing for each line
executing for each line
executing for each line
executing for each line
executing for each line
executing for each line
executing for each line
executing for each line


### multiple fields 

- `-F` denotes field sep
- `$1` denotes first field from fasta file

In [28]:
%%bash
#!bin/bash

awk -F"," '{ print $1 }' test.csv

PassengerId
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1

- to address multiple fields at once use multiple parameters `$` 

In [33]:
%%bash
#!bin/bash

awk -F"," '{ print $1 $10 }' test.csv

PassengerIdCabin
8927.8292
8937
8949.6875
8958.6625
89612.2875
8979.225
8987.6292
89929
9007.2292
90124.15
9027.8958
90326
90482.2667
90526
90661.175
90727.7208
90812.35
9097.225
9107.925
9117.225
91259.4
9133.1708
91431.6833
91561.3792
916262.375
91714.5
91861.9792
9197.225
92030.5
92121.6792
92226
92331.5
92420.575
92523.45
92657.75
9277.2292
9288.05
9298.6625
9309.5
93156.4958
93213.4167
93326.55
9347.85
93513
93652.5542
9377.925
93829.7
9397.75
94076.2917
94115.9
94260
94315.0333
94423
945263
94615.5792
94729.125
9487.8958
9497.65
95016.1
951262.375
9527.8958
95313.5
9547.75
9557.725
956262.375
95721
9587.8792
95942.4
96028.5375
961263
9627.75
9637.8958
9647.925
96527.7208
966211.5
967211.5
9688.05
96925.7
97013
9717.75
97215.2458
973221.7792
97426
9757.8958
97610.7083
97714.4542
9787.8792
9798.05
9807.75
98123
98213.9
9837.775
98452
9858.05
98626
9877.7958
98878.85
9897.925
9907.8542
9918.05
99255.4417
99326
9947.75
9957.775
9968.5167
99722.525
9987.8208
9997.75
10008.7125
100113


- but we have a problem : there's no sepator between fields
- to solve it do the following :

In [35]:
%%bash
#!bin/bash

awk -F"," '{ print $1 " " $10 }' test.csv



PassengerId Cabin
892 7.8292
893 7
894 9.6875
895 8.6625
896 12.2875
897 9.225
898 7.6292
899 29
900 7.2292
901 24.15
902 7.8958
903 26
904 82.2667
905 26
906 61.175
907 27.7208
908 12.35
909 7.225
910 7.925
911 7.225
912 59.4
913 3.1708
914 31.6833
915 61.3792
916 262.375
917 14.5
918 61.9792
919 7.225
920 30.5
921 21.6792
922 26
923 31.5
924 20.575
925 23.45
926 57.75
927 7.2292
928 8.05
929 8.6625
930 9.5
931 56.4958
932 13.4167
933 26.55
934 7.85
935 13
936 52.5542
937 7.925
938 29.7
939 7.75
940 76.2917
941 15.9
942 60
943 15.0333
944 23
945 263
946 15.5792
947 29.125
948 7.8958
949 7.65
950 16.1
951 262.375
952 7.8958
953 13.5
954 7.75
955 7.725
956 262.375
957 21
958 7.8792
959 42.4
960 28.5375
961 263
962 7.75
963 7.8958
964 7.925
965 27.7208
966 211.5
967 211.5
968 8.05
969 25.7
970 13
971 7.75
972 15.2458
973 221.7792
974 26
975 7.8958
976 10.7083
977 14.4542
978 7.8792
979 8.05
980 7.75
981 23
982 13.9
983 7.775
984 52
985 8.05
986 26
987 7.7958
988 78.85
989 7.925
990 7.854

In [37]:
%%bash
#!bin/bash
 
awk -F","  '{ print "id : " $1 "\t\t" "in a cabin number : " $10 }' test.csv

id : PassengerId		in a cabin number : Cabin
id : 892		in a cabin number : 7.8292
id : 893		in a cabin number : 7
id : 894		in a cabin number : 9.6875
id : 895		in a cabin number : 8.6625
id : 896		in a cabin number : 12.2875
id : 897		in a cabin number : 9.225
id : 898		in a cabin number : 7.6292
id : 899		in a cabin number : 29
id : 900		in a cabin number : 7.2292
id : 901		in a cabin number : 24.15
id : 902		in a cabin number : 7.8958
id : 903		in a cabin number : 26
id : 904		in a cabin number : 82.2667
id : 905		in a cabin number : 26
id : 906		in a cabin number : 61.175
id : 907		in a cabin number : 27.7208
id : 908		in a cabin number : 12.35
id : 909		in a cabin number : 7.225
id : 910		in a cabin number : 7.925
id : 911		in a cabin number : 7.225
id : 912		in a cabin number : 59.4
id : 913		in a cabin number : 3.1708
id : 914		in a cabin number : 31.6833
id : 915		in a cabin number : 61.3792
id : 916		in a cabin number : 262.375
id : 917		in a cabin number : 14.5
id : 918		in a 

In [39]:
%%bash

awk -F"," '{ print $0  }' test.csv

PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47,1,0,363272,7,,S
894,2,"Myles, Mr. Thomas Francis",male,62,0,0,240276,9.6875,,Q
895,3,"Wirz, Mr. Albert",male,27,0,0,315154,8.6625,,S
896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22,1,1,3101298,12.2875,,S
897,3,"Svensson, Mr. Johan Cervin",male,14,0,0,7538,9.225,,S
898,3,"Connolly, Miss. Kate",female,30,0,0,330972,7.6292,,Q
899,2,"Caldwell, Mr. Albert Francis",male,26,1,1,248738,29,,S
900,3,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",female,18,0,0,2657,7.2292,,C
901,3,"Davies, Mr. John Samuel",male,21,2,0,A/4 48871,24.15,,S
902,3,"Ilieff, Mr. Ylio",male,,0,0,349220,7.8958,,S
903,1,"Jones, Mr. Charles Cresson",male,46,0,0,694,26,,S
904,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23,1,0,21228,82.2667,B45,S
905,2,"Howard, Mr. Benjamin",male,63,1,0,24065,26,,S
906,1,"Chaffe