## curl
https://www.geeksforgeeks.org/curl-command-in-linux-with-examples/

## wget

# There are many other useful command line utilities.  Some common ones that you may find useful are detailed below.
https://www.kdnuggets.com/2018/03/top-12-essential-command-line-tools-data-scientists.html
https://www.kdnuggets.com/2022/06/20-basic-linux-commands-data-science-beginners.html

## cut
Cut allows us to slice out contents from each line of a file. The default delimiter is tab character, but we can specify another delimiter with -d option.  The -f option allows for selecting specific fields.

In [None]:
%%bash

curl -o 'google_ratings.csv' 'https://archive.ics.uci.edu/ml/machine-learning-databases/00485/google_review_ratings.csv'

In [None]:
!cat google_ratings.csv

In [None]:
!cut -d "," -f 9 google_ratings.csv

In [None]:
# If we want to pipe these data to another program, we might want to drop the header.
!awk "NR>1" google_ratings.csv | cut -d "," -f9

In [None]:
%%bash
# we can also ignore the first row by using -n +NUM where NUM is the row we'll begin outputting on.
tail -n +2 google_ratings.csv | cut -d "," -f 9

## tr
The tr command can be used to translate and delete charactes.  Some supported options include squeezing repeated characters, deletion and replacement. The -s (squeeze) option is particularly useful when trying to cut columns from fixed-length files.

https://linuxhint.com/bash_tr_command/

In [None]:
%%bash

curl -o 'tucson_daily.txt' 'https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2022/CRND0103-2022-AZ_Tucson_11_W.txt'


In [None]:
!head -n 5 tucson_daily.txt | cat

In [None]:
# the first couple of fields are single whitespace delimited, but as we move through the features
# and encounter variable length values, then our delimiters will be variable numbers of whitespace characters.
!cut -d ' ' -f 6 tucson_daily.txt

In [None]:
!tr -s " " < tucson_daily.txt | cut -d " " -f 6

In [None]:
# If we quickly scroll through the isolated column above, we'll see a null value indicator (-9999.0). 
# Let's remove this from our vector using awk and the !(not) operator.

!tr -s " " < tucson_daily.txt | cut -d " " -f 6 | awk "!/-9999.0/"

## wc
With no options, wc returns line count, word count and character count

In [None]:
!wc hotel.csv

In [None]:
!wc -w hotel.csv

In [None]:
!wc -l hotel.csv

#### -L option for longest length line

In [None]:
!wc -L hotel.csv

#### How many files are currently in the working directory


In [None]:
! ls | wc -l

## unzip

In [None]:
%%bash

curl -O 'https://nces.ed.gov/ipeds/datacenter/data/HD2021.zip'
ls *.zip

In [None]:
!unzip HD2021.zip

In [None]:
!wc -l hd2021.csv