# Working with Pipe

### Introduction

In this lesson, let's download and explore our code with piping.  Time to get started.

### Making Requests from the Shell

Now as we may know we can use `curl` to make a request from the command line.  For example, go to the command line and write `curl https://www.espn.com/`.

> We should see the contents of the website printed in our terminal.

Next try using `wget`.  First, if on a mac, install `wget` with `brew install wget` -- windows instructions [are here](https://builtvisible.com/download-your-website-with-wget/). Then in the terminal, write `wget` followed by the same url.

> In general, we'll use curl over `wget` as it has [slightly more functionality](https://daniel.haxx.se/docs/curl-vs-wget.html).

Ok, now let's use curl to download airbnb data from the following url, and instead of just displaying the content in the shell, save the data to `airbnb.csv` (use a redirect to do so).

In [1]:
url = "https://raw.githubusercontent.com/rajtulluri/Airbnb-Data-Exploratory-Analysis/master/AB_NYC_2019.csv"

Next, display the first two lines to ensure it was done correctly.

In [7]:
!head -n 2 airbnb.csv

# id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
# 2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365

id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365


And count the number of lines in the file.

In [10]:
!wc -l airbnb.csv

# 49081

   49081 airbnb.csv


Ok, so close to 50000 records.

Now, let's see count the number of lines where Brooklyn shows up.

In [13]:
!grep Brooklyn airbnb.csv | wc -l

# 20151

   20151


And the number of lines that Queens shows up.

In [21]:
!grep Queens airbnb.csv | wc -l

# 5675

    5675


Then take all of the matching lines of Brooklyn and place them in a file called `brooklyn_and_queens.csv`.

In [18]:
!grep Brooklyn airbnb.csv > brooklyn_and_queens.csv

And take a look at the first three lines of the new file to confirm that the contents are in fact of Brooklyn apartments.

In [19]:
!head -n 2 brooklyn_and_queens.csv

2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194


Now append in the apartments from Queens to the same csv file.

In [20]:
!grep Queens airbnb.csv >> brooklyn.csv

Now our csv file should be roughly 25700, as that is the sum of the number of lines with brooklyn and number of lines with queens that we saw earlier. Let's check the number of lines in brooklyn.csv file now.  

In [23]:
!wc -l brooklyn.csv

# 25826 brooklyn.csv

   25826 brooklyn.csv


Ok, looks close enough.  Finally, let's change the name of our file to say `brooklyn_and_queens.csv`.

### Bonus: A Deeper Dive on Sorting

Let's wrap up by taking a deeper look at sorting.  Now, if we want to *sort* our `airbnb.csv` file by say the name of the listing, how can we accomplish this?

In [31]:
!head -n 2 airbnb.csv 

id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365


Well the difficult part is that our name is not listed until the second column.  It turns out that we can pass sort the `-k` flag followed by the index of the column we wish to sort by.

In [37]:
!sort -k1 -t "," airbnb.csv | head -n 3


 12 mins Manhattan",213781715,Anting,Brooklyn,Greenpoint,40.73543,-73.95454,Private room,119,1,0,,,33,179
 15 MinTimes Square",57049951,Eliahu,Manhattan,Harlem,40.81316,-73.95176,Private room,69,2,2,2018-10-22,0.10,9,365
sort: Broken pipe


> We [can ignore broken pipe errors](https://stackoverflow.com/questions/46202653/bash-error-in-sort-sort-write-failed-standard-output-broken-pipe).

So let's unpack the above.  We specified a `-k1` to specify that we want to sort by the column at index 1 (the second column), and then we used a `-t` to specify that our columns were delimited by a ",".  

> By default sort assumes a sequence of blank characters as the delimiter.

Now let's try to sort by the price.  The price is located in the 10th column, or at index 9.  But this time there is a twist.  Because price is a number we need to tell linux to treat the field as a number.  We do so by changing the column flag to `-nk9`.

In [43]:
!sort -t "," -nk9 airbnb.csv | tail -n 5

walk 2 train",20951849,Elaine,Brooklyn,Bedford-Stuyvesant,40.69138,-73.93217,Entire home/apt,350,1,5,2019-01-01,0.49,2,5
Loft .",4399103,James,Manhattan,Greenwich Village,40.72768,-73.99917,Entire home/apt,410,31,40,2019-07-01,3.20,2,10
only 17min to Central Park",161351021,V,Queens,Woodside,40.75593,-73.90268,Private room,425,1,11,2018-03-01,0.58,2,0
Hell’s Kitchens",35303743,Patricia,Manhattan,Upper West Side,40.76835,-73.98367,Private room,6500,30,0,,,1,97
3038614,"Lovely, Huge, Open, Quiet, Spacious, Sunny, A/C",14892152,Laura,Brooklyn,Bedford-Stuyvesant,40.68596,-73.95837,Entire home/apt,155,25,3,2015-04-02,0.05,1,364


> The prices are after the room type (eg. `Entire home/apt`).  Notice that this comes close to sorting our records, but is thrown off by the commas in the title.

### Summary

In this lesson, we practiced combining shell commands with pipe and redirects.  We first used a redirect to curl data from a web address and store it in a file.  Then we used pipe to take the results from searching for Brooklyn or Queens, and counting the number of lines.  Finally we used redirect to add and append those results to a file.