Getting Data using CURL
-----------------------

We now move into a more interesting topic: How to get data from Internet sources. For that, we will use a command-line tool of Unix, called `curl`. (Later in class, we will learn how to achieve the same using Python, but for quick testing, `curl` is often the standard method used.) We will also use a tool called `jq` to interact with JSON output. (Do not worry, we will revisit both these later in class.)

_Often, curl and jq do not come preinstalled, so the first time that we use them, we need to issue the appropriate command for installing it.  To install it, simply type:_

In [None]:
!sudo apt-get -y install curl
!sudo apt-get -y install jq

Let's start by retrieving a simple text file, which we will use later in the class, to illustrate how different shell commands work. The sample data file is hosted online. You can use terminal commands to copy this remote file. Simply type:

### Get a file with CURL

In [None]:
!curl -L 'https://www.dropbox.com/s/w6sov31z68v5e8v/sample.txt?dl=0'

The columns in this tab-separated data correspond to [order id] [time of order] [user id] [ordered item], something similar to what might be encountered in practice. If you wish, you can copy-paste the data written above into a text editor, making sure there is a newline following each of the ordered item columns (the columns with alphabetic characters).

To store the output to a file, we also add the `-o [output file]` in the command. (We are also going to see in the next session how to use _output redirection_ to store the output to a file.)

In [None]:
!pwd; mkdir data

In [None]:
!curl 'https://www.dropbox.com/s/w6sov31z68v5e8v/sample.txt?dl=0' -o data/sample.txt

In [None]:
!ls data/

This will pull the file to the directory `/home/[USER]/notebooks/dealing_with_data/12-UNIX_Basics/data`, creating a new file called `sample.txt`. If we do not want to see any statistics about the download, we can use the `-s` option:

In [None]:
!curl  -s 'https://www.dropbox.com/s/w6sov31z68v5e8v/sample.txt?dl=0' -o data/sample.txt

And let's clean up:

In [None]:
!rm data/sample.txt; rmdir data

### Access APIs with CURL

Now, let's try to use curl to get access to some real data. A key component of today's data ecosystem is the existence of `Web APIs` which provide functionality for a variety of tasks.

#### What's the weather?

Let's use the OpenWeather API to get the weather details in our location. (The details of the API calls are available at http://openweathermap.org/api.)

We can access an API call by issuing the following command, where `jq` provides nice formatting:

     !curl -s ... | jq 

While the output does not look nice to a human, for a computer is a perfectly legitimate answer. This format is called "JSON", and is an efficient and very commonly used way to transfer data on the Internet.


In [None]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?\
&appid=ffb7b9808e07c9135bdcc7d1e867253d\
&q=New%20York,NY,USA\
&mode=json" | jq 

#### Synonym finder

And now a web API that analyzes Wikipedia to figure out different ways that people refer to the same entity.

In [None]:
!curl -s "http://wikisynonyms.ipeirotis.com/api/Donald_Trump" | jq 