Getting Data using CURL
-----------------------

We now move into a more interesting topic: How to get data from Internet sources. For that, we will use a command-line tool of Unix, called `curl`. (Later in class, we will learn how to achieve the same using Python, but for quick testing, curl is often the standard method used.) We will also use a tool called `jq` to interact with JSON output. (Do not worry, we will revisit both these later in class.)

_Often, curl and jq do not come preinstalled, so the first time that we use them, we need to issue the appropriate command for installing it.  To install it, simply type:_

In [None]:
!sudo apt-get -y install curl
!sudo apt-get -y install jq

Let's start by retrieving a simple text file, which we will use later in the class, to illustrate how different shell commands work. The sample data file is hosted online. You can use terminal commands to copy this remote file. Simply type:

In [None]:
!curl -L 'https://dl.dropboxusercontent.com/u/16006464/IPDS/sample.txt'

The columns in this tab-separated data correspond to [order id] [time of order] [user id] [ordered item], something similar to what might be encountered in practice. If you wish, you can copy-paste the data written above into a text editor, making sure there is a newline following each of the ordered item columns (the columns with alphabetic characters).

To store the output to a file, we also add the `-o [output file]` in the command. (We are also going to see in the next session how to use _output redirection_ to store the output to a file.)

In [None]:
!curl -L 'https://dl.dropboxusercontent.com/u/16006464/IPDS/sample.txt' -o data/sample.txt

In [None]:
!ls data/

This will pull the file to the directory `/home/ubuntu/data/`, creating a new file called `sample.txt`. If we do not want to see any statistics about the download, we can use the `-s` option:

In [None]:
!curl  -s -L 'https://dl.dropboxusercontent.com/u/16006464/IPDS/sample.txt' -o data/sample.txt

Now, let's try to use curl to get access to some real data. A key component of today's data ecosystem is the existence of `Web APIs` which provide functionality for a variety of tasks.

#### Where am I?

For example, let's try to figure out programmatically the location of the computer where the iPython server is running. We can access the API call by issuing the following command:



In [1]:
!curl -s "http://freegeoip.net/json/" | jq .

[1;39m{
  [0m[34;1m"ip"[0m[1;39m: [0m[0;32m"54.174.159.22"[0m[1;39m,
  [0m[34;1m"country_code"[0m[1;39m: [0m[0;32m"US"[0m[1;39m,
  [0m[34;1m"country_name"[0m[1;39m: [0m[0;32m"United States"[0m[1;39m,
  [0m[34;1m"region_code"[0m[1;39m: [0m[0;32m"VA"[0m[1;39m,
  [0m[34;1m"region_name"[0m[1;39m: [0m[0;32m"Virginia"[0m[1;39m,
  [0m[34;1m"city"[0m[1;39m: [0m[0;32m"Ashburn"[0m[1;39m,
  [0m[34;1m"zip_code"[0m[1;39m: [0m[0;32m"20149"[0m[1;39m,
  [0m[34;1m"time_zone"[0m[1;39m: [0m[0;32m"America/New_York"[0m[1;39m,
  [0m[34;1m"latitude"[0m[1;39m: [0m[0;39m39.0481[0m[1;39m,
  [0m[34;1m"longitude"[0m[1;39m: [0m[0;39m-77.4729[0m[1;39m,
  [0m[34;1m"metro_code"[0m[1;39m: [0m[0;39m511[0m[1;39m
[1;39m}[0m


While this does not look nice to a human, for a computer is a perfectly legitimate answer. This format is called "JSON", and is an efficient and very commonly used way to trasfer data today on the Internet.
| jq controls presentation

Now, let's examine a few more web APIs, just for fun:

#### What's the weather?

Now, let's use the OpenWeather API to get the weather details in our location. (The details of the API calls are available at http://openweathermap.org/api.)

In [2]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?\
&appid=ffb7b9808e07c9135bdcc7d1e867253d\
&q=New%20York,NY,USA\
&units=imperial\
&mode=json" | jq .

[1;39m{
  [0m[34;1m"coord"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"lon"[0m[1;39m: [0m[0;39m-74.01[0m[1;39m,
    [0m[34;1m"lat"[0m[1;39m: [0m[0;39m40.71[0m[1;39m
  [1;39m}[0m[1;39m,
  [0m[34;1m"weather"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"id"[0m[1;39m: [0m[0;39m500[0m[1;39m,
      [0m[34;1m"main"[0m[1;39m: [0m[0;32m"Rain"[0m[1;39m,
      [0m[34;1m"description"[0m[1;39m: [0m[0;32m"light rain"[0m[1;39m,
      [0m[34;1m"icon"[0m[1;39m: [0m[0;32m"10d"[0m[1;39m
    [1;39m}[0m[1;39m
  [1;39m][0m[1;39m,
  [0m[34;1m"base"[0m[1;39m: [0m[0;32m"stations"[0m[1;39m,
  [0m[34;1m"main"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"temp"[0m[1;39m: [0m[0;39m39.59[0m[1;39m,
    [0m[34;1m"pressure"[0m[1;39m: [0m[0;39m1015.94[0m[1;39m,
    [0m[34;1m"humidity"[0m[1;39m: [0m[0;39m94[0m[1;39m,
    [0m[34;1m"temp_min"[0m[1;39m: [0m[0;39m39.59[0m[1;39m,
    [0m[34;1m"temp_max"[

You will notice that we asked the service to return to us the data in JSON format. For that API, we can also ask the data to be returned in a different format, called XML, which is wordlier. (We will get back to these formats later in the semester.)

In [3]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?\
&q=New%20York,NY,USA\
&units=imperial\
&mode=xml\
&appid=ffb7b9808e07c9135bdcc7d1e867253d"

<current><city id="5128581" name="New York"><coord lon="-74.01" lat="40.71"></coord><country>US</country><sun rise="2017-01-23T12:12:46" set="2017-01-23T22:03:58"></sun></city><temperature value="39.59" min="39.59" max="39.59" unit="fahrenheit"></temperature><humidity value="94" unit="%"></humidity><pressure value="1015.94" unit="hPa"></pressure><wind><speed value="22.41" name="Severe Gale"></speed><gusts></gusts><direction value="69.5014" code="ENE" name="East-northeast"></direction></wind><clouds value="100" name="overcast clouds"></clouds><visibility></visibility><precipitation value="0.76" mode="rain" unit="3h"></precipitation><weather number="500" value="light rain" icon="10d"></weather><lastupdate value="2017-01-23T18:18:16"></lastupdate></current>

#### What's the sentiment?

Now let's try to use a web service to automatically analyze the sentiment for a piece of text. (The service comes from the [IBM's Alchemy API](http://www.alchemyapi.com/api/sentiment/textc.html#textsentiment))

In [4]:
!curl -s "http://access.alchemyapi.com/calls/text/TextGetTextSentiment" \
-d "outputMode=json" \
-d "apikey=4b46c7859a7be311b6f9389b12504e302cac0a55" \
-d "text=I did not dislike it. " | jq .

[1;39m{
  [0m[34;1m"status"[0m[1;39m: [0m[0;32m"OK"[0m[1;39m,
  [0m[34;1m"usage"[0m[1;39m: [0m[0;32m"By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html"[0m[1;39m,
  [0m[34;1m"totalTransactions"[0m[1;39m: [0m[0;32m"1"[0m[1;39m,
  [0m[34;1m"language"[0m[1;39m: [0m[0;32m"english"[0m[1;39m,
  [0m[34;1m"docSentiment"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"type"[0m[1;39m: [0m[0;32m"neutral"[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m}[0m


#### And a few synonyms

And now just a demo of a web API that I created myself a few years back. It analyzes Wikipedia to figure out different ways that people use to refer to the same entity



In [6]:
!curl -s "http://wikisynonyms.ipeirotis.com/api/Donald_Trump" | jq .

[1;39m{
  [0m[34;1m"http"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"message"[0m[1;39m: [0m[0;32m"success"[0m[1;39m,
  [0m[34;1m"terms"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"term"[0m[1;39m: [0m[0;32m"Donald Trump"[0m[1;39m,
      [0m[34;1m"canonical"[0m[1;39m: [0m[0;39m1[0m[1;39m,
      [0m[34;1m"oskill"[0m[1;39m: [0m[0;39m0[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[34;1m"term"[0m[1;39m: [0m[0;32m"Donald John Trump"[0m[1;39m,
      [0m[34;1m"canonical"[0m[1;39m: [0m[0;39m0[0m[1;39m,
      [0m[34;1m"oskill"[0m[1;39m: [0m[0;39m0[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[34;1m"term"[0m[1;39m: [0m[0;32m"The Donald"[0m[1;39m,
      [0m[34;1m"canonical"[0m[1;39m: [0m[0;39m0[0m[1;39m,
      [0m[34;1m"oskill"[0m[1;39m: [0m[0;39m0[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[34;1m"term"[0m[1;39m: [0m[0;32m"Donald J. Trum

## Exercise

The following websites contain listing of many useful APIs

* https://www.mashape.com 
* http://www.programmableweb.com/
* http://www.mashery.com/
* http://apigee.com/ 

Mashape is my own personal favorite in terms of user-friendliness and also has examples directly expressed using CURL. but the others are pretty nice as well. Your task: search through these websites and find a web API that does something that you like. Use CURL to issue a web API call to this service. 