# <font color='blue'>Federated Search</font>

This notebook demonstrates the World Modelers Federated Search API

- `/search`
- `/metadata/{data_location}/{id_value}`
- `/download_variables/{data_location}/{dataset_id}`
- `/download/{data_location}/{id_value}`
- `/search_concepts/{concept_name}`

## Usage:
A live version of the API is hosted at https://search.worldmodelers.com/. If you need login credentials, please e-mail travis@jataware.com.

To run the API locally, go to: [Federated Search](https://github.com/WorldModelers/federated-search) and follow the README instructions. 

## <font color='green'>Notes</font>
Throughout this notebook you will find notes about the usage for each endpoint and how it may differ depending on the target Datamart (ISI or NYU). Since each Datamart has its own functionality, this API abstracts these differences as much as possible, but there are instances where the differences should be noted by the end user.

 ### Federated Search Server:

In [1]:
from requests import get,post,put,delete
from io import StringIO
import pandas as pd
import requests
import json

**Please update the below credentials:**

In [2]:
## Comment out the server you do not wish to use:

# To run API on localhost, see instructions at: https://github.com/WorldModelers/federated-search
#federated_url = 'http://localhost:8080'

# To run on remote server
username = 'INSERT_USERNAME_HERE'
password = 'INSERT_PASSWORD_HERE'
federated_url = f'https://{username}:{password}@search.worldmodelers.com/'

## <font color='blue'>1. Search: </font>
### Search by keywords and/or apply time and geospatial filters
    
### endpoint: `/search`    

### <font color='green'>ISI/NYU Search Notes</font>

<b>data_location</b>: "ISI" or "NYU"

#### <b>geo</b>: Currently NYU ONLY

  - bounding box (bbox):
    - Example: query_bbox
    - Latitude1/Longitude1 => Northwest point
    - Latitude2/Longitude2 => Southeast point
    - Latitude and Longitude:
      - North/East are positive
      - South/West are negative
    
  - place:
    - Example: query_place
    - Note: "place" replaces "bbox" and "area_name" key/value replaces "lat/long"
    

#### <b>keywords</b>:
  - ISI requires keywords
    - Example: query_isi
  - NYU does not require keywords
  - Both ISI and NYU perform "OR" searches; ISI via its API; NYU via Federated Search making repeated API calls for each keyword.
  
#### <b>time</b>: NYU Only  
  - Enter a start date (ISO 8601)
  - Enter an end date (ISO 8601)
  - Example: query_bbox  
  
#### <b>Hybrid Searches</b>:
  - NYU supports hybrid searches, ISI currently does not
  - NYU can have any combo of [keywords, geo (bbox or place), time]
  - Example: query_bbox  

### Examples:

Query the NYU Datamart with a bounding box query:

In [3]:
query_bbox = {
  "data_location": "NYU",
  "geo": {
    "type": "bbox",
    "value": {
      "bbox": {
        "latitude1": 14.5,
        "latitude2": 32,
        "longitude1": 3,
        "longitude2": 46
      }
    }
  },
  "keywords": [
    "wfp"
  ],
  "time": {
    "end": "2020-08-31T00:00:00Z",
    "start": "2017-01-01T00:00:00Z"
  }
}

In [4]:
query = query_bbox

response = requests.post(f'{federated_url}/search',json=query)
print(json.dumps(response.json(), indent=2))

[
  {
    "data_location": "NYU Datamart",
    "dataset_id": "datamart.uaz-indicators.6ee20bafa79c5db792f0172a06e901ef",
    "description": "None",
    "name": "Net official flows from UN agencies (WDI)",
    "score": 6.463652
  },
  {
    "data_location": "NYU Datamart",
    "dataset_id": "datamart.upload.9b584d2bc04e41339d520404785e8d2c",
    "description": "WFP data for Ethiopia",
    "name": "World Food Prices",
    "score": 4.339799
  }
]


Query the NYU Datamart with a place name:

In [5]:
query_place = {
  "data_location": "NYU",
  "geo": {
    "type": "place",
    "value": {
      "place": {"area_name": "Ethiopia" }
    }
  },
  "keywords": [
    "maize", "wfp"
  ]
}

In [6]:
query = query_place

response = requests.post(f'{federated_url}/search',json=query)
print(json.dumps(response.json(), indent=2))

[
  {
    "data_location": "NYU Datamart",
    "dataset_id": "datamart.uaz-indicators.f7618fe2a85050c198929f37472ccd1d",
    "description": "None",
    "name": "Maize (white) - Retail (WFP)",
    "score": 10.560213
  }
]


Query the ISI Datamart with keywords:

In [7]:
query_isi = {
  "data_location": "ISI",
  "keywords": [
    "maize", "wfp"
  ]
}

In [8]:
query = query_isi

response = requests.post(f'{federated_url}/search',json=query)
print(json.dumps(response.json()[:2], indent=2))

[
  {
    "data_location": "ISI Datamart",
    "dataset_id": "UAZ",
    "description": "None",
    "name": " FAO: Biomass burned (dry matter), Maize[tonnes]",
    "score": 0.0607927,
    "variable_id": "VUAZ-311"
  },
  {
    "data_location": "ISI Datamart",
    "dataset_id": "UAZ",
    "description": "None",
    "name": " FAO: Direct emissions (CO2eq) (Crop residues), Maize[gigagrams]",
    "score": 0.0607927,
    "variable_id": "VUAZ-325"
  }
]


##  <font color='blue'>2. Obtain Metadata</font>
### Search APIs for metadata with known API and dataset ID

### endpoint: `/metadata/{data_location}/{id_value}`


#### For ISI and NYU:

<b>data_location</b> = "ISI" or "NYU"

<b>dataset_id</b> is the ID of the dataset of interest.

<b>variable_id</b> is the ID of the variable of interest (only applies to ISI Datamart).


Note: `z_meta` is a catch-all of all other metadata associated with the id_value that is not specified in the schema

Example of dataset level metadata from ISI Datamart:

In [9]:
# EXAMPLES: 
data_location = "ISI"
dataset_id = "WDI"

In [10]:
meta_url = f'{federated_url}/metadata/{data_location}/{dataset_id}'

In [11]:
response = requests.get(meta_url)
print(json.dumps(response.json(), indent=2))

{
  "data_location": "ISI",
  "dataset_id": "None",
  "description": "None",
  "name": "None",
  "source": "None",
  "spatial_resolution": "None",
  "temporal_resolution": "None",
  "variable_id": "None",
  "z_meta": {
    "corresponds_to_property": "None",
    "qualifier": "None"
  }
}


Example of metadata from NYU Datamart:

In [12]:
# EXAMPLES: 
data_location = "NYU"
dataset_id = "datamart.uaz-indicators.069de31ef57758da93ebde435df440a4"  

In [13]:
meta_url = f'{federated_url}/metadata/{data_location}/{dataset_id}'

In [14]:
response = requests.get(meta_url)
print(json.dumps(response.json(), indent=2))

{
  "data_location": "NYU",
  "dataset_id": "datamart.uaz-indicators.069de31ef57758da93ebde435df440a4",
  "description": "None",
  "name": "Average Harvested Weight at Maturity (Maize) (None)",
  "source": "None (UAZ)",
  "spatial_resolution": "Country",
  "temporal_resolution": "None",
  "z_meta": {
    "attribute_keywords": [
      "Country",
      "State",
      "County",
      "Year",
      "Month",
      "Average Harvested Weight at Maturity (Maize) (kg/ha)",
      "Average",
      "Harvested",
      "Weight",
      "at",
      "Maturity",
      "Maize",
      "kg",
      "ha",
      ""
    ],
    "columns": [
      {
        "admin_area_level": 0,
        "name": "Country",
        "num_distinct_values": 2,
        "plot": {
          "data": [
            {
              "bin": "Ethiopia",
              "count": 68
            },
            {
              "bin": "South Sudan",
              "count": 68
            }
          ],
          "type": "histogram_categorical"
      

Additionally, we can obtain variable level metadata from the ISI Datamart:

In [15]:
# EXAMPLES: 
data_location = "ISI"
dataset_id = "UAZ"
variable_id = "VUAZ-311"

In [16]:
meta_url = f'{federated_url}/metadata/{data_location}/{dataset_id}?variable_id={variable_id}'

In [17]:
response = requests.get(meta_url)
print(json.dumps(response.json(), indent=2))

{
  "data_location": "ISI",
  "dataset_id": "UAZ",
  "description": "FAO: Biomass burned (dry matter), Maize[tonnes]",
  "name": "FAO: Biomass burned (dry matter), Maize[tonnes]",
  "source": "None",
  "spatial_resolution": "None",
  "temporal_resolution": "None",
  "variable_id": "VUAZ-311",
  "z_meta": {
    "corresponds_to_property": "P2006020317",
    "qualifier": [
      {
        "identifier": "P585",
        "name": "point in time"
      },
      {
        "identifier": "P248",
        "name": "stated in"
      }
    ]
  }
}


## <font color='blue'>3. Download Datasets</font>
### Download datasets by dataset ID

### `/download/{data_location}/{id_value}`

> Note: this is only relevant for downloading from the NYU Datamart. For downloading from ISI, use the `/download_variables` endpoint.

In [18]:
## Always NYU...
data_location = "NYU"

# Example:
dataset_id = "datamart.upload.9b584d2bc04e41339d520404785e8d2c"

In [19]:
nyu_download_url = f'{federated_url}/download/{data_location}/{dataset_id}'

In [20]:
# Display top 5 rows for ease of viewing...
response = requests.get(nyu_download_url)
df = pd.read_csv(StringIO(response.text))
df.drop(df.index[0]).head(5)

Unnamed: 0,date,cmname,unit,category,price,currency,country,admname,adm1id,mktname,mktid,cmid,ptid,umid,catid,sn,default
1,7/15/2005,Sorghum - Wholesale,100 KG,cereals and tubers,238,ETB,Ethiopia,Addis Ababa,1227,Addis Ababa,480.0,65,14.0,9.0,1,480_65_14_9,
2,8/15/2005,Sorghum - Wholesale,100 KG,cereals and tubers,250,ETB,Ethiopia,Addis Ababa,1227,Addis Ababa,480.0,65,14.0,9.0,1,480_65_14_9,
3,9/15/2005,Sorghum - Wholesale,100 KG,cereals and tubers,248,ETB,Ethiopia,Addis Ababa,1227,Addis Ababa,480.0,65,14.0,9.0,1,480_65_14_9,
4,10/15/2005,Sorghum - Wholesale,100 KG,cereals and tubers,233,ETB,Ethiopia,Addis Ababa,1227,Addis Ababa,480.0,65,14.0,9.0,1,480_65_14_9,
5,11/15/2005,Sorghum - Wholesale,100 KG,cereals and tubers,252,ETB,Ethiopia,Addis Ababa,1227,Addis Ababa,480.0,65,14.0,9.0,1,480_65_14_9,


## <font color='blue'>4. Download Variables </font>
### Download variables by variable ID

### `/download_variables/{data_location}/{dataset_id}`

> Note: this is only relevant for the ISI Datamart. For downloading from NYU Datamart, see `/download` endpoint.A

In [21]:
## Always ISI...
data_location = "ISI"

# Example
dataset_id= "WDI"
variable_ids = ["access_to_electricity_of_population","access_to_clean_fuels_and_technologies_for_cooking_of_population"]

In [22]:
isi_download_url = f'{federated_url}/download_variables/{data_location}/{dataset_id}'

In [23]:
response = requests.post(isi_download_url, json=variable_ids)
df = pd.read_csv(StringIO(response.text))
df.drop(df.index[0]).head(5)

Unnamed: 0.1,Unnamed: 0,dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,admin1,admin2,admin3,region_coordinate,stated_in,stated_in_id,stated in
1,1,WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,76.34446,,2001-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",,,WDI
2,2,WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,77.307663,,2002-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",,,WDI
3,3,WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,78.251656,,2003-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",,,WDI
4,4,WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,79.171516,,2004-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",,,WDI
5,5,WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,81.6,,2005-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",,,WDI


### <font color='red'>Work in Progress... </font>

## <font color='blue'>4. Search Concepts </font>
### Search UAZ indicators for concept matchings


### `/search_concepts/{concept_name}`

> NOTE: This endpoint will be available once UAZ Concept Mapping Service integration is completed.