# index of ES

In [None]:
meta data of a json document
      {
        "_index" : "orgs-index",
        "_type" : "_doc",
        "_id" : "1263",
        "_score" : 1.0,
        "_source" : {
          "address_line_1" : "7410 West Rawson Avenue",
          "city_municipality" : "Franklin",
          "country" : "United States",
          "ctep_id" : "WI137",
          "id" : "252328",
          "name" : "Wheaton Franciscan Healthcare-Saint Francis/Reiman Cancer Center",
          "state_province_territory" : "WI",
          "va_organization" : false
        }
      }

## create index

In [None]:
#create an index known as "trials"
PUT /trials
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}


## delete index

In [None]:
DELETE /trials
{
    "acknowledged": true
}

## insert data 
POST <_index>/<_type>/<_id>
_index: index name to which a document belogs
_type: index mapping type. 
_id: document's id

new pattern: 
POST <_index>/_create/<_id>
POST <_index>/_doc/<_id>
POST <_index>/_doc


1. document indexing is relying on _index, _type, and _id together. combination of _type and _id identifies a unique _uid field. However, _type may be removed from ES in the future. _id is generated automatically.
2. _type is a way to separtate data logically. It is possible to store several types of data in the same index. That is helpful to keep number of indices low. _type field could restrict a search within complex data.
3. Use _type only if all types have similar mapping. For example: parents and children like data. Otherwise, use different slices. 
4. Data is stored in shards rather than index though we could say "insert data into index"

In [None]:
#old pattern
POST /trials/trial/NCI-8765-00001
{
  "nci_id": "NCI-8765-00001",
  "diseases": [],
  "is_lead_disease": true,
  "organizations":{
    "name": "cancer center",
    "id": "0244"
  }
}


In [None]:
#new pattern: _type is deprecated.
#_id is automatically created.
POST /trials/_doc
{
  "nci_id": "NCI-8765-00001",
  "diseases": [],
  "is_lead_disease": true,
  "organizations":{
    "name": "cancer center",
    "id": "0244"
  }
}


#_id is NCI-8765-00001
POST /trials/_doc/NCI-8765-00001
{
  "nci_id": "NCI-8765-00001",
  "diseases": [],
  "is_lead_disease": true,
  "organizations":{
    "name": "cancer center",
    "id": "0244"
  }
}


## update data
In ES, it is not ok to update data. For updating, it is possible cover old document with new document
Once updating succeed, _version will increase one. 

In [None]:
# entirely updates using PUT or POST
PUT /trials/_doc/NCI-8765-00001
{
  "nci_id": "NCI-8765-00001",
  "diseases": [],
  "is_lead_disease": true,
  "organizations":{
    "name": "cancer center",
    "id": "0244"
  },
  "create_date": "20334"
}

POST /trials/_doc/NCI-8765-00001
{
  "nci_id": "NCI-8765-00001",
  "diseases": [],
  "is_lead_disease": true,
  "organizations":{
    "name": "cancer center",
    "id": "0244"
  },
  "create_date": "20334"
}

In [None]:
#partial updates
POST /trials/_update/NCI-8765-00001
{
  "doc":{
      "biomarkers":[
        {
          "evs_id": "C4002",
          "name": "neoplasm"
        }
      ]
  }
}



## delete data
Once deletion succeds, "result" is "deleted". This document is labelled as "deleted" rather than real deletion. ES will automatically delete when enough deletion are collected.

In [None]:
DELETE /trials/_doc/NCI-8765-00002

## get data
GET {_index}/{_type}/{_id}

In [None]:
#
GET /trials/_doc/NCI-8765-00001

## get all data
GET /{_index}/_search
default return 10 hits

In [None]:
#get all data
GET /trials/_search

In [None]:
# add conditions
GET /trials/_search
{
  "query":{
    "match":{
      "create_date": "20344"
    }
  }
}


## check if data exists
check if a document exists:
check status code: 200 exists. or 404 
200 - OK
{"statusCode":404,"error":"Not Found","message":"404 - Not Found"}

In [None]:
#check current existance
HEAD trials/_doc/NCI-8765-00001