## Exercise 2. Update Metadata on a Dataset
Let's use the [Metadata API](https://socratametadataapi.docs.apiary.io/#) to add more information to our dataset.

## Import Libraries

In [129]:
import json
import os
import pandas as pd
import requests

## Setup Authentication
- Can enter Socrata user name and password or api keys with key id and secret values respectively
- Enter the domain of dataset if you have publisher or admin access
- Enter the dataset unique ID

Let's update the dataset description for https://alicia.data.socrata.com/dataset/Arizona-Places-Median-Household-Income/9abs-ubh5 to something more meaningful.

In [130]:
user_name = os.environ['MY_SOCRATA_USERNAME']
password = os.environ['MY_SOCRATA_PASSWORD']
domain = 'alicia.data.socrata.com'
dataset_id = '9abs-ubh5'

# URL to metadata for any asset
meta_url = 'https://' + domain + '/api/views/metadata/v1/' + dataset_id

dataset_description = 'Median Household Income - American Community Survey 5 Year Estimates for all Arizona Places from 2011-2017'

## Make HTTP request for existing metadata
- Include authentication because dataset is private by default

In [131]:
req = requests.get(meta_url, auth=(user_name,password))

meta = req.text
print(meta)

{
  "id" : "9abs-ubh5",
  "name" : "Arizona Places Median Household Income",
  "attribution" : "Census",
  "attributionLink" : "https://www.census.gov/programs-surveys/acs/",
  "category" : "Demographics",
  "createdAt" : "2019-02-20T00:22:21+0000",
  "dataUpdatedAt" : "2019-02-20T00:22:38+0000",
  "dataUri" : "https://alicia.data.socrata.com/resource/9abs-ubh5",
  "description" : "Median Household Income - American Community Survey 5 Year Estimates for all Arizona Places from 2011-2017",
  "domain" : "alicia.data.socrata.com",
  "externalId" : null,
  "hideFromCatalog" : false,
  "hideFromDataJson" : false,
  "license" : null,
  "metadataUpdatedAt" : "2019-02-20T02:59:03+0000",
  "provenance" : "OFFICIAL",
  "updatedAt" : "2019-02-20T02:59:03+0000",
  "webUri" : "https://alicia.data.socrata.com/d/9abs-ubh5",
  "customFields" : null,
  "tags" : null
}



## Make HTTP Patch request to update dataset description
- Create the JSON and encode it

In [132]:
# update Description using Patch method which allows updates for selected properties
payload = {"description": dataset_description}
json_data = json.dumps(payload)

req_update = requests.patch(meta_url, json_data, auth=(user_name,password))
meta_new = req_update.text
print(meta_new)

{
  "action" : "modify",
  "metadata" : {
    "id" : "9abs-ubh5",
    "name" : "Arizona Places Median Household Income",
    "attribution" : "Census",
    "attributionLink" : "https://www.census.gov/programs-surveys/acs/",
    "category" : "Demographics",
    "createdAt" : "2019-02-20T00:22:21+0000",
    "dataUpdatedAt" : "2019-02-20T00:22:38+0000",
    "dataUri" : "https://alicia.data.socrata.com/resource/9abs-ubh5",
    "description" : "Median Household Income - American Community Survey 5 Year Estimates for all Arizona Places from 2011-2017",
    "domain" : "alicia.data.socrata.com",
    "externalId" : null,
    "hideFromCatalog" : false,
    "hideFromDataJson" : false,
    "license" : null,
    "metadataUpdatedAt" : "2019-02-20T02:59:15+0000",
    "provenance" : "OFFICIAL",
    "updatedAt" : "2019-02-20T02:59:15+0000",
    "webUri" : "https://alicia.data.socrata.com/d/9abs-ubh5",
    "customFields" : null,
    "tags" : null
  }
}



## Update multiple metadata properties by iterating through a file
- `/data/dataset_metadata.xlsx` contains dataset id and other metadata properties for a dataset that can be suitable  for bulk updates
- Set NaNs to blanks

Let's update https://alicia.data.socrata.com/dataset/Arizona-Places-Median-Household-Income/9abs-ubh5 with many properties.

In [133]:
metadata = pd.read_excel('../data/dataset_metadata.xlsx')
metadata = metadata.fillna('')
metadata.head()

Unnamed: 0,dataset_id,name,category,description,attribution,attributionLink,tags
0,9abs-ubh5,Arizona Places Median Household Income,Demographics,Median Household Income - American Community S...,Census,https://www.census.gov/programs-surveys/acs/,
1,kyb6-5pf6,Arizona Places Median Household Income,Business,Median Household Income - American Community S...,Census,https://www.census.gov/programs-surveys/acs/,income
2,7erx-nkcf,Arizona Places Median Household Income,Demographics,Median Household Income - American Community S...,,,"income, household income, prosperity"


In [134]:
for idx,row in metadata.iterrows():
    dataset_id = row['dataset_id']
    meta_url = 'https://' + domain + '/api/views/metadata/v1/' + dataset_id
    
    payload = dict()
    payload['name'] = row['name']
    payload['description'] = row['description']
    payload['category'] = row['category']

    # if there are multiple keywords, split them by comma
    if(row['tags'] != ''):
        payload['tags'] = row['tags'].split(',')
    else:
        payload['tags'] = []
    
    # empty links return validation error
    if(row['attributionLink']!=''):
        payload['attributionLink'] = row['attributionLink']
    
    # encode json
    json_data = json.dumps(payload)
    # print(json_data)
    
    req_update = requests.patch(meta_url, json_data, auth=(user_name,password))
    meta_new = req_update.text
    print(meta_new)

{
  "action" : "modify",
  "metadata" : {
    "id" : "9abs-ubh5",
    "name" : "Arizona Places Median Household Income",
    "attribution" : "Census",
    "attributionLink" : "https://www.census.gov/programs-surveys/acs/",
    "category" : "Demographics",
    "createdAt" : "2019-02-20T00:22:21+0000",
    "dataUpdatedAt" : "2019-02-20T00:22:38+0000",
    "dataUri" : "https://alicia.data.socrata.com/resource/9abs-ubh5",
    "description" : "Median Household Income - American Community Survey 5 Year Estimates for all Arizona Places from 2011-2017",
    "domain" : "alicia.data.socrata.com",
    "externalId" : null,
    "hideFromCatalog" : false,
    "hideFromDataJson" : false,
    "license" : null,
    "metadataUpdatedAt" : "2019-02-20T02:59:15+0000",
    "provenance" : "OFFICIAL",
    "updatedAt" : "2019-02-20T02:59:15+0000",
    "webUri" : "https://alicia.data.socrata.com/d/9abs-ubh5",
    "customFields" : null,
    "tags" : null
  }
}

{
  "action" : "modify",
  "metadata" : {
    "id" 