# Importing data from metadata hosted table to Contentful

## Table of contents
1. [First steps](#first)
    1. [Import packages](#packages)
    2. [Get credentials](#credentials)
    3. [Connect to APIS](#connect-to-apis)
    4. [Utils](#utils)
2. [Export data from AGOL to Contentful](#export)
    1. [Create content model for metadata in Contentful](#model)
    2. [Get hosted table as dataframe](#hosted)
    3. [Send dataframe information to contentful model](#content)
3. [Updating Contentful metadata](#update)
4. [Delete all entries and content type](#delete)

---
<a id='first'></a>
## First steps

<a id='packages'></a>
### Import packages

In [1]:
import contentful
import contentful_management
import pandas as pd
import numpy as np
from numpy import array
import geopandas as gpd
import arcgis
from arcgis.gis import GIS
import json
import pandas as pd
from arcgis.features import FeatureLayerCollection
import requests as re
from copy import deepcopy

<a id='credentials'></a>
### Get credentials

In [2]:
env_path = ".env"
with open(env_path) as f:
   env = {}
   for line in f:
       env_key, _val = line.split("=")
       env_value = _val.split("\n")[0]
       env[env_key] = env_value

aol_password = env['ARCGIS_GRETA_PASS']
aol_username = env['ARCGIS_GRETA_USER']

cnt_space = env['contentful_space'] # Space in contentful
cnt_token = env['contentful_token'] # This token is only for read-only purposes, it doesn't allow management
cnt_management = env['contentful_personal_token'] # This is the token needed for management purposes


<a id='apis'></a>
### Connect to APIS

**ESRI**

In [3]:
gis = GIS("https://eowilson.maps.arcgis.com", aol_username, aol_password)

**Contentful**

In [4]:
client = contentful_management.Client(cnt_management) # This allows managing

In [5]:
client2 = contentful.Client(cnt_space, cnt_token) # this only allows queries

<a id='utils'></a>
### Utils

In [6]:
# To convert an existing hosted table in an spatial dataframe
def getHTfromId(item_id):
    item = gis.content.get(item_id)
    flayer = item.tables[0]
    sdf = flayer.query().sdf
    return sdf

---
<a id='export'></a>
## Export data from AGOL metadata table to contentful

<a id='model'></a>
### 1. Create content model for metadata in Contentful 
The first thing here is to create a content model in Contentful that has the same fields as the fields we want to import from the table in ArcGIS Online. We are calling it `Metadata_prod`. Make sure that the type of the field (number, long text, short text...) are the same we have on the metadata table hosted in AGOL. Note that text fields in AGOL are set as string fields. Once the content model is created, we can start importing data.

<a id='hosted'></a>
### 2. Get hosted table as dataframe

In [12]:
metadata = getHTfromId('ef369a73779d4a37b2252808afef98a7') # call table from AGOL using ID
metadata.head()

Unnamed: 0,layerSlug,description,source,molLogo,hasAdditionalContent,title,ObjectId,GlobalID,ObjectId2,ObjectId3
0,urban_human_pressures,Shows areas where the land is used by urban ac...,"(1) [Ellis, Erle C., et al., 2010](https://onl...",False,False,Urban pressures metadata,,136e5a63-a0e7-4100-9741-2650b353e36e,1,1
1,irrigated_human_pressures,Shows areas where the land is used by irrigate...,"(1) [Ellis, Erle C., et al., 2010](https://onl...",False,False,Irrigated agriculture metadata,,0e9c36e0-fa1d-4f76-9ad2-a2beccdb18f0,2,2
2,rainfed_human_pressures,Shows areas where the land is used by rainfed ...,"(1) [Ellis, Erle C., et al., 2010](https://onl...",False,False,Rainfed agriculture metadata,,a7fbc232-6689-4a65-939a-089f7226e527,3,3
3,rangeland_human_pressures,Shows areas where the land is used by rangelan...,"(1) [Ellis, Erle C., et al., 2010](https://onl...",False,False,Rangeland metadata,,84c1634d-9ce7-4e8e-bfbc-f44ce852c4cf,4,4
4,merged_land_human_pressures,Shows areas where there is high anthropogenic ...,"(1) [Ellis, Erle C., et al., 2010](https://onl...",False,False,Land human pressures metadata,,4d862350-7f89-4c5c-8b4b-cb1271e66b1e,5,5


In [13]:
metadata.shape

(80, 10)

<a id='content'></a>
### 3. Send dataframe information to contentful content model

In [14]:
metadata.columns

Index(['layerSlug', 'description', 'source', 'molLogo', 'hasAdditionalContent',
       'title', 'ObjectId', 'GlobalID', 'ObjectId2', 'ObjectId3'],
      dtype='object')

In [16]:
# Enter data from table in contentful and publish it

for index, row in metadata.iterrows():

    entry_attributes = {
        'content_type_id': 'metadataProd',
        'fields': {
            'layerSlug': {
                'en-US': row["layerSlug"]
            },
            'description': {
                'en-US': row['description']
            },
            'source':{
                'en-US': row['source']
            },
            'molLogo':{
                'en-US': row['molLogo']
            },
            'hasAdditionalContent':{
                'en-US': row['hasAdditionalContent']
            },
            'title':{
                'en-US': row['title']
            },
            'globalId':{
                'en-US': row['GlobalID']
            },
            'objectId3':{
                'en-US': row['ObjectId3']
            },
            'language':{
                'en-US': 'en'
            }
            
        }
    }
    
    new_entry = client.entries(cnt_space, 'master').create(
        'metadataProd{0}'.format(index),
        entry_attributes
    )

    new_entry.publish() # with this command the entries are published, otherwise they are added just as drafts in content type
    

---
<a id='update'></a>
## Updating contentful metadata
In this part of the notebook we are going to identify new rows in the metadata_staging table and export the data to contentful as new entries. This way, we can update contentful every time new data is added to the hosted table. Bear in mind, though, that this only accounts for new rows/entries. If the content of existing rows is updated in the hosted table, those changes won't be automatically identified and changed in contentful. Changes related to existing entries will need to be made manually in Contentful. 

Part of this section will be used to create a hosted notebook in AGOL that can run periodically to integrate new changes.

In [12]:
# To start checking mismatches between metadata hosted table and contentful let's remove one entry in contentful content type
# content_type = client.content_types(cnt_space, 'master').find('metadataProd')
# entry = content_type.entries().find('metadataProd79')
# entry.unpublish()
# entry.delete()

In [7]:
# Check length of metadataProd content type in contentful
len(client2.entries({'content_type': 'metadataProd'}).items)

82

Because we are adding new entries in our metadataProd content type to include the translations, we will have more entries that expected: the original entries in English * the number of languages. To identify new metadata in the AGOL table and add it to contentful, we'll have a look only at the number of entries that are written in English in Contentful. If those are less than the number of rows in the AGOL table, that means that we have new metadata in AGOL that needs to be incorporated (and translated) in Contentful. 

In [8]:
# Check number of entries that have language "en" in metadataProd content type 
entries = client2.entries({'content_type': 'metadataProd'})
entries_en=[]
for entry in entries:
    if entry.language == 'en':
        en = entry.id
        entries_en.append(en)
len(entries_en)

80

In [11]:
# Bring metadata_staging and check number of rows in that table
metadata = getHTfromId('ef369a73779d4a37b2252808afef98a7')
len(metadata)

86

There are more rows in the hosted table than entries (in English) in contentful so we need to create new entries for the new metadata

In [12]:
# Let's give IDs to the rows in hosted table to match those in contentful
metadata['ID']= (range(0, len(metadata)))
metadata['ID']= metadata['ID'].astype(str)
metadata['ID2'] = ('metadataProd'+metadata['ID']).astype(str)
metadata.tail()

Unnamed: 0,layerSlug,description,source,molLogo,hasAdditionalContent,title,ObjectId,GlobalID,ObjectId2,ObjectId3,ID,ID2
81,mammals-rarity-1km,Each cell in this view measures 1 km x 1 km. W...,Map of Life and supporting datasets.,True,False,Mammals rarity metadata,,,82,82,81,metadataProd81
82,summer-birds-richness-1km,Breeding ranges of migratory birds. Each cell ...,Map of Life and supporting datasets.,True,False,Summer birds richness metadata,,,83,83,82,metadataProd82
83,summer-birds-rarity-1km,Breeding ranges of migratory birds. Each cell ...,Map of Life and supporting datasets.,True,False,Summer birds rarity metadata,,,84,84,83,metadataProd83
84,winter-birds-richness-1km,Non-breeding ranges of migratory birds. Each c...,Map of Life and supporting datasets.,True,False,Winter birds richness metadata,,,85,85,84,metadataProd84
85,winter-birds-rarity-1km,Non-breeding ranges of migratory birds. Each c...,Map of Life and supporting datasets.,True,False,Winter birds rarity metadata,,,86,86,85,metadataProd85


In [13]:
# Create a list with the ID2 values in hosted table
originals = []
for i in range(0, len(metadata)):
        l = metadata.iloc[i, 11]
        originals.append(l)

In [17]:
originals[0:5] # example of the values of the field ID2 

['metadataProd0',
 'metadataProd1',
 'metadataProd2',
 'metadataProd3',
 'metadataProd4']

In [18]:
entries_en[0:5] # example of the ID values for entries in English

['metadataProd79',
 'metadataProd78',
 'metadataProd77',
 'metadataProd76',
 'metadataProd75']

In [15]:
# Identify which IDs are in the hosted table but not in contentful
main_list = list(set(originals) - set(entries_en))
main_list # these are the rows that are included in hosted table but not in contentful

['metadataProd81',
 'metadataProd84',
 'metadataProd85',
 'metadataProd80',
 'metadataProd82',
 'metadataProd83']

In [19]:
# Create new dataframe with only the new rows
new_df = metadata[metadata['ID2'].isin(main_list)]
new_df


Unnamed: 0,layerSlug,description,source,molLogo,hasAdditionalContent,title,ObjectId,GlobalID,ObjectId2,ObjectId3,ID,ID2
80,mammals-richness-1km,Each cell in this view measures 1 km x 1 km. W...,Map of Life and supporting datasets.,True,False,Mammals richness metadata,,,81,81,80,metadataProd80
81,mammals-rarity-1km,Each cell in this view measures 1 km x 1 km. W...,Map of Life and supporting datasets.,True,False,Mammals rarity metadata,,,82,82,81,metadataProd81
82,summer-birds-richness-1km,Breeding ranges of migratory birds. Each cell ...,Map of Life and supporting datasets.,True,False,Summer birds richness metadata,,,83,83,82,metadataProd82
83,summer-birds-rarity-1km,Breeding ranges of migratory birds. Each cell ...,Map of Life and supporting datasets.,True,False,Summer birds rarity metadata,,,84,84,83,metadataProd83
84,winter-birds-richness-1km,Non-breeding ranges of migratory birds. Each c...,Map of Life and supporting datasets.,True,False,Winter birds richness metadata,,,85,85,84,metadataProd84
85,winter-birds-rarity-1km,Non-breeding ranges of migratory birds. Each c...,Map of Life and supporting datasets.,True,False,Winter birds rarity metadata,,,86,86,85,metadataProd85


In [21]:
# Export new data in hosted table (new_df) to contentful

if len(metadata) != len(entries_en):
    print("there is new metadata")
    
for index, row in new_df.iterrows():

    entry_attributes = {
        'content_type_id': 'metadataProd',
        'fields': {
            'layerSlug': {
                'en-US': row["layerSlug"]
            },
            'description': {
                'en-US': row['description']
            },
            'source':{
                'en-US': row['source']
            },
            'molLogo':{
                'en-US': row['molLogo']
            },
            'hasAdditionalContent':{
                'en-US': row['hasAdditionalContent']
            },
            'title':{
                'en-US': row['title']
            },
            'globalId':{
                'en-US': row['GlobalID']
            },
            'objectId3':{
                'en-US': row['ObjectId3']
            },
            'language':{
                'en-US': 'en'
            }
            
        }
    }
    
    new_entry = client.entries(cnt_space, 'master').create(
        'metadataProd{0}'.format(index),
        entry_attributes
    )
    new_entry.publish()


there is new metadata


----
<a id='delete'></a>
## Delete all entries and content type
Although it is not very likely that you need to remove a content type, know that to do so you first need to unpublish and delete all entries. In this part of the notebook we are going to provide an example of how to remove a content type we created for testing purposes, but we discourage its use unless you are very sure of what you want to do. 

In [6]:
# Let's reove the tests content type created above
content_type = client.content_types(cnt_space, 'master').find('metadataProd')
entries = client.entries(cnt_space, 'master').all()
entries_for_content_type = content_type.entries().all()

In [7]:
# Create an array with all entries
a = np.array(entries_for_content_type.items)

In [8]:
# All entries in content type must be set to "draft" before being able to delete them
for i in a:
    entry = i
    entry.unpublish()

In [9]:
# Archive and delete all entries in content type
for i in a:
    entry = i
    entry.archive()
    entry.delete()

In [72]:
# Unpublish and delete content type (content type needs to be empty to delete it)
content_type = client.content_types(cnt_space, 'master').find('metadataTest')
content_type.unpublish()
content_type.delete()

<Response [204]>