<a href="https://colab.research.google.com/github/datakind/hxl-metadata-prediction/blob/main/generate-test-train-data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This notebook downloads and prepares training data for predicing Humanitarian Exchange Language (HXL) metadata tags on humanitarian datasets on the Humanitarian Data Exchange (HDX) platform.

This notebook downloads data provided by the HDX team from a google drive folder. The data was captured using an [HXL crawl process](https://github.com/HXLStandard/hdx-hashtag-crawler). It also downloads the [HXL core schema](https://data.humdata.org/dataset/hxl-core-schemas) which defines supported tags and attributes. Finally, the data is split into test and training, being careful to ensure data produced by similar processes and organizations is not present in both test and train.  



# Setup

See [README](README.md) from instructions.

In [None]:
#!pip install gdown==5.2.0
#!pip install pandas==2.2.2
#!pip install hdx-python-api==6.3.1
#!pip install openai==1.35.3

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [18]:
import sys
import os
import requests
import gdown
import tarfile
import pandas as pd
import re
from sklearn.model_selection import train_test_split

from hdx.utilities.easy_logging import setup_logging
from hdx.api.configuration import Configuration
from hdx.data.dataset import Dataset
import hxl
import json
import time
from openai import OpenAI
import numpy as np

if os.getenv("OPENAI_API_KEY") is None:
  from google.colab import userdata
  OPENAI_API_KEY =  userdata.get('OPENAI_API_KEY')
else:
  OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

client = OpenAI(
    api_key=OPENAI_API_KEY
)

# If using Colab, this is where Google drive gets mounted. Otherwise leave blank
GOOGLE_BASE_DIR = "/content/drive/MyDrive/Colab"

# Where to save local data files
LOCAL_DATA_DIR = f"{GOOGLE_BASE_DIR}/hxl-metadata-prediction/data/"

# Google drive location of HDXHashtag crawler data. Shared with HDX team
HXL_CRAWLER_DATA_GDRIVE="https://drive.google.com/file/d/1BDCuh0WVJWK1-1RMC-77cvh4H2Hep_ry/export?format=xlsx"
HXL_CRAWLER_DATA_FILE= LOCAL_DATA_DIR + "/hdx-hxl-output.tgz"

# This is the HXL schema sheet, search HDX to get this link
HXL_SCHEMA_RESOURCE_URL = "https://docs.google.com/spreadsheets/d/1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI/export?format=xlsx"
HXL_SCHEMA_LOCAL_FILE = LOCAL_DATA_DIR + "/hxl-core-schema.xlsx"

# Number of records in data excerpts
DATA_EXCERPT_SIZE = 10

# Data Summary LLM, used to summarize tabular data for use in prompts
#DATA_SUMMARY_LLM = "gpt-4o-mini"
DATA_SUMMARY_LLM = "gpt-3.5-turbo"

# Set this to a name of your choosing
HDX_USER_AGENT = "hxl-metadata-prediction"

input_options = hxl.input.InputOptions(http_headers={'User-Agent': HDX_USER_AGENT})

pd.set_option('display.max_colwidth', 200)
pd.set_option('display.max_rows', 200)

## Get HDX connection

We need a connection to HDX to extract excerpts of data for prompts.

In [13]:
def setup_hdx_connection(agent_name):
    try:
        Configuration.create(hdx_site="prod", user_agent=agent_name, hdx_read_only=True)
    except:
        print("Configuration already created, continuing ...")

# Note, if you run this twice you will get a 'Configuration already exists!' error, but it can be ignored
setup_hdx_connection(HDX_USER_AGENT)

Configuration already created, continuing ...


## Download HXL Core schema

We will download the original google sheet defining the HXL core schema, as found on [HDX](https://data.humdata.org/dataset/hxl-core-schemas).





In [19]:
gdown.download(HXL_SCHEMA_RESOURCE_URL, HXL_SCHEMA_LOCAL_FILE, quiet=False, fuzzy=True)

response = requests.get(HXL_SCHEMA_RESOURCE_URL)
with open(HXL_SCHEMA_LOCAL_FILE, 'wb') as f:
    f.write(response.content)

df= pd.read_excel(HXL_SCHEMA_LOCAL_FILE, sheet_name='Core hashtags')
hashtags_list = df['Hashtag'][1:].tolist()

df= pd.read_excel(HXL_SCHEMA_LOCAL_FILE, sheet_name='Core attributes')
attributes_list = df['Attribute'][1:].tolist()

# Remove rows with disallowed tags or attributes
APPROVED_HXL_SCHEMA = hashtags_list + attributes_list

print("Approved HXL schema ...")
print(APPROVED_HXL_SCHEMA)

Downloading...
From: https://docs.google.com/spreadsheets/d/1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI/export?format=xlsx
To: /content/drive/MyDrive/Colab/hxl-metadata-prediction/data/hxl-core-schema.xlsx
240kB [00:00, 581kB/s]


Approved HXL schema ...
['#access', '#activity', '#adm1', '#adm2', '#adm3', '#adm4', '#adm5', '#affected', '#beneficiary', '#capacity', '#cause', '#channel', '#contact', '#country', '#crisis', '#currency', '#date', '#delivery', '#description', '#event', '#frequency', '#geo', '#group', '#impact', '#indicator', '#inneed', '#item', '#loc', '#meta', '#modality', '#need', '#operations', '#org', '#output', '#population', '#reached', '#region', '#respondee', '#sector', '#service', '#severity', '#status', '#subsector', '#targeted', '#value', '+abducted', '+acronym', '+activity', '+adolescents', '+adults', '+approved', '+ar', '+bounds', '+budget', '+canceled', '+children', '+cluster', '+code', '+converted', '+coord', '+dest', '+displaced', '+elderly', '+elevation', '+email', '+en', '+end', '+es', '+f', '+fa', '+fr', '+funder', '+hh', '+i', '+id', '+idps', '+impl', '+incamp', '+ind', '+infants', '+infected', '+injured', '+killed', '+label', '+lat', '+lon', '+m', '+ms', '+name', '+noncamp', '+num

# Analysis

## Download HXL crawler data

This data was generated using the [HDX Hashtag Crawler](https://github.com/dividor/hdx-hashtag-crawler) over several days and saved to Google drive.

In [21]:
gdown.download(HXL_CRAWLER_DATA_GDRIVE, HXL_CRAWLER_DATA_FILE, quiet=False, fuzzy=True)
print(f"HXL core schema saved to {HXL_CRAWLER_DATA_FILE}")

Downloading...
From: https://drive.google.com/file/d/1BDCuh0WVJWK1-1RMC-77cvh4H2Hep_ry/export?format=xlsx
To: /content/drive/MyDrive/Colab/hxl-metadata-prediction/data/hdx-hxl-output.tgz
3.06kB [00:00, 3.60MB/s]

HXL core schema saved to /content/drive/MyDrive/Colab/hxl-metadata-prediction/data//hdx-hxl-output.tgz





## Identify Unique combinations of HXL tags we want for training data

The HXL crawler and reports data create a hash based on column names, referred to as a 'Resource pattern' by the HDX team. This is useful because a lot of HDX resources (tables) are generated by automatic pipeline, so rather than duplicating very similar data in our training set, we will focus on unique combinations of column headers. This is an indirect way of balancing the data.

In [29]:
df = pd.read_csv(LOCAL_DATA_DIR + "/output/hdx-expanded-hashed-stats.csv")

# We'll keep one row per column, Hashtag with Attributes has what we require
df.drop(columns=['Attribute', 'Hashtag'], inplace=True)
df.drop_duplicates(inplace=True)

# Remove HXL tags row in the metadata (we keep them for actual data)
df = df[1:]

print("Number of rows in data ...")
print(df.shape)

print("Unique data providers ...")
print(len(df["Data provider"].unique()))

print("Unique HDX resource ids ...")
print(len(df["HDX resource id"].unique()))

display(df.head())

  df = pd.read_csv(LOCAL_DATA_DIR + "/output/hdx-expanded-hashed-stats.csv")


Number of rows in data ...
(487297, 9)
Unique data providers ...
120
Unique HDX resource ids ...
43074


Unnamed: 0,Hashtag with Attributes,Text header,Locations,Data provider,HDX dataset id,HDX resource id,Date created,Unnamed: 9,Hash
1,#affected+hh,Total IDP HH,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c
2,#affected+idp+ind,Total IDP IND,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c
4,#affected+idp+male,Total IDP Male Ind,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c
6,#affected+female+idp,Total IDP Female Ind,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c
8,#affected+ind+returnees,Total Returnees,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c


Let's use the column hash created by the crawler to find unique combinations of tags

In [30]:
hash_count = df.groupby('Hash').size()
display(hash_count)


Hash
0x100556db35012c6b       101
0x102125f4dd16c64     190286
0x10309a2e5e2722ba       509
0x105b36aac3c9192f       693
0x105c6ee3379af31c       595
                       ...  
0xf0a7e4d9104f069         12
0xf93f0051e52a4d          28
0xfa11ad9f842a37d         17
0xfe0e278de9d0a33          1
0xfe8777dcd878424         20
Length: 644, dtype: int64

In [31]:
hash_resources = df.groupby('Hash')['HDX resource id'].nunique().sort_values(ascending=False)

for col in ['HDX resource id', 'HDX dataset id', 'Data provider']:
    hash_resources  = hash_resources .reset_index()
    hash_resources [f"Unique {col}"] = hash_resources ['Hash'].map(df.groupby('Hash')[col].unique())

display(hash_resources )

hash_resources.to_excel(f"{LOCAL_DATA_DIR}/hxl_hash_resources.xlsx", index=False)

Unnamed: 0,level_0,index,Hash,HDX resource id,Unique HDX resource id,Unique HDX dataset id,Unique Data provider
0,0,0,0x102125f4dd16c64,13858,"[51b2e4ec-aca5-4b97-bbb7-c005175b682e, a6ef8040-3b15-47ae-9973-1dbc113673cf, f579cf0e-5535-4414-897f-2f8c05105180, 66c62464-017b-4aa6-845f-9ec2487acb82, 91e1cb98-353b-487e-a14d-b0eea783da6f, 65f38...","[who-data-for-south-sudan, who-data-for-montenegro, who-data-for-zimbabwe, who-data-for-zambia, who-data-for-yemen, who-data-for-viet-nam, who-data-for-venezuela-bolivarian-republic-of, who-data-f...",[world-health-organization]
1,1,1,0x428a8e37940223d8,9328,"[2e130cdf-c850-4533-b2f3-e961adbec48a, 9e160d82-691d-49a6-979b-0ff0dbb6b7a8, 002501bc-7efb-4335-b672-d045cd76bc5b, 0d0e0fc4-e4f1-49cd-ad30-42cc8dc08b74, f2c1ea93-d241-413b-8140-c009da88d912, 23e36...","[world-bank-combined-indicators-for-zimbabwe, world-bank-trade-indicators-for-zimbabwe, world-bank-external-debt-indicators-for-zimbabwe, world-bank-climate-change-indicators-for-zimbabwe, world-b...",[world-bank-group]
2,2,2,0x31fda1ef985b4a59,3425,"[9c751883-698a-4a2c-9475-ff828e9c11db, 791b69af-df57-4157-96c0-c0d8d308315e, b055f1f7-8cdc-4ff8-a412-9f54d6e56c41, 6ae8568f-449b-4b59-9ad2-79d7df94cd9c, 4f4c7462-6b13-4125-a88a-f8e9ac0c837b, abb79...","[dhs-data-for-sao-tome-and-principe, dhs-data-for-rwanda, dhs-data-for-philippines, dhs-data-for-peru, dhs-data-for-paraguay, dhs-data-for-papua-new-guinea, dhs-data-for-pakistan, dhs-data-for-nig...",[dhs]
3,3,3,0x19598575d3397e19,3232,"[ed7b5bd2-7818-4d7a-9ff0-8ba0d97bf7d5, 8f239e93-76c0-4287-a414-3d17a5e55344, 040efd19-3d71-4e0b-8939-f6b46c465868, 8db41d1c-af8f-4f26-9d40-5b80b1bf72e8, 6482dc75-0e79-40d9-aacf-cdcde3c368a6, c9981...","[dhs-subnational-data-for-sao-tome-and-principe, dhs-subnational-data-for-rwanda, dhs-subnational-data-for-philippines, dhs-subnational-data-for-peru, dhs-subnational-data-for-paraguay, dhs-subnat...",[dhs]
4,4,4,0x16d2b679132fea10,2147,"[295cd9e4-8464-43ee-ad17-47196991a1f7, 34337d16-017d-4d69-834c-a5e0fc21a549, b51b8c0e-494d-488b-98d5-a70fd9451b90, 75076d6a-8d3f-49e3-b4f4-7c889bc82806, 7e0b2b37-73b8-4c69-bb11-6ba954fa0cd9, 08ea0...","[unhcr-population-data-for-world, unhcr-population-data-for-zwe, unhcr-population-data-for-zmb, unhcr-population-data-for-zaf, unhcr-population-data-for-yem, unhcr-population-data-for-wsm, unhcr-p...",[unhcr]
...,...,...,...,...,...,...,...
639,639,639,0x22def8e1b7b0c742,1,[68328f42-9276-423e-80d0-fe89630804ff],[3w-december-2017],[ocha-ethiopia]
640,640,640,0x4b7012601f2de402,1,[63143067-46e0-4fb5-b131-d83ee45122ab],[ethiopia-settlements],[ocha-ethiopia]
641,641,641,0x4aefd35864aaa20a,1,[6ddcfbc9-fa06-4b14-b9a4-ce96d3fae65e],[base-acceso-internet-personas-entre-los-5-y-19-anos-2018],[immap]
642,642,642,0x4acf4e36d67d877,1,[903326f2-b372-4786-9973-87226cb15e41],[people-in-need-2008-2019],[ocha-fts]


In [32]:
# Extract a single resource_id for each hash
resource_ids = hash_resources['Unique HDX resource id'].apply(lambda x: x[0])
print(resource_ids)

0      51b2e4ec-aca5-4b97-bbb7-c005175b682e
1      2e130cdf-c850-4533-b2f3-e961adbec48a
2      9c751883-698a-4a2c-9475-ff828e9c11db
3      ed7b5bd2-7818-4d7a-9ff0-8ba0d97bf7d5
4      295cd9e4-8464-43ee-ad17-47196991a1f7
                       ...                 
639    68328f42-9276-423e-80d0-fe89630804ff
640    63143067-46e0-4fb5-b131-d83ee45122ab
641    6ddcfbc9-fa06-4b14-b9a4-ce96d3fae65e
642    903326f2-b372-4786-9973-87226cb15e41
643    8b0feea6-20e4-45dc-aaae-c8b6fbd5a9f4
Name: Unique HDX resource id, Length: 644, dtype: object


In [33]:
df_subset = df[df['HDX resource id'].isin(resource_ids)]

print("Column data subset to one resource ID per hash ...")
print(df_subset.shape)

print("Unique data providers ...")
print(len(df_subset["Data provider"].unique()))

print("Unique HDX resource ids ...")
print(len(df_subset["HDX resource id"].unique()))

Column data subset to one resource ID per hash ...
(7834, 9)
Unique data providers ...
119
Unique HDX resource ids ...
644


## Remove unsupported HXL tags and attributes

Next, we will remove HXL tags and attributes which are not officially supported in the HXL Core schema.

In [34]:
def filter_for_schema(text):
    #print(f"Tokens before: {text}")
    if " " in text:
        text = text.replace(" ","")

    tokens_raw = text.split("+")
    tokens = [tokens_raw[0]]
    for t in tokens_raw[1:]:
        tokens.append(f"+{t}")

    filtered = []
    for t in tokens:
        if t in APPROVED_HXL_SCHEMA:
            if t not in filtered:
                filtered.append(t)
    filtered = "".join(filtered)

    if len(filtered) > 0 and filtered[0] != '#':
        filtered = ""

    return filtered

def filter_disallowed_hxl(column_data, hxl_col = 'Hashtag with Attributes'):
    print("Before",column_data.shape)
    allowed = []
    disallowed = []
    for index, row in column_data.iterrows():
        if row[hxl_col] == filter_for_schema(row[hxl_col]):
            allowed.append(row)
        else:
            disallowed.append(row)
    allowed = pd.DataFrame(allowed)
    disallowed = pd.DataFrame(disallowed)
    print("After", allowed.shape)
    return allowed, disallowed

data, disallowed = filter_disallowed_hxl(df_subset)
print(data.shape)

display(disallowed)



Before (7834, 9)
After (3777, 9)
(3777, 9)


Unnamed: 0,Hashtag with Attributes,Text header,Locations,Data provider,HDX dataset id,HDX resource id,Date created,Unnamed: 9,Hash
2,#affected+idp+ind,Total IDP IND,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,true,0x2cc7fd3129c0d18c
4,#affected+idp+male,Total IDP Male Ind,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,true,0x2cc7fd3129c0d18c
6,#affected+female+idp,Total IDP Female Ind,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,true,0x2cc7fd3129c0d18c
256,#meta+appeal+type,atype,SSD,ifrc,ifrc-appeals-data-for-south-sudan,4110b824-3338-453f-ae5d-89ca80f5b147,2023-03-13,true,0x1d1434ee319a1be
260,#meta+appeal+id,code,SSD,ifrc,ifrc-appeals-data-for-south-sudan,4110b824-3338-453f-ae5d-89ca80f5b147,2023-03-13,true,0x1d1434ee319a1be
...,...,...,...,...,...,...,...,...,...
689570,#lat_deg,prevlat,VUT,brcmapsteam,cyclone-pam-path,a8ccd9d2-8328-487a-b04b-ca3f3f2e0ea3,2015-03-16,True,0x1d4a8deeb40f76ce
689571,#lon_deg,prevlon,VUT,brcmapsteam,cyclone-pam-path,a8ccd9d2-8328-487a-b04b-ca3f3f2e0ea3,2015-03-16,True,0x1d4a8deeb40f76ce
689572,#period_date,datelabel,VUT,brcmapsteam,cyclone-pam-path,a8ccd9d2-8328-487a-b04b-ca3f3f2e0ea3,2015-03-16,True,0x1d4a8deeb40f76ce
689573,#x_time,hours,VUT,brcmapsteam,cyclone-pam-path,a8ccd9d2-8328-487a-b04b-ca3f3f2e0ea3,2015-03-16,True,0x1d4a8deeb40f76ce


In [35]:
disallowed_sample = disallowed.sample(20)
display(disallowed_sample['Hashtag with Attributes'])

357128                              #meta+healthcare
625185    #indicator+armed_entry+health_facility+num
677347                             #meta+site+status
637143                         #indicador+ponderator
683127                                   #date+month
687071                       #indicator+offgrid+type
684270                      #reached+displaced+girls
356686                                    #value+chf
653484                         #activity+number+year
1242                             #date+end+impl+year
96166               #affected+assaulted+healthworker
519333                                  #inneed+gaza
469067                                  #reached+new
515426                            #sector+gbv+inneed
688397              #population+age0_4+male+refugees
625175                                   #group+perp
667704                              #meta+confidence
626952                               #adm1+i_en+name
689082                                #affecte

## Download data excerpts

Using our subset of resource_ids for each hash, extract column data excerpts.

In [42]:
def populate_data_excerpts(df_in):

  df = df_in.copy()

  df['Data excerpt'] = ''

  datasets_resources = df[['HDX dataset id', 'HDX resource id']].drop_duplicates()
  datasets_resources.reset_index(drop=True, inplace=True)

  print("For each resource, extract a data excerpt for each column ...")

  num_rows = datasets_resources.shape[0]
  for index, row in datasets_resources.iterrows():

      if index % 10 == 0:
          print(f"Processing {index} of {num_rows} ({(index/num_rows)*100:.2f}%) resources")

      dataset_id = row['HDX dataset id']
      resource_id = row['HDX resource id']
      try:
        dataset = Dataset.read_from_hdx(dataset_id)
        if dataset is None:
            print(f"Dataset {dataset_id} not found!")
            continue
      except Exception as e:
        print(f"Error reading dataset {dataset_id} ... {e}")
        continue

      resources = dataset.get_resources()
      for resource in resources:
          if resource['id'] == resource_id:
              print(f"    Accessing data for resource {resource_id}, {resource['name']}")
              try:
                  url, path = resource.download(LOCAL_DATA_DIR)
                  df.loc[df['HDX resource id'] == resource_id, 'File'] = path
                  df.loc[df['HDX resource id'] == resource_id, 'URL'] = url

                  with hxl.data(resource['url'], input_options) as source:
                      columns = [column.header for column in source.columns]
                      tags = [column.get_display_tag(sort_attributes=True) for column in source.columns]
                      data = {}
                      rowcount = 0
                      for row in source:
                          if rowcount > DATA_EXCERPT_SIZE:
                              break
                          i = 0
                          for colvalue in row:
                              colname = columns[i]
                              if colname not in data:
                                  data[colname] = [colvalue]
                              else:
                                  data[colname].append(colvalue)
                              i += 1
                          rowcount += 1

                      for col in columns:
                          if col in data:
                              #print(f"       Setting data excerpt for column {col} >> {data[col]} ...")
                              df.loc[(df['HDX resource id'] == resource_id) & (df['Text header'] == col), 'Data excerpt'] = str(data[col])

              except Exception as e:
                  print(f"Error accessing data for resource {resource_id}, {resource['name']} ... {e}")

  return df

df = data.copy()
df = populate_data_excerpts(df)
display(df)

df.to_csv(f"{LOCAL_DATA_DIR}/hxl_hash_resources_data.csv", index=False)

print(data.shape)
print(df.shape)

data = df.copy()


For each resource, extract a data excerpt for each column ...
Processing 0 of 612 (0.00%) resources
    Accessing data for resource 26ecc26f-74e7-46af-b450-8872dca0b63b, DRC - Baseline Assessment - M23 Crisis 13 - February 2024
    Accessing data for resource dbf9b4bd-1321-4846-b6f0-4654509d3626, admin1-summaries-earthquake.csv
    Accessing data for resource 4110b824-3338-453f-ae5d-89ca80f5b147, IFRC Appeals Data for South Sudan
    Accessing data for resource b4e5634e-42e0-4ca5-8893-8e15cad6b620, fts_requirements_funding_pse.csv
    Accessing data for resource 4854ed70-56fb-4792-8845-bdeb51418c66, fts_incoming_funding_vnm.csv
    Accessing data for resource 7b373ee7-af33-4140-92a0-83d2693eb74c, CERF Donor Contributions (HXLated).csv
    Accessing data for resource 636c8890-8747-442b-87c9-895133314dcb, idmc event data for WSM
    Accessing data for resource 3a53c43d-83bc-46e7-b66a-38ccca4b536c, AWSD_SD_security_incidents.csv
    Accessing data for resource 0908c976-8b62-4a7e-9a38-9750

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:10:40 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=projectid
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: projectid
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:10:40 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=projectid
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: projectid


    Accessing data for resource 697c656a-684c-4ae1-9849-0db4d6b732ad, DTM Yemen Displacement Tracking  11 - 17 February 2024
    Accessing data for resource 2ca8e86c-461c-482f-ac5e-9cdb8871a4ad, Lebanon - IDP Tracking Dataset - ROUND 24- 15-02-2024 - HDX.xlsx
    Accessing data for resource c5ce40d6-07b1-4f36-955a-d6196436ff6b, EMDAT-country-profiles_2024_07_29.xlsx
    Accessing data for resource a1a73231-7a96-4501-9675-97cddf3c3893, Suite of Food Security Indicators for Zimbabwe
    Accessing data for resource 26e65696-95c3-4eac-8d18-c150526e34e0, Deflators data for Zimbabwe
    Accessing data for resource b81e33f7-8942-4113-b675-8fbffc8a714f, Exchange rates data for Holy See
    Accessing data for resource 6f659727-e3d8-4eef-bc2b-a3a836b65243, wpdx_water_points_hti
Processing 30 of 612 (4.90%) resources
    Accessing data for resource 6926dff7-658a-49e1-8d61-0ed8a983fbe1, ipc_global_national_long_latest.csv
    Accessing data for resource a6f9f9b8-d265-4099-9ead-5e65fd26c074, Mozamb

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:11:54 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=name, #reporting+or+implementing+org?
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: name, #reporting+or+implementing+org?
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:11:54 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=name, #reporting+or+implementing+org?
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: name, #reporting+or+implementing+org?


    Accessing data for resource e77f3400-5d43-4a46-81a9-bf86e8958141, InterAction Member Data for Zimbabwe


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:11:59 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags='activtiy+working_group
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 'activtiy+working_group
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:11:59 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags='activtiy+working_group
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 'activtiy+working_group


    Accessing data for resource e58625dc-946c-47aa-a086-28d30cf4fe58, Nouvelles admissions CRENAS, CRENI et CRENAM Sep 2023
    Accessing data for resource 10a1749b-1484-496e-b505-b6a34a40ec4a, OCHA_SOM_Operational_Presence_3W_data_Oct2022.xlsx
    Accessing data for resource d7c53285-fc22-4468-82fa-da7f087b60ca, 2023 Consolidated 3W data April to 31 Dec_hxl
    Accessing data for resource 1bb1fff2-1a10-4024-ade6-d4a0dd499d8d, ECB_FX_USD-quote.csv
    Accessing data for resource 15b313fb-9852-468f-a1eb-a83323338cd8, zimbabwe-healthsites-csv-with-hxl-tags


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:13:02 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=x, y, osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: x, y, osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:13:02 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=x, y, osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: x, y, osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp


    Accessing data for resource ca29f3a5-1e8e-4b57-bbb7-3e3469059845, zambia-healthsites-csv-with-hxl-tags


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:13:09 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=x, y, osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp, part_time_beds
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: x, y, osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp, part_time_beds
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:13:09 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=x, y, osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp, part_time_beds
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: x, y, osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, chang

Processing 50 of 612 (8.17%) resources
    Accessing data for resource 08f68c8c-2e64-47e6-8566-9663e8c2da28, tokelau-healthsites-csv-with-hxl-tags


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:13:14 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:13:14 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: osm_id, osm_type, completeness, addr_housenumber, addr_street, addr_postcode, addr_city, changeset_id, changeset_version, changeset_timestamp


    Accessing data for resource 63390977-a035-4aa6-be1c-3e520b4cbb3e, Human Development Indicators for Zimbabwe
    Accessing data for resource 62c167b6-2e99-494a-9cac-1630ef1b1369, Human Development Indicators for Somalia
    Accessing data for resource 227b85c5-8e3b-4815-af04-bb0e66e183a7, yem_pin_2024.xlsx
    Accessing data for resource dd12a3c8-b9e7-4b80-9835-f982dd0a4f76, afghanistan-natural-disaster-incidents-from-january-to-november-2023.xlsx
    Accessing data for resource bc1f0808-636a-4aed-a4c1-7af540bd4d4f, Conflict Data for Zimbabwe
    Accessing data for resource 3c4eb1db-3e72-4b85-a1d5-f0c7ffbf4582, DTM Chad Site and Village Assessment Round 21
    Accessing data for resource d743b4b4-35d7-4e4d-80b9-18807b5df0b8, DTM Burundi_21_27_Janvier_2024 Emergency Event Tracking.xlsx
Dataset mozambique-3ws-2023-jan-nov not found!
    Accessing data for resource ebf41b1a-5b24-422d-be52-77e0c4c7a704, List of airports in Myanmar (HXL tags)
Processing 60 of 612 (9.80%) resources
    Ac

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:14:31 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=adm1 +code
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: adm1 +code
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:14:31 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=adm1 +code
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: adm1 +code


    Accessing data for resource 19440fa1-7bc7-4cf7-a4c3-afaef150df52, DTM Haiti - Artibonite, Centre , Grande Anse, Nippes, South, South-East and West  (December 2023) - Baseline Assessment - Round 5.1
    Accessing data for resource 49cb8fb5-bd40-4cc8-be58-330bc6c2585f, Uganda - Multi-Hazard Response/DRR Platform (December 2023)
Error accessing data for resource 49cb8fb5-bd40-4cc8-be58-330bc6c2585f, Uganda - Multi-Hazard Response/DRR Platform (December 2023) ... Download of https://data.humdata.org/dataset/acc482ab-7ef8-4135-aa20-e75e110ebf9e/resource/49cb8fb5-bd40-4cc8-be58-330bc6c2585f/download/dec-2023-uganda-multihazard-dataset_public-hdx.xlsx failed in retrieval of stream!
    Accessing data for resource 295cd9e4-8464-43ee-ad17-47196991a1f7, Demographics and locations of forcibly displaced and stateless persons (Global)
Processing 70 of 612 (11.44%) resources
    Accessing data for resource d3f24b62-82d6-4b12-97d1-66344615db94, Ethiopia Who is Doing What Where (4W) - December 202

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:15:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=lat2, long
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: lat2, long
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:15:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=lat2, long
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: lat2, long


    Accessing data for resource a10ae1ac-533f-4b0a-bc5a-d04bad3ef132, UKR 2024 HNRP PiN Severity Targets Activities 20240119.xlsx
    Accessing data for resource e1953bbd-432e-4449-b417-dae7487663d0, DTM Cameroon Baseline Assessment Round 27


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:15:28 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=15
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 15
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:15:28 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=15
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 15


    Accessing data for resource ec934f6d-ceca-4f34-acd2-5d567fd88a84, DTM Mali Baseline Assessment Round 78
    Accessing data for resource 47902ccd-7d9d-4bfd-80b4-c3e4f5c38e20, Round 31 — Area Baseline Assessment (Raion level)


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:15:41 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=adm2+name+eng
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: adm2+name+eng
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:15:41 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=adm2+name+eng
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: adm2+name+eng


    Accessing data for resource 193408bd-1016-47c4-9560-7015cd938aac, Afghanistan COVID-19 Stats
    Accessing data for resource fae834e9-f3cc-42db-a964-7f4c127e5019, HTI_HNO_2024.xlsx
    Accessing data for resource 0943c5b5-06b3-43ff-8574-6fc5aae7c459, access-classification-september-2023_hxl
    Accessing data for resource ad7efeab-75f1-4039-a643-308cf86aefc5, CAR_Malnutrition


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:16:04 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+mam_under_6-59months, #indicator+mas_under_6-59months, #indicator+mcg_6-59 mois
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+mam_under_6-59months, #indicator+mas_under_6-59months, #indicator+mcg_6-59 mois
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:16:04 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+mam_under_6-59months, #indicator+mas_under_6-59months, #indicator+mcg_6-59 mois
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+mam_under_6-59months, #indicator+mas_under_6-59months, #indicator+mcg_6-59 mois


    Accessing data for resource 6157fadd-d04d-4747-88cd-77749628731b, COL_345W_Jan-Jun2021.xlsx
Processing 80 of 612 (13.07%) resources
    Accessing data for resource 16251491-9bc1-461a-9c45-110d3d3245ca, tcd_hpc2023_rev_pin-cible_20231006.xlsx
    Accessing data for resource e9ce2fb4-3527-4576-8aec-8be688a18db0, nigeria-acute-malnutrition-sam-and-mam-oct-2023.xlsx
    Accessing data for resource 09916f6a-d52a-46a7-91aa-35e7d593acf2, Myanmar_HPC_2024
    Accessing data for resource 88eda02e-bdf0-4fa8-b914-f6113d22b206, TCD_VIZ_Nombre_Retournés_Niveau_Admin2_20231217.xlsx
    Accessing data for resource 81a02eaf-f99f-43f4-b484-60e8494df81d, 2024_humanitarian_profile_10102023.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:16:39 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#inneed+host community
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #inneed+host community
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:16:39 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#inneed+host community
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #inneed+host community


    Accessing data for resource 9e82ad7e-470a-468a-a1fa-ab1bbc5ba80e, afg_hno_pin_2024.xlsx
    Accessing data for resource c9836eb9-85be-4051-b638-f0a23c5a7ebb, south-sudan-humanitarian-needs-and-response-plan-2024_hxl.csv
    Accessing data for resource 1a3206f3-2ac6-49e0-a70c-04206f651e92, mli-hno-2024


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:16:56 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#inneed+m+age>60, #inneed+f+age>60, #targeted+m+age>60, #targeted+f+age>60
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #inneed+m+age>60, #inneed+f+age>60, #targeted+m+age>60, #targeted+f+age>60
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:16:56 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#inneed+m+age>60, #inneed+f+age>60, #targeted+m+age>60, #targeted+f+age>60
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #inneed+m+age>60, #inneed+f+age>60, #targeted+m+age>60, #targeted+f+age>60


    Accessing data for resource 2c828f0f-565f-4f9d-b2ea-220126f49602, sdn_hnrp_2024_122123_hxl.csv
    Accessing data for resource 605ddbac-37d7-47c1-84a4-a863a4023f83, UNHCR_UKR_dataset
Processing 90 of 612 (14.71%) resources
    Accessing data for resource ca1d748a-63a8-48d9-8503-e0a85516be44, cod_hno_2024
    Accessing data for resource 793e66fe-4cdb-4076-b037-fb8c053239e2, global_pcodes.csv
    Accessing data for resource 237e1704-29c9-4a8f-9f91-9e3dac624d7e, 3W_All_Clusters_March_2022
    Accessing data for resource 45331aaa-e651-4a16-a8a0-92a51b3734c0, OxCGRT_CSV
    Accessing data for resource 38807c10-84a7-4aec-8a00-a4e0e052a553, haiti-healthsites_hdx
    Accessing data for resource 86f92eb8-504e-4685-bcc6-85121153c60e, DTM Somalia Baseline Assessment Round 2
    Accessing data for resource b9718d0d-4c2f-48ee-a283-350b97e892c5, somalia-2023-post-gu-acute-malnutrition-burden-and-prevalence-by-district-21-sep-2023.xlsx
    Accessing data for resource 796d90e7-23b6-499b-a16f-b20bc

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:22:50 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=sal
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: sal
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:22:50 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=sal
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: sal


Error accessing data for resource e1129533-f1b8-4706-a79e-778ebaeedc88, VEN_5W_Jan-Oct2023.xlsx ... list index out of range
    Accessing data for resource d394406d-7d81-4821-a4bc-cae55119daab, 3W_TCD_Nov2023
    Accessing data for resource 09cef61d-b5fc-42c6-9cf5-99371e938b40, BFA_Innindation_2023
    Accessing data for resource 79abb4c0-ef91-43c0-bc2c-7a54179a6eb2, YEM_4W_Jan-Sept-2023.xlsx
    Accessing data for resource bfc5a2d0-23e7-46e7-bde4-1a3a54f45d8e, BFA_HNO_2024
Dataset mozambique_-humanitarian-needs not found!
Processing 110 of 612 (17.97%) resources
    Accessing data for resource 2114c854-d95c-4974-9ff0-2d59cd1d49fa, 2022 IRN SHCC Health Care Data.xlsx
    Accessing data for resource 28be64d3-adaf-4f61-887c-87a8b5d9c625, GHO 2024 Section 3.xlsx
    Accessing data for resource 501b929d-012b-4557-b5bf-2003fd4e3661, DTM Kenya — BA/MSLA - Samburu County — Round 1
Error accessing data for resource 501b929d-012b-4557-b5bf-2003fd4e3661, DTM Kenya — BA/MSLA - Samburu County — Ro

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:33:50 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=* estimates in donetska, zaporizka, luhanska and khersonska oblasts (blue text) are likely under-represented due to limited coverage of government-controlled areas only, as well as the limited number of respondents reached through the random digit dial. the estimation for luhanska is taken into account only within the total population estimation.
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: * estimates in donetska, zaporizka, luhanska and khersonska oblasts (blue text) are likely under-represented due to limited coverage of government-controlled areas only, as well as the limited number of respondents reached through the random digit dial. the estimation for luhanska is taken into account only within the total population estimation.
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:33:50 [error    ] Skipping column(s) with malformed hashtag specs f

    Accessing data for resource 9026e57c-ad7b-4786-8841-b36264097f6c, 3W_Haiti_HDX_20230511
Processing 150 of 612 (24.51%) resources
    Accessing data for resource 9d396e61-e708-48ab-b51e-f50579fa169a, BDI_HNO_2023
    Accessing data for resource 6c15e318-635a-4de8-9b8c-58c7c4e8af97, DTM DRC BA Ituri Aug2023 R10


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:34:31 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+ind+idp+
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+ind+idp+
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:34:31 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+ind+idp+
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+ind+idp+


    Accessing data for resource 668dbc8d-442a-4a12-96d2-0682313b586b, Round 14 — DTM Ukraine Returnees Dataset
    Accessing data for resource c50a1711-f5f8-471b-9887-85d396f19ddd, DTM Burundi Baseline Assessment Round 73
    Accessing data for resource 53c40477-6596-4209-b6a8-d587aac0dfe8, suivi-du-bilan_2022_nutrition
    Accessing data for resource 6341ffbf-85a3-44ec-b3eb-7ff300174252, DTM South Sudan Baseline Assessment Round 14
    Accessing data for resource 5fa55cb0-326a-4629-a056-eaf37aeff9d9, DTM Libya Baseline Assessment Round 45
    Accessing data for resource c524f6bd-f023-4cf3-a52d-ed0e9be5242b, hno-nigeria-2023_hxl-tags.xlsx
    Accessing data for resource 25dca590-f12c-40e1-817a-9da1344bca5e, DTM Mozambique Site Assessment Round 22
    Accessing data for resource b1e2a1f7-bd73-42a7-9cde-6227a78d4b77, DTM Madagascar Baseline Assessment Round 5


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:35:37 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=0, 0, 0, 0
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 0, 0, 0, 0
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:35:37 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=0, 0, 0, 0
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 0, 0, 0, 0


Processing 160 of 612 (26.14%) resources
    Accessing data for resource 62a5dbd5-ecd5-476e-8cfc-6225331e1377, DTM DRC BA Kasai Central Dec2020 R7
    Accessing data for resource 46e11b83-f60a-4796-a027-e46483193fa6, DTM Lebanon — Migrant Presence Monitoring — Round 3
    Accessing data for resource 175fe47a-ab68-495d-8ac0-defce8e4924d, TCD_DATA_SMART2021
Error accessing data for resource 175fe47a-ab68-495d-8ac0-defce8e4924d, TCD_DATA_SMART2021 ... list index out of range
    Accessing data for resource ce85c4d9-3cea-4e59-833a-ff19e91b9006, NER_Sep_2023
    Accessing data for resource 3bc1cf5a-a46d-47a5-8f7f-93789f5db5b9, Health Facilities
    Accessing data for resource 0b8dd14b-a9b9-408f-8e9d-08e3dbc84ef7, somalia-drought-key-figures
    Accessing data for resource e1d706aa-f2cc-43b0-80f1-60280945371c, DTM CAR Baseline Assessment Round 19
    Accessing data for resource 2f20951f-aaf8-4c69-a3be-7ad116ca3569, Black Sea Initiative Vessel Movements
    Accessing data for resource 79c6524

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:42:09 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#zone-specifique, #zone-specifique + code
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #zone-specifique, #zone-specifique + code
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:42:09 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#zone-specifique, #zone-specifique + code
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #zone-specifique, #zone-specifique + code


    Accessing data for resource 6ad4b44b-d9d4-4832-b39a-da7f1c912930, 3W_BDI_Mar2022
    Accessing data for resource 121d6cbe-92ff-4059-841f-26d8a8887ac9, global-3w-2023-06-08.xlsx
    Accessing data for resource 7106abfe-384a-4e26-86ab-eed828e91d63, 2022 SHCC Incident Data.xlsx
    Accessing data for resource 12fc93c2-39f6-4b8b-8f18-2a135e4d1a45, Burundi- Muyinga, Cankuzo, Makamba, Ruyigi, Rutana, Rumonge: Operational Presence
    Accessing data for resource cf461e2e-4ae2-439d-a6c6-ec0c5e2f5bad, Iraq populated places 2021 P-coded.xlsx
    Accessing data for resource 0130fc33-dd0a-4547-bf13-1e1514e4c134, DTM South Sudan Event Tracking - Jan-Dec 2022
Processing 210 of 612 (34.31%) resources
    Accessing data for resource b08f71f9-ba7e-4295-ba15-035a72f32fe6, 2016-2023 Attacks on Vaccination Campaigns.xlsx
    Accessing data for resource 8fd2b30c-6030-4daa-886c-0e3a6016025c, CAR_Cluster Education BD des écoles_08052023.xlsx
    Accessing data for resource 6332355d-ec41-4136-bb12-42bbb8b

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:44:47 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_bal, #indicator+lang+pct+iso639 3_brh, #indicator+lang+pct+iso639 3_prs, #indicator+lang+pct+iso639 3_eng, #indicator+lang+pct+iso639 3_hnd, #indicator+lang+pct+iso639 3_khw, #indicator+lang+pct+iso639 3_plk, #indicator+lang+pct+iso639 3_unknown, #indicator+lang+pct+iso639 3_pan, #indicator+lang+pct+iso639 3_pus, #indicator+lang+pct+iso639 3_skr, #indicator+lang+pct+iso639 3_plk, #indicator+lang+pct+iso639 3_snd, #indicator+lang+pct+iso639 3_urd
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639 3_bal, #indicator+lang+pct+iso639 3_brh, #indicator+lang+pct+iso639 3_prs, #indicator+lang+pct+iso639 3_eng, #indicator+lang+pct+iso639 3_hnd, #indicator+lang+pct+iso639 3_khw, #indicator+lang+pct+iso639 3_plk, #indicator+lang+pct+iso639 3_unknown, #indicator+lang+pct+iso639 3_pan, #indicator+lan

    Accessing data for resource 00084aa2-52c6-4332-8077-9fb4955d7820, PiN_VBG_2023_hdx.xlsx
Processing 230 of 612 (37.58%) resources
    Accessing data for resource a14afbc7-2b59-42e9-8642-abd9a7169ceb, afghanistan_conflict_displacements_2022.xlsx
    Accessing data for resource 31366c37-4803-4886-a608-768cd948cf7f, Palestine COVID-19 Cases by Governorate
    Accessing data for resource 57e50360-b021-4772-9bc5-542e0af00c2a, herams_nampula-db_qtr-2-2022_hxl.xlsx
    Accessing data for resource 235f892e-beea-4d90-8af3-19baf12ea5b9, hti_polbndl_rd_cnigs.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:45:51 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=# names
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: # names
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:45:51 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=# names
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: # names


    Accessing data for resource 92827be4-f6b3-4453-80cc-4c5c2e5d3d5f, DF_SITREP_COVID19.csv
    Accessing data for resource f8e6e090-fa9a-41df-b561-267d8728af38, Desnutrición Aguda_HDX_2022.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:46:00 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicador+low birth weight, #indicador+low birth weight
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicador+low birth weight, #indicador+low birth weight
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:46:00 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicador+low birth weight, #indicador+low birth weight
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicador+low birth weight, #indicador+low birth weight


    Accessing data for resource c17c4113-b885-4799-b24a-44562eb57947, PiN_Seguridad Alimentaria y Nutrición_2023.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:46:06 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+food security, #indicador+food security, #indicador+food security, #indicador+food security, #indicador+food security, #inneed+afro-colombian
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+food security, #indicador+food security, #indicador+food security, #indicador+food security, #indicador+food security, #inneed+afro-colombian
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:46:06 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicador+low birth weight, #indicador+low birth weight, #indicador+low bi

    Accessing data for resource 77b38c47-d802-4ae5-8445-73cd3132be22, pin_severity_hno_2023.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:46:12 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#severity + code, #severity +code +protection_mine-action, #inneed +ind +protection_mine-action
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #severity + code, #severity +code +protection_mine-action, #inneed +ind +protection_mine-action
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:46:12 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#severity + code, #severity +code +protection_mine-action, #inneed +ind +protection_mine-action
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #severity + code, #severity +code +protection_mine-action, #inneed +ind +protection_mine-action


    Accessing data for resource cfe49c66-8975-44e4-b8e0-594b5ea2c5ee, SYR_FSA_PIN_2023.xlsx
    Accessing data for resource 685d0768-17c5-4947-969d-4de446750704, SDG 4 Global and Thematic data
Processing 240 of 612 (39.22%) resources
    Accessing data for resource 1a856318-0c7f-4b9f-bc29-0c98a717b125, COVID-19 pandemic Attacks on Health Care in 2020.xlsx
    Accessing data for resource dedff8c3-065b-413d-9d01-2570e82d566b, flood-response-monitoring-matrix-august-2022-coded-woredas.xlsx
    Accessing data for resource 60ec36f5-59d9-40b6-983d-195b2e660276, CMR_EXNO_Data_InondationLC_MD_MT_V1.0_20221206.xlsx
    Accessing data for resource 918b0b51-30d3-4d46-b0c5-1688bdf27b8e, SSD_floods_301122.xlsx
    Accessing data for resource 63143067-46e0-4fb5-b131-d83ee45122ab, ETH_CapitalTowns 2021
    Accessing data for resource 0061b888-67c4-4798-83f5-ea44df8587e4, who-is-doing-what-and-where_nga_3w_jul_sept_2022.xlsx
    Accessing data for resource 65564e72-3052-4ed0-98cf-310896f7dbeb, ner_hno

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:50:55 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_arb, #indicator+lang+pct+iso639 3_unknown, #indicator+lang+pct+iso639 3_swh, #indicator+lang+pct+iso639 3_eng, #indicator+lang+pct+iso639 3_ita, #indicator+lang+pct+iso639 3_swh, #indicator+lang+pct+iso639 3_ymm, #indicator+lang+pct+iso639 3_xma, #indicator+lang+pct+iso639 3_som, #indicator+lang+pct+iso639 3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639 3_arb, #indicator+lang+pct+iso639 3_unknown, #indicator+lang+pct+iso639 3_swh, #indicator+lang+pct+iso639 3_eng, #indicator+lang+pct+iso639 3_ita, #indicator+lang+pct+iso639 3_swh, #indicator+lang+pct+iso639 3_ymm, #indicator+lang+pct+iso639 3_xma, #indicator+lang+pct+iso639 3_som, #indicator+lang+pct+iso639 3_unknown
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:50:55 [error    ] Skipping column(s) with malformed hashtag specs funct

    Accessing data for resource 363cfc0e-74ce-4268-815c-8705a72ccd0d, afghanistan-3w-april-to-june-2022.xlsx
    Accessing data for resource dfa9bde5-9f6f-43fd-af4d-b43bd8d0e124, Ukraine Flash Appeal - March to December 2022 PIN_HDX.xlsx
    Accessing data for resource 25bcd859-16ff-449b-9b8e-707a724f5152, who-is-doing-what-and-where_nga_3w_apr_jun_2022.xlsx
Processing 270 of 612 (44.12%) resources
    Accessing data for resource 60d15653-37e7-433c-87a4-20826ef74456, HT_Climato-Hydro-Meteo_EMDAT_Data 20220623
    Accessing data for resource 8161df6c-77ef-4627-9f17-1f81e975923b, HT_Acces a l'eau par commune 2022
    Accessing data for resource a3634ad3-5375-4644-88b2-93693f1a7ab4, 220702_3W Typhoon Rai_Odette Consolidated HDX.xlsx
Error accessing data for resource a3634ad3-5375-4644-88b2-93693f1a7ab4, 220702_3W Typhoon Rai_Odette Consolidated HDX.xlsx ... list index out of range
    Accessing data for resource fc41924b-9ace-4a71-9a6a-031e24bf841c, 2021-HRP-Sectors-Response-Jan-Dec.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:52:18 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+infected+new+24hrs, #affected+infected+test+24hrs
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+infected+new+24hrs, #affected+infected+test+24hrs
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:52:18 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+infected+new+24hrs, #affected+infected+test+24hrs
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+infected+new+24hrs, #affected+infected+test+24hrs


    Accessing data for resource 8cad612d-0245-45ae-ad25-1e9c9d82eb8b, Somalia COVID-19 cases by location
    Accessing data for resource 18797eb3-7352-4f99-b117-ffcecbaeb3d9, Libya COVID-19 Cases by Location
Processing 280 of 612 (45.75%) resources
    Accessing data for resource 180183e8-d5fc-4e81-a8c3-29f80be2af6b, DTM Nigeria North Central & West Location Assessment R9
    Accessing data for resource cb439fdf-23b4-4116-9ff2-a28d6c341e0f, DTM South Sudan Site Assessment Round 11
    Accessing data for resource 615848f2-4f81-46ef-9cb8-aa50b11147e9, Haiti: Coronavirus (COVID-19) Subnational Cases
    Accessing data for resource 8e15cffe-2815-4cbb-acd3-9a315bbc4d84, Child_protection_area_of_responsibility_organizations_hxl.xlsx
    Accessing data for resource c9ea0e06-6357-4356-a8b6-acef2367d35e, DTM Nigeria North Central & West Baseline Assessment R9
    Accessing data for resource c68af038-870a-4ee9-82bf-3413b65b2b55, Deflators data for Turkey
    Accessing data for resource 329d810b-

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:54:22 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=affected+migrants+hh, affected+migrants+ind
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: affected+migrants+hh, affected+migrants+ind
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:54:22 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=affected+migrants+hh, affected+migrants+ind
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: affected+migrants+hh, affected+migrants+ind


    Accessing data for resource 5e42b51e-8395-40ec-90dc-fa4d20d90307, DTM Zimbabwe Village Assessment — Matabeleland South and Masvingo Provinces (November 2021)


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:54:30 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=affected+migrants+hh
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: affected+migrants+hh
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:54:30 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=affected+migrants+hh
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: affected+migrants+hh


    Accessing data for resource d2f9cbaf-5fb2-454d-a59c-f9e1849f21b1, NGA_Subnational_Covid19_HXL_HERA.csv
    Accessing data for resource da6cba36-f81f-4bf7-9ad3-b04cf52bedf9, BFA_Subnational_Covid19_HXL_HERA.csv


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:54:49 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+vaccinated+1dose, #affected+cumulative+vaccinated+1dose, #affected+vaccinated+healthworkers+1dose, #affected+cumulative+vaccinated+healthworkers+1dose, #affected+vaccinated+2doses, #affected+cumulative+vaccinated+2doses, #affected+vaccinated+healthworkers+2doses, #affected+cumulative+vaccinated+healthworkers+2doses
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+vaccinated+1dose, #affected+cumulative+vaccinated+1dose, #affected+vaccinated+healthworkers+1dose, #affected+cumulative+vaccinated+healthworkers+1dose, #affected+vaccinated+2doses, #affected+cumulative+vaccinated+2doses, #affected+vaccinated+healthworkers+2doses, #affected+cumulative+vaccinated+healthworkers+2doses
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:54:49 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+vaccina

    Accessing data for resource 550ff8d0-aac2-4a77-bfeb-a6da9dd8ecae, MRT_Subnational_Covid19_HXL_HERA.csv
    Accessing data for resource 928a4140-5472-458d-b23c-776a8469e62d, MLI_Subnational_Covid19_HXL_HERA.csv
    Accessing data for resource 03fcb424-4149-49c2-9a3f-84536870ad12, VE_lang_admin0.csv
Processing 300 of 612 (49.02%) resources
    Accessing data for resource 36bab3c0-106c-4735-baee-e21b6924d7a2, Proyectos_san_2022_hrp_hdx
    Accessing data for resource be7434d8-1cd2-4c18-b303-ad95053170f3, victimas_explotacion_sexual_comercial.xlsx
    Accessing data for resource a8d86cf4-53ff-4137-b7a7-361ab55b31df, violencia_sexual_desastres_naturales.xlsx
    Accessing data for resource 55505ce8-f3b9-4b3a-b3bb-d04a577de138, UKR_ 2022 HRP_Target SADD.xlsx
    Accessing data for resource 2537d8e4-c376-4fce-a18f-1f8b70551972, datos_brutos_personas_alcanzadas_vbg.xlsx
    Accessing data for resource 2b50a59a-c07c-4e95-a362-a050361bca52, PAK_Consolidated_4W_Q1_Q4
    Accessing data for re

ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:56:25 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicador+barriers to healtcare, #indicador+barriers to healtcare, #indicador+barriers to healtcare, #indicador+barriers to healtcare, #indicador+barriers to healtcare, #indicador+mortality to epidemiological, #indicador+mortality to epidemiological, #indicador+mortality to epidemiological, #indicador+mortality to epidemiological, #indicador+mortality to epidemiological, 05001, 72727.56735918422, 2
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicador+barriers to healtcare, #indicador+barriers to healtcare, #indicador+barriers to healtcare, #indicador+barriers to healtcare, #indicador+barriers to healtcare, #indicador+mortality to epidemiological, #indicador+mortality to epidemiological, #indicador+mortality to epidemiological, #indicador+mortality to epidemiological, #indicador+mortality to epidemiological, 05001, 72727.56735918422,

    Accessing data for resource 29dd1c53-b1e7-4756-b782-09d389b422ed, Education Cluster Activities Dataset Consolidated 2019.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:56:40 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=0, 0, 42
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 0, 0, 42
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:56:40 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=0, 0, 42
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 0, 0, 42


    Accessing data for resource e66d4346-5c48-4a2c-840a-f0ffad339318, Zim_3W_August_2021.xlsx
    Accessing data for resource e5c97513-8597-49f4-bd0a-754165f6439a, 200402-datos-dashboard-acceso-a-servicios.xlsx
    Accessing data for resource 1a8d10a9-4c6b-46b1-9360-81205784f268, datos-para-hpc-v2.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:57:02 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#value+waste1+2
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #value+waste1+2
ERROR:hxl.REMOTE_ACCESS:2024-07-30 17:57:02 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#value+waste1+2
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #value+waste1+2


    Accessing data for resource 9440386d-5ae0-4eba-b687-338a2ef79786, 2010-2019-consolidado-sivicap (1).xlsx
    Accessing data for resource 90d10e73-06bf-45a2-80f6-e43068709afe, 201812-indicadores-wash-extranjeros-censo (1).xlsx
    Accessing data for resource f06ecfb9-c5e5-427e-aa0c-365b8b485d37, base-de-datos-wash (1).xlsx
    Accessing data for resource a06af1f0-3942-4b9c-aabc-5a5bcd0e25e7, afghan-voluntary-repatriation-2021.xlsx
    Accessing data for resource 54cb50fa-12dd-4f1d-be5d-32f69d0a2a39, Base convalidaciones - HDX.xlsx
Processing 320 of 612 (52.29%) resources
    Accessing data for resource eb4fcf4a-2fe6-46cf-a1c7-2a2026a8809c, base-desercion-escolar.xlsx
    Accessing data for resource 6ddcfbc9-fa06-4b14-b9a4-ce96d3fae65e, conectividad-internet-5-a-19-anos.xlsx
    Accessing data for resource 05fe6261-e81d-4854-adaf-81475f9c4205, base-cobertura-educativa.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:00:23 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=coverage+percent
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: coverage+percent
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:00:23 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=coverage+percent
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: coverage+percent


    Accessing data for resource ca4c6d23-75e2-43cb-972f-95dbe5e7ebe1, instituciones_de_salud_en_colombia.xlsx
    Accessing data for resource 79149c89-b6b8-48cd-8808-c098956ce8d8, acciones-de-cooperacion-para-ninos-ninas-y-adolescentes-migrantes-venezolanos.xlsx
    Accessing data for resource 0f089229-482c-4548-96f7-7eb63f714b9f, 200630_datosinfografiasocios.xlsx
    Accessing data for resource f59fe73c-9f2c-489a-add7-4f50f27ebbd2, delito-sexual-ven-col.xlsx
    Accessing data for resource ac90b2d0-dee4-4230-8bae-e4016a316de4, apoyos-de-la-cooperacion-con-insumos-para-salud-ante-covid-19.xlsx
    Accessing data for resource 9229fd69-29b7-4b29-b806-1631b6a62e05, vih_sida-vf.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:01:47 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+density+1000, #affected+density+1000
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+density+1000, #affected+density+1000
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:01:47 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+density+1000, #affected+density+1000
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+density+1000, #affected+density+1000


    Accessing data for resource 8c7f094b-a78f-42dc-8ff9-0c8ca2be5771, SECOP_HDX.xlsx
Processing 330 of 612 (53.92%) resources
    Accessing data for resource f8c5a8e8-ccd7-4e92-9ed0-33500f6f1359, iraq_hno_2021_severity_district_20210603
    Accessing data for resource 5db9438b-3692-4a0e-b49f-b966d96c3b41, PIN_SAN_2022_HDX.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:02:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+food security, #indicador+food security, #indicador+food security, #indicador+food security, #indicador+food security, #inneed+afro-colombian
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+low birth weight, #indicador+food security, #indicador+food security, #indicador+food security, #indicador+food security, #indicador+food security, #inneed+afro-colombian
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:02:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicador+low birth weight, #indicador+low birth weight, #indicador+low bi

    Accessing data for resource 1080cfda-cd4b-444a-817c-64a8bc61a1f2, Población objetivo de Salud_HDX.xlsx
    Accessing data for resource 578b6f44-ec18-424b-8505-9711f012a7bf, Proyectos del Clúster de Salud_2022_HDX.csv
    Accessing data for resource 7e2df627-c3b9-4b24-8030-72cfc8cbf80b, KP_Tribal_Districts_HF_Registry_v2_20180320.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:02:22 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#sr. no., #full name of health facility, #name of health facility, #number of beds, #
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #sr. no., #full name of health facility, #name of health facility, #number of beds, #
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:02:22 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#sr. no., #full name of health facility, #name of health facility, #number of beds, #
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #sr. no., #full name of health facility, #name of health facility, #number of beds, #


    Accessing data for resource 496f4397-99b0-4381-8c44-80584d1f8738, Pakistan IDPs by Area of Origin
    Accessing data for resource 7b1e5eff-b760-4df0-9a95-2503f0cecb79, LBY_POP_2021
    Accessing data for resource 5115e8c2-f4b4-4fa3-8d56-54c47c741f1d, PiN_WASH_2022_HDX.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:02:40 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#inneed+afro-colombian, #inneed+disability-condition
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #inneed+afro-colombian, #inneed+disability-condition
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:02:40 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#inneed+afro-colombian, #inneed+disability-condition
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #inneed+afro-colombian, #inneed+disability-condition


    Accessing data for resource c408b46d-3f45-46ac-abe0-7c697a73b0db, LBY_HNO_2022.xlsx
    Accessing data for resource 5e1bea70-be2f-4a17-95dc-1c4700cd75ba, Salud_financiamiento_FTS_HDX_2021.xlsx
Processing 340 of 612 (55.56%) resources
    Accessing data for resource 5722fb9c-6dd5-485f-87d6-84421b337288, UKR_Population Baseline_SADD_2022.xlsx
    Accessing data for resource fb31450c-b6e3-4660-a75f-44fcec5e0372, Kerela.xlsx
    Accessing data for resource cd541fbe-01f9-499d-8b88-41b84b9b596e, hdx_summary_stats_2021.csv
    Accessing data for resource d20b1642-e6de-4bc1-9315-81a32eb88c3a, PAK_HNO_2021.xlsx
    Accessing data for resource 92c7b3c3-c14f-4e8c-becc-b12f2f272967, Iraq Covid 19 subnational data
    Accessing data for resource 4a3fbee8-f82b-422c-95c0-ee68be073704, DATOS_Discapacidad_HDX.xlsx
    Accessing data for resource 745442ed-18e6-408c-8241-2db7702d7c21, SAN_Socios_2021_hdx.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:03:49 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#beneficiary+focus+indigenous+aro-colombian, #activity+crisis+national+covid-19
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #beneficiary+focus+indigenous+aro-colombian, #activity+crisis+national+covid-19
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:03:49 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#beneficiary+focus+indigenous+aro-colombian, #activity+crisis+national+covid-19
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #beneficiary+focus+indigenous+aro-colombian, #activity+crisis+national+covid-19


    Accessing data for resource 7ce27863-30e6-4de5-ba24-388855f4364b, Schoolsandtheircoordinates2020.xlsx
    Accessing data for resource dd330c8d-f092-4750-8405-66d6b546cd62, mapeo_actores_vbg.xlsx
    Accessing data for resource 5f144109-39c6-4b00-8a32-b983b957e65c, Partos_grupos etarios_2020.xlsx
Processing 350 of 612 (57.19%) resources
    Accessing data for resource 91794717-4e92-4f52-a483-3e4198f60270, PiN_VBG_HDX.xlsx
    Accessing data for resource 40ef29d7-8097-4261-be42-186705cb7473, Tasa_Afectados_Desastres_2019_2020.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:04:21 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#population+total+2019, #population+total+2020, #affected+2019, #affected+2020
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #population+total+2019, #population+total+2020, #affected+2019, #affected+2020
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:04:21 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#population+total+2019, #population+total+2020, #affected+2019, #affected+2020
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #population+total+2019, #population+total+2020, #affected+2019, #affected+2020


    Accessing data for resource c60fac4d-b3fb-48a7-9652-7d7e44b34a89, Health Facility List (with geo-codes).xlsx
    Accessing data for resource 03d86390-5a57-4a14-b0d8-2288fb7dac9d, presunto-delito-sexual-sexo-edad-septiembre.xlsx
    Accessing data for resource 4ef84622-0e2e-4251-999d-02f46c0d585c, BFA_Covid19_Citylevel_HXL_HERA.csv
    Accessing data for resource ba97f052-78bc-471d-89d3-fe48a19b8659, MLI_Covid19_Citylevel_HXL_HERA.csv
    Accessing data for resource 1c66a857-d11a-4519-baca-8f1f5e2ce2a4, 20210810_location_fce_rrrc_unhcr_population-registration_public.xlsx
    Accessing data for resource b1d1f96b-624b-4275-bdab-0fe670d11299, Iraq_Population_2021_CSO_Projection.xlsx
    Accessing data for resource 3f727fc9-9f88-4f22-934c-ce6c10db0aaa, DTM Mauritania Migrants Baseline Assessment Round 1
    Accessing data for resource ae821d86-cb34-4421-b1c9-db7708572038, DTM Haiti - EQ(2021) - Site Assessment - Round 2
Processing 360 of 612 (58.82%) resources
    Accessing data for res

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:05:27 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_len, #indicator+lang+pct+iso639 3_ccr, #indicator+lang+pct+iso639 3_ppl, #indicator+lang+pct+iso639 3_eng, #indicator+lang+pct+iso639 3_deu, #indicator+lang+pct+iso639 3_fra, #indicator+lang+pct+iso639 3_por, #indicator+lang+pct+iso639 3_ita, #indicator+lang+pct+iso639 3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639 3_len, #indicator+lang+pct+iso639 3_ccr, #indicator+lang+pct+iso639 3_ppl, #indicator+lang+pct+iso639 3_eng, #indicator+lang+pct+iso639 3_deu, #indicator+lang+pct+iso639 3_fra, #indicator+lang+pct+iso639 3_por, #indicator+lang+pct+iso639 3_ita, #indicator+lang+pct+iso639 3_unknown
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:05:27 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_len, #indicator+lang+pct

    Accessing data for resource 6f66c234-92c9-42fb-a16e-b1a4ca0b4bb4, IRV-2021.xlsx
    Accessing data for resource b54bf736-5f79-425b-bd94-38bb405d3593, Nigeria_3W_August_2021.xlsx
    Accessing data for resource e1b7dbce-14c4-4f20-be37-27aa0670e3d7, Somalia drought viz - UNHCR-PRMN-displacements
    Accessing data for resource 927ced8e-f1e0-4666-b259-af1403eb1a96, who-is-doing-what-and-where_nga_3w_august_2021.xlsx
    Accessing data for resource a7440a59-e4ef-4050-a155-aaf332e04577, ukraine_civilian casualties_2016-2021.xlsx
    Accessing data for resource e8c619e6-59b0-4e00-a912-76e18fe5b9f9, Nigeria Hospitals and Clinics_HXL.xlsx
    Accessing data for resource cefed0c9-230e-4313-bbe7-25e5dc441ff9, eth_agriculture_cluster_4w_march_june_2021
Processing 370 of 612 (60.46%) resources
    Accessing data for resource 37565fbc-5415-45a4-9f6c-b856653dae5d, Excess mortality during COVID-19 pandemic
    Accessing data for resource 997dc52d-0482-4773-9b8a-7e62d32e8c54, Daily-Update IDN-COVI

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:06:55 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+vaccinated+1dose, #affected+cumulative+vaccinated+1dose, #affected+vaccinated+2doses, #affected+cumulative+vaccinated+2doses
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+vaccinated+1dose, #affected+cumulative+vaccinated+1dose, #affected+vaccinated+2doses, #affected+cumulative+vaccinated+2doses
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:06:55 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+vaccinated+1dose, #affected+cumulative+vaccinated+1dose, #affected+vaccinated+2doses, #affected+cumulative+vaccinated+2doses
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+vaccinated+1dose, #affected+cumulative+vaccinated+1dose, #affected+vaccinated+2doses, #affected+cumulative+vaccinated+2doses


    Accessing data for resource 79eadcbd-af15-4218-862a-40714f468c1b, Haiti 3W data
    Accessing data for resource a4ddd895-00ed-406b-ae28-c7886bade322, who-is-doing-what-and-where_nga_3w_jan_Jun_2021.xlsx
    Accessing data for resource 58b9b864-f794-4ee4-a3f2-58827f7c9504, DTM CAR Site Assessment Round 12
    Accessing data for resource 9b44a1a0-5d32-4969-9daa-afa5d2910052, 20210719_5w_cluster-template-cleaned-v6-quickchart


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:07:31 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=indicator+distributed, note, adm1_pcode, adm2_pcode, adm3_pcode_, wash activity group, ind, total reached, check org 1, check org 2, check donnors, check sites, activity check3, activity_indicator2
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: indicator+distributed, note, adm1_pcode, adm2_pcode, adm3_pcode_, wash activity group, ind, total reached, check org 1, check org 2, check donnors, check sites, activity check3, activity_indicator2
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:07:31 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=indicator+distributed, note, adm1_pcode, adm2_pcode, adm3_pcode_, wash activity group, ind, total reached, check org 1, check org 2, check donnors, check sites, activity check3, activity_indicator2
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: indicator+di

Processing 380 of 612 (62.09%) resources
    Accessing data for resource 0c65956d-906f-4b12-9721-eef5823a6503, EC_lang_admin0.csv


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:07:36 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639 3_unknown
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:07:36 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639 3_unknown


    Accessing data for resource a5905c0c-a404-4668-aa12-b4227d0d8792, 210318_OCHA 3W_COVID 19 Humanitarian Response.xlsx
    Accessing data for resource 28732ce4-3100-4ebb-b353-3eb8fed9a56b, Sudan_Floods_Affected_Localities_10Nov2020
    Accessing data for resource 63ef99fd-d2b5-4e56-8ecb-89f7da87a9a4, HTI_TARGET_HRP2021.xlsx
    Accessing data for resource f4909418-05cd-45ac-9a9b-0f11830c65d8, Intersectoral severity of needs - Ethiopia 2021 HNO.xlsx
Dataset education-in-emergency-eie-key-figures-2018-2019 not found!
    Accessing data for resource 57501c43-83a4-4e7c-b69f-b034f65b9cd2, SUDAN_HNO 2021_Baseline Data.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:08:09 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=vulr18, u5_per, plw
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: vulr18, u5_per, plw
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:08:09 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=vulr18, u5_per, plw
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: vulr18, u5_per, plw


    Accessing data for resource a000ceaa-70c9-4db7-8ee3-cb5c17130fae, BJ_lang_admin0.csv


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:08:15 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639 3_unknown
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:08:15 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639 3_unknown


    Accessing data for resource b5bda083-b26c-49a6-9aa4-f476f19f17d0, UA-IDPs (2021 HNO).xlsx
    Accessing data for resource 7e2ec0b7-c46b-49e3-ad2e-44ba077abd23, TGO_Covid19_Citylevel_HXL_HERA.csv
Processing 390 of 612 (63.73%) resources
    Accessing data for resource fae45d2c-ffae-4de8-b54e-2bca673e0f42, TGO_Subnational_Covid19_HXL_HERA.csv
    Accessing data for resource e1003a2f-280b-429c-9ad8-997e30642e42, Pakistan National Nutrition Survey 2018.xlsx
    Accessing data for resource 618c6f61-bd9c-448a-8913-5c96ba956af1, Zimbabwe_HNO.xlsx
    Accessing data for resource 4b9ab62b-8395-4eb0-93dc-8e253b0a9898, afghanistan-3w-operational-presence-january-to-march-2021.csv
    Accessing data for resource e21a03c5-0e7c-4a07-8292-43a26f92a48c, Yemen_CIMP - Civilian Structure
    Accessing data for resource ad496fd4-7b76-432a-894c-6870e8844e85, Yemen_CIMP - Number of casualities
    Accessing data for resource a10e7b0b-e14e-4536-9b82-eb93ac2f5485, Zimbabwe Baseline Population.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:09:07 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=15, 0, 15
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 15, 0, 15
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:09:07 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=15, 0, 15
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 15, 0, 15


    Accessing data for resource f0acef6a-a628-4fc5-b675-eb58783b2e2a, unrwa_pse_refugees_31dec2019.xlsx
    Accessing data for resource af6cae79-86e2-411a-9ac8-351dc3920864, 210316_3W_COVID 19 Humanitarian Response_RCCE.xlsx
    Accessing data for resource 9a4cef39-9ac5-4fff-945c-d9098816dac9, PE_lang_admin0.csv


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:09:27 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_cni, #indicator+lang+pct+iso639 3_ayr, #indicator+lang+pct+iso639 3_que, #indicator+lang+pct+iso639 3_spa, #indicator+lang+pct+iso639 3_unknown_indigenous, #indicator+lang+pct+iso639 3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639 3_cni, #indicator+lang+pct+iso639 3_ayr, #indicator+lang+pct+iso639 3_que, #indicator+lang+pct+iso639 3_spa, #indicator+lang+pct+iso639 3_unknown_indigenous, #indicator+lang+pct+iso639 3_unknown
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:09:27 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639 3_cni, #indicator+lang+pct+iso639 3_ayr, #indicator+lang+pct+iso639 3_que, #indicator+lang+pct+iso639 3_spa, #indicator+lang+pct+iso639 3_unknown_indigenous, #indicator+lang+pct+iso639 3_unknown
ERROR

Processing 400 of 612 (65.36%) resources
    Accessing data for resource b0db3d0b-1c6c-41ec-a4c6-bbc0bc22f27c, bfa_hno_2021.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:09:37 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=42, 15, 0, 15, 15, 0, 15, 15, 15
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 42, 15, 0, 15, 15, 0, 15, 15, 15
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:09:37 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=42, 15, 0, 15, 15, 0, 15, 15, 15
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 42, 15, 0, 15, 15, 0, 15, 15, 15


    Accessing data for resource de3d30de-016f-4b10-9711-9fbd1846bf92, Myanmar_HNO_HRP_2021-hxl.xlsx
    Accessing data for resource 4b95538b-0fc8-4870-8cfd-8f08a2d2f1d3, LIF_3w_DatafromNov2020HXL.xlsx
    Accessing data for resource 95a4d096-4480-48f7-85d4-310bf91c43c8, Burundi Cankuzo_FD.xlsx
    Accessing data for resource 28cc6655-2f83-43ba-8338-85d4bc6ac97f, 2016-2020 Air- and ground-launched explosive weapons affecting health facilities.xlsx
    Accessing data for resource efef2f04-b446-4963-bf27-cf9af9fc8d3a, SEN_Subnational_Covid19_HXL_HERA.csv
    Accessing data for resource c279e1ea-9e29-45e2-94b7-b51d389e8ab6, tcd_hrp_2021.xlsx
    Accessing data for resource 9fc0df81-bbb1-43f5-beb9-45b625fa3aa7, Site Assessment Round 2 - Pb
    Accessing data for resource bdfd19ae-e815-4935-95ce-9dcbddf4e40b, 06_IOM DTM Dataset Kenya_Round 1_20150530_public_0_0.xlsx
    Accessing data for resource 12ad71a7-0f51-4548-a621-f0a9d9a4e5e7, Ecuador_site_assessment_R7_Pb
Processing 410 of 612 (66.9

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:10:53 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#geo=lon
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #geo=lon
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:10:53 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#geo=lon
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #geo=lon


    Accessing data for resource e9f78e16-ee45-4070-bdee-a2050e66ba20, afghanistan-3w-operational-presence-january-to-march-2020.csv
    Accessing data for resource d554c9dc-62c2-4ad4-8e1c-a04b4fd03f57, 2020 SHCC Health Care Nagorno-Karabakh Data.xlsx
Error reading dataset drc-congo-humanitarian-needs-overview ... Failed when trying to read: id=drc-congo-humanitarian-needs-overview! (POST)
    Accessing data for resource cf6f622e-a904-47f8-897e-a500507c70b4, south_sudan_2021_humanitarian_needs_overview.xlsx
    Accessing data for resource 7106f14e-61f5-41d6-bc0f-6432edafeebb, CAF_HRP_2021
Error reading dataset mozambique-hno ... Failed when trying to read: id=mozambique-hno! (POST)
    Accessing data for resource ce5c0b0b-4e91-438a-bc49-f08ac7f88b13, 210330_Typhoon Goni (Rolly) and Vamco (Ulysses)_3W response.xlsx
    Accessing data for resource 3ea2cad4-13db-4099-8998-76a5a56f0640, 2020-HRP-Sectors-Response-Jan-Dec-HXLV2.xlsx
Processing 420 of 612 (68.63%) resources
    Accessing data 

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:12:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639-3_bam, #indicator+lang+pct+iso639-3_ful, #indicator+lang+pct+iso639-3_hmb, #indicator+lang+pct+iso639-3_snk, #indicator+lang+pct+iso639-3_kao, #indicator+lang+pct+iso639-3_myk, #indicator+lang+pct+iso639-3_unknown, #indicator+lang+pct+iso639-3_mey, #indicator+lang+pct+iso639-3_tmh, #indicator+lang+pct+iso639-3_rkm, #indicator+lang+pct+iso639-3_myk, #indicator+lang+pct+iso639-3_bxw, #indicator+lang+pct+iso639-3_bze, #indicator+lang+pct+iso639-3_ara, #indicator+lang+pct+iso639-3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639-3_bam, #indicator+lang+pct+iso639-3_ful, #indicator+lang+pct+iso639-3_hmb, #indicator+lang+pct+iso639-3_snk, #indicator+lang+pct+iso639-3_kao, #indicator+lang+pct+iso639-3_myk, #indicator+lang+pct+iso639-3_unknown, #indicator+lang+pct+iso639-3_mey, #indicator

    Accessing data for resource 6e750e3b-28f8-4846-b512-a35450faa36f, 4W_BU_Kayanza.csv
    Accessing data for resource 1e6050b6-2bf6-4050-aed8-7157f2c3cc5c, DR Congo Covid 19 Subnational cases
    Accessing data for resource 2a816dc6-d079-4db8-b8fc-84458b006491, 28Jan21 - 5W & Assesment - OCHAIDN.xlsx
    Accessing data for resource 74f86547-b489-45d5-a613-15d56dff734d, 4W_BU_Kirundo.csv
Processing 430 of 612 (70.26%) resources
    Accessing data for resource 17f5ddef-ba47-47fa-8108-c564c5da5385, fieldsdata_4w_UG_BujumburaMarie.csv
    Accessing data for resource 6510f20d-f959-49dc-a544-b9dfa4c7561b, Ethiopia_Covid19_cases_HXL_HERA.csv


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:12:46 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=ethiopia
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: ethiopia
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:12:46 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=ethiopia
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: ethiopia


    Accessing data for resource 41a5a96e-bf08-4228-bcc8-9229cd7c7b4d, DTM Armenia Baseline Assessment Round 3
    Accessing data for resource bdab79f3-9f24-4dbb-99a2-637375ac97bd, Italy - Food Prices
    Accessing data for resource f77153af-56f7-49d2-abf0-8c4598eb46b4, cleaned_compiled_Nov_3ws.xlsx
    Accessing data for resource be59bc8b-11a2-4430-bbc2-dd7584d9cf06, DTM Guatemala Site Assessment ETA_IOTA R1
    Accessing data for resource 4f44557e-e5e8-4892-9984-247d0fbd1ea8, DTM Honduras Site Assessment  Hurricane ETA And IOTA Response R 1
    Accessing data for resource db57c14c-514d-4744-8079-0bbaa2879c45, ni_lang_v01_admin0.csv
    Accessing data for resource 866f4dcc-b1de-4714-ae12-b2c4a3b25b56, npm-site-assessment-round-16-dataset-20191010
    Accessing data for resource 5ee2111a-86bc-4ce2-b127-14f0bd4d5482, LCB_SnapShot_DataSets - key_figures.csv
Processing 440 of 612 (71.90%) resources
    Accessing data for resource 2ff8c88a-c7d9-4e2c-a624-fb9ea8c4c941, Bénin_Covid-19_Subnat

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:15:39 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+diseases_of_ respiratory_system
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+diseases_of_ respiratory_system
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:15:39 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#affected+diseases_of_ respiratory_system
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #affected+diseases_of_ respiratory_system


Processing 460 of 612 (75.16%) resources
    Accessing data for resource 57a4d991-f760-472f-9312-868c10498bc1, co_lang_v01.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:15:47 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639-3_emp, #indicator+lang+pct+iso639-3_pbb, #indicator+lang+pct+iso639-3_bpb, #indicator+lang+pct+iso639-3_guc, #indicator+lang+pct+iso639-3_cto, #indicator+lang+pct+iso639-3_spa, #indicator+lang+pct+iso639-3_eng, #indicator+lang+pct+iso639-3_fra, #indicator+lang+pct+iso639-3_ita, #indicator+lang+pct+iso639-3_deu, #indicator+lang+pct+iso639-3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639-3_emp, #indicator+lang+pct+iso639-3_pbb, #indicator+lang+pct+iso639-3_bpb, #indicator+lang+pct+iso639-3_guc, #indicator+lang+pct+iso639-3_cto, #indicator+lang+pct+iso639-3_spa, #indicator+lang+pct+iso639-3_eng, #indicator+lang+pct+iso639-3_fra, #indicator+lang+pct+iso639-3_ita, #indicator+lang+pct+iso639-3_deu, #indicator+lang+pct+iso639-3_unknown
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:15:47 [erro

    Accessing data for resource b08078d9-2e5c-4d32-a430-733e1e9e6ed2, mhu-jan-dec2019-idn-hxl.csv
    Accessing data for resource 6c7ee710-ea84-4cc9-981b-81c8f432bd04, kh_lang_v01.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:16:01 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_umh, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_krr, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_pnx, #indicator+lang+pct+iso639-3_brb, #indicator+lang+pct+iso639-3_umh, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_umh, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_krr, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_khm, #indicator+lang+pct+iso639-3_pnx, #indicator+lang+pct+iso639-3_brb, #indicator+lang+pc

    Accessing data for resource dd079132-bc41-4ac2-b4e7-92894277d7de, HPC 2020 sector PINs Targets final.xlsx
    Accessing data for resource e0da0453-eae1-412e-a84b-016c8e87b758, PiN, IDPs, Refugees & Returnees 2020 figures
    Accessing data for resource a6eea2b2-0ff8-41ae-b25e-087566f0f221, Imperial_COVID-19_Projections.xlsx
    Accessing data for resource 9e945b6f-c825-41a6-b1ab-eed971fd792d, sudan-people-reached-by-state-jan-dec-2018_hrp.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:16:22 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=4000000
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 4000000
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:16:22 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=4000000
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 4000000


    Accessing data for resource cdfcaf02-56ed-4dfe-914a-7a40ee5090ca, sudan-people-reached-by-locality-jan-dec-2019_hrp.xlsx
    Accessing data for resource 7153d0f0-56b4-4282-aabc-3a0695812c0e, us-states-hxl.csv
    Accessing data for resource 3661aacd-d9b7-4eb0-8167-681ac13d4042, th_lang_v01.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:16:43 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639-3_tha, #indicator+lang+pct+iso639-3_unknown, #indicator+lang+pct+iso639-3_mfa, #indicator+lang+pct+iso639-3_mya, #indicator+lang+pct+iso639-3_eng, #indicator+lang+pct+iso639-3_unknown
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639-3_tha, #indicator+lang+pct+iso639-3_unknown, #indicator+lang+pct+iso639-3_mfa, #indicator+lang+pct+iso639-3_mya, #indicator+lang+pct+iso639-3_eng, #indicator+lang+pct+iso639-3_unknown
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:16:43 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639-3_tha, #indicator+lang+pct+iso639-3_unknown, #indicator+lang+pct+iso639-3_mfa, #indicator+lang+pct+iso639-3_mya, #indicator+lang+pct+iso639-3_eng, #indicator+lang+pct+iso639-3_unknown
ERROR:hxl.model:Skipping column(s) wit

Processing 470 of 612 (76.80%) resources
    Accessing data for resource b5d618bd-f363-4240-8ce2-1536beda9299, Global Coordination Groups (Beta) CSV
    Accessing data for resource 3cac22e4-1e09-45b5-93d5-720771161b1e, DHS Quickstats Data for Turkey
    Accessing data for resource de4a6b23-45f6-46cc-bb68-36c59b043219, DHS Quickstats Data for Turkey
    Accessing data for resource 742b08fe-c53d-4538-81fc-b30a41a1280a, Distribution of Population by Sex, Number of Households, Land Area, Population Density and Sub County.xlsx
    Accessing data for resource 7267b407-3806-448c-8f78-f9b98cb8ed11, 2020_DPRK_N&P_Overview_Provisional_data.xlsx
    Accessing data for resource b9465fc1-15d4-462e-91df-2f2c39ee5cc3, Sudan Disease Outbreaks 2019.xlsx
    Accessing data for resource a2a23a36-cc7a-4c0c-aeb6-cb879578f8fe, spcf-31dec2019_rosea_hxl.xlsx
    Accessing data for resource e2781ffd-9989-4112-9154-6b8d29464b11, 200304_3W on NCDDS_EQ.xlsx
    Accessing data for resource 1e7e721e-e7bd-4508-888a-

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:17:40 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=1
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 1
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:17:40 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=1
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 1


    Accessing data for resource 13ae3804-14ea-44a8-9711-f738e82afcd6, 200210_Taal Eruption 3W_Consolidated.xlsx
Processing 480 of 612 (78.43%) resources
    Accessing data for resource 8d45c271-3394-4860-961b-0231b91f8a09, MW_lang_V02.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:17:55 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+lang+pct+iso639-3_nya, #indicator+lang+pct+iso639-3_ngl, #indicator+lang+pct+iso639-3_eng, #indicator+lang+pct+iso639-3_lai, #indicator+lang+pct+iso639-3_ngo, #indicator+lang+pct+iso639-3_nyy, #indicator+lang+pct+iso639-3_por, #indicator+lang+pct+iso639-3_seh, #indicator+lang+pct+iso639-3_nse, #indicator+lang+pct+iso639-3_tum, #indicator+lang+pct+iso639-3_toh, #indicator+lang+pct+iso639-3_yao, #indicator+lang+pct+iso639-3_unknown, #indicator+lang+understand+pct+iso639-3_abc
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+lang+pct+iso639-3_nya, #indicator+lang+pct+iso639-3_ngl, #indicator+lang+pct+iso639-3_eng, #indicator+lang+pct+iso639-3_lai, #indicator+lang+pct+iso639-3_ngo, #indicator+lang+pct+iso639-3_nyy, #indicator+lang+pct+iso639-3_por, #indicator+lang+pct+iso639-3_seh, #indicator+lang+pct+iso639-3_nse, #indica

    Accessing data for resource 3ce474ea-8a18-4909-929f-131ac883acf8, WCA UN agencies location.xlsx
    Accessing data for resource b4fead08-f224-407a-9c7f-bd2f9983273d, afghanistan-3w-operational-presence-october-to-december-2019.csv
    Accessing data for resource f1aabdf9-2bbe-4b7f-b19e-eb38c07f915d, DTM Yemen Flow Monitoring - Jan19 to Dec19
    Accessing data for resource 15e24633-67f1-463b-ad41-9b89b9dbbd07, Philippines Mindanao Earthquakes Site Assessment R5
    Accessing data for resource d4e67ae7-a910-47c9-b283-304877c7e5eb, Humanitarian Response Plans
    Accessing data for resource 9963ba32-bd72-4c25-96cd-d28ad9cd7e75, DTM CAR Bangui Floods Oct19 Site Assessment
    Accessing data for resource e8a121b9-eb74-498f-91a4-8c6601eddc98, water quality data_v2.xlsx
    Accessing data for resource 038f595a-baaf-4fa2-9b4e-04f7f901ba42, DTM_Bahamas_MSLA_Data_Round_3
    Accessing data for resource eae0a3d3-a630-4412-ab2a-7b61046f57cc, Somalia OCHA - Monitoring Matrix 2017
Processing 49

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:21:16 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=comments, org type verified?
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: comments, org type verified?
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:21:16 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=comments, org type verified?
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: comments, org type verified?


Processing 510 of 612 (83.33%) resources
    Accessing data for resource cb10bb18-bef3-4040-ac01-4008a2677100, DTM Burkina Faso Site Assessment Round 1
    Accessing data for resource fcc28514-36c3-4aab-91e9-1719ff0a33c2, Papua New SA Ulawun Valcona R2


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:21:27 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=0, 0
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 0, 0
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:21:27 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=0, 0
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: 0, 0


    Accessing data for resource d21f0711-2722-42ee-aa1c-6dda74287ff4, DTM Malawi Site Assessment Round 3
    Accessing data for resource 50fa214d-06bb-469c-8482-e448944c00dc, sanitation_access_opendefecation.xlsx
    Accessing data for resource 58542259-91b3-4174-a8fe-3192e3f6511e, yemen_hard_to_reach_districts_april_2019.xlsx
    Accessing data for resource e142446b-cfc1-40bf-9714-6dc8c0a97277, Base Desaparecidos(INML)EneroaAbril2019Elaborado08072019.xlsx
    Accessing data for resource d6339726-55f5-47b5-9e27-11f015ade281, CXB-Education-Sector-Facility-2018-06-24.xlsx
    Accessing data for resource c40c0d0f-562b-4912-98f4-8e629545e266, afghanistan-3w-operational-presence-january-to-march-2019.xlsx
    Accessing data for resource a1fba5e5-1cbc-4e24-b25f-6ece34766ef0, DTM Afghanistan IDPs Returnees Needs Assessment May-Jun-18
    Accessing data for resource cd02b8df-2cb9-4ca2-abba-ff73f31615a6, afg_casualties_2018.xlsx
Processing 520 of 612 (84.97%) resources
    Accessing data for re

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:23:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=household_size, partenaires d'implementation, service financier, conditionality, date_ordre_end, mois
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: household_size, partenaires d'implementation, service financier, conditionality, date_ordre_end, mois
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:23:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=household_size, partenaires d'implementation, service financier, conditionality, date_ordre_end, mois
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: household_size, partenaires d'implementation, service financier, conditionality, date_ordre_end, mois


    Accessing data for resource 89dacb2d-9cb9-492a-9db3-2d997ca4267c, All sectors consolidated-ver2.xlsx
    Accessing data for resource 68328f42-9276-423e-80d0-fe89630804ff, 3w_hxl.xlsx
    Accessing data for resource dc003506-2ddf-4fdf-8c6d-0bb783af5390, COD_MLI.csv


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:23:28 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=shape_length, shape_area
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: shape_length, shape_area
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:23:28 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=shape_length, shape_area
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: shape_length, shape_area


Processing 530 of 612 (86.60%) resources
    Accessing data for resource cf031d00-b817-4f6f-ac58-6bf74a000cb6, Hurricane_Michael_Twitter_Data_Analysis_SummaryTable.xlsx
    Accessing data for resource 60247249-5427-4634-93f2-0068c1a5b1a1, Dashboard-reowa-sahel-2018.xlsx
    Accessing data for resource 05d6134c-1234-4857-982c-a9ccf42711d8, moz_cycloneidai_aerialsurvey_hxl.xlsx
    Accessing data for resource 66ecc982-bc91-45bb-b400-49295fd7f1e8, DTM Vanuatu Manaro Round 4 Displacement-Return Dataset (2019-02).xlsx
    Accessing data for resource 40b9b069-9695-457f-81d4-0b8d975f4cbb, DTM Yemen Area Assessment Round 37
    Accessing data for resource ecc21f95-cfed-47a9-ab5b-10e0f81fcd19, mhr_diffa_fevrier_2019.csv
    Accessing data for resource f6e65456-b147-45d3-bb89-848919aa0ada, MLI_DATA_HumanitarianAccessByCommuneSurveyData_20190314.xlsx
    Accessing data for resource ee6b4c0c-6d2a-4d87-9f73-4ad34e79c66c, Haiti Ciblage HRP 2019-2020.xlsx
    Accessing data for resource 226f8c83-1897

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:26:14 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#population+rural+2018
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #population+rural+2018
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:26:14 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#population+rural+2018
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #population+rural+2018


    Accessing data for resource 3b4b4e12-860f-4223-8d98-b756c081ec96, homicides_venezuela_2017_1.xlsx
    Accessing data for resource d9e2e643-87b6-4aeb-a636-557b751df553, ebola-cases-and-deaths-national-may-2018-outbreak-equateur.csv
    Accessing data for resource f54a1612-aec3-41ad-88f0-231afaf7e150, Indicators_data_ZW
Error accessing data for resource f54a1612-aec3-41ad-88f0-231afaf7e150, Indicators_data_ZW ... list index out of range
    Accessing data for resource f9579315-7dd0-4155-aa74-d514258046ae, Crime Trends and Operations of Criminal Justice Systems (UN-CTS).csv
    Accessing data for resource 56a9595e-5a09-4b60-8cb2-f73492fe69af, Yemen Cholera Outbreak Epidemiology Data
    Accessing data for resource f8d48ac8-cef8-4404-8e37-bfe8f85bce89, idp_flowdata_july_2018_finalshare.xlsx
    Accessing data for resource 1c0cbace-7206-42fa-a84b-c5cf84e1a1de, 160704_5W_HDX.xlsx
    Accessing data for resource bb97c4ef-7222-45b6-934b-df0a07ba011b, unicef-esaro-regional-refugee-and-idp-d

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:29:30 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#adm2+ code
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #adm2+ code
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:29:30 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#adm2+ code
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #adm2+ code


Processing 580 of 612 (94.77%) resources
    Accessing data for resource 25b04cb5-53c0-43f7-8b7e-898ff7b34c2b, Somalia flood - people affected per district
    Accessing data for resource fe1cbb23-fb92-4daf-8555-2862d289ae53, wca_Cases2015_RegCholeraPlatformWCA.xlsx
    Accessing data for resource 13708985-49a3-4d3b-aec5-315628ee2ec4, cmr_car_refugees_20171231_unhcr_hxl_v1.1.xlsx


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:29:46 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#population+refugees+female+age>60, population+refugees+male+age5_11, #population+refugees+male+age>60
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #population+refugees+female+age>60, population+refugees+male+age5_11, #population+refugees+male+age>60
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:29:46 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#population+refugees+female+age>60, population+refugees+male+age5_11, #population+refugees+male+age>60
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #population+refugees+female+age>60, population+refugees+male+age5_11, #population+refugees+male+age>60


    Accessing data for resource a1a8ac08-807d-4bfb-9d97-2415134f21f1, ocha_nigeria_ne_cash_activities_May-Nov 2017-distributed.xlsx
    Accessing data for resource c86ff8da-bf4b-4766-a43d-08f5d1fc5a15, Refugee Population in Afghanistan 2017
    Accessing data for resource 4d9585ae-f32b-4742-a399-06efe44b63e8, Personnes déplacées du Pool


ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:30:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+ secondary+school+f
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+ secondary+school+f
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:30:05 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#indicator+ secondary+school+f
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #indicator+ secondary+school+f


    Accessing data for resource e6d5a229-92d2-4f76-9a0b-2f26854e1012, Iraq 2015.csv
    Accessing data for resource d22dd1b6-2ff0-47ab-85c6-08aeb911a832, #HXL core vocabulary list
    Accessing data for resource a1246506-28f6-4a41-b6f3-80ebdce74556, Standby Task Force Situational Review of Aid Responders in Nepal - Final 2W Report on 513 Organizations Responding as of May 6, 2015  - Final Output (1).xlsx
    Accessing data for resource 2180251e-53dc-4ade-839d-2e390c0d406c, Mt_Agung
Processing 590 of 612 (96.41%) resources
    Accessing data for resource c7fb99a5-43ec-4b3f-b8db-935640c75aeb, assesment_data_crm_05april2017.xlsx
    Accessing data for resource 8df22107-ffd4-4c06-a701-16ff5d1bd2b4, Afghanistan_Conflict_Displacements_2016.xlsx
    Accessing data for resource f76b0dd8-2137-42a1-8a9b-4fc6dc38c007, Philippines 3w as of December 2016.xlsx
    Accessing data for resource 98189d1d-7b20-4a5f-891e-17b26ef49a75, DC_OP4_DANA.xlsx
    Accessing data for resource 2527ac5b-66fe-46f0-8b9

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:31:04 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#sector?
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #sector?
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:31:04 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#sector?
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #sector?


    Accessing data for resource 3106b05f-e171-49ac-a353-d7b56aed918f, Lake_Chad_Basin_Appeal_Status_2017-02-27.csv
    Accessing data for resource 1aaa5281-de4f-4ffa-a5ef-37c754a86478, Lake_Chad_Basin_Displaced_2017-02-23.csv
    Accessing data for resource 020000e5-e7aa-4027-b58a-efe7396ff32e, Lake_Chad_Basin_Estimated_Population_2017-02-23.csv
    Accessing data for resource 48eea6e6-04d9-4c6a-ad38-80ff8365f601, 201612---Pakistan-4Ws-KP-FATA-2016-HXL (3).xlsx
    Accessing data for resource 218ab806-0651-451a-a731-e0ccb80fe1d9, 06_09_update zika_global cases_IFRC.xlsx
Processing 600 of 612 (98.04%) resources
    Accessing data for resource 968202f1-856a-4906-ae87-c730e9b1dd27, PHL_haima_houses_damaged_pcoded_ndrrmc_sitrep_9_20161025.csv
    Accessing data for resource 7211bf25-c9a1-4496-9bbb-a3e10efb2e4f, 15_09_zika_cases_export_Zika_Team.xlsx
    Accessing data for resource 920df0fe-411d-47ee-8004-11b2302fbe1c, 160909_5W 2.0_HDX.xlsx
    Accessing data for resource ac8d491f-338f-414

ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:32:10 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#sector#subsector
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #sector#subsector
ERROR:hxl.REMOTE_ACCESS:2024-07-30 18:32:10 [error    ] Skipping column(s) with malformed hashtag specs function=parse_list hastags=#sector#subsector
ERROR:hxl.model:Skipping column(s) with malformed hashtag specs: #sector#subsector


    Accessing data for resource 58069fa8-18b6-4132-a242-32216815a398, Ecuador Earthquake - April 2016 - Severity index
    Accessing data for resource 5c3a03d2-53de-4754-86b3-c7116d927155, Burundi 3W_formated.xlsx
    Accessing data for resource 62d4b11f-1a29-4753-a54f-c327f21f0b0b, 3W - 5 May - Final consolidated.xlsx
    Accessing data for resource f7daccff-bcc7-41cd-bee0-4133ce6b8988, Somalia NGO Consortium Data.xlsx
    Accessing data for resource 9cba28b4-d112-4850-96b4-81122af45f9b, 141121 LR Health Care Facilities.xlsx
Processing 610 of 612 (99.67%) resources
    Accessing data for resource f78dc606-04e2-4fb6-a7eb-9eb995c33f76, 1501 Sierra Leone Health Centers.xlsx
    Accessing data for resource 5d2531d6-c03a-449b-afdd-52c07d687679, Guinea health-facility master data


Unnamed: 0,Hashtag with Attributes,Text header,Locations,Data provider,HDX dataset id,HDX resource id,Date created,Unnamed: 9,Hash,Data excerpt,File,URL
1,#affected+hh,Total IDP HH,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,true,0x2cc7fd3129c0d18c,[319283],/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/DRC - Baseline Assessment - M23 Crisis 13 - February 20247.xlsx,https://data.humdata.org/dataset/3554c498-660a-45cb-ada5-86a1fbcd6056/resource/26ecc26f-74e7-46af-b450-8872dca0b63b/download/adc_27jan-12_feb_update_public_v2.xlsx
8,#affected+ind+returnees,Total Returnees,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,true,0x2cc7fd3129c0d18c,[587705],/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/DRC - Baseline Assessment - M23 Crisis 13 - February 20247.xlsx,https://data.humdata.org/dataset/3554c498-660a-45cb-ada5-86a1fbcd6056/resource/26ecc26f-74e7-46af-b450-8872dca0b63b/download/adc_27jan-12_feb_update_public_v2.xlsx
100,#country,country_name,AFG BFA BDI CMR CAF TCD COL COD ETH HTI MLI MOZ MMR NER NGA SOM SSD PSE SDN SYR UKR VEN YEM,eth-zurich-weather-and-climate-risks,climada-earthquake-dataset,dbf9b4bd-1321-4846-b6f0-4654509d3626,2024-02-23,,0x234606b299c1e43e,"['Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/admin1-summaries-earthquake7.csv,https://data.humdata.org/dataset/744f4f0b-3172-4397-9609-5ec0b9d34fcb/resource/dbf9b4bd-1321-4846-b6f0-4654509d3626/download/admin1-summaries-earthquake.csv
101,#adm1+name,region_name,AFG BFA BDI CMR CAF TCD COL COD ETH HTI MLI MOZ MMR NER NGA SOM SSD PSE SDN SYR UKR VEN YEM,eth-zurich-weather-and-climate-risks,climada-earthquake-dataset,dbf9b4bd-1321-4846-b6f0-4654509d3626,2024-02-23,,0x234606b299c1e43e,,/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/admin1-summaries-earthquake7.csv,https://data.humdata.org/dataset/744f4f0b-3172-4397-9609-5ec0b9d34fcb/resource/dbf9b4bd-1321-4846-b6f0-4654509d3626/download/admin1-summaries-earthquake.csv
102,#geo+lat,latitude,AFG BFA BDI CMR CAF TCD COL COD ETH HTI MLI MOZ MMR NER NGA SOM SSD PSE SDN SYR UKR VEN YEM,eth-zurich-weather-and-climate-risks,climada-earthquake-dataset,dbf9b4bd-1321-4846-b6f0-4654509d3626,2024-02-23,,0x234606b299c1e43e,"['34.5527', '34.9568', '34.9619', '34.3033', '34.0121', '34.2743', '34.7693', '35.4474', '35.8025', '34.8046', '33.3211']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/admin1-summaries-earthquake7.csv,https://data.humdata.org/dataset/744f4f0b-3172-4397-9609-5ec0b9d34fcb/resource/dbf9b4bd-1321-4846-b6f0-4654509d3626/download/admin1-summaries-earthquake.csv
...,...,...,...,...,...,...,...,...,...,...,...,...
689531,#adm1,Province,SLE,standby-task-force,141121-sierra-leone-health-facilities,f78dc606-04e2-4fb6-a7eb-9eb995c33f76,2014-11-01,True,0x5c648c638f567754,"['Eastern', 'Eastern', 'Eastern', 'Eastern', 'Eastern', 'Eastern', 'Eastern', 'Eastern', 'Eastern', 'Eastern', 'Eastern']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/1501 Sierra Leone Health Centers1.xlsx,https://data.humdata.org/dataset/7453fb80-752b-4078-a892-d936f9846dab/resource/f78dc606-04e2-4fb6-a7eb-9eb995c33f76/download/1501-sierra-leone-health-centers.xlsx
689533,#adm2,District,SLE,standby-task-force,141121-sierra-leone-health-facilities,f78dc606-04e2-4fb6-a7eb-9eb995c33f76,2014-11-01,True,0x5c648c638f567754,"['Kenema', 'Kenema', 'Kenema', 'Kenema', 'Kenema', 'Kenema', 'Kenema', 'Kenema', 'Kenema', 'Kenema', 'Kenema']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/1501 Sierra Leone Health Centers1.xlsx,https://data.humdata.org/dataset/7453fb80-752b-4078-a892-d936f9846dab/resource/f78dc606-04e2-4fb6-a7eb-9eb995c33f76/download/1501-sierra-leone-health-centers.xlsx
689535,#adm3,Chiefdom,SLE,standby-task-force,141121-sierra-leone-health-facilities,f78dc606-04e2-4fb6-a7eb-9eb995c33f76,2014-11-01,True,0x5c648c638f567754,"['Dama', 'Dama', 'Dama', 'Dama', 'Dama', 'Dama', 'Dama', 'Dama', 'Dama', 'Dama', 'Dama']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/1501 Sierra Leone Health Centers1.xlsx,https://data.humdata.org/dataset/7453fb80-752b-4078-a892-d936f9846dab/resource/f78dc606-04e2-4fb6-a7eb-9eb995c33f76/download/1501-sierra-leone-health-centers.xlsx
689545,#adm1+name,Nom de la région,GIN,ipc-cluster-guinea,guinea-healthcare-master-data,5d2531d6-c03a-449b-afdd-52c07d687679,2015-09-03,True,0x47c11098111f7d90,"['Boke', 'Conakry', 'Faranah', 'Kankan', 'Kindia', 'Labe', 'Mamou', 'Nzerekore']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/Guinea health-facility master data1.google sheet,https://docs.google.com/spreadsheets/d/1x0MgLKLG3fxWBJ200VV5Fr67GqgSvYISefO-EYEp2wg/edit#gid=0


(3777, 9)
(3777, 12)


## Data cleaning

In [45]:
data = data[data['Data excerpt'].notnull()]
data = data[data['Data excerpt'].str.contains(r'[A-Za-z0-9]')]

print(data.shape)

(3336, 12)


## Tags distribution

In [46]:
data['tag'] = data['Hashtag with Attributes'].apply(lambda x: x.split('+')[0])
tag_counts_train = data['tag'].value_counts()

print(tag_counts_train)

tag
#adm1           522
#adm2           473
#affected       380
#country        284
#date           244
#org            228
#adm3           188
#inneed         136
#sector         109
#geo            106
#targeted        75
#loc             67
#activity        67
#status          59
#population      57
#indicator       55
#region          49
#meta            42
#reached         32
#adm4            32
#subsector       13
#event           13
#beneficiary     12
#cause           12
#value           11
#item            10
#severity         9
#output           8
#crisis           6
#currency         4
#service          4
#adm5             4
#contact          4
#access           4
#capacity         3
#impact           3
#description      3
#frequency        2
#group            2
#modality         2
#delivery         1
#operations       1
Name: count, dtype: int64


## Generate Data table descriptions using LLM
Though there are descriptions on HDX for datasets, there aren't always detailed descriptions for tables in CSV and Excel files. We will generate these using GPT-3.5-Turbo to include in the prompt as they provide valuable context when considering what HXL tags apply to a column

In [None]:
def generate_data_description(data, file_name):
    """
    Generate a short description of a dataset based on a summary of its content.

    Args:
        data (DataFrame): The input dataset for which a description needs to be generated.
        file_name (str): The name of the file.

    Returns:
        str: A short description of the dataset.
    """

    prompt = f"""
      This data file ...

      {file_name}

      Has data that looks like this ...

      {data.iloc[0:DATA_EXCERPT_SIZE].to_string()}

      Summarize this dataset
    """

    # Define conversation messages
    messages = [
        {"role": "system", "content": "You are a helpful assistant Summarizing data into one paragraph"},
        {"role": "user", "content": prompt}
    ]

    # Request a completion (description) from the OpenAI API
    try:
      response = client.chat.completions.create(
          model=DATA_SUMMARY_LLM,
          messages=messages,
          temperature=0,
          max_tokens=300,
          stop=["\n\n"]
      )
    except Exception as e:
      print(f"Error generating description for {file_name} ... {e}")
      return ""

    return response.choices[0].message.content


def generate_data_descriptions(data_in):

  data = data_in.copy()
  data["Data description"] = ''

  unique_resources = data['HDX resource id'].unique().shape[0]
  print(f"\n\nUnique resources: {unique_resources}\n")

  count = 0
  dataset_descriptions = {}
  for index, row in data.iterrows():
    resource_name = row['File'].replace(f"{LOCAL_DATA_DIR}/",'')
    resource_id = row['HDX resource id']
    if resource_id not in dataset_descriptions:
      # We need a try/except as not all downloads from HDX succeed
      try:
        if ".xlsx" in resource_name:
            df = pd.read_excel(resource_name)
        elif ".csv" in resource_name:
            df = pd.read_csv(resource_name)
        else:
            print(f"Unknown file type for {resource_name}")
            continue
      except Exception as e:
          print(f"Error reading {resource_name} ... {e}")
          continue

      dataset_descriptions[resource_id] = \
      generate_data_description(df, resource_name)

      print(f"Description: {dataset_descriptions[resource_id]}")

      data.loc[data['HDX resource id'] == resource_id, 'Data description'] = \
                dataset_descriptions[resource_id]

      count += 1
      if count % 10 == 0:
          print(f"Processed {round(count/unique_resources,2)*100}% resources ...")

  return data


df = generate_data_descriptions(data)
display(df)




Unique resources: 554

Description: The dataset from the file "DRC - Baseline Assessment - M23 Crisis 13 - February 20247.xlsx" contains information on the total number of internally displaced persons (IDPs) and returnees in the Democratic Republic of Congo. The data includes the total number of IDP households, total IDP individuals, total IDP males, total IDP females, and total returnees. For instance, there are 319,283 IDP households, 1,548,732 IDP individuals, 646,805 IDP males, 901,927 IDP females, and 587,705 returnees recorded in the dataset.
Description: The dataset contains earthquake data for various administrative regions within Afghanistan, including information such as country name, admin1 name, latitude, longitude, aggregation method, indicator name, and indicator value. Each row represents a different administrative region with corresponding earthquake magnitude values. The dataset appears to be structured with columns for specific data attributes and rows for individua

## Train/Test split

In this section we will create train and test datasets for fine tuning.

In [None]:
data = pd.read_csv(f"{LOCAL_DATA_DIR}/hxl_hash_resources_data.csv")
print(data.shape)
display(data)

(7834, 12)


Unnamed: 0,Hashtag with Attributes,Text header,Locations,Data provider,HDX dataset id,HDX resource id,Date created,Unnamed: 9,Hash,Data excerpt,File,URL
0,#affected+hh,Total IDP HH,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c,[319283],/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/DRC - Baseline Assessment - M23 Crisis 13 - February 2024.xlsx,https://data.humdata.org/dataset/3554c498-660a-45cb-ada5-86a1fbcd6056/resource/26ecc26f-74e7-46af-b450-8872dca0b63b/download/adc_27jan-12_feb_update_public_v2.xlsx
1,#affected+idp+ind,Total IDP IND,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c,[1548732],/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/DRC - Baseline Assessment - M23 Crisis 13 - February 2024.xlsx,https://data.humdata.org/dataset/3554c498-660a-45cb-ada5-86a1fbcd6056/resource/26ecc26f-74e7-46af-b450-8872dca0b63b/download/adc_27jan-12_feb_update_public_v2.xlsx
2,#affected+idp+male,Total IDP Male Ind,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c,[646805],/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/DRC - Baseline Assessment - M23 Crisis 13 - February 2024.xlsx,https://data.humdata.org/dataset/3554c498-660a-45cb-ada5-86a1fbcd6056/resource/26ecc26f-74e7-46af-b450-8872dca0b63b/download/adc_27jan-12_feb_update_public_v2.xlsx
3,#affected+female+idp,Total IDP Female Ind,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c,[901927],/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/DRC - Baseline Assessment - M23 Crisis 13 - February 2024.xlsx,https://data.humdata.org/dataset/3554c498-660a-45cb-ada5-86a1fbcd6056/resource/26ecc26f-74e7-46af-b450-8872dca0b63b/download/adc_27jan-12_feb_update_public_v2.xlsx
4,#affected+ind+returnees,Total Returnees,COD,international-organization-for-migration,drc-displacement-idps-returnees-m23-crisis-north-kivu-province-baseline-assessment-iom-dtm,26ecc26f-74e7-46af-b450-8872dca0b63b,2023-10-16,True,0x2cc7fd3129c0d18c,[587705],/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/DRC - Baseline Assessment - M23 Crisis 13 - February 2024.xlsx,https://data.humdata.org/dataset/3554c498-660a-45cb-ada5-86a1fbcd6056/resource/26ecc26f-74e7-46af-b450-8872dca0b63b/download/adc_27jan-12_feb_update_public_v2.xlsx
...,...,...,...,...,...,...,...,...,...,...,...,...
7829,#lat_deg,prevlat,VUT,brcmapsteam,cyclone-pam-path,a8ccd9d2-8328-487a-b04b-ca3f3f2e0ea3,2015-03-16,True,0x1d4a8deeb40f76ce,"['-8.5', '-8.5', '-8.4', '-9.8', '-10.6', '-11.1', '-11', '-11.2', '-11.5', '-11.9', '-12.6']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/Cyclone Pam Path.google sheet,https://docs.google.com/spreadsheets/d/1xFOPVLCKeVpLtM27loV3_zicG-xswOZk7SD_nAQ217Q/edit?usp=sharing
7830,#lon_deg,prevlon,VUT,brcmapsteam,cyclone-pam-path,a8ccd9d2-8328-487a-b04b-ca3f3f2e0ea3,2015-03-16,True,0x1d4a8deeb40f76ce,"['169.8', '169.8', '170.3', '170.5', '170.3', '170.1', '169.6', '169.7', '169.7', '170.1', '170.2']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/Cyclone Pam Path.google sheet,https://docs.google.com/spreadsheets/d/1xFOPVLCKeVpLtM27loV3_zicG-xswOZk7SD_nAQ217Q/edit?usp=sharing
7831,#period_date,datelabel,VUT,brcmapsteam,cyclone-pam-path,a8ccd9d2-8328-487a-b04b-ca3f3f2e0ea3,2015-03-16,True,0x1d4a8deeb40f76ce,"['09 Mar', '09 Mar', '10 Mar', '10 Mar', '10 Mar', '11 Mar', '11 Mar', '11 Mar', '11 Mar', '12 Mar', '12 Mar']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/Cyclone Pam Path.google sheet,https://docs.google.com/spreadsheets/d/1xFOPVLCKeVpLtM27loV3_zicG-xswOZk7SD_nAQ217Q/edit?usp=sharing
7832,#x_time,hours,VUT,brcmapsteam,cyclone-pam-path,a8ccd9d2-8328-487a-b04b-ca3f3f2e0ea3,2015-03-16,True,0x1d4a8deeb40f76ce,"['0', '6', '18', '30', '36', '42', '48', '54', '60', '66', '72']",/content/drive/MyDrive/Colab/hxl-metadata-prediction/data/Cyclone Pam Path.google sheet,https://docs.google.com/spreadsheets/d/1xFOPVLCKeVpLtM27loV3_zicG-xswOZk7SD_nAQ217Q/edit?usp=sharing


###  Split by data provider organization

On HDX, the hierarchy is ...

Organization > datasets > resources > tables

A random train/test split will result in data from files in a dataset being in both train and test, which would pollute the test set with very similar data to training. So we will split by organization

In [None]:
def split_data(column_data, provider_col, test_size=0.2, random_state=42):
    """
    Perform train-test split on datasets, print information, and return X_train and X_test.

    The split is done by organizations, to try and avoid the situation where an org provides
    similar data files. Also, we exclude orgs which are subsidiaries from the test set, eg ocha-*
    as presumably each subsid will provide similar data. The aim is that the test set is new.

    Parameters:
    - column_data (pd.DataFrame): DataFrame containing column data.
    - provider_col (string): Name of column holding data providers.
    - test_size (float): The proportion of the dataset to include in the test split.
    - random_state (int): Seed for random number generation.

    Returns:
    - pd.DataFrame, pd.DataFrame: X_train, X_test
    """

    orgs_df = column_data.groupby(provider_col)[provider_col].count().sort_values(ascending=False)
    orgs_df = column_data.groupby(provider_col)[provider_col].count().sort_values(ascending=False).reset_index(name='count')
    all_orgs = orgs_df[provider_col].unique()

    # Split orgs to get 'Parent', eg 'ocha-*' -> 'ocha'
    orgs_df['org_parent'] = orgs_df[provider_col].str.split('-').str[0]

    # Count occurrences of each 'org_parent'
    org_parent_counts = orgs_df['org_parent'].value_counts().reset_index(name='count')

    # Filter to keep only those occurring once
    org_parents_single_occurrence = org_parent_counts[org_parent_counts['count'] == 1]

    # Get the 'org_parent' values that occur only once
    single_occurrence_org_parents = org_parents_single_occurrence['org_parent'].tolist()

    # Filter the original DataFrame to keep rows where 'org_parent' occurs only once
    org_parents_unique = orgs_df[orgs_df['org_parent'].isin(single_occurrence_org_parents)]

    print("\nOrgs which don't seem to be subsidiaries ...\n")
    display(org_parents_unique)

    single_entities = list(org_parents_unique[provider_col].unique())

    # Remove 'hdx' from single_entities, not good for testing as it's the folks that made HXL! Also some monolithic orgs with very similar data
    single_entities = [x for x in single_entities if not x in ['hdx','ourairports', 'un-ocha']]

    single_entities.sort()

    # Sample single-subsid orgs
    sample_size = int(len(single_entities)*test_size) - 1
    np.random.seed(42)
    X_test_orgs = np.random.choice(single_entities, sample_size)
    X_train_orgs = list(set(all_orgs)-set(X_test_orgs))

    print(f"Train orgs: {X_train_orgs}")
    print(f"Test orgs: {X_test_orgs}")

    # Extract column rows for datasets in X_train_datasets
    X_train = column_data[column_data[provider_col].isin(X_train_orgs)]

    # Extract column rows for datasets in X_test_datasets
    X_test = column_data[~column_data[provider_col].isin(X_train_orgs)]

    train_orgs = X_train[provider_col].unique()
    train_orgs.sort()
    test_orgs = X_test[provider_col].unique()
    test_orgs.sort()

    print(f"\nTrain orgs: {train_orgs}")
    print(f"\nTest orgs: {test_orgs}")

    print(f"\nTrain column data: {X_train.shape}")
    print(f"Test column data: {X_test.shape}")

    return X_train, X_test


X_train, X_test = split_data(data, 'Data provider', test_size=0.2, random_state=42)


Orgs which don't seem to be subsidiaries ...



Unnamed: 0,Data provider,count,org_parent
1,immap,221,immap
2,hdx,139,hdx
5,hera-humanitarian-emergency-response-africa,100,hera
11,redhum,69,redhum
13,fieldsdata,53,fieldsdata
17,ifrc,49,ifrc
20,wfp,41,wfp
25,brcmapsteam,31,brcmapsteam
28,dhs,28,dhs
29,standby-task-force,28,standby


Train orgs: ['cbes', 'unicef-esaro', 'unicef-data', 'iati', 'ocha-philippines', 'gec', 'ocha-pakistan', 'ocha-rosea', 'ocha-fts', 'world-health-organization', 'sadc_rvaa', 'cirrolytix', 'ocha-colombia', 'cesvi', 'blavatnik-school-of-government-university-of-oxford', 'lacso', 'soswcaf', 'wfp', 'ocha-niger', 'jhucsse', 'cpaor', 'water-point-data-exchange', 'ocha-myanmar', 'unesco', 'cfp-rco-nepal', 'unicef-rdc', 'hdx', 'rcpwca', 'clear', 'crs-waro', 'srsgcc', 'ocha-mali', 'brcmapsteam', 'ocha-burkina', 'ipc-cluster-guinea', 'ocha-haiti', 'ourairports', 'moving-energy-initiative', 'ocha-eritrea', 'ocha-ethiopia', 'cerf', 'ocha-opt', 'ocha-sudan', 'unodc', 'qcri', 'ipc', 'ocha-rosc', 'ewipa', 'unhcr', 'ocha-south-sudan', 'libya-ingo-forum', 'hxl', 'kenya-national-bureau-of-statistics', 'ocha-burundi', 'inter-sector-coordination-group', 'dalberg', 'ocha-libya', 'ocha-ukraine', 'ocha-rowca', 'ocha-yemen', 'redhum', 'infoculture', 'ocha-indonesia', 'cred', 'ocha-car', 'ocha-afghanistan', 'som

### Create LLM fine-tuning prompt files

In [None]:
def create_prompt_file(X_train, prompt_col, filename):
    """
    Create a prompt file from a DataFrame.

    Args:
        X_train (pd.DataFrame): The DataFrame containing the prompts.
        prompt_col (str): The name of the column containing the prompts.
        filename (str): The name of the file to write the prompts to.
    """

    with open(filename, 'w') as f:
        for index, row in X_train.iterrows():
            f.write(row[prompt_col] + "\n")

    print(f"Prompts written to {filename}")


def generate_chat_prompt(dataset_name, resource_name, column_name, excerpt, \
                         hxl_tag=None, dataset_description="", add_response=True):
    """
    Generate a chat (eg for GPT-3.5-Turbo) fine tuning prompt for HXL tags given dataset, resource, column information.

    Parameters:
    - dataset_name (str): Name of the dataset.
    - resource_name (str): Name of the resource.
    - column_name (str): Name of the column.
    - excerpt (str): Examples or excerpt of the column.
    - hxl_tag (str, optional): HXL tags for the column. Default is None.
    - dataset_description (str, optional): Description of the dataset. Default is an empty string.
    - add_response (bool, optional): Whether to include the response in the prompt. Default is True.

    Returns:
    - dict: A dictionary containing the prompt and optional completion/response.
    """

    system_message = """
        You are an assistant that replies with HXL tags and attributes"
    """

    resource_name = resource_name.replace(f"{LOCAL_DATA_DIR}/",'')

    column_details = f"resource_name='{resource_name}'; " + \
                     f"dataset_description='{dataset_description}'; " + \
                     f"column_name:'{column_name}'; examples: {excerpt}"

    user_prompt = f"What are the HXL tags and attributes for a column with these details? {column_details}"

    prompt = {
        "messages": [
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_prompt},
        ]
    }

    if add_response:
        prompt["messages"].append({"role": "assistant", "content": hxl_tag})

    #prompt = json.dumps(prompt, index=4)

    return prompt


def generate_prompts(df,
                     heading_col='Text header',
                     resource_name_col='File', \
                     tag_col='Hashtag with Attributes', \
                     excerpt_col='Data excerpt', \
                     hxl_tag=False):
    """
    Generate a set of prompts for HXL tags from a DataFrame.

    Parameters:
    - df (DataFrame): Input DataFrame containing dataset, resource, column information.
    - hxl_tag (bool, optional): Whether to include HXL tags in the prompts. Default is False.
    - heading_col (str, optional): Name of the column containing column headers. Default is 'Text header'.
    - resource_name_col (str, optional): Name of the column containing resource names. Default is 'File'.
    - tag_col (str, optional): Name of the column containing HXL tags. Default is 'Hashtag with Attributes'.
    - excerpt_col (str, optional): Name of the column containing column data excerpts. Default is 'Data excerpt'.

    Returns:
    - str: A string containing JSON-formatted prompts.
    """


    prompts = []
    for index, row in df.iterrows():
        if row['HDX resource id'] in dataset_descriptions:
            dataset_description = dataset_descriptions[row['HDX resource id']]
        else:
            "No dataset description, skipping ..."
            continue

        prompt = generate_chat_prompt('',  # Dataset name
                    row[resource_name_col], \
                    row[heading_col], \
                    row[excerpt_col], \
                    hxl_tag=row[tag_col], \
                    dataset_description=dataset_description, \
                    add_response=True)

        prompt["Data description"] = dataset_description

        for field in ['HDX resource id', 'HDX dataset id', 'Data provider', \
                      'Date created', 'Locations', 'URL', 'Text header',\
                      'Data excerpt']:
            prompt[field] = row[field]

        prompts.append(prompt)

    return prompts

def save_prompts(prompts, filename):

    with open(filename, 'w') as f:
        for prompt in prompts:
            f.write(json.dumps(prompt) + "\n")

    print(f"Prompts written to {filename}")


def save_all_prompts():
    for dataset_type in ["train","test"]:
        if dataset_type == "train":
            data = X_train
        else:
            data = X_test

        prompt_file = f"{LOCAL_DATA_DIR}/hxl_chat_prompts_{dataset_type}.jsonl"
        prompts = generate_prompts(data, hxl_tag=True)

        print(f"\n\nSaving {len(prompts)} prompts to {prompt_file} ...")

        save_prompts(prompts, prompt_file)


save_all_prompts()




Unique resources: 482

Description: The dataset from the file "DRC - Baseline Assessment - M23 Crisis 13 - February 2024.xlsx" contains information on the total number of internally displaced persons (IDPs) and returnees in the Democratic Republic of Congo. The data includes the total number of IDP households, IDP individuals, male and female IDPs, and returnees. Specifically, there are 319,283 IDP households, 1,548,732 IDP individuals, with 646,805 males and 901,927 females, and 587,705 returnees.
Description: The dataset contains earthquake data for various administrative regions in Afghanistan, including country name, admin1 name, latitude, longitude, aggregation type, indicator name, and indicator value. The data includes maximum earthquake values recorded in different regions, with corresponding latitude and longitude coordinates. The dataset provides insights into the seismic activity in different administrative areas of Afghanistan.
Description: The dataset contains informatio

  df = pd.read_csv(resource_name)


Description: The dataset contains information on demographics and locations of forcibly displaced and stateless persons globally. It includes data on the year, country of origin and asylum, population type, location, urban/rural classification, accommodation type, and demographic breakdown by age and gender. The data provides details such as the number of females and males in different age groups, as well as the total population count for each entry. The dataset covers various countries and years, offering insights into the displacement and statelessness of individuals across different regions.
Description: The dataset contains information on projects in Ethiopia for December 2023, including details such as project status, donors, implementing agencies, locations at regional, zonal, and woreda levels, sector clusters, activities, and geographical coordinates. The data includes multiple entries for different projects, each specifying the status, donors, implementing agencies, and locati

  df = pd.read_csv(resource_name)


Description: The dataset stored in the file global_pcodes.csv contains information about locations, administrative levels, P-Codes, names, parent P-Codes, and valid from dates. The data includes details such as country codes, admin levels, specific P-Codes for different regions, names of locations, parent P-Codes, and effective dates. The dataset appears to focus on administrative divisions within a country, with each entry representing a different region or area within Afghanistan.
Description: The dataset titled "3W_All_Clusters_March_2022.xlsx" contains information on different clusters, organizations, organization types, regions, and districts. The data includes columns for Cluster, Organization, Org Type, Region, and District, with examples of entries such as CCCM, Agency for Technical Cooperation and Development (ACTED), INGO, and various regions and districts. The dataset appears to provide details on organizations operating within different clusters and regions, along with thei

  df = pd.read_csv(resource_name)


Description: The dataset contains information on various measures and indicators related to COVID-19 response for different countries and regions. It includes data such as government responses, containment measures, economic support, vaccination policies, confirmed cases and deaths, vaccination status, and stringency indices. The dataset is structured with columns representing different aspects of the response efforts, with rows corresponding to specific dates and locations.
Description: The dataset contains information on health institutions in Haiti, including details such as administrative divisions (adm1_fr, adm1_ht, adm2code, adm2_en, adm2_fr, adm3code, adm3_en, adm3_fr), institution names, categories, types, codes, latitude (LatDD), and longitude (LongDD). The institutions vary in ownership (public, private for-profit, private non-profit, and mixed) and include health centers, hospitals, and dispensaries. The dataset provides a comprehensive overview of healthcare facilities in H

  df = pd.read_csv(resource_name)


Description: The dataset contains transaction data from the file "transactions.csv" with columns including Month, Reporting org id, Reporting org name, Reporting org type, Sector, Recipient country, Humanitarian indicator, Strict indicator, Transaction type, Activity id, Net money, and Total money. The data includes information on transactions made by the AECID Spanish Agency for International Development Cooperation in various sectors and countries during January 2020. The transactions involve commitments with different values for net money and total money.
Description: The dataset from the file "GHO-mid-year-update-2023.xlsx" contains columns labeled "Unnamed: 0" and "Unnamed: 1" with various entries, including information such as page titles, export details, dates, and data sources. The dataset appears to have some missing values denoted by "NaN" entries. The data seems to pertain to Humanitarian Action 2023, with details on plans, dates, and sources.
Description: The dataset "Droug

  warn("""Cannot parse header or footer so it will be ignored""")


Description: The dataset contains information on populated places in Iraq, including administrative divisions such as country, governorate, district, and sub-district levels. Each entry includes details such as place names in English and Arabic, corresponding administrative codes, longitude, latitude, and estimated population. The dataset provides a comprehensive overview of various populated places in Iraq and their respective demographic information.
Description: The dataset contains information on events in South Sudan from January to December 2022, including details such as event dates, locations at various administrative levels, population demographics, displacement information, movement triggers, arrival details, needs assessments, and shelter conditions. The data includes assessments of affected populations, including IDPs and returnees, as well as information on household composition, age groups, gender breakdowns, and specific needs such as food, shelter, water, sanitation, he

  df = pd.read_csv(resource_name)


Description: The dataset contains information on COVID-19 vaccinations, including data such as location, ISO code, date, total vaccinations, people vaccinated, people fully vaccinated, total boosters, daily vaccinations, and various vaccination rates per hundred and per million. The data includes details on different countries and dates, with some entries showing specific vaccination numbers while others have missing values. The dataset provides a comprehensive overview of vaccination progress across different locations and time periods.
Description: The dataset contains information on various communes in Madagascar, specifically focusing on the impact of drought in the Gand Sud region in September 2022. The data includes details such as the evaluation date, commune type, region, district, household and individual statistics, reasons for displacement, destinations of displaced individuals, and returnee information. Each row represents a different commune, with data on the impact of dro

  df = pd.read_csv(resource_name)


Description: The dataset contains information about various schools, including their names, locations, populations of pupils in 2012 and 2015, ISCED levels, addresses, operators, coordinates, and other details. The data includes details such as school names, dates started, pupil populations, geographical information, and operator information. The dataset seems to focus on schools in the Tawi-tawi region of the Autonomous Region in Muslim Mindanao.
Description: The dataset contains information related to IDPs (Internally Displaced Persons) and returnees in Uganda, specifically focusing on different administrative levels such as Admin 0, Admin 1, Admin 2, Admin 3, and Admin 4. The data includes details like the snapshot date, survey date, administrative codes and names, total number of IDPs and returnees in households and individuals, area of origin of IDPs, and the type of displacement (e.g., natural disasters like floods). The dataset provides insights into the displacement situation i

  df = pd.read_csv(resource_name)


Description: The dataset titled "Excess mortality during COVID-19 pandemic" contains information on various countries, regions, periods, years, months, weeks, dates, deaths, expected deaths, excess deaths, and total excess deaths percentage. The data includes details such as the country, region, period, year, month, week, date, number of deaths, expected deaths, and excess deaths for each entry. The dataset appears to track excess mortality during the COVID-19 pandemic, with a focus on the percentage of total excess deaths.
Processed 56.00000000000001% resources ...
Description: The dataset contains daily updates on COVID-19 cases in Indonesia, with columns for Date, Cumulative_cases, Recovered_cases, Total_death, Patient_under_treatment, New_case_perDay, Recovered-cases_perDay, Death_cases_perDay, and Treatment_cases_perDay. The data starts from March 2, 2020, and includes information on the number of cumulative cases, recoveries, deaths, patients under treatment, and daily changes in

  warn(msg)


Description: The dataset provided is stored in an Excel file and contains information related to various fields such as reporting week, status of response, start and end dates, organizations involved, locations, sector/cluster activities, monitoring indicators, quantities, and demographic data. The data includes details on different activities conducted by the International Organization for Migration (IOM) in Tigray, including health-related services like mobile health and nutrition teams, mental health support, and treatment for severe acute malnutrition. The dataset also includes information on the number of consultations, individuals reached, and services provided.
Description: The dataset contains information on language data for Ecuador, with columns including the names of administrative regions, language codes, number of named languages, main language, main language share, population totals, gender breakdown, literacy rates, and metadata details. The data shows the distribution o

  warn(msg)


Description: The dataset contains information on various organizations, partners, clusters, sub-clusters, regions, provinces, cities/municipalities, barangays, evacuation sites, activities, and their statuses (ongoing, completed, planned). It includes details such as start and finish dates, remarks, region codes, province codes, municipal city codes, organization acronyms, and partner organization acronyms. The data showcases different activities undertaken by organizations like construction, distribution of kits, repair of facilities, and hygiene promotion sessions in response to Typhoon Goni (Rolly) and Vamco (Ulysses) in the Bicol Region.
Description: The dataset contains information on various sectors such as Nutrition, ES NFI, WaSH, Education, Child Protection, FSL, Health, GBV, Protection, RCF, and Total for different states or regions. The data includes details like targets, reached values, and specific sector information for each region. The dataset provides a comprehensive ove

  warn(msg)


Description: The dataset contains information on various programs and projects funded by different agencies, such as UNHCR and UNICEF, in Eritrea. The data includes details like program titles, outcomes, sectors, funding requirements, funding received in different quarters, and donor information. The programs cover a range of areas including health and nutrition, water and sanitation, education, environment, capacity development, food security, gender empowerment, and social protection. Each entry also includes information on pillars, government representatives, and categories. The dataset provides a comprehensive overview of the funding and activities related to refugee and other persons of concern in Eritrea.
Description: The dataset from the file "200304_3W on NCDDS_EQ.xlsx" contains information on the status of projects, with a total of 1653 projects. Of these, 1241 projects are completed, 281 are ongoing, and 131 are planned. The data is structured in two columns, with the first c

  warn(msg)


Description: The dataset contains information on various organizations, partners, clusters, sub-clusters, regions, provinces, cities/municipalities, barangays, evacuation sites, activities, and statuses related to food security, agriculture, and livelihood initiatives. Each entry includes details such as the start and finish dates, the number of families served, and the contents of food packs distributed. The data covers different regions in the Philippines, including Ilocos, Cordillera Administrative Region, and Cagayan Valley. The activities mentioned are primarily focused on distributing food packs containing rice, monggo, dried fish, biscuits, sardines, cooking oil, sugar, and salt, with additional fortified rice packs included in some cases.
Processed 79.0% resources ...
Description: The dataset contains information on various organizations, their types, partners, clusters, sub-clusters, regions, provinces, cities/municipalities, barangays, evacuation centers, activities, statuses

  warn(msg)


Description: The dataset contains information on sexual violence incidents reported in various countries in 2015, including the number of staff affected, incident type, survivor gender, and date. The data includes countries such as Afghanistan, Belgium, Colombia, the Democratic Republic of the Congo (DRC), and Ethiopia. The incident types range from unknown to aggressive sexual behavior, unwanted sexual comments, sexual assault, and attempted sexual assault, with female survivors being the predominant gender represented in the dataset.
Description: The dataset contains information about various camps set up in the aftermath of the Sulawesi Earthquake in Indonesia. It includes details such as camp names, locations, duration, ownership, facilities available, demographics of residents, health and hygiene conditions, food security, access to education and healthcare, livelihood impact, security measures, and access to basic necessities. The camps vary in terms of size, conditions, and serv

  warn(msg)


Description: The dataset from the file "160704_5W_HDX.xlsx" contains information about different organizations, their sectors, activities, locations (province, canton, and parish), and status of their projects. The data includes details such as organization name, sector, activity, province, canton, parish, and project status. The dataset seems to focus on organizations involved in activities related to water, sanitation, hygiene, and health in the Esmeraldas province of Ecuador. The status of the projects varies between "En Ejecución" (In Execution) and "Finalizado" (Finished).
Description: The dataset contains information on the number of internally displaced persons (IDPs) in various countries. The data includes countries such as Uganda, Ethiopia, Kenya, Tanzania, South Sudan, Rwanda, Burundi, Angola, and Zambia, with corresponding IDP numbers. Some countries have missing data for IDPs. Ethiopia has the highest number of IDPs with 2,800,000, followed by South Sudan with 1,900,000, an

  warn(msg)


Description: The dataset from the file "160516_5W_ForHDX.xlsx" contains information about various organizations, sectors, provinces, and beneficiaries in different regions. The data includes columns such as ID, Gob, Cod.2, Organización, Sector, Provincia, Cantón, and Total beneficiaries. It provides details about organizations like Agencia Adventista de Desarrollo y Recursos Asistenciales and Aldeas Infantiles SOS operating in provinces like Manabí and Esmeraldas. The dataset also includes information on the number of beneficiaries in specific areas.
Error generating description for /content/drive/MyDrive/Colab/hxl-metadata-prediction/data/Redhum-Ec 5w 2.0 Ronda 12 Versión 2061012-HDX.xlsx ... Error code: 400 - {'error': {'message': "Invalid 'messages[1].content': string too long. Expected a string with maximum length 1048576, but got a string with length 1983173 instead.", 'type': 'invalid_request_error', 'param': 'messages[1].content', 'code': 'string_above_max_length'}}
Description:

  warn(msg)


Description: The dataset contains information on various activities implemented in different districts and VDC wards, including details such as partner organizations, funding sources, activity types, sub-types, names, details, units, funding and activity statuses, planned and reached totals, start and end dates, and comments. The data includes activities like technical assistance, training, and construction demonstrations, with details on participants and trainees involved. The dataset covers multiple activities across different locations, with completion statuses ranging from ongoing to completed.
Description: The dataset contains information on various regions in Afghanistan, including province codes, district codes, operational and organizational presence in different sectors such as ESNFI, FSAC, health, nutrition, protection, and wash. Each region has data on the operational presence and capacity of different organizations in these sectors. The dataset provides details on the numbe

  warn(msg)


Description: The dataset contains information on regions, districts, communes, and fokontany in Madagascar, along with data on the number of affected individuals in terms of deaths, injuries, damaged houses, flooded houses, roofless houses, displaced persons, and displaced households. The data includes details on different demographic groups such as children under 5 years, pregnant women, persons with disabilities, and individuals over 60 years old. The dataset provides a breakdown of the impact of various events on different locations within the regions, including the number of affected individuals and households.
Description: The dataset contains information on conflict-induced displacements in Afghanistan in 2016, with data compiled by OCHA sub offices based on inter-agency assessments. The data includes details such as the date of displacement, province code and name of origin and displacement, as well as district code and name of origin. The dataset provides a snapshot of newly di

  warn(msg)


Description: The dataset contains information on various organizations, partners, clusters, sub-clusters, activity statuses, activity types, regions, provinces, cities/municipalities, barangays, evacuation sites, start and finish dates of activities in the Philippines as of December 2016. The data includes details such as organization names, partner organizations, types of activities (humanitarian or development), and the status of activities (ongoing, completed, planned). The dataset covers a range of activities including protection, mine action, DRRM projects, food security, agriculture, livelihood, and education, among others, implemented in different regions and provinces within the country.
Error reading /content/drive/MyDrive/Colab/hxl-metadata-prediction/data/DC_OP4_DANA.xlsx ... Excel file format cannot be determined, you must specify an engine manually.
Description: The dataset contains information on nutrition assistance in various countries in the Sahel region for the year 2

  warn(msg)


Description: The dataset contains information on organizations involved in activities related to water, sanitation, and hygiene (WASH) in the Esmeraldas province of Ecuador. The data includes details such as organization names, sectors, activities, quantities, units, locations (province, canton, parish), and status of the projects (e.g., in execution or completed). Organizations like OPS/OMS, Cruz Roja Ecuatoriana, and World Vision are mentioned, along with their specific WASH-related projects such as water treatment, distribution of hygiene kits, and water chlorination. The dataset provides a snapshot of ongoing and completed initiatives in the region.


  warn(msg)


Description: The dataset contains information on various organizations, sectors, types of activities, codes, quantities, units, locations (including provinces, cantons, and parishes), status of activities, and total number of beneficiaries. The data includes details such as organization names, sectors like Water, Sanitation, and Hygiene, specific activities undertaken by each organization, quantities of items distributed or actions taken, and the status of each activity (e.g., in execution or finalized). The dataset covers activities related to water purification, distribution of hygiene kits, construction of latrines, and other water, sanitation, and hygiene initiatives in provinces like Manabí and Esmeraldas in Ecuador.
Unknown file type for /content/drive/MyDrive/Colab/hxl-metadata-prediction/data/Ecuador Earthquake - April 2016 - Severity index.google sheet
Description: The dataset contains information on various projects in Burundi, including details such as reference numbers, dis

  warn(msg)


Description: The dataset titled "141121 LR Health Care Facilities.xlsx" contains information related to the deployment of the DHN/Standby Task Force for data collection of health facilities in Guinea, Liberia, and Sierra Leone. The data was collected, collated, and cleaned by Standby Task Force volunteers during September and October 2014. The dataset includes links to updated maps and is free for use for nonprofit humanitarian projects. Users are advised to check back regularly for new updates as the links may change. The dataset also includes a link to the Standby Task Force Maps portal for further information.
Processed 100.0% resources ...
Description: The dataset contains information on health centers in Sierra Leone, including details such as center ID, status, date opened, type of center, activity, location coordinates, address, and source of information. The data includes various health centers across different districts and chiefdoms in Sierra Leone, with details on their capa

  warn(msg)


Description: The dataset contains information on the density of HIV/AIDS incidence per 1000 cases by sex and municipality in Colombia. It includes data such as the number of cases, affected population, and density of incidence for different municipalities in Antioquia. The dataset also provides details on the source of the data and specifies that only live cases for the year 2017 were considered. The data is structured in columns with various unnamed headers, and it seems to be sourced from the SISPRO 2017 database.
Description: The dataset SECOP_HDX.xlsx contains information on public procurement contracts related to the Venezuelan refugee and migrant population. It includes variables describing entities involved in public contracts, contracts targeting the refugee and migrant population, and a description of the dataset. The geographical breakdown includes national, departmental, and in some cases municipal levels. The dataset covers the period from 2005 to 2021 and is updated as nee