# Get full metadata from CSV file - BEDMAP

This notebook shows how to get the rich and complete metadata from the Bedmap CSV file. It also shows how to convert the CSV file to NCCSV (NetCDF compatible CSV) as an example to the FAIR implementation of the CSV format used for Bedmap.

With only access to the CSV file, it is possible to programmatically obtain rich metadata from the file without having a complex metadata structure as header information in the file. 

## The data

The BEDMAP CSV files are available for downmoad from the UK Polar data Centre:

    BEDMAP1 CSV: https://doi.org/10.5285/f64815ec-4077-4432-9f55-0ce230f46029
    BEDMAP2 CSV: https://doi.org/10.5285/2fd95199-365e-4da1-ae26-3b6d48b3e6ac
    BEDMAP3 CSV: https://doi.org/10.5285/91523ff9-d621-46b3-87f7-ffb6efcd1847
    
## Upload the modules

For this conversion, we will need pandas, json and urllib modules. 


In [7]:
import pandas as pd
import json
import urllib.request

## Opening and reading the CSV file

For this exercise, we only need to check the short metadata provided in the CSV file.

In [30]:
CSV_file = 'D:/BEDMAP/AWI_2015_GEA-DML_AIR_BM3.csv'
csv_metadata = pd.read_csv(CSV_file, nrows=17, sep = ': ', engine='python', header= None)
csv_metadata

Unnamed: 0,0,1
0,#project,Dronning Maud Land (GEA).
1,#time_coverage_start,2015
2,#time_coverage_end,2016
3,#creator_name,Alfred Wegener Institute; Bundesanstalt für Ge...
4,#institution,Alfred Wegener Institute; Bundesanstalt für Ge...
5,#funding,Bundesanstalt für Geowissenschaften; Alfred We...
6,#source,https://doi.pangaea.de/10.1594/PANGAEA.915475
7,#references,https://doi.org/10.1016/j.gr.2018.05.011
8,#platform,airborne radar.
9,#instrument,AWI EMR.


First, we set the index as the first column (column 0 in our case).

In [31]:
csv_metadata[0] = csv_metadata[0].str.strip('#')
csv_metadata = csv_metadata.set_index(0)

The data from the CSV file are transformed to a dictionary for an easy handling of the metadata.

In [32]:
dict_metadata = csv_metadata.to_dict()[1]
dict_metadata

{'project': 'Dronning Maud Land (GEA).',
 'time_coverage_start': '2015',
 'time_coverage_end': '2016',
 'creator_name': 'Alfred Wegener Institute; Bundesanstalt für Geowissenschaften.',
 'institution': 'Alfred Wegener Institute; Bundesanstalt für Geowissenschaften.',
 'funding': 'Bundesanstalt für Geowissenschaften; Alfred Wegener Institute.',
 'source': 'https://doi.pangaea.de/10.1594/PANGAEA.915475',
 'references': 'https://doi.org/10.1016/j.gr.2018.05.011',
 'platform': 'airborne radar.',
 'instrument': 'AWI EMR.',
 'history': 'Incoherent processing',
 'electromagnetic wave speed in ice': '168.0 (meters/microseconds)',
 'firn correction': '0 (m)',
 'centre frequency': '150 (MHz)',
 'comment': 'Part of Bedmap3',
 'metadata_link': 'https://doi.org/10.5285/91523ff9-d621-46b3-87f7-ffb6efcd1847',
 'license': 'https://creativecommons.org/licenses/by/4.0/'}

The additional metadata can be obtained from the doi itself referenced in the metadata_link.

In [35]:
# Opening JSON file
doi = dict_metadata['metadata_link'].strip('https://doi.org/')

with urllib.request.urlopen("https://api.datacite.org/dois/application/vnd.datacite.datacite+json/" + doi) as url:
    DOI_metadata = json.load(url)

In [36]:
DOI_metadata

{'id': 'https://doi.org/10.5285/91523ff9-d621-46b3-87f7-ffb6efcd1847',
 'doi': '10.5285/91523FF9-D621-46B3-87F7-FFB6EFCD1847',
 'url': 'https://data.bas.ac.uk/full-record.php?id=GB/NERC/BAS/PDC/01614',
 'types': {'ris': 'DATA',
  'bibtex': 'misc',
  'citeproc': 'dataset',
  'schemaOrg': 'Dataset',
  'resourceType': 'Dataset',
  'resourceTypeGeneral': 'Dataset'},
 'creators': [{'name': 'Fremand, Alice',
   'givenName': 'Alice',
   'familyName': 'Fremand',
   'affiliation': [{'name': 'British Antarctic Survey',
     'schemeUri': 'https://ror.org/',
     'affiliationIdentifier': 'https://ror.org/01rhff309',
     'affiliationIdentifierScheme': 'ROR'}],
   'nameIdentifiers': [{'schemeUri': 'https://orcid.org',
     'nameIdentifier': 'https://orcid.org/0000-0001-8272-0981',
     'nameIdentifierScheme': 'ORCID'}]},
  {'name': 'Fretwell, Peter',
   'givenName': 'Peter',
   'familyName': 'Fretwell',
   'affiliation': [{'name': 'British Antarctic Survey',
     'schemeUri': 'https://ror.org/',
  

It is now possible to add the relevant and rich DOI metadata to the simple metadata specific to the survey. Depending on the standard, the name of the field may differ. The example below shows how the metadata are transformed to NetCDF compliant metadata.

## Converting the simple metadata to NetCDF compliant metadata

In [53]:
dict_metadata['title'] = data['titles'][0]['title']
dict_metadata['summary'] = data['descriptions'][0]['description']
dict_metadata['publisher'] = data['publisher']
dict_metadata['geospatial_lat_min'] = data['geoLocations'][0]['geoLocationBox']['southBoundLatitude']
dict_metadata['geospatial_lat_max'] = data['geoLocations'][0]['geoLocationBox']['northBoundLatitude']
dict_metadata['geospatial_lon_min'] = data['geoLocations'][0]['geoLocationBox']['westBoundLongitude']
dict_metadata['geospatial_lon_max'] = data['geoLocations'][0]['geoLocationBox']['eastBoundLongitude']


In [54]:
for i in range(0, len(data['subjects'])):
    if '"' not in data['subjects'][i]['subject']:
        dict_metadata['keywords'] = data['subjects'][i]['subject']

In [55]:
dict_metadata

{'project': 'Dronning Maud Land (GEA).',
 'time_coverage_start': '2015',
 'time_coverage_end': '2016',
 'creator_name': 'Alfred Wegener Institute; Bundesanstalt für Geowissenschaften.',
 'institution': 'Alfred Wegener Institute; Bundesanstalt für Geowissenschaften.',
 'funding': 'Bundesanstalt für Geowissenschaften; Alfred Wegener Institute.',
 'source': 'https://doi.pangaea.de/10.1594/PANGAEA.915475',
 'references': 'https://doi.org/10.1016/j.gr.2018.05.011',
 'platform': 'airborne radar.',
 'instrument': 'AWI EMR.',
 'history': 'Incoherent processing',
 'electromagnetic wave speed in ice': '168.0 (meters/microseconds)',
 'firn correction': '0 (m)',
 'centre frequency': '150 (MHz)',
 'comment': 'Part of Bedmap3',
 'metadata_link': 'https://doi.org/10.5285/91523ff9-d621-46b3-87f7-ffb6efcd1847',
 'license': 'https://creativecommons.org/licenses/by/4.0/',
 'abstract': "We present here the Bedmap3 ice thickness, bed and surface elevation standardised CSV data points that are used to creat