Set up the notebook

In [40]:
import os
import csv
import json

notebook_path = os.path.abspath("generate_metadata.ipynb")

Add the CSV file you're working with (in the same folder as this python notebook):

In [59]:
infilename = 'global_land_temp_country_1995_2016.csv'#"{your file here}.csv"
infilepath = os.path.join(os.path.dirname(notebook_path), infilename)

Fill out the fields for the data card, excepting "fields" and "column" - we'll get to those later. A template is [here](https://hackmd.io/62se7jj-Qoycs__e6NjS2w).

In [None]:
data = {
    #ignore this
    'fields': [],
    #here we include the basic card information
    'card': {
        # a short description of the dataset
        'description': '', 
        # a link to the original source
        'source': '', 
        #date last updated (if possible)
        'last_updated': '', 
        #ignore this
        'columns':[], 
        'context': 
        {
            # who it was created by
            'created_by': '',
            # has it been cleaned/prepared for use
            'preparation': '', 
            # does it contain potential identifying/confidential information
            'confidentiality': '', 
            # does it contain information that can identify a subgroup of people (age, race, gender)
            'subgroup_identifiers': '', 
            # what are potential uses (e.g. what are some successful combinations of features)?
            'potential_uses': '', 
            # what should it not be used for?
            'potential_misuses': ''
        }
    }
}


Now that we've done our basic setup, let's get to the columns. Run this code, which should display the available columns in the csv

In [42]:
with open(infilepath, 'r') as infile:
    reader = csv.reader(infile, delimiter=",")
    csv_list = list(map(tuple, reader))
    columns = csv_list[0]
    print(columns)

('Year', 'Day', 'Month', 'AverageTemperature', 'AverageTemperatureUncertainty', 'Country')


Assign each column to a list of either continuous or categorical data, e.g. ```
continuous = ['temperature', 'score']
categorical = ['state', 'color']
```

In [68]:
    continuous = ['Year', 'Day', 'AverageTemperature', 'AverageTemperatureUncertainty']
    categorical = ['Country', 'Month']
    # reset fields in case you're making changes
    data['fields'] = []
    for i in columns:
        if i in continuous:
            data['fields'].append({'type': 'continuous', 'id': i})
        elif i in categorical:
            data['fields'].append({'type': 'categorical', 'id': i})
        else:
            raise Exception("You forget to set a type for %f", i)
    
    print('Set field information:', data['fields'])
    data['card']['columns'] = data['fields'].copy()
    print('\b')
    print('columns: ', columns)

    

Set field information: [{'type': 'continuous', 'id': 'Year'}, {'type': 'continuous', 'id': 'Day'}, {'type': 'categorical', 'id': 'Month'}, {'type': 'continuous', 'id': 'AverageTemperature'}, {'type': 'continuous', 'id': 'AverageTemperatureUncertainty'}, {'type': 'categorical', 'id': 'Country'}]

columns:  ('Year', 'Day', 'Month', 'AverageTemperature', 'AverageTemperatureUncertainty', 'Country')


We've set our columns for the interface, now let's just add descriptions as a list - e.g. if our columns are `['Year', 'Temperature']` our list might be `['The year the measurement was taken', 'The temperature in Celsius']`. Do this in the same order the columns are printed above.

In [62]:
    desc = ['The year the measurement was taken', 'The day the measurement was taken', 'The month the measurement was taken', 'The average temperature from that day in Celsius', 'The average temperature uncertainty in Celsius', 'The country where the measurement was taken']
    if len(desc) < len(data['card']['columns']):
        raise Exception("You don't have a description for each column!")
    for idx, i in enumerate(data['card']['columns']):
        i['description'] =  desc[idx]


Take a look and make sure everything is right, and if you're confident, we can write our data to a JSON file.

In [60]:
    print(infilename)
    with open(infilename.split('.')[0] + '.json', 'w') as outfile:
        json.dump(data, outfile)

global_land_temp_country_1995_2016.csv
