# create_metadata

This notebook creates the CSVW metadata file for the interview CSV file.

## Setup

Imports packages.

Information on the `csvw_functions` package is available here: https://github.com/stevenkfirth/csvw_functions

In [5]:
import csvw_functions
import json

## Get embedded metadata

Reads the CSV file and extracts the information from the column headings to form an initial CSVW metadata document.

In [6]:
metadata_table_dict=\
    csvw_functions.get_embedded_metadata(
        'interview_data.csv',
        relative_path=True  # sets the 'url' table property to a path relative to the current working directory.
)
metadata_table_dict

{'@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': {'columns': [{'titles': {'und': ['respondent_id']},
    'name': 'respondent_id'},
   {'titles': {'und': ['question_id']}, 'name': 'question_id'},
   {'titles': {'und': ['question_text']}, 'name': 'question_text'},
   {'titles': {'und': ['response_text']}, 'name': 'response_text'}]},
 'url': 'interview_data.csv'}

## Add new information to the metadata document

This section adds additional information to create a complete metadata document.

### set properties on Table object

New information about the table.

In [7]:
metadata_table_dict.update(
    {
        "dc:title": "Interview University staff",
        "dc:description": 
            "Fictional interview responses for a survey carried out on University staff in June 2022",
        "dc:location": "Loughborough University, LE11 3TU, UK",
        "dc:creator": "ABCE Open Research Team"
    }
)
metadata_table_dict

{'@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': {'columns': [{'titles': {'und': ['respondent_id']},
    'name': 'respondent_id'},
   {'titles': {'und': ['question_id']}, 'name': 'question_id'},
   {'titles': {'und': ['question_text']}, 'name': 'question_text'},
   {'titles': {'und': ['response_text']}, 'name': 'response_text'}]},
 'url': 'interview_data.csv',
 'dc:title': 'Interview University staff',
 'dc:description': 'Fictional interview responses for a survey carried out on University staff in June 2022',
 'dc:location': 'Loughborough University, LE11 3TU, UK',
 'dc:creator': 'ABCE Open Research Team'}

### update column names

Updates the column names to better formatted strings.

In [8]:
for col_dict in metadata_table_dict['tableSchema']['columns']:
    col_dict['name']=col_dict['titles']['und'][0].split(',')[0].split('(')[0].lower().replace(' ','_')
metadata_table_dict

{'@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': {'columns': [{'titles': {'und': ['respondent_id']},
    'name': 'respondent_id'},
   {'titles': {'und': ['question_id']}, 'name': 'question_id'},
   {'titles': {'und': ['question_text']}, 'name': 'question_text'},
   {'titles': {'und': ['response_text']}, 'name': 'response_text'}]},
 'url': 'interview_data.csv',
 'dc:title': 'Interview University staff',
 'dc:description': 'Fictional interview responses for a survey carried out on University staff in June 2022',
 'dc:location': 'Loughborough University, LE11 3TU, UK',
 'dc:creator': 'ABCE Open Research Team'}

### add column descriptions, datatypes and units

Adds additional information to each column.

In [9]:
data={
    'respondent_id':{
        'rdfs:label': 'respondent_id',
        'dc:description': 'A unique integer identifier for the respondent.',
        'datatype':'integer'
    },
    'question_id':{
        'rdfs:label': 'question_id',
        'dc:description': 'A unique integer identifier for the question.',
        'datatype':'integer'
    },
    'question_text':{
        'rdfs:label': 'question_text',
        'dc:description': 'The text of the question.',
        "datatype": "string"
    },
    'response_text':{
        'rdfs:label': 'response_text',
        'dc:description': 'The text of the response.',
        "datatype": "string"
    }
}
for col_dict in metadata_table_dict['tableSchema']['columns']:
    for k,v in data[col_dict['name']].items():
        col_dict[k]=v
metadata_table_dict

{'@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': {'columns': [{'titles': {'und': ['respondent_id']},
    'name': 'respondent_id',
    'rdfs:label': 'respondent_id',
    'dc:description': 'A unique integer identifier for the respondent.',
    'datatype': 'integer'},
   {'titles': {'und': ['question_id']},
    'name': 'question_id',
    'rdfs:label': 'question_id',
    'dc:description': 'A unique integer identifier for the question.',
    'datatype': 'integer'},
   {'titles': {'und': ['question_text']},
    'name': 'question_text',
    'rdfs:label': 'question_text',
    'dc:description': 'The text of the question.',
    'datatype': 'string'},
   {'titles': {'und': ['response_text']},
    'name': 'response_text',
    'rdfs:label': 'response_text',
    'dc:description': 'The text of the response.',
    'datatype': 'string'}]},
 'url': 'interview_data.csv',
 'dc:title': 'Interview University staff',
 'dc:description': 'Fictional interview responses for a survey carried out on Unive

## Save the newly created metadata table object

In [10]:
with open('interview_data.csv-metadata.json','w') as f:
    json.dump(metadata_table_dict,f,indent=4)

## Testing

To test the newly created metadata file, we can use the `csvw_functions` package to create an annotated table group object and chaeck for errors. We can also convert the data to JSON-LD to check that this process works fine.


In [11]:
annotated_table_group_dict=csvw_functions.create_annotated_table_group(
    'interview_data.csv-metadata.json'
)

*(No runtime errors)*

In [12]:
csvw_functions.get_errors(annotated_table_group_dict)

[]

*(No errors stored in the annotated table group object)*

In [14]:
json_ld=csvw_functions.create_json_ld(
    annotated_table_group_dict,
    mode='minimal'
)
json_ld

[{'respondent_id': 1,
  'question_id': 1,
  'question_text': 'What is your definition of sustainable construction?',
  'response_text': 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed quis metus vel sapien tincidunt congue in a sem. Curabitur blandit mauris velit, nec vestibulum odio laoreet quis. Vestibulum vestibulum ornare massa, eu lacinia nulla. Nam ut enim nibh. Maecenas vel pretium dolor. Duis vitae condimentum mi. Ut mollis enim sed mauris eleifend, id semper erat commodo. In hac habitasse platea dictumst. Maecenas ac tempus risus. Mauris eu dui condimentum, laoreet eros posuere, fringilla ipsum. Suspendisse facilisis lobortis metus, at fringilla ligula venenatis eget. Duis bibendum, sem sed ullamcorper finibus, ante massa commodo massa, ac hendrerit dui enim eu velit. Etiam porta massa justo, at volutpat augue interdum auctor.'},
 {'respondent_id': 1,
  'question_id': 2,
  'question_text': 'What do you think the barriers are?',
  'response_text': 'Ut in enim et e

*(No runtime errors. Conversion looks fine.)*