# create_metadata

This notebook creates the CSVW metadata file for the questionnaire CSV file.

## Setup

Imports packages.

Information on the `csvw_functions` package is available here: https://github.com/stevenkfirth/csvw_functions

In [1]:
import csvw_functions
import json

## Get embedded metadata

Reads the CSV file and extracts the information from the column headings to form an initial CSVW metadata document.

In [2]:
metadata_table_dict=\
    csvw_functions.get_embedded_metadata(
        'questionnaire_data.csv',
        relative_path=True  # sets the 'url' table property to a path relative to the current working directory.
)
metadata_table_dict

{'@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': {'columns': [{'titles': {'und': ['person_id']},
    'name': 'person_id'},
   {'titles': {'und': ['first name']}, 'name': 'first%20name'},
   {'titles': {'und': ['last name']}, 'name': 'last%20name'},
   {'titles': {'und': ['age']}, 'name': 'age'},
   {'titles': {'und': ['gender']}, 'name': 'gender'},
   {'titles': {'und': ['occupation']}, 'name': 'occupation'}]},
 'url': 'questionnaire_data.csv'}

## Add new information to the metadata document

This section adds additional information to create a complete metadata document.

### set properties on Table object

New information about the table.

In [3]:
metadata_table_dict.update(
    {
        "dc:title": "Questionnaire on University staff",
        "dc:description": 
            "Fictional questionnaire responses for a survey carried out on University staff in June 2022",
        "dc:location": "Loughborough University, LE11 3TU, UK",
        "dc:creator": "ABCE Open Research Team"
    }
)
metadata_table_dict

{'@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': {'columns': [{'titles': {'und': ['person_id']},
    'name': 'person_id'},
   {'titles': {'und': ['first name']}, 'name': 'first%20name'},
   {'titles': {'und': ['last name']}, 'name': 'last%20name'},
   {'titles': {'und': ['age']}, 'name': 'age'},
   {'titles': {'und': ['gender']}, 'name': 'gender'},
   {'titles': {'und': ['occupation']}, 'name': 'occupation'}]},
 'url': 'questionnaire_data.csv',
 'dc:title': 'Questionnaire on University staff',
 'dc:description': 'Fictional questionnaire responses for a survey carried out on University staff in June 2022',
 'dc:location': 'Loughborough University, LE11 3TU, UK',
 'dc:creator': 'ABCE Open Research Team'}

### update column names

Updates the column names to better formatted strings.

In [4]:
for col_dict in metadata_table_dict['tableSchema']['columns']:
    col_dict['name']=col_dict['titles']['und'][0].split(',')[0].split('(')[0].lower().replace(' ','_')
metadata_table_dict

{'@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': {'columns': [{'titles': {'und': ['person_id']},
    'name': 'person_id'},
   {'titles': {'und': ['first name']}, 'name': 'first_name'},
   {'titles': {'und': ['last name']}, 'name': 'last_name'},
   {'titles': {'und': ['age']}, 'name': 'age'},
   {'titles': {'und': ['gender']}, 'name': 'gender'},
   {'titles': {'und': ['occupation']}, 'name': 'occupation'}]},
 'url': 'questionnaire_data.csv',
 'dc:title': 'Questionnaire on University staff',
 'dc:description': 'Fictional questionnaire responses for a survey carried out on University staff in June 2022',
 'dc:location': 'Loughborough University, LE11 3TU, UK',
 'dc:creator': 'ABCE Open Research Team'}

### add column descriptions, datatypes and units

Adds additional information to each column.

In [5]:
data={
    'person_id':{
        'rdfs:label': 'person_id',
        'dc:description': 'A unique integer identifier for the respondent.',
        'datatype':'integer'
    },
    'first_name':{
        'rdfs:label': 'first_name',
        'propertyUrl': 'foaf:firstName',
        'datatype':'string',
        'rdfs:comment': 'See http://xmlns.com/foaf/0.1/#term_firstName for definition'
        
    },
    'last_name':{
        'rdfs:label': 'last_name',
        'propertyUrl': 'foaf:family_Name',
        "datatype": "string",
        'rdfs:comment': 'See http://xmlns.com/foaf/0.1/#term_family_name for definition'
    },
    'age':{
        'rdfs:label': 'age',
        "propertyUrl": "https://dbpedia.org/ontology/age",
        "datatype": "integer",
        "http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure": {
            "@id": "http://qudt.org/vocab/unit/YR"
        },
        'https://schema.org/unitText': 'Years'
    },
    'gender':{
        'rdfs:label': 'gender',
        "propertyUrl": "foaf:gender",
        "datatype": "string",
        'rdfs:comment': 'See http://xmlns.com/foaf/0.1/#term_gender for definition'
    },
    'occupation':{
        'rdfs:label': 'occupation',
        "propertyUrl": "https://schema.org/jobTitle.",
        "datatype": 'string'
    }
}
for col_dict in metadata_table_dict['tableSchema']['columns']:
    for k,v in data[col_dict['name']].items():
        col_dict[k]=v
metadata_table_dict

{'@context': 'http://www.w3.org/ns/csvw',
 'tableSchema': {'columns': [{'titles': {'und': ['person_id']},
    'name': 'person_id',
    'rdfs:label': 'person_id',
    'dc:description': 'A unique integer identifier for the respondent.',
    'datatype': 'integer'},
   {'titles': {'und': ['first name']},
    'name': 'first_name',
    'rdfs:label': 'first_name',
    'propertyUrl': 'foaf:firstName',
    'datatype': 'string',
    'rdfs:comment': 'See http://xmlns.com/foaf/0.1/#term_firstName for definition'},
   {'titles': {'und': ['last name']},
    'name': 'last_name',
    'rdfs:label': 'last_name',
    'propertyUrl': 'foaf:family_Name',
    'datatype': 'string',
    'rdfs:comment': 'See http://xmlns.com/foaf/0.1/#term_family_name for definition'},
   {'titles': {'und': ['age']},
    'name': 'age',
    'rdfs:label': 'age',
    'propertyUrl': 'https://dbpedia.org/ontology/age',
    'datatype': 'integer',
    'http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure': {'@id': 'http://qudt.

## Save the newly created metadata table object

In [6]:
with open('questionnaire_data.csv-metadata.json','w') as f:
    json.dump(metadata_table_dict,f,indent=4)

## Testing

To test the newly created metadata file, we can use the `csvw_functions` package to create an annotated table group object and chaeck for errors. We can also convert the data to JSON-LD to check that this process works fine.


In [7]:
annotated_table_group_dict=csvw_functions.create_annotated_table_group(
    'questionnaire_data.csv-metadata.json'
)

*(No runtime errors)*

In [8]:
csvw_functions.get_errors(annotated_table_group_dict)

[]

*(No errors stored in the annotated table group object)*

In [9]:
json_ld=csvw_functions.create_json_ld(
    annotated_table_group_dict,
    mode='minimal'
)
json_ld[0:2]

[{'person_id': 1,
  'foaf:firstName': 'Andrew',
  'foaf:family_Name': 'Smith',
  'https://dbpedia.org/ontology/age': 32,
  'foaf:gender': 'M',
  'https://schema.org/jobTitle.': 'Administrator'},
 {'person_id': 2,
  'foaf:firstName': 'Beth',
  'foaf:family_Name': 'Jones',
  'https://dbpedia.org/ontology/age': 45,
  'foaf:gender': 'F',
  'https://schema.org/jobTitle.': 'Reader'}]

*(No runtime errors. Conversion looks fine.)*