# UK OMOP Dataset Table
Convert a CSV file created in Excel to JSON format for GitHub.
Also, export as Word table?

In [1]:
import pandas as pd
from IPython.display import JSON
import json

Import CSV file

In [2]:
source_file_name = 'Combined Organisation Dataset Table (v4).csv'

In [3]:
table = pd.read_csv(source_file_name).fillna(method='ffill')
#  dtype={'Dataset on portal?': bool, 'Dataset on Gateway?': bool, 'Cohort Discovery?': bool}
table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   Organisation                 105 non-null    object
 1   Organisation link            105 non-null    object
 2   Organisation with Hyperlink  105 non-null    object
 3   Dataset on portal?           105 non-null    bool  
 4   Dataset on Gateway?          105 non-null    bool  
 5   Cohort Discovery?            105 non-null    bool  
 6   Data set                     105 non-null    object
 7   Care type                    105 non-null    object
 8   Health area                  105 non-null    object
 9   Repository Link              105 non-null    object
dtypes: bool(3), object(7)
memory usage: 6.2+ KB


Filter out unwanted columns for JSON file!

We don't want `'Organisation with Hyperlink'` as it is saved as plain text. We don't want `'Repository Link'` as the forward filling inserts incorrect values.

In [4]:
table.drop(['Organisation with Hyperlink', 'Repository Link'], inplace=True, axis=1)
table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Organisation         105 non-null    object
 1   Organisation link    105 non-null    object
 2   Dataset on portal?   105 non-null    bool  
 3   Dataset on Gateway?  105 non-null    bool  
 4   Cohort Discovery?    105 non-null    bool  
 5   Data set             105 non-null    object
 6   Care type            105 non-null    object
 7   Health area          105 non-null    object
dtypes: bool(3), object(5)
memory usage: 4.5+ KB


Export as an edited (filled etc.) CSV:

In [5]:
table.to_csv('Dataset_Table.csv', index=False)

Convert tabular data to JSON string format and a JSON `dict`:

In [6]:
json_str = table.to_json(orient='index')

In [7]:
json_dict = json.loads(json_str)

Display the JSON data interactively:

In [8]:
JSON(json_dict)

<IPython.core.display.JSON object>

Pretty-print the JSON string (to improve readability)

In [10]:
pretty = json.dumps(json_dict, indent=4)
print(pretty)

{
    "0": {
        "Organisation": "Akrivia Health",
        "Organisation link": "https://akriviahealth.com",
        "Dataset on portal?": true,
        "Dataset on Gateway?": false,
        "Cohort Discovery?": false,
        "Data set": "Akrivia (AKRDB)",
        "Care type": "Secondary",
        "Health area": "Mental health"
    },
    "1": {
        "Organisation": "Avon Longitudinal Study of Parents and Children",
        "Organisation link": "https://www.bristol.ac.uk/alspac/",
        "Dataset on portal?": false,
        "Dataset on Gateway?": false,
        "Cohort Discovery?": true,
        "Data set": "ALSPAC",
        "Care type": "Cohort",
        "Health area": "General"
    },
    "2": {
        "Organisation": "Barts Health NHS Trust",
        "Organisation link": "https://www.bartshealth.nhs.uk/",
        "Dataset on portal?": true,
        "Dataset on Gateway?": false,
        "Cohort Discovery?": false,
        "Data set": "Barts",
        "Care type": "Inpatient

Test pretty-printed JSON for comaptibility

In [13]:
pretty_dict = json.loads(pretty)

In [14]:
pretty_dict == json_dict

True

Write the JSON string to a text file:

In [11]:
json_file_name = 'datasets.json'
with open(json_file_name, mode='wt') as json_file_obj:
    # chars = json_file_obj.write(json_str)
    chars = json_file_obj.write(pretty)
print(f'{chars} characters written to {json_file_name}')

37823 characters written to datasets.json
