# Nightingale  Demo
This notebook extracts data from an online source,  converts it into an SDML Dataset, and computes aggregate data from it.  Both the original and summary data are saved as SDML tables.

# Florence Nightingale's Dataset
Modern hospital care and nursing can be said to have begun with the English statistician and nurse [Florence Nightingale](https://en.wikipedia.org/wiki/Florence_Nightingale).  During the [Crimean War](https://en.wikipedia.org/wiki/Crimean_War), Ms. Nightingale tended to wounded troops, and invented modern hospital practices: primarily, sanitation and disinfection.  To show the efficacy of her techniques, she counted deaths due to disease, wounds, and undetermined causes, showing that disease was the primary killer in war.  She showed that after her reforms, deaths dropped dramatically, and deaths due to disease more so.

The original dataset is taken from [Nightingale's Rose](https://github.com/datasets-io/nightingales-rose/), and is copyright the [Compute.io](https://github.com/compute-io) authors.

In [None]:
import pandas as pd
import json

Read the original data 

In [None]:
nightingale = pd.read_json('https://raw.githubusercontent.com/datasets-io/nightingales-rose/master/lib/dataset.json')

Extract the month from the date, and set so month 1 is April 1854, Month 24 is march 1856, then get rid of the spurious date column

In [None]:
nightingale['month'] = nightingale['date'].apply(lambda date: (date.month - 3) + 12 * (date.year - 1854))
nightingale = nightingale.iloc[:, 1:]

Compute the total deaths each month, and then use these to compute the percentage of deaths due to each cause.

In [None]:
nightingale['total'] = nightingale['disease'] + nightingale['wounds'] + nightingale['other']
columns = ['disease', 'wounds', 'other']
def compute_pct(column_name):
    pct = (nightingale[column_name]/nightingale['total']) * 10000
    nightingale[f'{column_name}_pct'] =  pct.astype(int)/100
for column in columns: compute_pct(column)

Create a table for the original Nightingale data and save it.  First, make the schema

In [None]:
schema = [{"name": column, "type": "number"} for column in nightingale.columns]
schema

Next, form the table in SDML format.  This is a JSON disctionary with the schema, the table type as "RowTable", and the rows as just the list of rows of the table

In [None]:
table = {
    "type": "RowTable",
    "schema": schema,
    "rows": nightingale.values.tolist()
}



Save the table

In [None]:
with open('scratch/nightingale.sdml', 'w') as f:
  json.dump(table, f, indent=2)

Compute the summary  table.  This has columns (month, deaths, cause). 

In [None]:
from functools import reduce
# Compute the table (month, death, cause) for each cause
def extract_cause_table(cause):
    slice = nightingale[['month', cause]].rename(columns = {cause: 'deaths'})
    slice['cause'] = cause
    return slice
# form the cause tables
cause_tables = [extract_cause_table(column) for column in columns]
# use reduce to merge them into a single table
merged = reduce(lambda acc, cur: acc.merge(cur, how = "outer"), cause_tables[1:], cause_tables[0])
# Use this to create the detail table and send to the dashboard


In [None]:

summary_schema = [
    {'name': 'month', 'type': 'number'},
    {'name': 'deaths', 'type': 'number'},
    {'name': 'cause', 'type': 'string'}
]
summary_table = {
    "type": "RowTable",
    "schema": summary_schema,
    "rows": merged.values.tolist()
}

with open('scratch/nightingale-summary.sdml', 'w') as f:
  json.dump(summary_table, f, indent=2)