## *DISCLAIMER*
<p style="font-size:16px; color:#117d30;">
 By accessing this code, you acknowledge the code is made available for presentation and demonstration purposes only and that the code: (1) is not subject to SOC 1 and SOC 2 compliance audits; (2) is not designed or intended to be a substitute for the professional advice, diagnosis, treatment, or judgment of a certified financial services professional; (3) is not designed, intended or made available as a medical device; and (4) is not designed or intended to be a substitute for professional medical advice, diagnosis, treatment or judgement. Do not use this code to replace, substitute, or provide professional financial advice or judgment, or to replace, substitute or provide medical advice, diagnosis, treatment or judgement. You are solely responsible for ensuring the regulatory, legal, and/or contractual compliance of any use of the code, including obtaining any authorizations or consents, and any solution you choose to build that incorporates this code in whole or in part.
</p>

## Important – Do not use in production, for demonstration purposes only – please review the legal notices before continuing
 License agreement: https://github.com/microsoft/Azure-Analytics-and-AI-Engagement/blob/main/HealthCare/License.md 


## Legal Notices
This presentation, demonstration, and demonstration model are for informational purposes only. Microsoft makes no warranties, express or implied, in this presentation demonstration, and demonstration model. Nothing in this presentation, demonstration, or demonstration model modifies any of the terms and conditions of Microsoft’s written and signed agreements. This is not an offer and applicable terms and the information provided is subject to revision and may be changed at any time by Microsoft.

This presentation, demonstration, and/or demonstration model do not give you or your organization any license to any patents, trademarks, copyrights, or other intellectual property covering the subject matter in this presentation, demonstration, and demonstration model.

The information contained in this presentation, demonstration and demonstration model represent the current view of Microsoft on the issues discussed as of the date of presentation and/or demonstration, and the duration of your access to the demonstration model. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of presentation and/or demonstration and for the duration of your access to the demonstration model.

No Microsoft technology, nor any of its component technologies, including the demonstration model, is intended or made available: (1) as a medical device; (2) for the diagnosis of disease or other conditions, or in the cure, mitigation, treatment or prevention of a disease or other conditions; or (3) as a substitute for the professional clinical advice, opinion, or judgment of a treating healthcare professional. Partners or customers are responsible for ensuring the regulatory compliance of any solution they build using Microsoft technologies.

© 2020 Microsoft Corporation. All rights reserved


## Please don't run / don't click "Run all" the notebook:
At the time of writing of this document, the current core limit is 200 cores per workspace and depending upon number of concurrent users, you may end up with core capacity being exceeded or maximum number of parallel jobs being exceeded error.         



In [3]:
%%pyspark
df = spark.read.load('abfss://iomt-data@#STORAGE_ACCOUNT_NAME#.dfs.core.windows.net/healthcare-iomt.csv', format='csv',header=True)
display(df)
#data_path.show(100)

## Data Transformation


In [10]:
%%pyspark
from pyspark.sql.functions import *
from pyspark.sql.types import *

import numpy as np
pd_df = df.select("PatientId","PatientAge","BodyTemperature",
"HeartRate","BreathingRate","numberOfSteps","Calories").toPandas()
pd_df.groupby(['PatientId'])
print(pd_df)

PatientId PatientAge  ... numberOfSteps Calories
0      0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          2685     2695
1      0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          4639     2922
2      0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...         11649     2399
3      0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          4504     2682
4      0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          5954     2636
...                                     ...        ...  ...           ...      ...
25915  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          1038     1546
25916  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          1526     2709
25917  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          1393     2294
25918  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          1592     1625
25919  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          1382     2280

[25920 rows x 7 columns]

In [15]:
%%pyspark
from pyspark.sql.functions import *
from pyspark.sql.types import *

import numpy as np
pd_df = df.select("PatientId","PatientAge","BodyTemperature",
"HeartRate","BreathingRate","numberOfSteps","Calories").toPandas()
pd_df.groupby(['PatientId','BodyTemperature']).sum()

pd_df.head(10)

PatientId PatientAge  ... numberOfSteps Calories
0  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          2685     2695
1  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          4639     2922
2  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...         11649     2399
3  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          4504     2682
4  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          5954     2636
5  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          5577     2045
6  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          5389     2576
7  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          2654     2872
8  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          5365     2713
9  0058a52a-235c-11eb-be74-70b5e8b8edbb         42  ...          4740     2087

[10 rows x 7 columns]

In [17]:
%%pyspark
df = spark.createDataFrame(pd_df)
df.show(5)

(df
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .format("com.databricks.spark.csv")
 .save('abfss://iomt-data@#STORAGE_ACCOUNT_NAME#.dfs.core.windows.net/iomtData'))


+--------------------+----------+---------------+---------+-------------+-------------+--------+
|           PatientId|PatientAge|BodyTemperature|HeartRate|BreathingRate|numberOfSteps|Calories|
+--------------------+----------+---------------+---------+-------------+-------------+--------+
|0058a52a-235c-11e...|        42|           97.5|      157|           47|         2685|    2695|
|0058a52a-235c-11e...|        42|           98.7|      147|          124|         4639|    2922|
|0058a52a-235c-11e...|        42|           97.4|       82|           97|        11649|    2399|
|0058a52a-235c-11e...|        42|           98.6|       59|          159|         4504|    2682|
|0058a52a-235c-11e...|        42|           97.1|      130|           34|         5954|    2636|
+--------------------+----------+---------------+---------+-------------+-------------+--------+
only showing top 5 rows