MS-Fabric-End-to-End-Project

Stroke Prediction: In this project, we will predict whether a patient will have a heart stroke or not based on his/her comorbidities, work, and lifestyle. The project requires Data ingestion and orchestration, Data Cleaning, Data Visualiation and Machine learning Model(SVM). The following Microsoft Fabric workload were used for the project delivery:

1. Data Engineering(Data Factory & Synpase Engineering)
2. Data Science
3. Data Analysis(Power BI)

Project Architecture

Inside a Fabric enabled workspace, the following assets were created to deliver this project

Data Engineering

The Dataset for this project was extracted, transformed and Loaded into an already created lakehouse using dataflow gen2 that opens up the Power Query online. On Power query, the lakehouse destination must be specified before publishing.

The Published Dataflow was then ingested and orchestrated using data factory. In the pipeline, a dataflow activity was created and ran successfully (note that pipeline for this project is actually not compulsory but in case other pipeline activities like copy activtity are reqiured before loading the entire dataset into the lakehouse). Tables in Lakehouses are based on delta storage format.

The stroke dataset is now available in the lakehouse

The lakehouse comes with SQL endpoint that allows the use of SQL queries to explore the dataset. At this stage, SQL views were created for the fact and dimension tables for the BI model using some SQL queries.

Exploratory Data Analysis and Visualization

Data model was created based off the SQL views for the fact and dimension tables. DAX measures were created for the visualization.

The visualization was created, however, viusualization capabilities and experience here is not as seamless like the power BI desktop. Interestingly, I was able connect to the SQL endpoint in Power BI deskstop using the SQL endpoint connection string. it then occured to me that bringing in the dataset into Power BI desktop using import mode will be duplicating my dataset neither do I want to use direct query mode because I have a small dataset. I will go ahead to do the visualization in Power BI desktop later but this is worthy of mention since the idea of fabric onelake is to avoid duplication of data within our tenant.

Machine Learning Model (Support Vector Machine)

Spark note was created to build the machine learning model after it was connected to the existing lakehouse. This is the first machine learning model I will be building leveraging the UI of fabric and ChatGPT to write the the machine learning model codes. The model accuracy is 95%. However, I dont know what to do with the results of this model. I will be collaborating with my Data scientist friend to wrap up this project to a more insightful outcome.

It was a very rewarding and exciting experience. I am so thrilled at the possiblities with fabric and I cannot wait to witness the co-pilot coming to public preview. Link to dataset here: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
data modeling.sql		data modeling.sql
stroke notebook.ipynb		stroke notebook.ipynb
stroke notebook.py		stroke notebook.py
test.sql		test.sql
testscript.sql		testscript.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

data modeling.sql

data modeling.sql

stroke notebook.ipynb

stroke notebook.ipynb

stroke notebook.py

stroke notebook.py

test.sql

test.sql

testscript.sql

testscript.sql

Repository files navigation

MS-Fabric-End-to-End-Project

Project Architecture

Data Engineering

Exploratory Data Analysis and Visualization

Machine Learning Model (Support Vector Machine)

About

Releases

Packages

Languages

DonFrancis1/MS-Fabric-End-to-End-Project

Folders and files

Latest commit

History

Repository files navigation

MS-Fabric-End-to-End-Project

Project Architecture

Data Engineering

Exploratory Data Analysis and Visualization

Machine Learning Model (Support Vector Machine)

About

Resources

Stars

Watchers

Forks

Languages