Skip to content

DonFrancis1/MS-Fabric-End-to-End-Project

Repository files navigation

MS-Fabric-End-to-End-Project

Stroke Prediction: In this project, we will predict whether a patient will have a heart stroke or not based on his/her comorbidities, work, and lifestyle. The project requires Data ingestion and orchestration, Data Cleaning, Data Visualiation and Machine learning Model(SVM). The following Microsoft Fabric workload were used for the project delivery:

1. Data Engineering(Data Factory & Synpase Engineering)
2. Data Science
3. Data Analysis(Power BI) 

Project Architecture

Frame 1

Inside a Fabric enabled workspace, the following assets were created to deliver this project Inkedfabric assets_LI

Data Engineering

The Dataset for this project was extracted, transformed and Loaded into an already created lakehouse using dataflow gen2 that opens up the Power Query online. On Power query, the lakehouse destination must be specified before publishing.
Frame 1

The Published Dataflow was then ingested and orchestrated using data factory. In the pipeline, a dataflow activity was created and ran successfully (note that pipeline for this project is actually not compulsory but in case other pipeline activities like copy activtity are reqiured before loading the entire dataset into the lakehouse). Tables in Lakehouses are based on delta storage format.

pipeline

The stroke dataset is now available in the lakehouse image

The lakehouse comes with SQL endpoint that allows the use of SQL queries to explore the dataset. At this stage, SQL views were created for the fact and dimension tables for the BI model using some SQL queries. sql queries

Exploratory Data Analysis and Visualization

Data model was created based off the SQL views for the fact and dimension tables. DAX measures were created for the visualization. model

The visualization was created, however, viusualization capabilities and experience here is not as seamless like the power BI desktop. Interestingly, I was able connect to the SQL endpoint in Power BI deskstop using the SQL endpoint connection string. it then occured to me that bringing in the dataset into Power BI desktop using import mode will be duplicating my dataset neither do I want to use direct query mode because I have a small dataset. I will go ahead to do the visualization in Power BI desktop later but this is worthy of mention since the idea of fabric onelake is to avoid duplication of data within our tenant.

viz 1 viz 2

Machine Learning Model (Support Vector Machine)

Spark note was created to build the machine learning model after it was connected to the existing lakehouse. This is the first machine learning model I will be building leveraging the UI of fabric and ChatGPT to write the the machine learning model codes. The model accuracy is 95%. However, I dont know what to do with the results of this model. I will be collaborating with my Data scientist friend to wrap up this project to a more insightful outcome.

p1 p2 p3 p4 p5

It was a very rewarding and exciting experience. I am so thrilled at the possiblities with fabric and I cannot wait to witness the co-pilot coming to public preview. Link to dataset here: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages