Skip to content

Covid-19 End to End Big Data and ML- from ingesting stream to deploying ML model in production

License

Notifications You must be signed in to change notification settings

adipolak/covid-19-e2e-big-data-ml-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID-19 Data Analytics - End to End Big Data and ML system

This is an example of how you can build your own Covid-19 End to End Big Data and ML- from ingesting stream to deploying ML model in production leveraging kafka, Apache Spark, Spark mllib and cloud services to build your system and produce machine learning model with big data.

** this doesn't include CLI/Bash/Powershell/yml files for ops.

Prerequisites:

  1. Azure account
  2. Eventhubs
  3. Azure Databricks with MLFlow
  4. Azure Machine Learning
  5. Azure KeyVault
  6. Kubernetes Environment / Azure Container Instance

7.Cognitive Services - for enriching tweet data with sentiment


Architecture layers


ML life cycle from development to production

This is a simplified diagram that demonstrate a machine learning life cycle, from development to production.

The main drivers for triggering a new machine learning training process are often based on monitoring and observability layers. Three main triggers are:

  • Data driven - we detect new variability of data in our systems
  • Scheduled driven - we want to release an updated machine learning model every x days.
  • Metrics driven - error detected - highly dependent on the model itself and our ability to detect wrong prdictions/classifications based on the use case


Q&A

If you have questions/concerns or would like to chat, contact us:

About

Covid-19 End to End Big Data and ML- from ingesting stream to deploying ML model in production

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published