This is the repository for the LinkedIn Learning course Data-Centric AI: Best Practices, Responsible AI, and more
. The full course is available from LinkedIn Learning.
Machine learning typically focuses on producing effective models for a given dataset. In real-world applications, data is messy and improving models is not the only way to get better performance. Data-centric AI (DCAI) is an emerging science that studies techniques to improve datasets, which is often the best way to improve performance in practical ML applications. While data scientists have long practiced this manually via ad hoc trial/error and intuition, DCAI considers the improvement of data as a systematic engineering discipline. In this course, Aishwarya Srinivasan covers the data-centric principles that guide our path forward in this new age of AI as we shift from a model-centric approach to a data-centric paradigm. Learn about DCAI—what it is and the value it offers. Aishwarya covers the DCAI workflow; MLOps as part of DCAI; data validation and preprocessing; model validation; bias detection and mitigation; responsible AI; and more.
See the readme file in the main branch for updated instructions and information.
The code example in Chapters 6, 7, and 8 uses a Github Codespace to launch a Jupyter Notebook containing the code and a Maternal Health Risk dataset. To launch the notedbook, simply click on Code
and then Codespace
.
The Maternal Health Risk dataset is from:
Ahmed, Marzia. (2023). Maternal Health Risk. UCI Machine Learning Repository. https://doi.org/10.24432/C5DP5D.
Microsoft Founders Hub: https://www.microsoft.com/en-us/startups
Data-Centric AI Resource Hub: https://datacentricai.org
##Blogs What Are the Data-Centric AI Concepts behind GPT Models?: https://towardsdatascience.com/what-are-the-data-centric-ai-concepts-behind-gpt-models-a590071bb727
Why We Should Care About Bad Data:: https://blog.thegovlab.org/why-we-should-care-about-bad-data
Explore Data Cleaning Techniques with Python: https://www.kdnuggets.com/2023/04/exploring-data-cleaning-techniques-python.html
Cool Vendors in Data-Centric AI: https://www.gartner.com/en/documents/4022242
Data-Centric Strategies for Supply Chain Planning Improvements: https://www.gartner.com/en/supply-chain/insights/power-of-the-profession-blog/data-centric-strategies-for-supply-chain-planning-improvements
Solving the Data-Centric versus Model-Centric AI Governance Debate: https://www.collibra.com/us/en/blog/ai-governance-solving-the-data-centric-versus-model-centric-debate
MLOps 3.3: Data-Centric Approach for Machine Learning Modeling: https://towardsai.net/p/l/mlops-3-3-data-centric-approach-for-machine-learning-modelling
MLOps: Towards DevOps for Data-Centric AI: https://snorkel.ai/mlops-towards-devops-for-data-centric-ai-with-ce-zhang/
Machine Learning Explainability vs Interpretability: https://www.kdnuggets.com/2018/12/machine-learning-explainability-interpretability-ai.html
Tools for Explainability and Transparency: https://www.aiforpeople.org/tools-for-explainability-and-transparency/
Detect Data Drift on Datasets: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets?view=azureml-api-1
Responsible AI Toolbox: https://github.com/microsoft/responsible-ai-toolbox
President Biden Executive Order: https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/
Data-Centric AI: Perspectives and Challenges: https://epubs.siam.org/doi/abs/10.1137/1.9781611977653.ch106
Active-Learning-as-a-Service: An Automatic and Efficient MLOps System for Data-Centric AI: https://arxiv.org/abs/2207.09109
Potential Impact of Data Centric AI on Society: https://technologyandsociety.org/potential-impact-of-data-centric-ai-on-society/
Explainable and Interpretable Anomaly Detection Models for Production Data: https://onepetro.org/SJ/article/27/01/349/473036
Data Acquisition Azure Data Factory: https://azure.microsoft.com/en-us/products/data-factory Azure Streaming Analytics: https://azure.microsoft.com/en-us/products/stream-analytics
Data Preparation/Validation Azure Data Studio: https://azure.microsoft.com/en-us/products/data-studio MIcrosoft Power BI on Azure: https://azure.microsoft.com/en-us/products/power-bi
Model Development/Validation Microsoft Machine Learning Studio: https://studio.azureml.net/
Model Training/Model Deployment Azure Machine Learning: https://azure.microsoft.com/en-us/products/machine-learning
Model/Data Monitoring Azure Monitor: https://azure.microsoft.com/en-us/products/monitor
Infrastructure and Management Azure Network Watcher: https://learn.microsoft.com/en-us/azure/network-watcher/network-watcher-overview