Skip to content

LinkedInLearning/data-centric-ai-4516168

Repository files navigation

Data-Centric AI: Best Practices, Responsible AI, and more

This is the repository for the LinkedIn Learning course Data-Centric AI: Best Practices, Responsible AI, and more. The full course is available from LinkedIn Learning.

lil-thumbnail-url

Machine learning typically focuses on producing effective models for a given dataset. In real-world applications, data is messy and improving models is not the only way to get better performance. Data-centric AI (DCAI) is an emerging science that studies techniques to improve datasets, which is often the best way to improve performance in practical ML applications. While data scientists have long practiced this manually via ad hoc trial/error and intuition, DCAI considers the improvement of data as a systematic engineering discipline. In this course, Aishwarya Srinivasan covers the data-centric principles that guide our path forward in this new age of AI as we shift from a model-centric approach to a data-centric paradigm. Learn about DCAI—what it is and the value it offers. Aishwarya covers the DCAI workflow; MLOps as part of DCAI; data validation and preprocessing; model validation; bias detection and mitigation; responsible AI; and more.

See the readme file in the main branch for updated instructions and information.

Code Instructions

The code example in Chapters 6, 7, and 8 uses a Github Codespace to launch a Jupyter Notebook containing the code and a Maternal Health Risk dataset. To launch the notedbook, simply click on Code and then Codespace.

The Maternal Health Risk dataset is from:

Ahmed, Marzia. (2023). Maternal Health Risk. UCI Machine Learning Repository. https://doi.org/10.24432/C5DP5D.

Resources and Additional Links

Microsoft Founders Hub: https://www.microsoft.com/en-us/startups

Data-Centric AI Resource Hub: https://datacentricai.org

##Blogs What Are the Data-Centric AI Concepts behind GPT Models?: https://towardsdatascience.com/what-are-the-data-centric-ai-concepts-behind-gpt-models-a590071bb727

Why We Should Care About Bad Data:: https://blog.thegovlab.org/why-we-should-care-about-bad-data

Explore Data Cleaning Techniques with Python: https://www.kdnuggets.com/2023/04/exploring-data-cleaning-techniques-python.html

Cool Vendors in Data-Centric AI: https://www.gartner.com/en/documents/4022242

Data-Centric Strategies for Supply Chain Planning Improvements: https://www.gartner.com/en/supply-chain/insights/power-of-the-profession-blog/data-centric-strategies-for-supply-chain-planning-improvements

Solving the Data-Centric versus Model-Centric AI Governance Debate: https://www.collibra.com/us/en/blog/ai-governance-solving-the-data-centric-versus-model-centric-debate

MLOps 3.3: Data-Centric Approach for Machine Learning Modeling: https://towardsai.net/p/l/mlops-3-3-data-centric-approach-for-machine-learning-modelling

MLOps: Towards DevOps for Data-Centric AI: https://snorkel.ai/mlops-towards-devops-for-data-centric-ai-with-ce-zhang/

Machine Learning Explainability vs Interpretability: https://www.kdnuggets.com/2018/12/machine-learning-explainability-interpretability-ai.html

Tools for Explainability and Transparency: https://www.aiforpeople.org/tools-for-explainability-and-transparency/

Detect Data Drift on Datasets: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets?view=azureml-api-1

Responsible AI Toolbox: https://github.com/microsoft/responsible-ai-toolbox

EU AI Act: https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

President Biden Executive Order: https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/

Research Papers

Data-Centric AI: Perspectives and Challenges: https://epubs.siam.org/doi/abs/10.1137/1.9781611977653.ch106

Active-Learning-as-a-Service: An Automatic and Efficient MLOps System for Data-Centric AI: https://arxiv.org/abs/2207.09109

Potential Impact of Data Centric AI on Society: https://technologyandsociety.org/potential-impact-of-data-centric-ai-on-society/

Explainable and Interpretable Anomaly Detection Models for Production Data: https://onepetro.org/SJ/article/27/01/349/473036

Tools

Data Acquisition Azure Data Factory: https://azure.microsoft.com/en-us/products/data-factory Azure Streaming Analytics: https://azure.microsoft.com/en-us/products/stream-analytics

Data Preparation/Validation Azure Data Studio: https://azure.microsoft.com/en-us/products/data-studio MIcrosoft Power BI on Azure: https://azure.microsoft.com/en-us/products/power-bi

Model Development/Validation Microsoft Machine Learning Studio: https://studio.azureml.net/

Model Training/Model Deployment Azure Machine Learning: https://azure.microsoft.com/en-us/products/machine-learning

Model/Data Monitoring Azure Monitor: https://azure.microsoft.com/en-us/products/monitor

Infrastructure and Management Azure Network Watcher: https://learn.microsoft.com/en-us/azure/network-watcher/network-watcher-overview

About

This repo is for the Linkedin Learning course: Data Centric AI: Best Practices, Responsible AI and more

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published