Skip to content

Bizovi/decision-making

Repository files navigation

Decision-making under uncertainty

This repository emerges out of teaching data science to students of various backgrounds and my practice in the industry. I aspire to contribute to the understanding of this complex landscape and teach people how to navigate it, how to develop valuable skills, and become more effective at problem-solving.

As outlined in the course website, we'll be contemplating in the library and engineering in the trenches, so here are lecture thumbnails, along with suggested practices and readings. I recommend to start your journey with the statistical fundamentals, as I re-contextualize and build on top of them towards more sophisticated, but interpretable models, which would aid decision-makers. To learn more about the interdisciplinary approach to decision-making, read the course philosophy.

  • Module I: Business Decisions and Data Science
  • Module II: Probability and Statistics Fundamentals
  • Module III: A/B Testing and Experiment Design
  • Module IV: Bayesian Hierarchical Models
  • Module V: Machine Learning and Special Topics
  • Module VI: Full Stack Data Apps

The repository will go through many changes as we go through the journey together, but you can get a sneak-peek of what it's about in the /playground directory.

I: Business Decisions and Data Science

The slides for the first, short module are completed in /slides and will be moved and published on the course website repo.

Lecture 1: Introduction and Course Overview

  • First, I justify why -- what we really want is "decision science"
  • What is the course about and why should you care?
  • Conversations about industries, domains, and applications
  • Teaching approach, learning how to learn, and course philosophy
picasso big picture
(Fig.1) - Learn what does Pollock and Picasso have to do with statistics and ML (Fig.2) - Learn how everything you learned before fits together into a coherent whole

Lecture 2: Business Context and Strategy

The second lecture is also conceptual, as we explore and articulate hard choices businesses face. I then bring some clarity onto the big, interdisciplinary picture of AI.

  • It is important to understand AI in context of business decisions and strategy
  • Read here for the difference between Analytics, Statistics, and ML.
  • The lecture is filled with hard-learned lessons and multiple tools for figuring out a good strategy: both for the business and AI

Reading and Practice

(Py) Lab 1: Setting up the local environment

Reading and Practice

First, you have to be confident and comfortable with your local development tooling. Invest an hour to understand conda and type in the commands -- benefit a decade ahead!

(Py) Lab 2: The python ecosystem

  • Functional programming ideas in the context of numpy, pandas
  • The great and terrible matplotlib
Python Ecosystem DSc
(Fig.10) - Practicing the tools for modeling and operationalization of models
Python Ecosystem Tooling
(Fig.11) - Getting comfortable with the idea of literate programming and learn the tools which make this whole zoo of technologies run harmoniously

II: Probability and Statistics Fundamentals

Lecture 3: The Probabilitstic Multiverse

The third lecture is also conceptual, but in a more mathematical sense, as I attempt to build the bridge between reality and the language of uncertainty (probability theory).

Influence DAG PMF
(Fig.3) - How many people will show up to safari? notebook here (Fig.4) - We discussed the importance of visual storytelling: relevance, persuasiveness,truthfulness, and aesthetics.

Reading and Practice

  • Read about a few fundamental ideas and concepts in probability and why we need them here
  • To assess if you need a refresher over probability and statistics, look at this study guide

There are three amazing resources which you can use as reference and inspiration for introductory to intermediate probability and mathematical statistics. They have recorded video lectures, a freely-available book, and the first two, code:

  • Probability 110 by Joe Blitzstein (Harvard), with R code. Great stories behind probabilities, numerous examples of applications, and accessible proofs.
  • Probability for Data Science by Stanley Chan (Purdue), with python code. Amazing graphics, visualizations, accessible and extensive mathematical treatment.
  • Probability by Santosh Venkatesh (University of Pennsylvania), once available on coursera, now on youtube. Great real-world examples from numerous domains, gentle build-up towards more complicated concepts. Unfortunately, no code or book -- but you can combine this playlist with one of the above.

Lab 3: Probability DAGs and Simulations

If you have conda installed on Linux, MacOS or WSL2 on Windows, the easiest way to play around with the notebook is to recreate the environment from the yml file. Then, you can either create a kernel or connect from VSCode notebooks to the environment and start hacking.

git clone https://github.com/bizovi/decision-making.git

cd playground
conda env create --file conda-env.yml
conda activate gpa-prob

# if using a jupyter lab
python -m ipykernel install --user \
    --name="gpa-kernel" \
    --display-name="Kernel for Simulations"

# run the test suite and see if everything works as expected
python -m pytest 

Lecture 4: Think like a Bayesian

Statistics is the art and science of changing your mind and action in the face of evidence. We're going to declare our assumptions and apply Bayes theorem to weight the information from data with our prior beliefs.

We're still in the land of probability and generative models, but a step closer towards making inferences about parameters and latent quantities, in order to answer the research questions.

Probability Tree Grid bayes
(Fig.5) - Bayes Theorem and Rare Diseases. Inverse probabilities and conditioning notebook here (Fig. 6) - How confident am I code has no bugs after x tests pass? Grids and point estimates

Lecture 5: Full Luxury Bayes and Large N

It's time we move away from point estimates, towards a full posterior distribution, which captures the uncertainty in our estimates and can be used to make prediction about the observable quantities.

A few important ideas to add to your conceptual understanding:

  • Parameter (estimand), estimator, estimation
  • DeMoivre: "The most dangerous equation": are U.S. schools too big?
  • What does a statistician want? Properties of estimators.
  • Most practical applications won't have an analytic solution, so we have to use a probabilistic programming language like pymc to draw samples from the posterior
DAGs Beta-Binomial
(Fig.7) - The greatest theorem never told adapted and refactored from CamDavidson (upcoming!) (Fig.8) - Conjugate priors and the idea of Bayesian updating. Full luxury bayes: automatic sampling, thoughtful modeling

About

Artificial Intelligence and Cybernetics for Business Decision-Making under Uncertainty

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published