Comparison of Empirical Probability Distributions

Quick preview

Author: Sylvain Combettes
Dates: Oct. 2019 - Feb. 2020 (5 months)
Context: For my final-year project at Mines Nancy (one day per week), I did research for the CNRS, the largest governmental research organisation in France.
Topic: Comparison of empirical probability distributions. Application to the Choquet integral with stochastic inputs.
Methods: Integral probability metrics (e.g. Kantorovich metric), f-divergences (e.g. Kullback-Leibler).
Programming: Python.
Result: We empirically show that a new method for simulating the Choquet integral is "correct".
Links: [full 62 pages report] [slides]

Abstract

This repository completes the report of my final-year project at Ecole des Mines de Nancy. The end goal of this project is to compare two empirical probability distributions from two different methods for computing the Choquet integral.

The first chapter is about the Choquet integral, a non-linear aggregation operator. We provide a lot of explanations and examples so that someone new to the Choquet integral can get a good understanding of it.

The second chapter is about integral probability metrics (IPMs), a popular estimation of distance measures on probabilities. In particular, we deal with the Kantorovich metric and the Dudley metric. We also study the empirical estimation of the Kantorovich metric and implement it with Python.

The third chapter is about f-divergences. f-divergences are another method for computing the distance between two probability distributions. In particular, we deal with the Kullback-Leibler divergence, the Hellinger distance and the Variational distance. We also study the empirical estimation of these f-divergences and implement them with Python.

The fourth (and last) chapter applies the previous results on IPMs and f-divergences to the data obtained from the two methods for computing the Choquet integral.

How to use this repository

We recommend reading the report before reading the notebooks, as most explanations are not duplicated in the notebooks. All notebooks are in Python 3. According to my report, the files should be read in this order:

ipm-prerequisite.ipynb: an introductory notebook that helps to understand ipm.ipynb better
ipm.ipynb: core programs that generated the simulations in chapter II about the Kantorovich metric (an IPM) and also a part of chapter IV
f-divergences.ipynb: core programs that generated the simulations in chapter III about the f-divergences and also a part of chapter IV

More information is given in the beginning of the notebooks themselves.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
data		data
img		img
README.md		README.md
f-divergences.ipynb		f-divergences.ipynb
ipm-prerequisite.ipynb		ipm-prerequisite.ipynb
ipm.ipynb		ipm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

README.md

README.md

f-divergences.ipynb

f-divergences.ipynb

ipm-prerequisite.ipynb

ipm-prerequisite.ipynb

ipm.ipynb

ipm.ipynb

Repository files navigation

Comparison of Empirical Probability Distributions

Quick preview

Abstract

How to use this repository

About

Releases

Packages

Languages

sylvaincom/comparison-distributions

Folders and files

Latest commit

History

Repository files navigation

Comparison of Empirical Probability Distributions

Quick preview

Abstract

How to use this repository

About

Topics

Resources

Stars

Watchers

Forks

Languages