In [None]:
#| default_exp estimators

In [None]:
#| export
#| include: false
#| echo: false
from __future__ import annotations # allows multiple typing of arguments in Python versions prior to 3.10

In [None]:
#| include: false
#| echo: false
! [ -e /content ] && pip install -Uqq gingado nbdev # install or upgrade gingado on colab

In [None]:
#| include: false
%load_ext autoreload
%autoreload 2

In [None]:
#| include: false
from nbdev.showdoc import show_doc

In many instances, economists are interested in using machine learning models for specific purposes that go beyond their ability to predict variables to a good accuracy. For example:

- understanding the relationship between covariates and the outcome ("*coefficient-focused tasks*"), usually to demonstrate that a non-trivial effect of one variable on another exists.

- estimating a certain measure with certain desirable statistical and econometric properties ("*measure-focused tasks*"), where the object of interest is the predicted outcome of an adapted algorithm.

- identifying which covariates are related or not to a certain outcome ("*covariate-selection tasks*"), often to demonstrate the relevance of a certain theory.

- process non-traditional data (eg, text) for inclusion in a traditional econometrics regression ("*covariate-processing tasks*"), especially useful in settings where measurable quantitative data is complemented with this other type of data.

The `gingado.estimators` module contains machine learning algorithms adapted to enable the types of analyses described above. More estimators can be expected over time.

# Covariate-selection tasks

## Clustering

Here the clustering algorithms themselves are not adapted from the general use methods. Rather, the functions offer convenience functionalities to find and retain the other variables in the same cluster. 

These variables are usually individuals or entities (countries, stocks, etc) in a larger population.

The `gingado` clustering routines are designed to allow users standalone usage, or a seamless integration as part of a pipeline.

There are three levels of sophistication that users can choose from:

- using the off-the-shelf clustering routines provided by `gingado`, which were selected to be applied cross various use cases;

- selecting an existing clustering routine from the [`scikit-learn.cluster`](https://scikit-learn.org/stable/modules/clustering.html) module; or

- designing their own clustering algorithm.

In [None]:
#| include: false
#| export

import numpy as np
from sklearn.base import BaseEstimator, ClusterMixin
from sklearn.cluster import AffinityPropagation
from sklearn.utils.metaestimators import available_if

In [None]:
#| include: false
#| export

class FindCluster(BaseEstimator):
    "Retain only the columns of `X` that are in the same cluster as `y`."

    def __init__(
        self,
        cluster_alg:[BaseEstimator,ClusterMixin]=AffinityPropagation(), # An instance of the clustering algorithm to use
        random_state:int|None=None, # The random seed to be used by the algorithm, if relevant
    ):
        self.cluster_alg = cluster_alg
        self.random_state = random_state
        if hasattr(self.cluster_alg, "random_state"):
            self.cluster_alg.set_params(random_state=self.random_state)


    def fit(
        self,
        X, # The population of entities, organised in columns
        y # The entity of interest
    ):
        "Fit `FindCluster`"
        temp_y_colname = "gingado_ycol"

        X[temp_y_colname] = y

        entities = X.columns
        y_mask = entities == temp_y_colname

        self.cluster_alg.fit(X.T)

        cluster = entities[self.cluster_alg.labels_ == self.cluster_alg.labels_[y_mask]]
        self.same_cluster_ = [e for e in cluster if e != temp_y_colname]
        return self

    def transform(
        self,
        X # The population of entities, organised in columns
    )->np.array: # Columns of `X` that are in the same cluster as `y`
        "Keep only the entities in `X` that belong to the same cluster as `y`"
        return X[self.same_cluster_]

    def fit_transform(
        self,
        X, # The population of entities, organised in columns
        y # The entity of interest
    )->np.array: # Columns of `X` that are in the same cluster as `y`
        "Fit a `FindCluster` object and keep only the entities in `X` that belong to the same cluster as `y`"
        self.fit(X, y)
        return self.transform(X)

In [None]:
show_doc(FindCluster)

---

### FindCluster

>      FindCluster ()

Retain only the columns of `X` that are in the same cluster as `y`.

In [None]:
show_doc(FindCluster.fit)

---

### FindCluster.fit

>      FindCluster.fit (X, y)

Fit `FindCluster`

|    | **Details** |
| -- | ----------- |
| X | The population of entities organised in columns |
| y | The entity of interest |

In [None]:
show_doc(FindCluster.transform)

---

### FindCluster.transform

>      FindCluster.transform ()

Returns version of `X` keeping only entities in the same cluster as `y`

In [None]:
show_doc(FindCluster.fit_transform)

---

### FindCluster.fit_transform

>      FindCluster.fit_transform (X, y)

The @BARRO19941 dataset is used to illustrate the use of `FindCluster`. It is a country-level dataset. Let's use it to answer the following question: for some specific country, what other countries are the closest to it considering the data available?

First, we import the data:

In [None]:
from gingado.datasets import load_BarroLee_1994

The data is organized by rows: each row is a different country, and the variables are organised in columns. 

The dataset is originally organised for a regression of GDP growth (here denoted `y`) on the covariates (`X`). This is not what we want to do in this case. So instead of keeping GDP as a separate variable, the next step is to include it in the `X` DataFrame.

In [None]:
X, y = load_BarroLee_1994()
X['gdp'] = y
X.head()

Unnamed: 0.1,Unnamed: 0,gdpsh465,bmp1l,freeop,freetar,h65,hm65,hf65,p65,pm65,pf65,s65,sm65,sf65,fert65,mort65,lifee065,gpop1,fert1,mort1,invsh41,geetot1,geerec1,gde1,govwb1,govsh41,gvxdxe41,high65,highm65,highf65,highc65,highcm65,highcf65,human65,humanm65,humanf65,hyr65,hyrm65,hyrf65,no65,nom65,nof65,pinstab1,pop65,worker65,pop1565,pop6565,sec65,secm65,secf65,secc65,seccm65,seccf65,syr65,syrm65,syrf65,teapri65,teasec65,ex1,im1,xr65,tot1,gdp
0,0,6.591674,0.2837,0.153491,0.043888,0.007,0.013,0.001,0.29,0.37,0.21,0.04,0.06,0.02,6.67,0.16,3.693867,0.0203,6.68,0.165,0.11898,0.0195,0.0176,0.019,0.0931,0.1158,0.07877,0.12,0.23,0.01,0.09,0.18,0.01,0.301,0.568,0.043,0.004,0.008,0.0,89.46,79.98,98.61,0.0,12359.0,0.3469,0.4441,0.027591,0.45,0.75,0.17,0.13,0.21,0.04,0.033,0.057,0.01,47.6,17.3,0.0729,0.0667,0.348,-0.014727,-0.024336
1,1,6.829794,0.6141,0.313509,0.061827,0.019,0.032,0.007,0.91,1.0,0.65,0.16,0.23,0.09,6.97,0.145,3.933784,0.0185,7.114,0.154,0.12048,0.0556,0.0369,0.019,0.1589,0.156,0.09999,0.7,1.18,0.2,0.63,1.04,0.2,0.706,1.138,0.257,0.027,0.045,0.008,89.1,82.35,96.1,0.02325,4630.0,0.2703,0.4474,0.035637,3.0,4.74,1.2,1.36,2.05,0.64,0.173,0.274,0.067,57.1,18.0,0.094,0.1438,0.525,0.00575,0.100473
2,2,8.895082,0.0,0.204244,0.009186,0.26,0.325,0.201,1.0,1.0,1.0,0.56,0.62,0.51,3.11,0.024,4.273884,0.0188,3.662,0.027,0.23098,0.0465,0.0365,0.04,0.1442,0.1367,0.06,16.67,17.95,15.41,4.5,5.7,3.31,8.317,8.249,8.384,0.424,0.473,0.375,1.4,1.4,1.4,0.0,19678.0,0.3874,0.3175,0.076685,36.74,33.5,39.95,15.68,13.19,18.14,2.573,2.478,2.667,26.5,20.7,0.1741,0.175,1.082,-0.01004,0.067051
3,3,7.565275,0.1997,0.248714,0.03627,0.061,0.07,0.051,1.0,1.0,1.0,0.24,0.22,0.31,6.26,0.072,4.168214,0.0345,6.83,0.085,0.12928,0.0375,0.035,0.011,0.1165,0.2018,0.15616,3.1,3.4,2.8,2.11,2.28,1.95,3.833,3.86,3.807,0.104,0.114,0.095,20.6,20.6,20.6,0.0,1482.0,0.3011,0.4671,0.031039,7.6,7.5,7.7,2.76,2.89,2.63,0.438,0.453,0.424,27.8,22.7,0.1265,0.1496,6.625,-0.002195,0.064089
4,4,7.162397,0.174,0.299252,0.037367,0.017,0.027,0.007,0.82,0.85,0.81,0.17,0.15,0.13,6.71,0.12,3.998201,0.031,6.816,0.131,0.07932,0.0257,0.0224,0.012,0.0971,0.169,0.13427,0.67,0.98,0.36,0.45,0.66,0.25,1.9,2.084,1.72,0.022,0.033,0.012,58.73,55.56,61.82,0.2,3006.0,0.3314,0.4561,0.026281,5.07,5.37,4.78,2.17,2.23,2.11,0.257,0.287,0.229,34.5,17.6,0.1211,0.1308,2.5,0.003283,0.02793


Now we remove the first column (an identifier) and transpose the DataFrame, so that countries are organized in columns.

Each country is identified by a number: 0, 1, ...

In [None]:
X = X.iloc[:, 1:]
countries = X.T
countries.columns = ['country_' + str(c) for c in countries.columns]
countries.head()

Unnamed: 0,country_0,country_1,country_2,country_3,country_4,country_5,country_6,country_7,country_8,country_9,country_10,country_11,country_12,country_13,country_14,country_15,country_16,country_17,country_18,country_19,country_20,country_21,country_22,country_23,country_24,country_25,country_26,country_27,country_28,country_29,country_30,country_31,country_32,country_33,country_34,country_35,country_36,country_37,country_38,country_39,...,country_50,country_51,country_52,country_53,country_54,country_55,country_56,country_57,country_58,country_59,country_60,country_61,country_62,country_63,country_64,country_65,country_66,country_67,country_68,country_69,country_70,country_71,country_72,country_73,country_74,country_75,country_76,country_77,country_78,country_79,country_80,country_81,country_82,country_83,country_84,country_85,country_86,country_87,country_88,country_89
gdpsh465,6.591674,6.829794,8.895082,7.565275,7.162397,7.21891,7.853605,7.70391,9.063463,8.15191,6.929517,7.237778,8.11582,7.271704,7.121252,6.977281,7.649693,8.056744,8.780941,6.287859,6.137727,8.12888,6.680855,7.177019,6.648985,6.879356,7.3473,6.725034,8.451053,8.602453,8.619027,8.733755,7.665753,7.998671,8.281977,8.627123,8.733111,8.144969,8.769973,8.632128,...,7.32185,6.783325,9.224933,7.880804,7.301148,7.448334,7.737616,8.184793,7.808323,9.229849,8.346168,7.30317,7.859027,7.998335,7.655864,7.675082,7.830028,8.498622,6.216606,8.414496,6.383507,8.782323,7.251345,7.511525,7.713785,6.728629,7.186144,8.326033,7.894691,7.17549,9.030974,8.995537,8.23483,8.332549,8.645586,8.991064,8.025189,9.030137,8.865312,8.912339
bmp1l,0.2837,0.6141,0.0,0.1997,0.174,0.0,0.0,0.2776,0.0,0.1484,0.0296,0.2151,0.4318,0.1689,0.1832,0.0962,0.0227,0.0208,0.2654,0.4207,0.1371,0.0,0.4713,0.0178,0.4762,0.2927,0.1017,0.0266,0.0,0.0,0.0,0.0,0.007,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.1432,0.5539,0.0,0.2927,0.1621,0.0,0.2231,0.0,0.1756,0.0,0.3199,0.3133,0.1222,1.6378,0.1345,0.0898,0.488,0.001,0.7557,0.0,0.3556,0.0,0.0516,0.1053,0.005,0.619,0.076,0.005,0.1062,0.0,0.0,0.0,0.0363,0.0,0.0,0.0,0.005,0.0,0.0,0.0
freeop,0.153491,0.313509,0.204244,0.248714,0.299252,0.258865,0.182525,0.215275,0.109614,0.110885,0.165784,0.078488,0.137482,0.164598,0.188016,0.204611,0.136287,0.197853,0.189867,0.130682,0.123818,0.16721,0.228424,0.18524,0.171181,0.179508,0.247626,0.179933,0.358556,0.416234,0.293138,0.30472,0.288405,0.345485,0.28844,0.371898,0.287587,0.235179,0.265778,0.282939,...,0.313509,0.157541,0.204244,0.248714,0.299252,0.258865,0.324171,0.182525,0.215275,0.109614,0.110885,0.165784,0.078488,0.137482,0.164598,0.188016,0.136287,0.189867,0.214345,0.374328,0.130682,0.16721,0.263813,0.228424,0.18524,0.171181,0.179508,0.321658,0.247626,0.179933,0.293138,0.30472,0.288405,0.345485,0.28844,0.371898,0.296437,0.265778,0.282939,0.150366
freetar,0.043888,0.061827,0.009186,0.03627,0.037367,0.02088,0.014385,0.029713,0.002171,0.028579,0.020115,0.011581,0.026547,0.044446,0.045678,0.077852,0.04673,0.037224,0.031747,0.109921,0.015897,0.003311,0.029328,0.015453,0.058937,0.035842,0.037392,0.046376,0.016468,0.014721,0.005517,0.011658,0.011589,0.006503,0.005995,0.014586,0.003998,0.009676,0.008629,0.005048,...,0.061827,0.026475,0.009186,0.03627,0.037367,0.02088,0.03266,0.014385,0.029713,0.002171,0.028579,0.020115,0.011581,0.026547,0.044446,0.045678,0.04673,0.031747,0.073495,0.0,0.109921,0.003311,0.045225,0.029328,0.015453,0.058937,0.035842,0.005106,0.037392,0.046376,0.005517,0.011658,0.011589,0.006503,0.005995,0.014586,0.013615,0.008629,0.005048,0.024377
h65,0.007,0.019,0.26,0.061,0.017,0.023,0.039,0.024,0.402,0.145,0.046,0.022,0.059,0.029,0.033,0.037,0.081,0.083,0.068,0.053,0.028,0.129,0.062,0.02,0.018,0.188,0.08,0.015,0.09,0.148,0.142,0.088,0.098,0.119,0.107,0.168,0.107,0.056,0.131,0.12,...,0.042,0.021,0.393,0.177,0.078,0.043,0.067,0.105,0.083,0.573,0.272,0.112,0.107,0.156,0.08,0.269,0.146,0.181,0.023,0.101,0.086,0.246,0.09,0.103,0.031,0.019,0.184,0.09,0.121,0.035,0.245,0.246,0.183,0.188,0.256,0.255,0.108,0.288,0.188,0.257


Suppose we are interested in country No 13. What other countries are similar to it?

First, country 13 needs to be carved out of the DataFrame with the other countries.

Second, we can now pass the larger DataFrame and country 13's data separately to an instance of `FindCluster`.

In [None]:
country_of_interest = countries.pop('country_13')


In [None]:
similar = FindCluster(AffinityPropagation(convergence_iter=5000))
similar

In [None]:
same_cluster = similar.fit_transform(X=countries, y=country_of_interest)

assert same_cluster.equals(similar.fit(X=countries, y=country_of_interest).transform(X=countries))

same_cluster



Unnamed: 0,country_2,country_9,country_41,country_48,country_49,country_52,country_60,country_64,country_66
gdpsh465,8.895082,8.151910,7.360740,6.469250,5.762051,9.224933,8.346168,7.655864,7.830028
bmp1l,0.000000,0.148400,0.418100,0.538800,0.600500,0.000000,0.319900,0.134500,0.488000
freeop,0.204244,0.110885,0.218471,0.153491,0.151848,0.204244,0.110885,0.164598,0.136287
freetar,0.009186,0.028579,0.027087,0.043888,0.024100,0.009186,0.028579,0.044446,0.046730
h65,0.260000,0.145000,0.032000,0.015000,0.002000,0.393000,0.272000,0.080000,0.146000
...,...,...,...,...,...,...,...,...,...
ex1,0.174100,0.052400,0.190500,0.069200,0.148400,0.255800,0.062500,0.052500,0.076400
im1,0.175000,0.052300,0.225700,0.074800,0.186400,0.241200,0.057800,0.057200,0.086600
xr65,1.082000,2.119000,3.949000,0.348000,7.367000,1.017000,36.603000,30.929000,40.500000
tot1,-0.010040,0.007584,0.205768,0.035226,0.007548,0.018636,0.014286,-0.004592,-0.007018


The default clustering algorithm used by `FindCluster` is affinity propagation [@frey2007clustering]. It is the algorithm of choice because of it combines several desireable characteristics, in particular:
- the number of clusters is data-driven instad of set by the user,
- the number of entities in each cluster is also chosen by the model, 
- all entities are part of a cluster, and
- each cluster might have a different number of entities.

However, we may want to try different clustering algorithms. Let's compare the result above with the same analyses using DBSCAN [@ester1996density].

In [None]:
from sklearn.cluster import DBSCAN

In [None]:
similar_dbscan = FindCluster(cluster_alg=DBSCAN())
similar_dbscan

In [None]:
same_cluster_dbscan = similar_dbscan.fit_transform(X=countries, y=country_of_interest)

assert same_cluster_dbscan.equals(similar_dbscan.fit(X=countries, y=country_of_interest).transform(X=countries))

same_cluster_dbscan

Unnamed: 0,country_0,country_1,country_2,country_3,country_4,country_5,country_6,country_7,country_8,country_9,country_10,country_11,country_12,country_14,country_15,country_16,country_17,country_18,country_19,country_20,country_21,country_22,country_23,country_24,country_25,country_26,country_27,country_28,country_29,country_30,country_31,country_32,country_33,country_34,country_35,country_36,country_37,country_38,country_39,country_40,...,country_50,country_51,country_52,country_53,country_54,country_55,country_56,country_57,country_58,country_59,country_60,country_61,country_62,country_63,country_64,country_65,country_66,country_67,country_68,country_69,country_70,country_71,country_72,country_73,country_74,country_75,country_76,country_77,country_78,country_79,country_80,country_81,country_82,country_83,country_84,country_85,country_86,country_87,country_88,country_89
gdpsh465,6.591674,6.829794,8.895082,7.565275,7.162397,7.218910,7.853605,7.703910,9.063463,8.151910,6.929517,7.237778,8.115820,7.121252,6.977281,7.649693,8.056744,8.780941,6.287859,6.137727,8.128880,6.680855,7.177019,6.648985,6.879356,7.347300,6.725034,8.451053,8.602453,8.619027,8.733755,7.665753,7.998671,8.281977,8.627123,8.733111,8.144969,8.769973,8.632128,8.718991,...,7.321850,6.783325,9.224933,7.880804,7.301148,7.448334,7.737616,8.184793,7.808323,9.229849,8.346168,7.303170,7.859027,7.998335,7.655864,7.675082,7.830028,8.498622,6.216606,8.414496,6.383507,8.782323,7.251345,7.511525,7.713785,6.728629,7.186144,8.326033,7.894691,7.175490,9.030974,8.995537,8.234830,8.332549,8.645586,8.991064,8.025189,9.030137,8.865312,8.912339
bmp1l,0.283700,0.614100,0.000000,0.199700,0.174000,0.000000,0.000000,0.277600,0.000000,0.148400,0.029600,0.215100,0.431800,0.183200,0.096200,0.022700,0.020800,0.265400,0.420700,0.137100,0.000000,0.471300,0.017800,0.476200,0.292700,0.101700,0.026600,0.000000,0.000000,0.000000,0.000000,0.007000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.143200,0.553900,0.000000,0.292700,0.162100,0.000000,0.223100,0.000000,0.175600,0.000000,0.319900,0.313300,0.122200,1.637800,0.134500,0.089800,0.488000,0.001000,0.755700,0.000000,0.355600,0.000000,0.051600,0.105300,0.005000,0.619000,0.076000,0.005000,0.106200,0.000000,0.000000,0.000000,0.036300,0.000000,0.000000,0.000000,0.005000,0.000000,0.000000,0.000000
freeop,0.153491,0.313509,0.204244,0.248714,0.299252,0.258865,0.182525,0.215275,0.109614,0.110885,0.165784,0.078488,0.137482,0.188016,0.204611,0.136287,0.197853,0.189867,0.130682,0.123818,0.167210,0.228424,0.185240,0.171181,0.179508,0.247626,0.179933,0.358556,0.416234,0.293138,0.304720,0.288405,0.345485,0.288440,0.371898,0.287587,0.235179,0.265778,0.282939,0.150366,...,0.313509,0.157541,0.204244,0.248714,0.299252,0.258865,0.324171,0.182525,0.215275,0.109614,0.110885,0.165784,0.078488,0.137482,0.164598,0.188016,0.136287,0.189867,0.214345,0.374328,0.130682,0.167210,0.263813,0.228424,0.185240,0.171181,0.179508,0.321658,0.247626,0.179933,0.293138,0.304720,0.288405,0.345485,0.288440,0.371898,0.296437,0.265778,0.282939,0.150366
freetar,0.043888,0.061827,0.009186,0.036270,0.037367,0.020880,0.014385,0.029713,0.002171,0.028579,0.020115,0.011581,0.026547,0.045678,0.077852,0.046730,0.037224,0.031747,0.109921,0.015897,0.003311,0.029328,0.015453,0.058937,0.035842,0.037392,0.046376,0.016468,0.014721,0.005517,0.011658,0.011589,0.006503,0.005995,0.014586,0.003998,0.009676,0.008629,0.005048,0.024377,...,0.061827,0.026475,0.009186,0.036270,0.037367,0.020880,0.032660,0.014385,0.029713,0.002171,0.028579,0.020115,0.011581,0.026547,0.044446,0.045678,0.046730,0.031747,0.073495,0.000000,0.109921,0.003311,0.045225,0.029328,0.015453,0.058937,0.035842,0.005106,0.037392,0.046376,0.005517,0.011658,0.011589,0.006503,0.005995,0.014586,0.013615,0.008629,0.005048,0.024377
h65,0.007000,0.019000,0.260000,0.061000,0.017000,0.023000,0.039000,0.024000,0.402000,0.145000,0.046000,0.022000,0.059000,0.033000,0.037000,0.081000,0.083000,0.068000,0.053000,0.028000,0.129000,0.062000,0.020000,0.018000,0.188000,0.080000,0.015000,0.090000,0.148000,0.142000,0.088000,0.098000,0.119000,0.107000,0.168000,0.107000,0.056000,0.131000,0.120000,0.146000,...,0.042000,0.021000,0.393000,0.177000,0.078000,0.043000,0.067000,0.105000,0.083000,0.573000,0.272000,0.112000,0.107000,0.156000,0.080000,0.269000,0.146000,0.181000,0.023000,0.101000,0.086000,0.246000,0.090000,0.103000,0.031000,0.019000,0.184000,0.090000,0.121000,0.035000,0.245000,0.246000,0.183000,0.188000,0.256000,0.255000,0.108000,0.288000,0.188000,0.257000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ex1,0.072900,0.094000,0.174100,0.126500,0.121100,0.063400,0.034200,0.086400,0.059400,0.052400,0.056000,0.027000,0.080400,0.077500,0.066800,0.087200,0.055700,0.317800,0.020100,0.029800,0.057000,0.020600,0.229500,0.017800,0.069500,0.086000,0.055800,0.168700,0.262900,0.109900,0.147200,0.071200,0.219500,0.075200,0.281600,0.322900,0.048000,0.206400,0.151800,0.169400,...,0.134500,0.310500,0.255800,0.152600,0.114800,0.085500,0.235100,0.033500,0.100600,0.078300,0.062500,0.107100,0.035700,0.078300,0.052500,0.090600,0.076400,0.213100,0.023200,0.595800,0.018800,0.103200,0.073000,0.090300,0.192200,0.028100,0.070300,0.747000,0.079700,0.063600,0.166200,0.259700,0.104400,0.286600,0.129600,0.440700,0.166900,0.323800,0.184500,0.187600
im1,0.066700,0.143800,0.175000,0.149600,0.130800,0.076200,0.042800,0.093100,0.046000,0.052300,0.082600,0.027500,0.093000,0.078000,0.078700,0.093800,0.062400,0.158300,0.034100,0.029700,0.060900,0.061800,0.199000,0.063400,0.072800,0.089800,0.061300,0.163500,0.269800,0.102000,0.133200,0.130800,0.261400,0.084200,0.282700,0.344800,0.049200,0.205300,0.157600,0.168800,...,0.143600,0.254300,0.241200,0.197600,0.124400,0.088100,0.291000,0.044400,0.116800,0.070300,0.057800,0.102800,0.046600,0.084700,0.057200,0.095900,0.086600,0.143700,0.040700,0.581900,0.022200,0.095800,0.222700,0.122900,0.182100,0.045900,0.071600,0.848900,0.101800,0.072100,0.161700,0.228800,0.179600,0.350000,0.145800,0.425700,0.220100,0.313400,0.194000,0.200700
xr65,0.348000,0.525000,1.082000,6.625000,2.500000,1.000000,12.499000,7.000000,1.000000,2.119000,11.879000,1.938000,0.003000,18.476000,125.990000,26.800000,0.052000,4.500000,4.762000,4.125000,360.000000,265.690000,3.061000,4.762000,4.017000,3.177000,20.800000,26.000000,50.000000,4.937000,4.000000,30.000000,0.357000,625.000000,3.620000,7.143000,59.997000,5.173000,0.357000,0.719000,...,0.402000,0.643000,1.017000,8.570000,2.500000,1.000000,0.909000,12.500000,7.030000,1.000000,36.603000,20.000000,8.127000,4.911000,30.929000,25.000000,40.500000,4.285000,8.876000,4.935000,8.653000,296.800000,0.320000,484.000000,2.402000,9.900000,7.248000,2.371000,3.017000,20.379000,4.286000,2.460000,32.051000,0.452000,652.850000,2.529000,25.553000,4.152000,0.452000,0.886000
tot1,-0.014727,0.005750,-0.010040,-0.002195,0.003283,-0.001747,0.009092,0.011630,0.008169,0.007584,0.086032,0.007666,0.016968,-0.020322,0.028916,0.020228,0.013407,-0.024761,-0.021656,-0.054872,-0.054874,0.018194,-0.034733,-0.000222,0.033636,0.010162,-0.018514,0.010943,-0.001521,0.008913,0.035536,0.022097,0.004606,-0.009181,0.002372,0.007208,0.009702,-0.000185,0.011117,0.022920,...,0.154551,-0.156878,0.018636,-0.030185,-0.007018,-0.017594,0.099879,0.077787,-0.044204,-0.039824,0.014286,0.111198,0.006002,-0.127025,-0.004592,0.191066,-0.007018,0.168536,-0.084064,0.021808,-0.012443,-0.057094,0.128443,0.007257,0.030424,-0.012137,0.009640,0.051395,0.207492,0.018019,-0.006642,-0.003241,-0.034352,-0.001660,-0.046278,-0.011883,-0.039080,0.005175,-0.029551,-0.036482


As illustrated above, the results can be quite different. In this case, affinity propagation converged to more tightly defined clusters, while DBSCAN selected a cluster that contains almost all other countries (therefore, not useful in this particular case).

`FindCluster` can also be used as part of a [`pipeline`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline). In this case, only the entities in the same cluster as the entity of interest will continue on to the next steps of the estimation.

# Measure-focused tasks

## Machine controls

> The machine learning version of synthetic controls methodology (@abadie2021using)


# References