estimate_cate_by_2_models crashes when col_propensity is provided in the data and enable_ipw=False #1

farismosman · 2019-04-23T12:55:53Z

I am using the example provided in https://colab.research.google.com/github/Minyus/causallift/blob/master/examples/CausalLift_example.ipynb. However I am trying to test a scenario where the propensity is computed beforehand. In my CasualLift I initialize the object using

model = CausalLift(train_df, test_df,
                   enable_ipw=False,
                   random_state=0,
                   verbose=3,
                   col_treatment='treated',
                   col_propensity='likelihood',
                   col_outcome='outcome')

train_df, test_df = model.estimate_cate_by_2_models()

This code crashed with the following error message

Traceback (most recent call last):
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 567, in __call__
    return self.func(*args, **kwargs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 528, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/sklearn.py", line 703, in fit
    missing=self.missing, nthread=self.n_jobs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 427, in __init__
    self.set_weight(weight)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 698, in set_weight
    self.set_float_info('weight', weight)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 592, in set_float_info
    c_data = c_array(ctypes.c_float, data)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 219, in c_array
    return (ctype * len(values))(*values)
TypeError: object of type 'float' has no len()
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/projects/uplift/model.py in <module>
     31                    col_propensity='propensity',
     32                    col_outcome='Outcome')
---> 33 train_df, test_df = model.estimate_cate_by_2_models()
     34 # estimated_effect_df = model.estimate_recommendation_impact()

~/projects/uplift/env/lib/python3.6/site-packages/causallift/causal_lift.py in estimate_cate_by_2_models(self, verbose)
    194                                             enable_ipw=self.enable_ipw,
    195                                             uplift_model_params=self.uplift_model_params,
--> 196                                             cv=self.cv)
    197         model_for_untreated = ModelForUntreated(train_df_, test_df_,
    198                                                 random_state=self.random_state,

~/projects/uplift/env/lib/python3.6/site-packages/causallift/model_for_each.py in __init__(self, *args, **kwargs)
    224     def __init__(self, *args, **kwargs):
    225         kwargs.update(treatment_val=1.0)
--> 226         super().__init__(*args, **kwargs)
    227 
    228 

~/projects/uplift/env/lib/python3.6/site-packages/causallift/model_for_each.py in __init__(self, train_df_, test_df_, treatment_val, random_state, verbose, cols_features, col_treatment, col_outcome, col_propensity, col_recommendation, min_propensity, max_propensity, enable_ipw, uplift_model_params, cv)
    101                              params, cv=cv, return_train_score=False, n_jobs=-1)
    102 
--> 103         model.fit(X_train, y_train, sample_weight=sample_weight)
    104         if verbose >= 3:
    105             print('### Best parameters of the model trained using samples with observational Treatment: {} \n {}'.

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    720                 return results_container[0]
    721 
--> 722             self._run_search(evaluate_candidates)
    723 
    724         results = results_container[0]

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
   1189     def _run_search(self, evaluate_candidates):
   1190         """Search all candidates in param_grid"""
-> 1191         evaluate_candidates(ParameterGrid(self.param_grid))
   1192 
   1193 

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
    709                                for parameters, (train, test)
    710                                in product(candidate_params,
--> 711                                           cv.split(X, y, groups)))
    712 
    713                 all_candidate_params.extend(candidate_params)

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    928 
    929             with self._backend.retrieval_context():
--> 930                 self.retrieve()
    931             # Make sure that we get a last message telling us we are done
    932             elapsed_time = time.time() - self._start_time

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in retrieve(self)
    831             try:
    832                 if getattr(self._backend, 'supports_timeout', False):
--> 833                     self._output.extend(job.get(timeout=self.timeout))
    834                 else:
    835                     self._output.extend(job.get())

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    519         AsyncResults.get from multiprocessing."""
    520         try:
--> 521             return future.result(timeout=timeout)
    522         except LokyTimeoutError:
    523             raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

TypeError: object of type 'float' has no len()

If I understand the document correctly, when the propensity is calculated beforehand, the flag enable_ipw is set to False. Assuming my object is initialized with the right parameters, I suspect that in line 76 in model_for_each module the sample_weight is defined as a float (1.0) while it should be a numpy array.

…nable_ipw is set to False (Issue #1). Add "from IPython.display import display" so it can run in non-IPython environments.

Minyus · 2019-04-29T02:54:28Z

Hi @farismosman,

Thank you for reporting the error.
Yes, sample_weight should be an array.
I fixed the error in v0.0.2.

To use the propensity computed beforehand, set enable_ipw True, and specify the column name of propensity with col_propensity (apparently 'likelihood' in your case).

If you set enable_ipw False, Inverse Probability Weighting is disabled.

To clarify, there are 3 ways to use CausalLift.

enable_ipw = True using propensity computed by CausalLift

As in CausalLift_example.ipynb
enable_ipw = True using propensity computed beforehand

Your case
enable_ipw = False

If the data is from A/B testing or Randomized Control Trial

Minyus self-assigned this Apr 28, 2019

Minyus added bug Something isn't working good first issue Good for newcomers labels Apr 28, 2019

Minyus added a commit that referenced this issue Apr 28, 2019

Fix "TypeError: object of type 'float' has no len()" that occurs if e…

e8f1396

…nable_ipw is set to False (Issue #1). Add "from IPython.display import display" so it can run in non-IPython environments.

Minyus closed this as completed May 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

estimate_cate_by_2_models crashes when col_propensity is provided in the data and enable_ipw=False #1

estimate_cate_by_2_models crashes when col_propensity is provided in the data and enable_ipw=False #1

farismosman commented Apr 23, 2019 •

edited

Loading

Minyus commented Apr 29, 2019

estimate_cate_by_2_models crashes when col_propensity is provided in the data and enable_ipw=False #1

estimate_cate_by_2_models crashes when col_propensity is provided in the data and enable_ipw=False #1

Comments

farismosman commented Apr 23, 2019 • edited Loading

Minyus commented Apr 29, 2019

farismosman commented Apr 23, 2019 •

edited

Loading