Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

estimate_cate_by_2_models crashes when col_propensity is provided in the data and enable_ipw=False #1

Closed
farismosman opened this issue Apr 23, 2019 · 1 comment
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@farismosman
Copy link
Contributor

farismosman commented Apr 23, 2019

I am using the example provided in https://colab.research.google.com/github/Minyus/causallift/blob/master/examples/CausalLift_example.ipynb. However I am trying to test a scenario where the propensity is computed beforehand. In my CasualLift I initialize the object using

model = CausalLift(train_df, test_df,
                   enable_ipw=False,
                   random_state=0,
                   verbose=3,
                   col_treatment='treated',
                   col_propensity='likelihood',
                   col_outcome='outcome')

train_df, test_df = model.estimate_cate_by_2_models()

This code crashed with the following error message

Traceback (most recent call last):
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 567, in __call__
    return self.func(*args, **kwargs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 528, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/sklearn.py", line 703, in fit
    missing=self.missing, nthread=self.n_jobs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 427, in __init__
    self.set_weight(weight)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 698, in set_weight
    self.set_float_info('weight', weight)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 592, in set_float_info
    c_data = c_array(ctypes.c_float, data)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 219, in c_array
    return (ctype * len(values))(*values)
TypeError: object of type 'float' has no len()
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/projects/uplift/model.py in <module>
     31                    col_propensity='propensity',
     32                    col_outcome='Outcome')
---> 33 train_df, test_df = model.estimate_cate_by_2_models()
     34 # estimated_effect_df = model.estimate_recommendation_impact()

~/projects/uplift/env/lib/python3.6/site-packages/causallift/causal_lift.py in estimate_cate_by_2_models(self, verbose)
    194                                             enable_ipw=self.enable_ipw,
    195                                             uplift_model_params=self.uplift_model_params,
--> 196                                             cv=self.cv)
    197         model_for_untreated = ModelForUntreated(train_df_, test_df_,
    198                                                 random_state=self.random_state,

~/projects/uplift/env/lib/python3.6/site-packages/causallift/model_for_each.py in __init__(self, *args, **kwargs)
    224     def __init__(self, *args, **kwargs):
    225         kwargs.update(treatment_val=1.0)
--> 226         super().__init__(*args, **kwargs)
    227 
    228 

~/projects/uplift/env/lib/python3.6/site-packages/causallift/model_for_each.py in __init__(self, train_df_, test_df_, treatment_val, random_state, verbose, cols_features, col_treatment, col_outcome, col_propensity, col_recommendation, min_propensity, max_propensity, enable_ipw, uplift_model_params, cv)
    101                              params, cv=cv, return_train_score=False, n_jobs=-1)
    102 
--> 103         model.fit(X_train, y_train, sample_weight=sample_weight)
    104         if verbose >= 3:
    105             print('### Best parameters of the model trained using samples with observational Treatment: {} \n {}'.

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    720                 return results_container[0]
    721 
--> 722             self._run_search(evaluate_candidates)
    723 
    724         results = results_container[0]

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
   1189     def _run_search(self, evaluate_candidates):
   1190         """Search all candidates in param_grid"""
-> 1191         evaluate_candidates(ParameterGrid(self.param_grid))
   1192 
   1193 

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
    709                                for parameters, (train, test)
    710                                in product(candidate_params,
--> 711                                           cv.split(X, y, groups)))
    712 
    713                 all_candidate_params.extend(candidate_params)

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    928 
    929             with self._backend.retrieval_context():
--> 930                 self.retrieve()
    931             # Make sure that we get a last message telling us we are done
    932             elapsed_time = time.time() - self._start_time

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in retrieve(self)
    831             try:
    832                 if getattr(self._backend, 'supports_timeout', False):
--> 833                     self._output.extend(job.get(timeout=self.timeout))
    834                 else:
    835                     self._output.extend(job.get())

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    519         AsyncResults.get from multiprocessing."""
    520         try:
--> 521             return future.result(timeout=timeout)
    522         except LokyTimeoutError:
    523             raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

TypeError: object of type 'float' has no len()

If I understand the document correctly, when the propensity is calculated beforehand, the flag enable_ipw is set to False. Assuming my object is initialized with the right parameters, I suspect that in line 76 in model_for_each module the sample_weight is defined as a float (1.0) while it should be a numpy array.

@Minyus Minyus self-assigned this Apr 28, 2019
@Minyus Minyus added bug Something isn't working good first issue Good for newcomers labels Apr 28, 2019
Minyus added a commit that referenced this issue Apr 28, 2019
…nable_ipw is set to False (Issue #1). Add "from IPython.display import display" so it can run in non-IPython environments.
@Minyus
Copy link
Owner

Minyus commented Apr 29, 2019

Hi @farismosman,

Thank you for reporting the error.
Yes, sample_weight should be an array.
I fixed the error in v0.0.2.

To use the propensity computed beforehand, set enable_ipw True, and specify the column name of propensity with col_propensity (apparently 'likelihood' in your case).

If you set enable_ipw False, Inverse Probability Weighting is disabled.

To clarify, there are 3 ways to use CausalLift.

  1. enable_ipw = True using propensity computed by CausalLift

    As in CausalLift_example.ipynb

  2. enable_ipw = True using propensity computed beforehand

    Your case

  3. enable_ipw = False

    If the data is from A/B testing or Randomized Control Trial

@Minyus Minyus closed this as completed May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants