# Model Comparison Lab

In this lab we will compare the performance of all the models we have learned about so far, using the car evaluation dataset.

## 1. Prepare the data

The [car evaluation dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/car/) is in the assets/datasets folder. By now you should be very familiar with this dataset.

1. Load the data into a pandas dataframe
- Encode the categorical features properly: define a map that preserves the scale (assigning smaller numbers to words indicating smaller quantities)
- Separate features from target into X and y

In [118]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.grid_search import GridSearchCV

In [119]:
df = pd.read_csv("../../assets/datasets/car.csv")
df.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,acceptability
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


In [120]:
print(df.persons.unique())
print(df.buying.unique())
print(df.doors.unique())
print(df.safety.unique())
print(df.acceptability.unique())
print(df.maint.unique())
print(df.lug_boot.unique())

['2' '4' 'more']
['vhigh' 'high' 'med' 'low']
['2' '3' '4' '5more']
['low' 'med' 'high']
['unacc' 'acc' 'vgood' 'good']
['vhigh' 'high' 'med' 'low']
['small' 'med' 'big']


In [121]:
map1 = {"low":1, 
        "med":2, 
        "high":3, 
        "vhigh":4}
map2 = {"small":1, 
        "med":2,
        "big":3}
map3 = {"unacc":1, 
        "acc":2, 
        "good":3, 
        "vgood":4}
map4 = {"2":2,
        "4":4,
        "more":5}
map5 = {"2":2, 
        "3":3, 
        "4":4,
        "5more": 5}

In [122]:
features = [c for c in df.columns if c != "acceptability"]
dfn = df.copy()

dfn.buying = df.buying.map(map1)
dfn.maint = df.maint.map(map1)
dfn.lug_boot = df.lug_boot.map(map2)
dfn.persons = df.persons.map(map4)
dfn.doors = df.doors.map(map5)
dfn.safety = df.safety.map(map1)
dfn.acceptability = df.acceptability.map(map3)

X = dfn[features]
y = dfn['acceptability']
X.head()


Unnamed: 0,buying,maint,doors,persons,lug_boot,safety
0,4,4,2,2,1,1
1,4,4,2,2,1,2
2,4,4,2,2,1,3
3,4,4,2,2,2,1
4,4,4,2,2,2,2


In [123]:
dfn.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1728 entries, 0 to 1727
Data columns (total 7 columns):
buying           1728 non-null int64
maint            1728 non-null int64
doors            1728 non-null int64
persons          1728 non-null int64
lug_boot         1728 non-null int64
safety           1728 non-null int64
acceptability    1728 non-null int64
dtypes: int64(7)
memory usage: 94.6 KB


## 2. Useful preparation

Since we will compare several models, let's write a couple of helper functions.

1. Separate X and y between a train and test set, using 30% test set, random state = 42
    - make sure that the data is shuffled and stratified
2. Define a function called `evaluate_model`, that trains the model on the train set, tests it on the test, calculates:
    - accuracy score
    - confusion matrix
    - classification report
3. Initialize a global dictionary to store the various models for later retrieval


In [124]:
from sklearn.cross_validation import train_test_split, KFold
from sklearn.metrics import accuracy_score,precision_score,recall_score,confusion_matrix,classification_report

In [125]:
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size = 0.3, random_state = 42, stratify =y)


def evaluate_model(model):
    model.fit(X_train,y_train)
    y_pred = model.predict(X_test)
    
    a = accuracy_score(y_test,y_pred)
    
    cm= confusion_matrix(y_test,y_pred)
    cr = classification_report(y_test,y_pred)
    
    print cm
    print cr
    
    return a
all_models = {}

## 3.a KNN

Let's start with `KNeighborsClassifier`.

1. Initialize a KNN model
- Evaluate it's performance with the function you previously defined
- Find the optimal value of K using grid search
    - Be careful on how you perform the cross validation in the grid search

In [126]:
from sklearn.neighbors import KNeighborsClassifier
a = evaluate_model(KNeighborsClassifier())

[[354   9   0   0]
 [  8 107   0   0]
 [  0   9  11   1]
 [  0   2   0  18]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.84      0.93      0.88       115
          3       1.00      0.52      0.69        21
          4       0.95      0.90      0.92        20

avg / total       0.95      0.94      0.94       519



In [127]:
from sklearn.grid_search import GridSearchCV

params = {"n_neighbors" : range(2,60)}

gsknn = GridSearchCV(KNeighborsClassifier(),
                    params, n_jobs = -1,
                    cv = KFold(len(y), n_folds = 3, shuffle =True))

In [128]:
gsknn.fit(X,y)

GridSearchCV(cv=sklearn.cross_validation.KFold(n=1728, n_folds=3, shuffle=True, random_state=None),
       error_score='raise',
       estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_neighbors': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)

In [129]:
gsknn.best_params_

{'n_neighbors': 5}

In [130]:
gsknn.best_score_

0.94502314814814814

In [131]:
gsknn.best_estimator_

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

In [132]:
a= evaluate_model(gsknn.best_estimator_)

[[354   9   0   0]
 [  8 107   0   0]
 [  0   9  11   1]
 [  0   2   0  18]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.84      0.93      0.88       115
          3       1.00      0.52      0.69        21
          4       0.95      0.90      0.92        20

avg / total       0.95      0.94      0.94       519



In [133]:
all_models["knn"] = {"model": gsknn.best_estimator_,
                    "score" : a}

In [134]:
all_models

{'knn': {'model': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
             metric_params=None, n_jobs=1, n_neighbors=5, p=2,
             weights='uniform'), 'score': 0.94412331406551064}}

## 3.b Bagging + KNN

Now that we have found the optimal K, let's wrap `KNeighborsClassifier` in a BaggingClassifier and see if the score improves.

1. Wrap the KNN model in a Bagging Classifier
- Evaluate performance
- Do a grid search only on the bagging classifier params

In [135]:
from sklearn.ensemble import BaggingClassifier

In [136]:
import sklearn

In [137]:
sklearn.__version__

'0.17.1'

## Wrap the KNN model in a Bagging Classifier

In [138]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import  BaggingClassifier
from sklearn.grid_search import GridSearchCV

cv = StratifiedKFold(y, n_folds=3, shuffle=True, random_state=41)

knn = KNeighborsClassifier()
bgClass = BaggingClassifier(knn)
a = evaluate_model(bgClass)

[[358   5   0   0]
 [  9 106   0   0]
 [  0   6  13   2]
 [  0   2   1  17]]
             precision    recall  f1-score   support

          1       0.98      0.99      0.98       363
          2       0.89      0.92      0.91       115
          3       0.93      0.62      0.74        21
          4       0.89      0.85      0.87        20

avg / total       0.95      0.95      0.95       519



In [139]:
evaluate_model(bgClass)

[[356   7   0   0]
 [ 10 105   0   0]
 [  0   8  12   1]
 [  0   4   1  15]]
             precision    recall  f1-score   support

          1       0.97      0.98      0.98       363
          2       0.85      0.91      0.88       115
          3       0.92      0.57      0.71        21
          4       0.94      0.75      0.83        20

avg / total       0.94      0.94      0.94       519



0.94026974951830444

In [140]:
bgClass.fit(X,y)

BaggingClassifier(base_estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

## Do a grid search only on the bagging classifier params

In [144]:
bagging_params ={"n_estimators": [10,20],
                "max_samples":[0.7,1.0],
                "max_features":[0.7,1.0],
                "bootstrap_features":[True,False]}
gsbaggingknn = GridSearchCV(bgClass,
                    bagging_params, n_jobs = -1,
                    cv = KFold(len(y), n_folds = 3, shuffle =True))

In [145]:
gsbaggingknn 

GridSearchCV(cv=sklearn.cross_validation.KFold(n=1728, n_folds=3, shuffle=True, random_state=None),
       error_score='raise',
       estimator=BaggingClassifier(base_estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': [10, 20], 'max_samples': [0.7, 1.0], 'bootstrap_features': [True, False], 'max_features': [0.7, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)

In [148]:
gsbaggingknn .fit(X,y)

GridSearchCV(cv=sklearn.cross_validation.KFold(n=1728, n_folds=3, shuffle=True, random_state=None),
       error_score='raise',
       estimator=BaggingClassifier(base_estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': [10, 20], 'max_samples': [0.7, 1.0], 'bootstrap_features': [True, False], 'max_features': [0.7, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)

In [149]:
gsbaggingknn.best_params_

{'bootstrap_features': False,
 'max_features': 1.0,
 'max_samples': 1.0,
 'n_estimators': 10}

In [150]:
gsbaggingknn.best_score_

0.95428240740740744

In [151]:
gsbaggingknn.best_estimator_

BaggingClassifier(base_estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

In [152]:
a= evaluate_model(gsbaggingknn.best_estimator_)

[[357   6   0   0]
 [  7 106   2   0]
 [  0   5  15   1]
 [  0   3   0  17]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.88      0.92      0.90       115
          3       0.88      0.71      0.79        21
          4       0.94      0.85      0.89        20

avg / total       0.95      0.95      0.95       519



In [154]:
evaluate_model(gsbaggingknn)

JoblibIndexError: JoblibIndexError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
//anaconda/lib/python2.7/runpy.py in _run_module_as_main(mod_name='ipykernel.__main__', alter_argv=1)
    169     pkg_name = mod_name.rpartition('.')[0]
    170     main_globals = sys.modules["__main__"].__dict__
    171     if alter_argv:
    172         sys.argv[0] = fname
    173     return _run_code(code, main_globals, None,
--> 174                      "__main__", fname, loader, pkg_name)
        fname = '/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py'
        loader = <pkgutil.ImpLoader instance>
        pkg_name = 'ipykernel'
    175 
    176 def run_module(mod_name, init_globals=None,
    177                run_name=None, alter_sys=False):
    178     """Execute a module's code without importing it

...........................................................................
//anaconda/lib/python2.7/runpy.py in _run_code(code=<code object <module> at 0x1007ce1b0, file "/ana...2.7/site-packages/ipykernel/__main__.py", line 1>, run_globals={'__builtins__': <module '__builtin__' (built-in)>, '__doc__': None, '__file__': '/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': 'ipykernel', 'app': <module 'ipykernel.kernelapp' from '//anaconda/lib/python2.7/site-packages/ipykernel/kernelapp.pyc'>}, init_globals=None, mod_name='__main__', mod_fname='/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py', mod_loader=<pkgutil.ImpLoader instance>, pkg_name='ipykernel')
     67         run_globals.update(init_globals)
     68     run_globals.update(__name__ = mod_name,
     69                        __file__ = mod_fname,
     70                        __loader__ = mod_loader,
     71                        __package__ = pkg_name)
---> 72     exec code in run_globals
        code = <code object <module> at 0x1007ce1b0, file "/ana...2.7/site-packages/ipykernel/__main__.py", line 1>
        run_globals = {'__builtins__': <module '__builtin__' (built-in)>, '__doc__': None, '__file__': '/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': 'ipykernel', 'app': <module 'ipykernel.kernelapp' from '//anaconda/lib/python2.7/site-packages/ipykernel/kernelapp.pyc'>}
     73     return run_globals
     74 
     75 def _run_module_code(code, init_globals=None,
     76                     mod_name=None, mod_fname=None,

...........................................................................
/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py in <module>()
      1 
      2 
----> 3 
      4 if __name__ == '__main__':
      5     from ipykernel import kernelapp as app
      6     app.launch_new_instance()
      7 
      8 
      9 
     10 

...........................................................................
//anaconda/lib/python2.7/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    648 
    649         If a global instance already exists, this reinitializes and starts it
    650         """
    651         app = cls.instance(**kwargs)
    652         app.initialize(argv)
--> 653         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    654 
    655 #-----------------------------------------------------------------------------
    656 # utility functions, for convenience
    657 #-----------------------------------------------------------------------------

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    469             return self.subapp.start()
    470         if self.poller is not None:
    471             self.poller.start()
    472         self.kernel.start()
    473         try:
--> 474             ioloop.IOLoop.instance().start()
    475         except KeyboardInterrupt:
    476             pass
    477 
    478 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
//anaconda/lib/python2.7/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    157             PollIOLoop.configure(ZMQIOLoop)
    158         return PollIOLoop.current(*args, **kwargs)
    159     
    160     def start(self):
    161         try:
--> 162             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    163         except ZMQError as e:
    164             if e.errno == ETERM:
    165                 # quietly return on ETERM
    166                 pass

...........................................................................
//anaconda/lib/python2.7/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    882                 self._events.update(event_pairs)
    883                 while self._events:
    884                     fd, events = self._events.popitem()
    885                     try:
    886                         fd_obj, handler_func = self._handlers[fd]
--> 887                         handler_func(fd_obj, events)
        handler_func = <function null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    888                     except (OSError, IOError) as e:
    889                         if errno_from_exception(e) == errno.EPIPE:
    890                             # Happens when the client closes the connection
    891                             pass

...........................................................................
//anaconda/lib/python2.7/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
//anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
//anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
//anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
//anaconda/lib/python2.7/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    271         if self.control_stream:
    272             self.control_stream.on_recv(self.dispatch_control, copy=False)
    273 
    274         def make_dispatcher(stream):
    275             def dispatcher(msg):
--> 276                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    277             return dispatcher
    278 
    279         for s in self.shell_streams:
    280             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {u'allow_stdin': True, u'code': u'evaluate_model(gsbaggingknn)', u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-12-05T17:02:39.111325', u'msg_id': u'480A7FA9A730488982C7A15C7B9D309D', u'msg_type': u'execute_request', u'session': u'47E3ABF017EA497A815EC2C2412D35C9', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'480A7FA9A730488982C7A15C7B9D309D', 'msg_type': u'execute_request', 'parent_header': {}})
    223             self.log.error("UNKNOWN MESSAGE TYPE: %r", msg_type)
    224         else:
    225             self.log.debug("%s: %s", msg_type, msg)
    226             self.pre_handler_hook()
    227             try:
--> 228                 handler(stream, idents, msg)
        handler = <bound method IPythonKernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = ['47E3ABF017EA497A815EC2C2412D35C9']
        msg = {'buffers': [], 'content': {u'allow_stdin': True, u'code': u'evaluate_model(gsbaggingknn)', u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-12-05T17:02:39.111325', u'msg_id': u'480A7FA9A730488982C7A15C7B9D309D', u'msg_type': u'execute_request', u'session': u'47E3ABF017EA497A815EC2C2412D35C9', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'480A7FA9A730488982C7A15C7B9D309D', 'msg_type': u'execute_request', 'parent_header': {}}
    229             except Exception:
    230                 self.log.error("Exception in message handler:", exc_info=True)
    231             finally:
    232                 self.post_handler_hook()

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=['47E3ABF017EA497A815EC2C2412D35C9'], parent={'buffers': [], 'content': {u'allow_stdin': True, u'code': u'evaluate_model(gsbaggingknn)', u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-12-05T17:02:39.111325', u'msg_id': u'480A7FA9A730488982C7A15C7B9D309D', u'msg_type': u'execute_request', u'session': u'47E3ABF017EA497A815EC2C2412D35C9', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'480A7FA9A730488982C7A15C7B9D309D', 'msg_type': u'execute_request', 'parent_header': {}})
    385         if not silent:
    386             self.execution_count += 1
    387             self._publish_execute_input(code, parent, self.execution_count)
    388 
    389         reply_content = self.do_execute(code, silent, store_history,
--> 390                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    391 
    392         # Flush output before sending the reply.
    393         sys.stdout.flush()
    394         sys.stderr.flush()

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code=u'evaluate_model(gsbaggingknn)', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = u'evaluate_model(gsbaggingknn)'
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=(u'evaluate_model(gsbaggingknn)',), **kwargs={'silent': False, 'store_history': True})
    496             )
    497         self.payload_manager.write_payload(payload)
    498 
    499     def run_cell(self, *args, **kwargs):
    500         self._last_traceback = None
--> 501         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = (u'evaluate_model(gsbaggingknn)',)
        kwargs = {'silent': False, 'store_history': True}
    502 
    503     def _showtraceback(self, etype, evalue, stb):
    504         # try to preserve ordering of tracebacks and print statements
    505         sys.stdout.flush()

...........................................................................
//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell=u'evaluate_model(gsbaggingknn)', store_history=True, silent=False, shell_futures=True)
   2712                 self.displayhook.exec_result = result
   2713 
   2714                 # Execute the user code
   2715                 interactivity = "none" if silent else self.ast_node_interactivity
   2716                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2717                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler instance>
   2718                 
   2719                 self.last_execution_succeeded = not has_raised
   2720 
   2721                 # Reset this so later displayed values do not modify the

...........................................................................
//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Expr object>], cell_name='<ipython-input-154-0378286ac445>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler instance>, result=<ExecutionResult object at 114fa2a10, execution_..._before_exec=None error_in_exec=None result=None>)
   2822                     return True
   2823 
   2824             for i, node in enumerate(to_run_interactive):
   2825                 mod = ast.Interactive([node])
   2826                 code = compiler(mod, cell_name, "single")
-> 2827                 if self.run_code(code, result):
        self.run_code = <bound method ZMQInteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x114fdf5b0, file "<ipython-input-154-0378286ac445>", line 1>
        result = <ExecutionResult object at 114fa2a10, execution_..._before_exec=None error_in_exec=None result=None>
   2828                     return True
   2829 
   2830             # Flush softspace
   2831             if softspace(sys.stdout, 0):

...........................................................................
//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x114fdf5b0, file "<ipython-input-154-0378286ac445>", line 1>, result=<ExecutionResult object at 114fa2a10, execution_..._before_exec=None error_in_exec=None result=None>)
   2876         outflag = 1  # happens in more places, so it's easier as default
   2877         try:
   2878             try:
   2879                 self.hooks.pre_run_code_hook()
   2880                 #rprint('Running code', repr(code_obj)) # dbg
-> 2881                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x114fdf5b0, file "<ipython-input-154-0378286ac445>", line 1>
        self.user_global_ns = {'BaggingClassifier': <class 'sklearn.ensemble.bagging.BaggingClassifier'>, 'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', u'df = pd.read_csv("../car.cv")', u"import pandas as pd\nimport numpy as np\nimpor...s plt\nget_ipython().magic(u'matplotlib inline')", u'df = pd.read_csv("../car.cv")', u'df = pd.read_csv("./car.cv")', u'df = pd.read_csv("../../datasets/assets/car.cv")', u'df = pd.read_csv("../../assets/datasets/car.cv")', u'df = pd.read_csv("../../assets/datasets/car.csv")', u'df = pd.read_csv("../../assets/datasets/car.csv")\ndf.head()', u'print(df.persons.unique())\nprint(df.buying.un...(df.maint.unique())\nprint(df.lug_boot.unique())', u'map1 = {"low":1, "med":2, "high":3, "vhigh":4}...ore":5}\nmap5 = {"2":2, "3":3, "4":4,"5more": 5}', u'features = [c for c in df.columns if c != "acc...[features]\ny = dfn[\'acceptability\']\nX.head()', u'dfn.info()', u'from sklearn.cross_validation import train_tes...all_score,confusion_matrix,classification_report', u'X_train,X_test,y_train,y_test = train_test_spl...\n    print cm\n    print cr\n    \n    return a', u'from sklearn.neighbors import KneighborsClassifier\na = evaluate_model(KneighborsClassifier())', u'from sklearn.neighbors import KNeighborsClassifier\na = evaluate_model(KNeighborsClassifier())', u'X_train,X_test,y_train,y_test = train_test_spl...\n    print cm\n    print cr\n    \n    return a', u'all_models = {}', u'from sklearn.grid_search import GridSearchCV\n... cv = KFold(len(y), n_folds = 3, shuffle =True))', ...], 'KFold': <class 'sklearn.cross_validation.KFold'>, 'KNeighborsClassifier': <class 'sklearn.neighbors.classification.KNeighborsClassifier'>, 'LogisticRegression': <class 'sklearn.linear_model.logistic.LogisticRegression'>, 'Out': {8:   buying  maint doors persons lug_boot safety ac...vhigh     2       2      med    med         unacc, 11:    buying  maint  doors  persons  lug_boot  safe...       4      4      2        2         2       2, 22: GridSearchCV(cv=sklearn.cross_validation.KFold(n...='2*n_jobs', refit=True, scoring=None, verbose=0), 23: {'n_neighbors': 5}, 24: 0.94618055555555558, 27:    buying  maint  doors  persons  lug_boot  safe...       4      4      2        2         2       2, 28:    buying  maint  doors  persons  lug_boot  safe...       4      4      2        2         2       2, 35: GridSearchCV(cv=sklearn.cross_validation.KFold(n...='2*n_jobs', refit=True, scoring=None, verbose=0), 41:       buying  maint  doors  persons  lug_boot  s...     5         3       3

[1728 rows x 6 columns], 42: buying      int64
maint       int64
doors       ...lug_boot    int64
safety      int64
dtype: object, ...}, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'StratifiedKFold': <class 'sklearn.cross_validation.StratifiedKFold'>, 'X':       buying  maint  doors  persons  lug_boot  s...     5         3       3

[1728 rows x 6 columns], ...}
        self.user_ns = {'BaggingClassifier': <class 'sklearn.ensemble.bagging.BaggingClassifier'>, 'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', u'df = pd.read_csv("../car.cv")', u"import pandas as pd\nimport numpy as np\nimpor...s plt\nget_ipython().magic(u'matplotlib inline')", u'df = pd.read_csv("../car.cv")', u'df = pd.read_csv("./car.cv")', u'df = pd.read_csv("../../datasets/assets/car.cv")', u'df = pd.read_csv("../../assets/datasets/car.cv")', u'df = pd.read_csv("../../assets/datasets/car.csv")', u'df = pd.read_csv("../../assets/datasets/car.csv")\ndf.head()', u'print(df.persons.unique())\nprint(df.buying.un...(df.maint.unique())\nprint(df.lug_boot.unique())', u'map1 = {"low":1, "med":2, "high":3, "vhigh":4}...ore":5}\nmap5 = {"2":2, "3":3, "4":4,"5more": 5}', u'features = [c for c in df.columns if c != "acc...[features]\ny = dfn[\'acceptability\']\nX.head()', u'dfn.info()', u'from sklearn.cross_validation import train_tes...all_score,confusion_matrix,classification_report', u'X_train,X_test,y_train,y_test = train_test_spl...\n    print cm\n    print cr\n    \n    return a', u'from sklearn.neighbors import KneighborsClassifier\na = evaluate_model(KneighborsClassifier())', u'from sklearn.neighbors import KNeighborsClassifier\na = evaluate_model(KNeighborsClassifier())', u'X_train,X_test,y_train,y_test = train_test_spl...\n    print cm\n    print cr\n    \n    return a', u'all_models = {}', u'from sklearn.grid_search import GridSearchCV\n... cv = KFold(len(y), n_folds = 3, shuffle =True))', ...], 'KFold': <class 'sklearn.cross_validation.KFold'>, 'KNeighborsClassifier': <class 'sklearn.neighbors.classification.KNeighborsClassifier'>, 'LogisticRegression': <class 'sklearn.linear_model.logistic.LogisticRegression'>, 'Out': {8:   buying  maint doors persons lug_boot safety ac...vhigh     2       2      med    med         unacc, 11:    buying  maint  doors  persons  lug_boot  safe...       4      4      2        2         2       2, 22: GridSearchCV(cv=sklearn.cross_validation.KFold(n...='2*n_jobs', refit=True, scoring=None, verbose=0), 23: {'n_neighbors': 5}, 24: 0.94618055555555558, 27:    buying  maint  doors  persons  lug_boot  safe...       4      4      2        2         2       2, 28:    buying  maint  doors  persons  lug_boot  safe...       4      4      2        2         2       2, 35: GridSearchCV(cv=sklearn.cross_validation.KFold(n...='2*n_jobs', refit=True, scoring=None, verbose=0), 41:       buying  maint  doors  persons  lug_boot  s...     5         3       3

[1728 rows x 6 columns], 42: buying      int64
maint       int64
doors       ...lug_boot    int64
safety      int64
dtype: object, ...}, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'StratifiedKFold': <class 'sklearn.cross_validation.StratifiedKFold'>, 'X':       buying  maint  doors  persons  lug_boot  s...     5         3       3

[1728 rows x 6 columns], ...}
   2882             finally:
   2883                 # Reset our crash handler in place
   2884                 sys.excepthook = old_excepthook
   2885         except SystemExit as e:

...........................................................................
/Users/maximeb/Documents/dsi-maximebouadoumou/week-06/3.4-lab/code/starter-code/<ipython-input-154-0378286ac445> in <module>()
----> 1 
      2 
      3 
      4 
      5 
      6 evaluate_model(gsbaggingknn)
      7 
      8 
      9 
     10 

...........................................................................
/Users/maximeb/Documents/dsi-maximebouadoumou/week-06/3.4-lab/code/starter-code/<ipython-input-125-61d896662ccc> in evaluate_model(model=GridSearchCV(cv=sklearn.cross_validation.KFold(n...='2*n_jobs', refit=True, scoring=None, verbose=0))
      1 
      2 X_train,X_test,y_train,y_test = train_test_split(X,y, test_size = 0.3, random_state = 42, stratify =y)
      3 
      4 
----> 5 def evaluate_model(model):
      6     model.fit(X_train,y_train)
      7     y_pred = model.predict(X_test)
      8     
      9     a = accuracy_score(y_test,y_pred)
     10     
     11     cm= confusion_matrix(y_test,y_pred)

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/grid_search.py in fit(self=GridSearchCV(cv=sklearn.cross_validation.KFold(n...='2*n_jobs', refit=True, scoring=None, verbose=0), X=      buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns], y=326     1
1238    4
1706    1
492     1
1556    ...    1
1480    2
Name: acceptability, dtype: int64)
    799         y : array-like, shape = [n_samples] or [n_samples, n_output], optional
    800             Target relative to X for classification or regression;
    801             None for unsupervised learning.
    802 
    803         """
--> 804         return self._fit(X, y, ParameterGrid(self.param_grid))
        self._fit = <bound method GridSearchCV._fit of GridSearchCV(...'2*n_jobs', refit=True, scoring=None, verbose=0)>
        X =       buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns]
        y = 326     1
1238    4
1706    1
492     1
1556    ...    1
1480    2
Name: acceptability, dtype: int64
        self.param_grid = {'bootstrap_features': [True, False], 'max_features': [0.7, 1.0], 'max_samples': [0.7, 1.0], 'n_estimators': [10, 20]}
    805 
    806 
    807 class RandomizedSearchCV(BaseSearchCV):
    808     """Randomized search on hyper parameters.

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/grid_search.py in _fit(self=GridSearchCV(cv=sklearn.cross_validation.KFold(n...='2*n_jobs', refit=True, scoring=None, verbose=0), X=      buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns], y=326     1
1238    4
1706    1
492     1
1556    ...    1
1480    2
Name: acceptability, dtype: int64, parameter_iterable=<sklearn.grid_search.ParameterGrid object>)
    548         )(
    549             delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
    550                                     train, test, self.verbose, parameters,
    551                                     self.fit_params, return_parameters=True,
    552                                     error_score=self.error_score)
--> 553                 for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.grid_search.ParameterGrid object>
    554                 for train, test in cv)
    555 
    556         # Out is a list of triplet: score, estimator, n_test_samples
    557         n_fits = len(out)

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object <genexpr>>)
    805             if pre_dispatch == "all" or n_jobs == 1:
    806                 # The iterable was consumed all at once by the above for loop.
    807                 # No need to wait for async callbacks to trigger to
    808                 # consumption.
    809                 self._iterating = False
--> 810             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    811             # Make sure that we get a last message telling us we are done
    812             elapsed_time = time.time() - self._start_time
    813             self._print('Done %3i out of %3i | elapsed: %s finished',
    814                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
IndexError                                         Mon Dec  5 17:02:39 2016
PID: 1613                              Python 2.7.12: //anaconda/bin/python
...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (BaggingClassifier(base_estimator=KNeighborsClass...  random_state=None, verbose=0, warm_start=False),       buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns], 326     1
1238    4
1706    1
492     1
1556    ...    1
1480    2
Name: acceptability, dtype: int64, <function _passthrough_scorer>, array([   3,    6,    8, ..., 1724, 1725, 1726]), array([   0,    1,    2,    4,    5,    7,    9,...1710, 1711, 1715,
       1716, 1719, 1721, 1727]), 0, {'bootstrap_features': True, 'max_features': 0.7, 'max_samples': 0.7, 'n_estimators': 10}, {})
        kwargs = {'error_score': 'raise', 'return_parameters': True}
        self.items = [(<function _fit_and_score>, (BaggingClassifier(base_estimator=KNeighborsClass...  random_state=None, verbose=0, warm_start=False),       buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns], 326     1
1238    4
1706    1
492     1
1556    ...    1
1480    2
Name: acceptability, dtype: int64, <function _passthrough_scorer>, array([   3,    6,    8, ..., 1724, 1725, 1726]), array([   0,    1,    2,    4,    5,    7,    9,...1710, 1711, 1715,
       1716, 1719, 1721, 1727]), 0, {'bootstrap_features': True, 'max_features': 0.7, 'max_samples': 0.7, 'n_estimators': 10}, {}), {'error_score': 'raise', 'return_parameters': True})]
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator=BaggingClassifier(base_estimator=KNeighborsClass...  random_state=None, verbose=0, warm_start=False), X=      buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns], y=326     1
1238    4
1706    1
492     1
1556    ...    1
1480    2
Name: acceptability, dtype: int64, scorer=<function _passthrough_scorer>, train=array([   3,    6,    8, ..., 1724, 1725, 1726]), test=array([   0,    1,    2,    4,    5,    7,    9,...1710, 1711, 1715,
       1716, 1719, 1721, 1727]), verbose=0, parameters={'bootstrap_features': True, 'max_features': 0.7, 'max_samples': 0.7, 'n_estimators': 10}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
   1519     if parameters is not None:
   1520         estimator.set_params(**parameters)
   1521 
   1522     start_time = time.time()
   1523 
-> 1524     X_train, y_train = _safe_split(estimator, X, y, train)
        X_train = undefined
        y_train = undefined
        estimator = BaggingClassifier(base_estimator=KNeighborsClass...  random_state=None, verbose=0, warm_start=False)
        X =       buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns]
        y = 326     1
1238    4
1706    1
492     1
1556    ...    1
1480    2
Name: acceptability, dtype: int64
        train = array([   3,    6,    8, ..., 1724, 1725, 1726])
   1525     X_test, y_test = _safe_split(estimator, X, y, test, train)
   1526 
   1527     try:
   1528         if y_train is None:

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py in _safe_split(estimator=BaggingClassifier(base_estimator=KNeighborsClass...  random_state=None, verbose=0, warm_start=False), X=      buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns], y=326     1
1238    4
1706    1
492     1
1556    ...    1
1480    2
Name: acceptability, dtype: int64, indices=array([   3,    6,    8, ..., 1724, 1725, 1726]), train_indices=None)
   1586             if train_indices is None:
   1587                 X_subset = X[np.ix_(indices, indices)]
   1588             else:
   1589                 X_subset = X[np.ix_(indices, train_indices)]
   1590         else:
-> 1591             X_subset = safe_indexing(X, indices)
        X_subset = undefined
        X =       buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns]
        indices = array([   3,    6,    8, ..., 1724, 1725, 1726])
   1592 
   1593     if y is not None:
   1594         y_subset = safe_indexing(y, indices)
   1595     else:

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/utils/__init__.py in safe_indexing(X=      buying  maint  doors  persons  lug_boot  s...     5         2       2

[1209 rows x 6 columns], indices=array([   3,    6,    8, ..., 1724, 1725, 1726]))
    147         Indices according to which X will be subsampled.
    148     """
    149     if hasattr(X, "iloc"):
    150         # Pandas Dataframes and Series
    151         try:
--> 152             return X.iloc[indices]
        X.iloc = <pandas.core.indexing._iLocIndexer object>
        indices = array([   3,    6,    8, ..., 1724, 1725, 1726])
    153         except ValueError:
    154             # Cython typed memoryviews internally used in pandas do not support
    155             # readonly buffers.
    156             warnings.warn("Copying input dataframe for slicing.",

...........................................................................
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.py in __getitem__(self=<pandas.core.indexing._iLocIndexer object>, key=array([   3,    6,    8, ..., 1724, 1725, 1726]))
   1291             key = com._apply_if_callable(key, self.obj)
   1292 
   1293         if type(key) is tuple:
   1294             return self._getitem_tuple(key)
   1295         else:
-> 1296             return self._getitem_axis(key, axis=0)
        self._getitem_axis = <bound method _iLocIndexer._getitem_axis of <pandas.core.indexing._iLocIndexer object>>
        key = array([   3,    6,    8, ..., 1724, 1725, 1726])
   1297 
   1298     def _getitem_axis(self, key, axis=0):
   1299         raise NotImplementedError()
   1300 

...........................................................................
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.py in _getitem_axis(self=<pandas.core.indexing._iLocIndexer object>, key=array([   3,    6,    8, ..., 1724, 1725, 1726]), axis=0)
   1594         else:
   1595 
   1596             if is_list_like_indexer(key):
   1597 
   1598                 # validate list bounds
-> 1599                 self._is_valid_list_like(key, axis)
        self._is_valid_list_like = <bound method _iLocIndexer._is_valid_list_like of <pandas.core.indexing._iLocIndexer object>>
        key = array([   3,    6,    8, ..., 1724, 1725, 1726])
        axis = 0
   1600 
   1601                 # force an actual list
   1602                 key = list(key)
   1603 

...........................................................................
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.py in _is_valid_list_like(self=<pandas.core.indexing._iLocIndexer object>, key=array([   3,    6,    8, ..., 1724, 1725, 1726]), axis=0)
   1533         # coerce the key to not exceed the maximum size of the index
   1534         arr = np.array(key)
   1535         ax = self.obj._get_axis(axis)
   1536         l = len(ax)
   1537         if len(arr) and (arr.max() >= l or arr.min() < -l):
-> 1538             raise IndexError("positional indexers are out-of-bounds")
   1539 
   1540         return True
   1541 
   1542     def _getitem_tuple(self, tup):

IndexError: positional indexers are out-of-bounds
___________________________________________________________________________

## 4. Logistic Regression

Let's see if logistic regression performs better

1. Initialize LR and test on Train/Test set
- Find optimal params with Grid Search
- See if Bagging improves the score

In [None]:
from sklearn.linear_model import LogisticRegression
a = evaluate_model(LogisticRegression())

In [None]:
from sklearn.grid_search import GridSearchCV

params = {"n_neighbors" : range(2,60)}

gsknn = GridSearchCV(KNeighborsClassifier(),
                    params, n_jobs = -1,
                    cv = KFold(len(y), n_folds = 3, shuffle =True))

In [None]:
gsknn.fit(X,y)

In [None]:
bgknn.best_params_

In [None]:
bgknn.best_score_

## 5. Decision Trees

Let's see if Decision Trees perform better

1. Initialize DT and test on Train/Test set
- Find optimal params with Grid Search
- See if Bagging improves the score

In [None]:
from sklearn.cross_validation import cross_val_score, StratifiedKFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, BaggingClassifier

cv = StratifiedKFold(y, n_folds=3, shuffle=True, random_state=41)

Now let's initialize a Decision Tree Classifier and evaluate its performance:

In [None]:
dt = DecisionTreeClassifier(class_weight='balanced')
bdt = BaggingClassifier(DecisionTreeClassifier())
s = cross_val_score(dt, X, y, cv=cv, n_jobs=-1)
print "{} Score:\t{:0.3} ± {:0.3}".format("Decision Tree", s.mean().round(3), s.std().round(3))
score(bdt, "Bagging DT")

## 6. Support Vector Machines

Let's see if SVM perform better

1. Initialize SVM and test on Train/Test set
- Find optimal params with Grid Search
- See if Bagging improves the score

## 7. Random Forest & Extra Trees

Let's see if Random Forest and Extra Trees perform better

1. Initialize RF and ET and test on Train/Test set
- Find optimal params with Grid Search

In [None]:
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
bdt = BaggingClassifier(DecisionTreeClassifier())
rf = RandomForestClassifier(class_weight='balanced', n_jobs=-1)
et = ExtraTreesClassifier(class_weight='balanced', n_jobs=-1)

def score(model, name):
    s = cross_val_score(model, X, y, cv=cv, n_jobs=-1)
    print "{} Score:\t{:0.3} ± {:0.3}".format(name, s.mean().round(3), s.std().round(3))

score(dt, "Decision Tree")
score(bdt, "Bagging DT")
score(rf, "Random Forest")
score(et, "Extra Trees")



## 8. Model comparison

Let's compare the scores of the various models.

1. Do a bar chart of the scores of the best models. Who's the winner on the train/test split?
- Re-test all the models using a 3 fold stratified shuffled cross validation
- Do a bar chart with errorbars of the cross validation average scores. is the winner the same?


## Bonus

We have encoded the data using a map that preserves the scale.
Would our results have changed if we had encoded the categorical data using `pd.get_dummies` or `OneHotEncoder`  to encode them as binary variables instead?

1. Repeat the analysis for this scenario. Is it better?
- Experiment with other models or other parameters, can you beat your classmates best score?