# ML Pipeline 
按照如下的指导要求，搭建你的机器学习管道。
### 1. 导入与加载
- 导入 Python 库
- 使用 [`read_sql_table`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html) 从数据库中加载数据集
- 定义特征变量X 和目标变量 Y

In [1]:
# import libraries
import pandas as pd
from sqlalchemy import create_engine
import re
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer

from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.multioutput import MultiOutputClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV

from sklearn.externals import joblib
import pickle

In [2]:
categories = ['related', 'request', 'offer', 'aid_related', 'medical_help', 'medical_products', 'search_and_rescue',
        'security', 'military', 'child_alone', 'water', 'food', 'shelter', 'clothing', 'money',
        'missing_people', 'refugees', 'death', 'other_aid', 'infrastructure_related', 'transport',
        'buildings', 'electricity', 'tools', 'hospitals', 'shops', 'aid_centers', 'other_infrastructure',
        'weather_related', 'floods', 'storm', 'fire', 'earthquake', 'cold', 'other_weather', 'direct_report']

In [3]:
# load data from database
engine = create_engine('sqlite:///DisasterResponse.db')
df = pd.read_sql_table('messages_categories', engine)
#X = df[['message']]
X = df.message.values
y = df[categories].values


In [4]:
X.shape

(26180,)

In [5]:
y.shape

(26180, 36)

In [6]:
# test
df = df[(df['related'] == 2)]
df.head()

Unnamed: 0,id,message,original,genre,related,request,offer,aid_related,medical_help,medical_products,...,aid_centers,other_infrastructure,weather_related,floods,storm,fire,earthquake,cold,other_weather,direct_report
117,146,Dans la zone de Saint Etienne la route de Jacm...,Nan zon st. etine rout jakmel la bloke se mize...,direct,2,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
218,263,. .. i with limited means. Certain patients co...,t avec des moyens limites. Certains patients v...,direct,2,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
304,373,The internet caf Net@le that's by the Dal road...,Cyber cafe net@le ki chita rout de dal tou pr ...,direct,2,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
459,565,"Bonsoir, on est a bon repos aprs la compagnie ...",Bonswa nou nan bon repo apri teleko nan wout t...,direct,2,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
575,700,URGENT CRECHE ORPHANAGE KAY TOUT TIMOUN CROIX ...,r et Salon Furterer. mwen se yon Cosmtologue. ...,direct,2,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### 2. 编写分词函数，开始处理文本

In [5]:
def tokenize(text):
    # Normalize text
    text = re.sub(r"[^a-zA-Z0-9]", " ", text)
    
    # # Tokenize text
    words = word_tokenize(text)
    
    # Remove stop words
    words = [w for w in words if w not in stopwords.words("english")]
    
    # reduce words to their stems
    # stemmed = [PorterStemmer().stem(w).lower().strip() for w in words]
    
    lemmatizer = WordNetLemmatizer()
    
    clean_tokens = []
    for tok in words:
        clean_tok = lemmatizer.lemmatize(tok).lower().strip()
        clean_tokens.append(clean_tok)
    
    return clean_tokens
    

### 3. 创建机器学习管道 
这个机器学习管道应该接收 `message` 列作输入，输出分类结果，分类结果属于该数据集中的 36 个类。你会发现 [MultiOutputClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html) 在预测多目标变量时很有用。

In [6]:
pipeline = Pipeline([
        ('vect', CountVectorizer(tokenizer=tokenize)),
        ('tfidf', TfidfTransformer()),
        ('clf', MultiOutputClassifier(KNeighborsClassifier()))
    ])

### 4. 训练管道
- 将数据分割成训练和测试集
- 训练管道

In [11]:
X_train.shape

(19635,)

In [12]:
y_train.shape

(19635, 36)

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y)


In [13]:
%%time
# train classifier
pipeline.fit(X_train, y_train)

CPU times: user 45.8 s, sys: 9.61 s, total: 55.4 s
Wall time: 57.9 s


Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...ric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
           n_jobs=1))])

In [14]:
%%time
# predict on test data
y_pred = pipeline.predict(X_test)

### 5. 测试模型
报告数据集中每个输出类别的 f1 得分、准确度和召回率。你可以对列进行遍历，并对每个元素调用 sklearn 的 `classification_report`。

In [77]:
y_test[:,0]

array([1, 1, 1, ..., 1, 1, 1])

In [15]:
for i in range(0,35):
    print("Categories:", categories[i])
    print(classification_report(y_test[:,i], y_pred[:,i]))

Categories: related
             precision    recall  f1-score   support

          0       0.67      0.11      0.19      1594
          1       0.82      0.07      0.13      4899
          2       0.01      0.96      0.02        52

avg / total       0.78      0.09      0.15      6545

Categories: request
             precision    recall  f1-score   support

          0       0.84      0.99      0.91      5446
          1       0.73      0.09      0.15      1099

avg / total       0.82      0.84      0.79      6545

Categories: offer
             precision    recall  f1-score   support

          0       1.00      1.00      1.00      6516
          1       0.00      0.00      0.00        29

avg / total       0.99      1.00      0.99      6545

Categories: aid_related
             precision    recall  f1-score   support

          0       0.60      0.99      0.75      3895
          1       0.72      0.04      0.08      2650

avg / total       0.65      0.61      0.48      6545

Categ

  'precision', 'predicted', average, warn_for)


### 6. 优化模型
使用网格搜索来找到最优的参数组合。 

In [12]:
pipeline.get_params()

{'clf': MultiOutputClassifier(estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
            metric_params=None, n_jobs=1, n_neighbors=5, p=2,
            weights='uniform'),
            n_jobs=1),
 'clf__estimator': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
            metric_params=None, n_jobs=1, n_neighbors=5, p=2,
            weights='uniform'),
 'clf__estimator__algorithm': 'auto',
 'clf__estimator__leaf_size': 30,
 'clf__estimator__metric': 'minkowski',
 'clf__estimator__metric_params': None,
 'clf__estimator__n_jobs': 1,
 'clf__estimator__n_neighbors': 5,
 'clf__estimator__p': 2,
 'clf__estimator__weights': 'uniform',
 'clf__n_jobs': 1,
 'memory': None,
 'steps': [('vect',
   CountVectorizer(analyzer='word', binary=False, decode_error='strict',
           dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
           lowercase=True, max_df=1.0, max_features=None, min_df=1,
           ngram_range=(1, 1), 

In [7]:
def build_model():
    pipeline = Pipeline([
        ('vect', CountVectorizer(tokenizer=tokenize)),
        ('tfidf', TfidfTransformer()),
        ('clf', MultiOutputClassifier(KNeighborsClassifier()))
    ])
    
    parameters = {
        'vect__ngram_range': ((1, 1), (1, 2)),
        #'vect__max_df': (0.5, 0.75, 1.0),
        #'vect__max_features': (None, 5000, 10000),
        #'tfidf__use_idf': (True, False)
        #'clf__estimator__n_neighbors': [i for i in range(4,6)],
        #'clf__estimator__p': [i for i in range(1,5)]
    }
    
    cv = GridSearchCV(pipeline, param_grid = parameters, scoring='f1_weighted', n_jobs=-1)
    
    return cv



In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [9]:
%%time

model = build_model()
model.fit(X_train, y_train)

JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/runpy.py in _run_module_as_main(mod_name='ipykernel.__main__', alter_argv=1)
    179         sys.exit(msg)
    180     main_globals = sys.modules["__main__"].__dict__
    181     if alter_argv:
    182         sys.argv[0] = mod_spec.origin
    183     return _run_code(code, main_globals, None,
--> 184                      "__main__", mod_spec)
        mod_spec = ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py')
    185 
    186 def run_module(mod_name, init_globals=None,
    187                run_name=None, alter_sys=False):
    188     """Execute a module's code without importing it

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/runpy.py in _run_code(code=<code object <module> at 0x103b780c0, file "/Use...3.5/site-packages/ipykernel/__main__.py", line 1>, run_globals={'__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/__pycache__/__main__.cpython-35.pyc', '__doc__': None, '__file__': '/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': 'ipykernel', '__spec__': ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/vicki.../python3.5/site-packages/ipykernel/kernelapp.py'>}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), pkg_name='ipykernel', script_name=None)
     80                        __cached__ = cached,
     81                        __doc__ = None,
     82                        __loader__ = loader,
     83                        __package__ = pkg_name,
     84                        __spec__ = mod_spec)
---> 85     exec(code, run_globals)
        code = <code object <module> at 0x103b780c0, file "/Use...3.5/site-packages/ipykernel/__main__.py", line 1>
        run_globals = {'__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/__pycache__/__main__.cpython-35.pyc', '__doc__': None, '__file__': '/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': 'ipykernel', '__spec__': ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/vicki.../python3.5/site-packages/ipykernel/kernelapp.py'>}
     86     return run_globals
     87 
     88 def _run_module_code(code, init_globals=None,
     89                     mod_name=None, mod_spec=None,

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py in <module>()
      1 if __name__ == '__main__':
      2     from ipykernel import kernelapp as app
----> 3     app.launch_new_instance()

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    648 
    649         If a global instance already exists, this reinitializes and starts it
    650         """
    651         app = cls.instance(**kwargs)
    652         app.initialize(argv)
--> 653         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    654 
    655 #-----------------------------------------------------------------------------
    656 # utility functions, for convenience
    657 #-----------------------------------------------------------------------------

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    469             return self.subapp.start()
    470         if self.poller is not None:
    471             self.poller.start()
    472         self.kernel.start()
    473         try:
--> 474             ioloop.IOLoop.instance().start()
    475         except KeyboardInterrupt:
    476             pass
    477 
    478 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    157             PollIOLoop.configure(ZMQIOLoop)
    158         return PollIOLoop.current(*args, **kwargs)
    159     
    160     def start(self):
    161         try:
--> 162             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    163         except ZMQError as e:
    164             if e.errno == ETERM:
    165                 # quietly return on ETERM
    166                 pass

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    882                 self._events.update(event_pairs)
    883                 while self._events:
    884                     fd, events = self._events.popitem()
    885                     try:
    886                         fd_obj, handler_func = self._handlers[fd]
--> 887                         handler_func(fd_obj, events)
        handler_func = <function wrap.<locals>.null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    888                     except (OSError, IOError) as e:
    889                         if errno_from_exception(e) == errno.EPIPE:
    890                             # Happens when the client closes the connection
    891                             pass

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function wrap.<locals>.null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function wrap.<locals>.null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    271         if self.control_stream:
    272             self.control_stream.on_recv(self.dispatch_control, copy=False)
    273 
    274         def make_dispatcher(stream):
    275             def dispatcher(msg):
--> 276                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    277             return dispatcher
    278 
    279         for s in self.shell_streams:
    280             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': '%%time\n\nmodel = build_model()\nmodel.fit(X_train, y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2019-12-15T23:29:08.543633', 'msg_id': '3AB31A0C56C347628CB5D4A6DBE9E97E', 'msg_type': 'execute_request', 'session': '8DE3ADEDCD244D4CA5BE5D4421A2F445', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '3AB31A0C56C347628CB5D4A6DBE9E97E', 'msg_type': 'execute_request', 'parent_header': {}})
    223             self.log.error("UNKNOWN MESSAGE TYPE: %r", msg_type)
    224         else:
    225             self.log.debug("%s: %s", msg_type, msg)
    226             self.pre_handler_hook()
    227             try:
--> 228                 handler(stream, idents, msg)
        handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = [b'8DE3ADEDCD244D4CA5BE5D4421A2F445']
        msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': '%%time\n\nmodel = build_model()\nmodel.fit(X_train, y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2019-12-15T23:29:08.543633', 'msg_id': '3AB31A0C56C347628CB5D4A6DBE9E97E', 'msg_type': 'execute_request', 'session': '8DE3ADEDCD244D4CA5BE5D4421A2F445', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '3AB31A0C56C347628CB5D4A6DBE9E97E', 'msg_type': 'execute_request', 'parent_header': {}}
    229             except Exception:
    230                 self.log.error("Exception in message handler:", exc_info=True)
    231             finally:
    232                 self.post_handler_hook()

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'8DE3ADEDCD244D4CA5BE5D4421A2F445'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': '%%time\n\nmodel = build_model()\nmodel.fit(X_train, y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2019-12-15T23:29:08.543633', 'msg_id': '3AB31A0C56C347628CB5D4A6DBE9E97E', 'msg_type': 'execute_request', 'session': '8DE3ADEDCD244D4CA5BE5D4421A2F445', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '3AB31A0C56C347628CB5D4A6DBE9E97E', 'msg_type': 'execute_request', 'parent_header': {}})
    385         if not silent:
    386             self.execution_count += 1
    387             self._publish_execute_input(code, parent, self.execution_count)
    388 
    389         reply_content = self.do_execute(code, silent, store_history,
--> 390                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    391 
    392         # Flush output before sending the reply.
    393         sys.stdout.flush()
    394         sys.stderr.flush()

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code='%%time\n\nmodel = build_model()\nmodel.fit(X_train, y_train)', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = '%%time\n\nmodel = build_model()\nmodel.fit(X_train, y_train)'
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=('%%time\n\nmodel = build_model()\nmodel.fit(X_train, y_train)',), **kwargs={'silent': False, 'store_history': True})
    496             )
    497         self.payload_manager.write_payload(payload)
    498 
    499     def run_cell(self, *args, **kwargs):
    500         self._last_traceback = None
--> 501         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = ('%%time\n\nmodel = build_model()\nmodel.fit(X_train, y_train)',)
        kwargs = {'silent': False, 'store_history': True}
    502 
    503     def _showtraceback(self, etype, evalue, stb):
    504         # try to preserve ordering of tracebacks and print statements
    505         sys.stdout.flush()

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell='%%time\n\nmodel = build_model()\nmodel.fit(X_train, y_train)', store_history=True, silent=False, shell_futures=True)
   2712                 self.displayhook.exec_result = result
   2713 
   2714                 # Execute the user code
   2715                 interactivity = "none" if silent else self.ast_node_interactivity
   2716                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2717                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler object>
   2718                 
   2719                 self.last_execution_succeeded = not has_raised
   2720 
   2721                 # Reset this so later displayed values do not modify the

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Expr object>], cell_name='<ipython-input-9-fcaada12a090>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<ExecutionResult object at 1238d90b8, execution_..._before_exec=None error_in_exec=None result=None>)
   2822                     return True
   2823 
   2824             for i, node in enumerate(to_run_interactive):
   2825                 mod = ast.Interactive([node])
   2826                 code = compiler(mod, cell_name, "single")
-> 2827                 if self.run_code(code, result):
        self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x1251a4ed0, file "<ipython-input-9-fcaada12a090>", line 1>
        result = <ExecutionResult object at 1238d90b8, execution_..._before_exec=None error_in_exec=None result=None>
   2828                     return True
   2829 
   2830             # Flush softspace
   2831             if softspace(sys.stdout, 0):

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x1251a4ed0, file "<ipython-input-9-fcaada12a090>", line 1>, result=<ExecutionResult object at 1238d90b8, execution_..._before_exec=None error_in_exec=None result=None>)
   2876         outflag = 1  # happens in more places, so it's easier as default
   2877         try:
   2878             try:
   2879                 self.hooks.pre_run_code_hook()
   2880                 #rprint('Running code', repr(code_obj)) # dbg
-> 2881                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x1251a4ed0, file "<ipython-input-9-fcaada12a090>", line 1>
        self.user_global_ns = {'CountVectorizer': <class 'sklearn.feature_extraction.text.CountVectorizer'>, 'FeatureUnion': <class 'sklearn.pipeline.FeatureUnion'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', '# import libraries\nimport pandas as pd\nfrom sqla...rom sklearn.externals import joblib\nimport pickle', "categories = ['related', 'request', 'offer', 'ai...quake', 'cold', 'other_weather', 'direct_report']", '# load data from database\nengine = create_engine...]\nX = df.message.values\ny = df[categories].values', 'X.shape', 'def tokenize(text):\n    # Normalize text\n    tex...pend(clean_tok)\n    \n    return clean_tokens\n    ', "def build_model():\n    pipeline = Pipeline([\n   ...ighted', n_jobs=-1, verbose=1)\n    \n    return cv", "def build_model():\n    pipeline = Pipeline([\n   ...ring='f1_weighted', n_jobs=-1)\n    \n    return cv", 'X_train, X_test, y_train, y_test = train_test_split(X, y)', r"get_ipython().run_cell_magic('time', '', '\nmodel = build_model()\nmodel.fit(X_train, y_train)')"], 'KNeighborsClassifier': <class 'sklearn.neighbors.classification.KNeighborsClassifier'>, 'MultiOutputClassifier': <class 'sklearn.multioutput.MultiOutputClassifier'>, 'Out': {4: (26180,)}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'PorterStemmer': <class 'nltk.stem.porter.PorterStemmer'>, 'TfidfTransformer': <class 'sklearn.feature_extraction.text.TfidfTransformer'>, ...}
        self.user_ns = {'CountVectorizer': <class 'sklearn.feature_extraction.text.CountVectorizer'>, 'FeatureUnion': <class 'sklearn.pipeline.FeatureUnion'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', '# import libraries\nimport pandas as pd\nfrom sqla...rom sklearn.externals import joblib\nimport pickle', "categories = ['related', 'request', 'offer', 'ai...quake', 'cold', 'other_weather', 'direct_report']", '# load data from database\nengine = create_engine...]\nX = df.message.values\ny = df[categories].values', 'X.shape', 'def tokenize(text):\n    # Normalize text\n    tex...pend(clean_tok)\n    \n    return clean_tokens\n    ', "def build_model():\n    pipeline = Pipeline([\n   ...ighted', n_jobs=-1, verbose=1)\n    \n    return cv", "def build_model():\n    pipeline = Pipeline([\n   ...ring='f1_weighted', n_jobs=-1)\n    \n    return cv", 'X_train, X_test, y_train, y_test = train_test_split(X, y)', r"get_ipython().run_cell_magic('time', '', '\nmodel = build_model()\nmodel.fit(X_train, y_train)')"], 'KNeighborsClassifier': <class 'sklearn.neighbors.classification.KNeighborsClassifier'>, 'MultiOutputClassifier': <class 'sklearn.multioutput.MultiOutputClassifier'>, 'Out': {4: (26180,)}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'PorterStemmer': <class 'nltk.stem.porter.PorterStemmer'>, 'TfidfTransformer': <class 'sklearn.feature_extraction.text.TfidfTransformer'>, ...}
   2882             finally:
   2883                 # Reset our crash handler in place
   2884                 sys.excepthook = old_excepthook
   2885         except SystemExit as e:

...........................................................................
/Users/vickieliu/Documents/Git/DataScientist_Udacity/DSND_Disaster_Response_Pipelines/<ipython-input-9-fcaada12a090> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', '\nmodel = build_model()\nmodel.fit(X_train, y_train)')

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, magic_name='time', line='', cell='\nmodel = build_model()\nmodel.fit(X_train, y_train)')
   2110             # This will need to be updated if the internal calling logic gets
   2111             # refactored, or else we'll be expanding the wrong variables.
   2112             stack_depth = 2
   2113             magic_arg_s = self.var_expand(line, stack_depth)
   2114             with self.builtin_trap:
-> 2115                 result = fn(magic_arg_s, cell)
        result = undefined
        fn = <bound method ExecutionMagics.time of <IPython.core.magics.execution.ExecutionMagics object>>
        magic_arg_s = ''
        cell = '\nmodel = build_model()\nmodel.fit(X_train, y_train)'
   2116             return result
   2117 
   2118     def find_line_magic(self, magic_name):
   2119         """Find and return a line magic by name.

...........................................................................
/Users/vickieliu/Documents/Git/DataScientist_Udacity/DSND_Disaster_Response_Pipelines/<decorator-gen-59> in time(self=<IPython.core.magics.execution.ExecutionMagics object>, line='', cell='\nmodel = build_model()\nmodel.fit(X_train, y_train)', local_ns=None)

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f=<function ExecutionMagics.time>, *a=(<IPython.core.magics.execution.ExecutionMagics object>, '', '\nmodel = build_model()\nmodel.fit(X_train, y_train)', None), **k={})
    183     validate_type(magic_kind)
    184 
    185     # This is a closure to capture the magic_kind.  We could also use a class,
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
        f = <function ExecutionMagics.time>
        a = (<IPython.core.magics.execution.ExecutionMagics object>, '', '\nmodel = build_model()\nmodel.fit(X_train, y_train)', None)
        k = {}
    189 
    190         if callable(arg):
    191             # "Naked" decorator call (just @foo, no args)
    192             func = arg

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in time(self=<IPython.core.magics.execution.ExecutionMagics object>, line='', cell='\nmodel = build_model()\nmodel.fit(X_train, y_train)', local_ns=None)
   1175             st = clock2()
   1176             out = eval(code, glob, local_ns)
   1177             end = clock2()
   1178         else:
   1179             st = clock2()
-> 1180             exec(code, glob, local_ns)
        code = <code object <module> at 0x1238de030, file "<timed exec>", line 2>
        glob = {'CountVectorizer': <class 'sklearn.feature_extraction.text.CountVectorizer'>, 'FeatureUnion': <class 'sklearn.pipeline.FeatureUnion'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', '# import libraries\nimport pandas as pd\nfrom sqla...rom sklearn.externals import joblib\nimport pickle', "categories = ['related', 'request', 'offer', 'ai...quake', 'cold', 'other_weather', 'direct_report']", '# load data from database\nengine = create_engine...]\nX = df.message.values\ny = df[categories].values', 'X.shape', 'def tokenize(text):\n    # Normalize text\n    tex...pend(clean_tok)\n    \n    return clean_tokens\n    ', "def build_model():\n    pipeline = Pipeline([\n   ...ighted', n_jobs=-1, verbose=1)\n    \n    return cv", "def build_model():\n    pipeline = Pipeline([\n   ...ring='f1_weighted', n_jobs=-1)\n    \n    return cv", 'X_train, X_test, y_train, y_test = train_test_split(X, y)', r"get_ipython().run_cell_magic('time', '', '\nmodel = build_model()\nmodel.fit(X_train, y_train)')"], 'KNeighborsClassifier': <class 'sklearn.neighbors.classification.KNeighborsClassifier'>, 'MultiOutputClassifier': <class 'sklearn.multioutput.MultiOutputClassifier'>, 'Out': {4: (26180,)}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'PorterStemmer': <class 'nltk.stem.porter.PorterStemmer'>, 'TfidfTransformer': <class 'sklearn.feature_extraction.text.TfidfTransformer'>, ...}
        local_ns = None
   1181             end = clock2()
   1182             out = None
   1183         wall_end = wtime()
   1184         # Compute actual times and report

...........................................................................
/Users/vickieliu/Documents/Git/DataScientist_Udacity/DSND_Disaster_Response_Pipelines/<timed exec> in <module>()

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_search.py in fit(self=GridSearchCV(cv=None, error_score='raise',
     ...re=True,
       scoring='f1_weighted', verbose=0), X=array(['Another quick trip to Santiago..',
     ...da , santa cruz (: ja colei , rs'], dtype=object), y=array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0,..., ..., 0, 0, 0],
       [2, 0, 0, ..., 0, 0, 0]]), groups=None, **fit_params={})
    633                                   return_train_score=self.return_train_score,
    634                                   return_n_test_samples=True,
    635                                   return_times=True, return_parameters=False,
    636                                   error_score=self.error_score)
    637           for parameters, (train, test) in product(candidate_params,
--> 638                                                    cv.split(X, y, groups)))
        cv.split = <bound method _BaseKFold.split of KFold(n_splits=3, random_state=None, shuffle=False)>
        X = array(['Another quick trip to Santiago..',
     ...da , santa cruz (: ja colei , rs'], dtype=object)
        y = array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0,..., ..., 0, 0, 0],
       [2, 0, 0, ..., 0, 0, 0]])
        groups = None
    639 
    640         # if one choose to see train score, "out" will contain train score info
    641         if self.return_train_score:
    642             (train_score_dicts, test_score_dicts, test_sample_counts, fit_time,

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object BaseSearchCV.fit.<locals>.<genexpr>>)
    784             if pre_dispatch == "all" or n_jobs == 1:
    785                 # The iterable was consumed all at once by the above for loop.
    786                 # No need to wait for async callbacks to trigger to
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time
    792             self._print('Done %3i out of %3i | elapsed: %s finished',
    793                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Sun Dec 15 23:32:48 2019
PID: 59244               Python 3.5.2: /Users/vickieliu/anaconda/bin/python
...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _fit_and_score>, (Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))]), array(['Another quick trip to Santiago..',
     ...da , santa cruz (: ja colei , rs'], dtype=object), memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [2, 0, 0, ..., 0, 0, 0]]), {'score': make_scorer(f1_score, pos_label=None, average=weighted)}, array([ 6545,  6546,  6547, ..., 19632, 19633, 19634]), array([   0,    1,    2, ..., 6542, 6543, 6544]), 0, {'vect__ngram_range': (1, 1)}), {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': False, 'return_times': True, 'return_train_score': True})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))]), array(['Another quick trip to Santiago..',
     ...da , santa cruz (: ja colei , rs'], dtype=object), memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [2, 0, 0, ..., 0, 0, 0]]), {'score': make_scorer(f1_score, pos_label=None, average=weighted)}, array([ 6545,  6546,  6547, ..., 19632, 19633, 19634]), array([   0,    1,    2, ..., 6542, 6543, 6544]), 0, {'vect__ngram_range': (1, 1)})
        kwargs = {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': False, 'return_times': True, 'return_train_score': True}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator=Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))]), X=array(['Another quick trip to Santiago..',
     ...da , santa cruz (: ja colei , rs'], dtype=object), y=memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [2, 0, 0, ..., 0, 0, 0]]), scorer={'score': make_scorer(f1_score, pos_label=None, average=weighted)}, train=array([ 6545,  6546,  6547, ..., 19632, 19633, 19634]), test=array([   0,    1,    2, ..., 6542, 6543, 6544]), verbose=0, parameters={'vect__ngram_range': (1, 1)}, fit_params={}, return_train_score=True, return_parameters=False, return_n_test_samples=True, return_times=True, error_score='raise')
    462                              " make sure that it has been spelled correctly.)")
    463 
    464     else:
    465         fit_time = time.time() - start_time
    466         # _score will return dict if is_multimetric is True
--> 467         test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric)
        test_scores = {}
        estimator = Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))])
        X_test = array(['Another quick trip to Santiago..',
     ... Saddening news about earthquake'], dtype=object)
        y_test = memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]])
        scorer = {'score': make_scorer(f1_score, pos_label=None, average=weighted)}
        is_multimetric = True
    468         score_time = time.time() - start_time - fit_time
    469         if return_train_score:
    470             train_scores = _score(estimator, X_train, y_train, scorer,
    471                                   is_multimetric)

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_validation.py in _score(estimator=Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))]), X_test=array(['Another quick trip to Santiago..',
     ... Saddening news about earthquake'], dtype=object), y_test=memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]]), scorer={'score': make_scorer(f1_score, pos_label=None, average=weighted)}, is_multimetric=True)
    497 
    498     Will return a single float if is_multimetric is False and a dict of floats,
    499     if is_multimetric is True
    500     """
    501     if is_multimetric:
--> 502         return _multimetric_score(estimator, X_test, y_test, scorer)
        estimator = Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))])
        X_test = array(['Another quick trip to Santiago..',
     ... Saddening news about earthquake'], dtype=object)
        y_test = memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]])
        scorer = {'score': make_scorer(f1_score, pos_label=None, average=weighted)}
    503     else:
    504         if y_test is None:
    505             score = scorer(estimator, X_test)
    506         else:

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_validation.py in _multimetric_score(estimator=Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))]), X_test=array(['Another quick trip to Santiago..',
     ... Saddening news about earthquake'], dtype=object), y_test=memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]]), scorers={'score': make_scorer(f1_score, pos_label=None, average=weighted)})
    527 
    528     for name, scorer in scorers.items():
    529         if y_test is None:
    530             score = scorer(estimator, X_test)
    531         else:
--> 532             score = scorer(estimator, X_test, y_test)
        score = undefined
        scorer = make_scorer(f1_score, pos_label=None, average=weighted)
        estimator = Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))])
        X_test = array(['Another quick trip to Santiago..',
     ... Saddening news about earthquake'], dtype=object)
        y_test = memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]])
    533 
    534         if hasattr(score, 'item'):
    535             try:
    536                 # e.g. unwrap memmapped scalars

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/metrics/scorer.py in __call__(self=make_scorer(f1_score, pos_label=None, average=weighted), estimator=Pipeline(memory=None,
     steps=[('vect', Count...      weights='uniform'),
           n_jobs=1))]), X=array(['Another quick trip to Santiago..',
     ... Saddening news about earthquake'], dtype=object), y_true=memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]]), sample_weight=None)
    103             return self._sign * self._score_func(y_true, y_pred,
    104                                                  sample_weight=sample_weight,
    105                                                  **self._kwargs)
    106         else:
    107             return self._sign * self._score_func(y_true, y_pred,
--> 108                                                  **self._kwargs)
        self._kwargs = {'average': 'weighted', 'pos_label': None}
    109 
    110 
    111 class _ProbaScorer(_BaseScorer):
    112     def __call__(self, clf, X, y, sample_weight=None):

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py in f1_score(y_true=memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]]), y_pred=array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0,..., ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0]]), labels=None, pos_label=None, average='weighted', sample_weight=None)
    709 
    710 
    711     """
    712     return fbeta_score(y_true, y_pred, 1, labels=labels,
    713                        pos_label=pos_label, average=average,
--> 714                        sample_weight=sample_weight)
        sample_weight = None
    715 
    716 
    717 def fbeta_score(y_true, y_pred, beta, labels=None, pos_label=1,
    718                 average='binary', sample_weight=None):

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py in fbeta_score(y_true=memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]]), y_pred=array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0,..., ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0]]), beta=1, labels=None, pos_label=None, average='weighted', sample_weight=None)
    823                                                  beta=beta,
    824                                                  labels=labels,
    825                                                  pos_label=pos_label,
    826                                                  average=average,
    827                                                  warn_for=('f-score',),
--> 828                                                  sample_weight=sample_weight)
        sample_weight = None
    829     return f
    830 
    831 
    832 def _prf_divide(numerator, denominator, metric, modifier, average, warn_for):

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py in precision_recall_fscore_support(y_true=memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]]), y_pred=array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0,..., ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0]]), beta=1, labels=None, pos_label=None, average='weighted', warn_for=('f-score',), sample_weight=None)
   1020         raise ValueError('average has to be one of ' +
   1021                          str(average_options))
   1022     if beta <= 0:
   1023         raise ValueError("beta should be >0 in the F-beta score")
   1024 
-> 1025     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
        y_type = undefined
        y_true = memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]])
        y_pred = array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0,..., ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0]])
   1026     present_labels = unique_labels(y_true, y_pred)
   1027 
   1028     if average == 'binary':
   1029         if y_type == 'binary':

...........................................................................
/Users/vickieliu/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py in _check_targets(y_true=memmap([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, ... ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]]), y_pred=array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0,..., ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0]]))
     83     # We can't have more than one value on y_type => The set is no more needed
     84     y_type = y_type.pop()
     85 
     86     # No metrics support "multiclass-multioutput" format
     87     if (y_type not in ["binary", "multiclass", "multilabel-indicator"]):
---> 88         raise ValueError("{0} is not supported".format(y_type))
        y_type = 'multiclass-multioutput'
     89 
     90     if y_type in ["binary", "multiclass"]:
     91         y_true = column_or_1d(y_true)
     92         y_pred = column_or_1d(y_pred)

ValueError: multiclass-multioutput is not supported
___________________________________________________________________________

In [11]:
model.best_estimator_

AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'

In [None]:
cv.best_score_

In [None]:
cv.best_params_

In [None]:
y_pred = cv.predict(X_test)

### 7. 测试模型
打印微调后的模型的精确度、准确率和召回率。  

因为本项目主要关注代码质量、开发流程和管道技术，所有没有模型性能指标的最低要求。但是，微调模型提高精确度、准确率和召回率可以让你的项目脱颖而出——特别是让你的简历更出彩。

In [None]:
for i in range(0,35):
    print("Categories:", categories[i])
    print(classification_report(y_test[:,i], y_pred[:,i]))

### 8. 继续优化模型，比如：
* 尝试其他的机器学习算法
* 尝试除 TF-IDF 外其他的特征

### 9. 导出模型为 pickle file

In [None]:
# 保存至本地磁盘
with open('model.pkl', 'wb') as file:
    pickle.dump(clf, file)


In [None]:
# 从本地磁盘加载模型
with open('model.pkl', 'rb') as file:
    model_joblib = pickle.load(file)
# 加载出来的模型可以进行predict等功能
print(model_joblib.predict([[4, 6, 10]]))

### 10. Use this notebook to complete `train.py`
使用资源 (Resources)文件里附带的模板文件编写脚本，运行上述步骤，创建一个数据库，并基于用户指定的新数据集输出一个模型。