# SVM classifier

This notebook trains a Support Vector Machine (with a linear kernel) to identify relevant tweets (POS).

We use scikit-learn's implementation of SVM and its cross validation tools. http://scikit-learn.org/

## Installation

To install all of the python dependencies for this notbook in a virtual environment:

```bash
# create environment in directory named 'venv'
python -m venv venv
# or:
# virtualenv venv

# activate environment
source venv/bin/activate

# install dependencies
pip3 install -r requirements.txt
```

In [9]:
from class_utils import *
import pickle
import numpy as np

from nltk.tokenize.casual import casual_tokenize
from nltk import word_tokenize

from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, fbeta_score
from sklearn.preprocessing import LabelBinarizer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, train_test_split

In [10]:
# globals
iteration="iter3a"
#model_filename = "models/best_svc_{}.pickle".format(iteration)

## Parse data sets

Here we parse data from our training files, and then randomly select a portion to be held out for evaluation. The training set is used to both train the SVM classifier and select parameters using k-fold cross validation.

The `parse_training_data()` function is provided in the external `class_utils.py` file.

In [11]:
# parse data from files
classes = ['NEG', 'POS']
docs, targets = parse_training_data(['NEG-{}.txt'.format(iteration), 'POS-{}.txt'.format(iteration)], classes)

# convert the targets array of strings to binary labels (0=NEG, 1=POS)
lb = LabelBinarizer(sparse_output=False)
lb.fit(classes)
bin_targets = lb.transform(targets).ravel()

# split data set into to training and evaluation sets
# X_test/y_test are held out and not used during the
# k-fold training and parameter search below
#
# The percentage of samples to hod out is determined by the `test_size`
# parameter
# for this iter2, the holdout is only going to be 10% 
X_train, X_test, y_train, y_test = train_test_split(
    docs, bin_targets, test_size=0.10, random_state=0)

## Create sklearn pipeline

Here we setup a scikit-learn pipeline to create vectors from our training sample vocabulary (`CountVectorizer`), normalize words based on frequency (`TfidfTransformer`), and train a SVM classifier (`SVC`). http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

We evaluate parameters based on th `fscore_prec` which is a weighted fscore which favors precision (beta < 1). We also calculate accuracy, precision, recall, and f1 scores for each of the k-fold training sessions.

Using a pipeline makes it easy to search a range of hyperparameters using sklearn's `GridSearchCV`. http://scikit-learn.org/stable/modules/grid_search.html

In [12]:
svc_pl = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SVC(kernel='linear')),
])

parameters = {
    'vect__preprocessor': [normalize_tweet],#[normalize_tweet, normalize_simple, None],
    'vect__max_df': np.linspace(0.3, 1.0, 10),  #'vect__tokenizer': [casual_tokenize, word_tokenize, None]
    'vect__stopwords': ['english', None],
    'vect__ngram_range': [(1, 1), (1, 2), (1,3)],  # largest n-gram
    'tfidf__use_idf':[(True, False)],# (True, False), #DEFAULT
}


# define the scores we want to calcualte during each k-fold training
fscore_prec = make_scorer(fbeta_score, beta=0)
scoring = {
    'accuracy': 'accuracy',
    'precision': 'precision',
    'recall': 'recall',
    'f1': 'f1',
    'fscore_prec': fscore_prec
}

# create the GridSearchCV object.
# by setting refit='fscore_prec', the model which maximizes that score
# will be selected and retrained on all training data.
svc_search = GridSearchCV(svc_pl, parameters, n_jobs=-1, verbose=1, scoring=scoring, refit='fscore_prec')

In [13]:
# Here we do the actual training
# Can take several minutes depending on the range of parameters given
# int he parameters dict above
svc_search.fit(X_train, y_train)

Fitting 3 folds for each of 60 candidates, totalling 180 fits


JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py in _run_module_as_main(mod_name='ipykernel_launcher', alter_argv=1)
    179         sys.exit(msg)
    180     main_globals = sys.modules["__main__"].__dict__
    181     if alter_argv:
    182         sys.argv[0] = mod_spec.origin
    183     return _run_code(code, main_globals, None,
--> 184                      "__main__", mod_spec)
        mod_spec = ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.5/site-packages/ipykernel_launcher.py')
    185 
    186 def run_module(mod_name, init_globals=None,
    187                run_name=None, alter_sys=False):
    188     """Execute a module's code without importing it

...........................................................................
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py in _run_code(code=<code object <module> at 0x103b468a0, file "/Use...3.5/site-packages/ipykernel_launcher.py", line 5>, run_globals={'__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/amyburkhardt/Documents/NLP/venv/lib/pytho...ges/__pycache__/ipykernel_launcher.cpython-35.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.5/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/amybu.../python3.5/site-packages/ipykernel/kernelapp.py'>, 'sys': <module 'sys' (built-in)>}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.5/site-packages/ipykernel_launcher.py'), pkg_name='', script_name=None)
     80                        __cached__ = cached,
     81                        __doc__ = None,
     82                        __loader__ = loader,
     83                        __package__ = pkg_name,
     84                        __spec__ = mod_spec)
---> 85     exec(code, run_globals)
        code = <code object <module> at 0x103b468a0, file "/Use...3.5/site-packages/ipykernel_launcher.py", line 5>
        run_globals = {'__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/amyburkhardt/Documents/NLP/venv/lib/pytho...ges/__pycache__/ipykernel_launcher.cpython-35.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.5/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/amybu.../python3.5/site-packages/ipykernel/kernelapp.py'>, 'sys': <module 'sys' (built-in)>}
     86     return run_globals
     87 
     88 def _run_module_code(code, init_globals=None,
     89                     mod_name=None, mod_spec=None,

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel_launcher.py in <module>()
     11     # This is added back by InteractiveShellApp.init_path()
     12     if sys.path[0] == '':
     13         del sys.path[0]
     14 
     15     from ipykernel import kernelapp as app
---> 16     app.launch_new_instance()

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    653 
    654         If a global instance already exists, this reinitializes and starts it
    655         """
    656         app = cls.instance(**kwargs)
    657         app.initialize(argv)
--> 658         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    659 
    660 #-----------------------------------------------------------------------------
    661 # utility functions, for convenience
    662 #-----------------------------------------------------------------------------

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    472             return self.subapp.start()
    473         if self.poller is not None:
    474             self.poller.start()
    475         self.kernel.start()
    476         try:
--> 477             ioloop.IOLoop.instance().start()
    478         except KeyboardInterrupt:
    479             pass
    480 
    481 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    172             )
    173         return loop
    174     
    175     def start(self):
    176         try:
--> 177             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    178         except ZMQError as e:
    179             if e.errno == ETERM:
    180                 # quietly return on ETERM
    181                 pass

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    883                 self._events.update(event_pairs)
    884                 while self._events:
    885                     fd, events = self._events.popitem()
    886                     try:
    887                         fd_obj, handler_func = self._handlers[fd]
--> 888                         handler_func(fd_obj, events)
        handler_func = <function wrap.<locals>.null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    889                     except (OSError, IOError) as e:
    890                         if errno_from_exception(e) == errno.EPIPE:
    891                             # Happens when the client closes the connection
    892                             pass

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function wrap.<locals>.null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function wrap.<locals>.null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    278         if self.control_stream:
    279             self.control_stream.on_recv(self.dispatch_control, copy=False)
    280 
    281         def make_dispatcher(stream):
    282             def dispatcher(msg):
--> 283                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    284             return dispatcher
    285 
    286         for s in self.shell_streams:
    287             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': '# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2019, 3, 14, 22, 11, 32, 438795, tzinfo=tzutc()), 'msg_id': '42D72593F7144ED6823E1A7D35547E61', 'msg_type': 'execute_request', 'session': '7C7C135E47CD4E588E9888A52C8AAD09', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '42D72593F7144ED6823E1A7D35547E61', 'msg_type': 'execute_request', 'parent_header': {}})
    230             self.log.warn("Unknown message type: %r", msg_type)
    231         else:
    232             self.log.debug("%s: %s", msg_type, msg)
    233             self.pre_handler_hook()
    234             try:
--> 235                 handler(stream, idents, msg)
        handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = [b'7C7C135E47CD4E588E9888A52C8AAD09']
        msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': '# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2019, 3, 14, 22, 11, 32, 438795, tzinfo=tzutc()), 'msg_id': '42D72593F7144ED6823E1A7D35547E61', 'msg_type': 'execute_request', 'session': '7C7C135E47CD4E588E9888A52C8AAD09', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '42D72593F7144ED6823E1A7D35547E61', 'msg_type': 'execute_request', 'parent_header': {}}
    236             except Exception:
    237                 self.log.error("Exception in message handler:", exc_info=True)
    238             finally:
    239                 self.post_handler_hook()

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'7C7C135E47CD4E588E9888A52C8AAD09'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': '# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2019, 3, 14, 22, 11, 32, 438795, tzinfo=tzutc()), 'msg_id': '42D72593F7144ED6823E1A7D35547E61', 'msg_type': 'execute_request', 'session': '7C7C135E47CD4E588E9888A52C8AAD09', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '42D72593F7144ED6823E1A7D35547E61', 'msg_type': 'execute_request', 'parent_header': {}})
    394         if not silent:
    395             self.execution_count += 1
    396             self._publish_execute_input(code, parent, self.execution_count)
    397 
    398         reply_content = self.do_execute(code, silent, store_history,
--> 399                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    400 
    401         # Flush output before sending the reply.
    402         sys.stdout.flush()
    403         sys.stderr.flush()

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code='# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = '# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)'
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=('# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)',), **kwargs={'silent': False, 'store_history': True})
    528             )
    529         self.payload_manager.write_payload(payload)
    530 
    531     def run_cell(self, *args, **kwargs):
    532         self._last_traceback = None
--> 533         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = ('# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)',)
        kwargs = {'silent': False, 'store_history': True}
    534 
    535     def _showtraceback(self, etype, evalue, stb):
    536         # try to preserve ordering of tracebacks and print statements
    537         sys.stdout.flush()

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell='# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)', store_history=True, silent=False, shell_futures=True)
   2693                 self.displayhook.exec_result = result
   2694 
   2695                 # Execute the user code
   2696                 interactivity = "none" if silent else self.ast_node_interactivity
   2697                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2698                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler object>
   2699                 
   2700                 self.last_execution_succeeded = not has_raised
   2701 
   2702                 # Reset this so later displayed values do not modify the

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Expr object>], cell_name='<ipython-input-13-e89dda0d6910>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<ExecutionResult object at 11778a828, execution_..._before_exec=None error_in_exec=None result=None>)
   2803                     return True
   2804 
   2805             for i, node in enumerate(to_run_interactive):
   2806                 mod = ast.Interactive([node])
   2807                 code = compiler(mod, cell_name, "single")
-> 2808                 if self.run_code(code, result):
        self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x11779a930, file "<ipython-input-13-e89dda0d6910>", line 4>
        result = <ExecutionResult object at 11778a828, execution_..._before_exec=None error_in_exec=None result=None>
   2809                     return True
   2810 
   2811             # Flush softspace
   2812             if softspace(sys.stdout, 0):

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x11779a930, file "<ipython-input-13-e89dda0d6910>", line 4>, result=<ExecutionResult object at 11778a828, execution_..._before_exec=None error_in_exec=None result=None>)
   2857         outflag = True  # happens in more places, so it's easier as default
   2858         try:
   2859             try:
   2860                 self.hooks.pre_run_code_hook()
   2861                 #rprint('Running code', repr(code_obj)) # dbg
-> 2862                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x11779a930, file "<ipython-input-13-e89dda0d6910>", line 4>
        self.user_global_ns = {'CountVectorizer': <class 'sklearn.feature_extraction.text.CountVectorizer'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', 'from class_utils import *\nimport pickle\nimport n...l_selection import GridSearchCV, train_test_split', '# globals\niteration="iter3a"\n#model_filename = "models/best_svc_{}.pickle".format(iteration)', "# parse data from files\nclasses = ['NEG', 'POS']...ocs, bin_targets, test_size=0.10, random_state=0)", 'from class_utils import *\nimport pickle\nimport n...l_selection import GridSearchCV, train_test_split', '# globals\niteration="iter3a"\n#model_filename = "models/best_svc_{}.pickle".format(iteration)', "# parse data from files\nclasses = ['NEG', 'POS']...ocs, bin_targets, test_size=0.10, random_state=0)", "svc_pl = Pipeline([\n    ('vect', CountVectorizer... verbose=1, scoring=scoring, refit='fscore_prec')", '# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)', 'from class_utils import *\nimport pickle\nimport n...l_selection import GridSearchCV, train_test_split', '# globals\niteration="iter3a"\n#model_filename = "models/best_svc_{}.pickle".format(iteration)', "# parse data from files\nclasses = ['NEG', 'POS']...ocs, bin_targets, test_size=0.10, random_state=0)", "svc_pl = Pipeline([\n    ('vect', CountVectorizer... verbose=1, scoring=scoring, refit='fscore_prec')", '# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)'], 'LabelBinarizer': <class 'sklearn.preprocessing.label.LabelBinarizer'>, 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'SVC': <class 'sklearn.svm.classes.SVC'>, 'TfidfTransformer': <class 'sklearn.feature_extraction.text.TfidfTransformer'>, 'X_test': ["Parents: Have ?s about the #PARCC test we're tak....parcconline.org/frequently-asked-questions�__�_\n", "Is it true that #hawaii parents can't opt out of...er Balanced Assessments? @HIDOE808 #OptOut #SpEd\n", '@LamoureuxRob @KarenMageeNYSUT @sarbetter That i...y. We need 2 make #RefuseTheTest NYSUT statement\n', 'Proud of your hard work on the #PARCC today. Two...gies watch the time & show them what you can do!\n', '@IdahoCore #StopCommonCore #OptOut #CommonCore s...has a bipartisan push to destroy it. #Washington\n', '0.1% (27 students) of #jeffparish students opt o...parish_families_o.html#incart_story_package�__�_\n', "#NJ's #communitycolleges strongly back #PARCC sc...parcc-tests-for-student-placement-1.1294635�__�_\n", '@NYSAPE Before I start supporting #optout I want... want to be respectful 2 supportive admin thanks\n', 'Good luck to @eagleacademypcs students this week on the #PARCC. #dccharter students rock!\n', '@LPBroward #broward #optout join us in #optout #...ps://www.facebook.com/groups/OptOutBroward/�__�_\n', 'A different way to evaluate teachers http://goo....tout #calloutcuomo #allkidsneed #respectteachers\n', '#NYSUT says parents have right to opt students o...dents-out-of-state-testing-20150315�__�_ #optout\n', '@FollowCSA @UFT Parent Opt Out from NYS ELA & Ma...vement_gain.html#incart_river_mobile�__�_ @nysut\n', 'Tisch: #optout is a _��terrible mistake._ѝ http:...Ub3aEd�__�_ #ThingsMerrylSays #protectourschools\n', 'Proud of the @FHSClass_2018 for facing day 1 of ...positive attitude we expected of you. #ClassActs\n', 'Math Consultant: Smarter Balanced Math Tests Hav..._ via @educationweek #edchat #cpchat #CCSS #SBAC\n', 'RT @MassStand: New @teachplusMA report shows 72%...is a better test than MCAS http://bit.ly/1Iq73ah\n', 'Tomorrow our 6th graders begin #Parcc testing. G...t a healthy breakfast and get to school on time!\n', 'As Common Core Testing Is Ushered In Parents and Students Opt Out http://nyti.ms/1zutBQo\n', "Alright boys and girls let's #PARCC\n", ...], 'X_train': ['@NorwichBulletin compares #SBAC to CEA\'s testing... delays education reform" http://shar.es/1ggVAN"\n', '@OHEducation Those poor kids. Adm you should be ...gwash disguised as assessment. Parents~ #OptOut!\n', 'High Stakes Testing is a waste if time and money...iswatching @NYGovCuomopic.twitter.com/QoxhLdYpCl\n', 'My students are braver and stronger and wiser th...nt will be able to capture. #SBAC Day 1 is over.\n', '#CO parent: Don_��t _��opt out_�� of transparenc...://bit.ly/1boSZ5L�_ via @edu_post #PARCC #edcolo\n', "So my third grader won't get to see her #PARCC e... Maybe next year we will suggest a #PARCCWALKOUT\n", '#BastaYa! Hay unas preguntas que no tienen logic...youtube.com/watch?v=94LQQz18MbI&app=desktop�__�_\n', '@15Warrenton Fantastic! No punitive sit and stare! #optout #morethanascore #whyIrefuse\n', "#PARCC victimhood and why it's overblown. @starledger editorial: http://bit.ly/1FtPB6f #nj\n", 'Teacher: #PARCC devours resources and valuable l..._resources_and_valuable_l.html#incart_river�__�_\n', "@CanibeYOUNG @Stoptesting15 That's why we all need to #OptOut\n", 'NM @Education_Stuff: CURMUDGUCATION: PA: All Abo...�__�__ѝ #poverty #nm #OptOut #CommonCore #PARCC"\n', "@pearson Teachers & parents will unite against y...kes tests don't negate freedom of speech. #PARCC\n", 'Parents want that free public edu 4 their kids b...t Out #CommonCore testing http://nyti.ms/1wDojSo\n', 'Why parents should support #PARCC and #SBAC @Hop...ould-welcome-the-new-common-core-tests.html�__�_\n', 'Pride in your child or student should never be d...optout #morethanascorepic.twitter.com/GmP5Huyu1n\n', 'Pearson has not actually released any validity s...es on #PARCC" - Cassie Cresswell #TestingSeason"\n', 'Go NewJersey! @NJOptOut: First NJ Refuse PARCC b...__ѝ NewMexico #OptOut #CommonCore #PARCC #nmpol"\n', 'We are officially finished with the #PARCC Perfo...Thanks to everyone for your support. #PohatPride\n', 'hey @ISBEnews no provision for lots of things in...ur right as parents & students! @ILRaiseYourHand\n', ...], ...}
        self.user_ns = {'CountVectorizer': <class 'sklearn.feature_extraction.text.CountVectorizer'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', 'from class_utils import *\nimport pickle\nimport n...l_selection import GridSearchCV, train_test_split', '# globals\niteration="iter3a"\n#model_filename = "models/best_svc_{}.pickle".format(iteration)', "# parse data from files\nclasses = ['NEG', 'POS']...ocs, bin_targets, test_size=0.10, random_state=0)", 'from class_utils import *\nimport pickle\nimport n...l_selection import GridSearchCV, train_test_split', '# globals\niteration="iter3a"\n#model_filename = "models/best_svc_{}.pickle".format(iteration)', "# parse data from files\nclasses = ['NEG', 'POS']...ocs, bin_targets, test_size=0.10, random_state=0)", "svc_pl = Pipeline([\n    ('vect', CountVectorizer... verbose=1, scoring=scoring, refit='fscore_prec')", '# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)', 'from class_utils import *\nimport pickle\nimport n...l_selection import GridSearchCV, train_test_split', '# globals\niteration="iter3a"\n#model_filename = "models/best_svc_{}.pickle".format(iteration)', "# parse data from files\nclasses = ['NEG', 'POS']...ocs, bin_targets, test_size=0.10, random_state=0)", "svc_pl = Pipeline([\n    ('vect', CountVectorizer... verbose=1, scoring=scoring, refit='fscore_prec')", '# Here we do the actual training\n# Can take seve...eters dict above\nsvc_search.fit(X_train, y_train)'], 'LabelBinarizer': <class 'sklearn.preprocessing.label.LabelBinarizer'>, 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'SVC': <class 'sklearn.svm.classes.SVC'>, 'TfidfTransformer': <class 'sklearn.feature_extraction.text.TfidfTransformer'>, 'X_test': ["Parents: Have ?s about the #PARCC test we're tak....parcconline.org/frequently-asked-questions�__�_\n", "Is it true that #hawaii parents can't opt out of...er Balanced Assessments? @HIDOE808 #OptOut #SpEd\n", '@LamoureuxRob @KarenMageeNYSUT @sarbetter That i...y. We need 2 make #RefuseTheTest NYSUT statement\n', 'Proud of your hard work on the #PARCC today. Two...gies watch the time & show them what you can do!\n', '@IdahoCore #StopCommonCore #OptOut #CommonCore s...has a bipartisan push to destroy it. #Washington\n', '0.1% (27 students) of #jeffparish students opt o...parish_families_o.html#incart_story_package�__�_\n', "#NJ's #communitycolleges strongly back #PARCC sc...parcc-tests-for-student-placement-1.1294635�__�_\n", '@NYSAPE Before I start supporting #optout I want... want to be respectful 2 supportive admin thanks\n', 'Good luck to @eagleacademypcs students this week on the #PARCC. #dccharter students rock!\n', '@LPBroward #broward #optout join us in #optout #...ps://www.facebook.com/groups/OptOutBroward/�__�_\n', 'A different way to evaluate teachers http://goo....tout #calloutcuomo #allkidsneed #respectteachers\n', '#NYSUT says parents have right to opt students o...dents-out-of-state-testing-20150315�__�_ #optout\n', '@FollowCSA @UFT Parent Opt Out from NYS ELA & Ma...vement_gain.html#incart_river_mobile�__�_ @nysut\n', 'Tisch: #optout is a _��terrible mistake._ѝ http:...Ub3aEd�__�_ #ThingsMerrylSays #protectourschools\n', 'Proud of the @FHSClass_2018 for facing day 1 of ...positive attitude we expected of you. #ClassActs\n', 'Math Consultant: Smarter Balanced Math Tests Hav..._ via @educationweek #edchat #cpchat #CCSS #SBAC\n', 'RT @MassStand: New @teachplusMA report shows 72%...is a better test than MCAS http://bit.ly/1Iq73ah\n', 'Tomorrow our 6th graders begin #Parcc testing. G...t a healthy breakfast and get to school on time!\n', 'As Common Core Testing Is Ushered In Parents and Students Opt Out http://nyti.ms/1zutBQo\n', "Alright boys and girls let's #PARCC\n", ...], 'X_train': ['@NorwichBulletin compares #SBAC to CEA\'s testing... delays education reform" http://shar.es/1ggVAN"\n', '@OHEducation Those poor kids. Adm you should be ...gwash disguised as assessment. Parents~ #OptOut!\n', 'High Stakes Testing is a waste if time and money...iswatching @NYGovCuomopic.twitter.com/QoxhLdYpCl\n', 'My students are braver and stronger and wiser th...nt will be able to capture. #SBAC Day 1 is over.\n', '#CO parent: Don_��t _��opt out_�� of transparenc...://bit.ly/1boSZ5L�_ via @edu_post #PARCC #edcolo\n', "So my third grader won't get to see her #PARCC e... Maybe next year we will suggest a #PARCCWALKOUT\n", '#BastaYa! Hay unas preguntas que no tienen logic...youtube.com/watch?v=94LQQz18MbI&app=desktop�__�_\n', '@15Warrenton Fantastic! No punitive sit and stare! #optout #morethanascore #whyIrefuse\n', "#PARCC victimhood and why it's overblown. @starledger editorial: http://bit.ly/1FtPB6f #nj\n", 'Teacher: #PARCC devours resources and valuable l..._resources_and_valuable_l.html#incart_river�__�_\n', "@CanibeYOUNG @Stoptesting15 That's why we all need to #OptOut\n", 'NM @Education_Stuff: CURMUDGUCATION: PA: All Abo...�__�__ѝ #poverty #nm #OptOut #CommonCore #PARCC"\n', "@pearson Teachers & parents will unite against y...kes tests don't negate freedom of speech. #PARCC\n", 'Parents want that free public edu 4 their kids b...t Out #CommonCore testing http://nyti.ms/1wDojSo\n', 'Why parents should support #PARCC and #SBAC @Hop...ould-welcome-the-new-common-core-tests.html�__�_\n', 'Pride in your child or student should never be d...optout #morethanascorepic.twitter.com/GmP5Huyu1n\n', 'Pearson has not actually released any validity s...es on #PARCC" - Cassie Cresswell #TestingSeason"\n', 'Go NewJersey! @NJOptOut: First NJ Refuse PARCC b...__ѝ NewMexico #OptOut #CommonCore #PARCC #nmpol"\n', 'We are officially finished with the #PARCC Perfo...Thanks to everyone for your support. #PohatPride\n', 'hey @ISBEnews no provision for lots of things in...ur right as parents & students! @ILRaiseYourHand\n', ...], ...}
   2863             finally:
   2864                 # Reset our crash handler in place
   2865                 sys.excepthook = old_excepthook
   2866         except SystemExit as e:

...........................................................................
/Users/amyburkhardt/Documents/NLP/Code/Classifiers-for-qual/classifier/<ipython-input-13-e89dda0d6910> in <module>()
      1 # Here we do the actual training
      2 # Can take several minutes depending on the range of parameters given
      3 # int he parameters dict above
----> 4 svc_search.fit(X_train, y_train)

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/model_selection/_search.py in fit(self=GridSearchCV(cv=None, error_score='raise',
     ...racy': 'accuracy', 'f1': 'f1'},
       verbose=1), X=['@NorwichBulletin compares #SBAC to CEA\'s testing... delays education reform" http://shar.es/1ggVAN"\n', '@OHEducation Those poor kids. Adm you should be ...gwash disguised as assessment. Parents~ #OptOut!\n', 'High Stakes Testing is a waste if time and money...iswatching @NYGovCuomopic.twitter.com/QoxhLdYpCl\n', 'My students are braver and stronger and wiser th...nt will be able to capture. #SBAC Day 1 is over.\n', '#CO parent: Don_��t _��opt out_�� of transparenc...://bit.ly/1boSZ5L�_ via @edu_post #PARCC #edcolo\n', "So my third grader won't get to see her #PARCC e... Maybe next year we will suggest a #PARCCWALKOUT\n", '#BastaYa! Hay unas preguntas que no tienen logic...youtube.com/watch?v=94LQQz18MbI&app=desktop�__�_\n', '@15Warrenton Fantastic! No punitive sit and stare! #optout #morethanascore #whyIrefuse\n', "#PARCC victimhood and why it's overblown. @starledger editorial: http://bit.ly/1FtPB6f #nj\n", 'Teacher: #PARCC devours resources and valuable l..._resources_and_valuable_l.html#incart_river�__�_\n', "@CanibeYOUNG @Stoptesting15 That's why we all need to #OptOut\n", 'NM @Education_Stuff: CURMUDGUCATION: PA: All Abo...�__�__ѝ #poverty #nm #OptOut #CommonCore #PARCC"\n', "@pearson Teachers & parents will unite against y...kes tests don't negate freedom of speech. #PARCC\n", 'Parents want that free public edu 4 their kids b...t Out #CommonCore testing http://nyti.ms/1wDojSo\n', 'Why parents should support #PARCC and #SBAC @Hop...ould-welcome-the-new-common-core-tests.html�__�_\n', 'Pride in your child or student should never be d...optout #morethanascorepic.twitter.com/GmP5Huyu1n\n', 'Pearson has not actually released any validity s...es on #PARCC" - Cassie Cresswell #TestingSeason"\n', 'Go NewJersey! @NJOptOut: First NJ Refuse PARCC b...__ѝ NewMexico #OptOut #CommonCore #PARCC #nmpol"\n', 'We are officially finished with the #PARCC Perfo...Thanks to everyone for your support. #PohatPride\n', 'hey @ISBEnews no provision for lots of things in...ur right as parents & students! @ILRaiseYourHand\n', ...], y=array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1,...1, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 1]), groups=None, **fit_params={})
    634                                   return_train_score=self.return_train_score,
    635                                   return_n_test_samples=True,
    636                                   return_times=True, return_parameters=False,
    637                                   error_score=self.error_score)
    638           for parameters, (train, test) in product(candidate_params,
--> 639                                                    cv.split(X, y, groups)))
        cv.split = <bound method StratifiedKFold.split of Stratifie...ld(n_splits=3, random_state=None, shuffle=False)>
        X = ['@NorwichBulletin compares #SBAC to CEA\'s testing... delays education reform" http://shar.es/1ggVAN"\n', '@OHEducation Those poor kids. Adm you should be ...gwash disguised as assessment. Parents~ #OptOut!\n', 'High Stakes Testing is a waste if time and money...iswatching @NYGovCuomopic.twitter.com/QoxhLdYpCl\n', 'My students are braver and stronger and wiser th...nt will be able to capture. #SBAC Day 1 is over.\n', '#CO parent: Don_��t _��opt out_�� of transparenc...://bit.ly/1boSZ5L�_ via @edu_post #PARCC #edcolo\n', "So my third grader won't get to see her #PARCC e... Maybe next year we will suggest a #PARCCWALKOUT\n", '#BastaYa! Hay unas preguntas que no tienen logic...youtube.com/watch?v=94LQQz18MbI&app=desktop�__�_\n', '@15Warrenton Fantastic! No punitive sit and stare! #optout #morethanascore #whyIrefuse\n', "#PARCC victimhood and why it's overblown. @starledger editorial: http://bit.ly/1FtPB6f #nj\n", 'Teacher: #PARCC devours resources and valuable l..._resources_and_valuable_l.html#incart_river�__�_\n', "@CanibeYOUNG @Stoptesting15 That's why we all need to #OptOut\n", 'NM @Education_Stuff: CURMUDGUCATION: PA: All Abo...�__�__ѝ #poverty #nm #OptOut #CommonCore #PARCC"\n', "@pearson Teachers & parents will unite against y...kes tests don't negate freedom of speech. #PARCC\n", 'Parents want that free public edu 4 their kids b...t Out #CommonCore testing http://nyti.ms/1wDojSo\n', 'Why parents should support #PARCC and #SBAC @Hop...ould-welcome-the-new-common-core-tests.html�__�_\n', 'Pride in your child or student should never be d...optout #morethanascorepic.twitter.com/GmP5Huyu1n\n', 'Pearson has not actually released any validity s...es on #PARCC" - Cassie Cresswell #TestingSeason"\n', 'Go NewJersey! @NJOptOut: First NJ Refuse PARCC b...__ѝ NewMexico #OptOut #CommonCore #PARCC #nmpol"\n', 'We are officially finished with the #PARCC Perfo...Thanks to everyone for your support. #PohatPride\n', 'hey @ISBEnews no provision for lots of things in...ur right as parents & students! @ILRaiseYourHand\n', ...]
        y = array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1,...1, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 1])
        groups = None
    640 
    641         # if one choose to see train score, "out" will contain train score info
    642         if self.return_train_score:
    643             (train_score_dicts, test_score_dicts, test_sample_counts, fit_time,

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object BaseSearchCV.fit.<locals>.<genexpr>>)
    784             if pre_dispatch == "all" or n_jobs == 1:
    785                 # The iterable was consumed all at once by the above for loop.
    786                 # No need to wait for async callbacks to trigger to
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time
    792             self._print('Done %3i out of %3i | elapsed: %s finished',
    793                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Thu Mar 14 16:11:32 2019
PID: 33931 Python 3.5.2: /Users/amyburkhardt/Documents/NLP/venv/bin/python3
...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _fit_and_score>, (Pipeline(memory=None,
     steps=[('vect', Count...e, shrinking=True,
  tol=0.001, verbose=False))]), ['@NorwichBulletin compares #SBAC to CEA\'s testing... delays education reform" http://shar.es/1ggVAN"\n', '@OHEducation Those poor kids. Adm you should be ...gwash disguised as assessment. Parents~ #OptOut!\n', 'High Stakes Testing is a waste if time and money...iswatching @NYGovCuomopic.twitter.com/QoxhLdYpCl\n', 'My students are braver and stronger and wiser th...nt will be able to capture. #SBAC Day 1 is over.\n', '#CO parent: Don_��t _��opt out_�� of transparenc...://bit.ly/1boSZ5L�_ via @edu_post #PARCC #edcolo\n', "So my third grader won't get to see her #PARCC e... Maybe next year we will suggest a #PARCCWALKOUT\n", '#BastaYa! Hay unas preguntas que no tienen logic...youtube.com/watch?v=94LQQz18MbI&app=desktop�__�_\n', '@15Warrenton Fantastic! No punitive sit and stare! #optout #morethanascore #whyIrefuse\n', "#PARCC victimhood and why it's overblown. @starledger editorial: http://bit.ly/1FtPB6f #nj\n", 'Teacher: #PARCC devours resources and valuable l..._resources_and_valuable_l.html#incart_river�__�_\n', "@CanibeYOUNG @Stoptesting15 That's why we all need to #OptOut\n", 'NM @Education_Stuff: CURMUDGUCATION: PA: All Abo...�__�__ѝ #poverty #nm #OptOut #CommonCore #PARCC"\n', "@pearson Teachers & parents will unite against y...kes tests don't negate freedom of speech. #PARCC\n", 'Parents want that free public edu 4 their kids b...t Out #CommonCore testing http://nyti.ms/1wDojSo\n', 'Why parents should support #PARCC and #SBAC @Hop...ould-welcome-the-new-common-core-tests.html�__�_\n', 'Pride in your child or student should never be d...optout #morethanascorepic.twitter.com/GmP5Huyu1n\n', 'Pearson has not actually released any validity s...es on #PARCC" - Cassie Cresswell #TestingSeason"\n', 'Go NewJersey! @NJOptOut: First NJ Refuse PARCC b...__ѝ NewMexico #OptOut #CommonCore #PARCC #nmpol"\n', 'We are officially finished with the #PARCC Perfo...Thanks to everyone for your support. #PohatPride\n', 'hey @ISBEnews no provision for lots of things in...ur right as parents & students! @ILRaiseYourHand\n', ...], array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1,...1, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 1]), {'accuracy': make_scorer(accuracy_score), 'f1': make_scorer(f1_score), 'fscore_prec': make_scorer(fbeta_score, beta=0), 'precision': make_scorer(precision_score), 'recall': make_scorer(recall_score)}, array([198, 202, 204, 205, 206, 207, 208, 209, 2...94, 595, 596, 597, 598, 599, 600, 601, 602, 603]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 194,
       195, 196, 197, 199, 200, 201, 203]), 1, {'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'}), {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': False, 'return_times': True, 'return_train_score': 'warn'})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (Pipeline(memory=None,
     steps=[('vect', Count...e, shrinking=True,
  tol=0.001, verbose=False))]), ['@NorwichBulletin compares #SBAC to CEA\'s testing... delays education reform" http://shar.es/1ggVAN"\n', '@OHEducation Those poor kids. Adm you should be ...gwash disguised as assessment. Parents~ #OptOut!\n', 'High Stakes Testing is a waste if time and money...iswatching @NYGovCuomopic.twitter.com/QoxhLdYpCl\n', 'My students are braver and stronger and wiser th...nt will be able to capture. #SBAC Day 1 is over.\n', '#CO parent: Don_��t _��opt out_�� of transparenc...://bit.ly/1boSZ5L�_ via @edu_post #PARCC #edcolo\n', "So my third grader won't get to see her #PARCC e... Maybe next year we will suggest a #PARCCWALKOUT\n", '#BastaYa! Hay unas preguntas que no tienen logic...youtube.com/watch?v=94LQQz18MbI&app=desktop�__�_\n', '@15Warrenton Fantastic! No punitive sit and stare! #optout #morethanascore #whyIrefuse\n', "#PARCC victimhood and why it's overblown. @starledger editorial: http://bit.ly/1FtPB6f #nj\n", 'Teacher: #PARCC devours resources and valuable l..._resources_and_valuable_l.html#incart_river�__�_\n', "@CanibeYOUNG @Stoptesting15 That's why we all need to #OptOut\n", 'NM @Education_Stuff: CURMUDGUCATION: PA: All Abo...�__�__ѝ #poverty #nm #OptOut #CommonCore #PARCC"\n', "@pearson Teachers & parents will unite against y...kes tests don't negate freedom of speech. #PARCC\n", 'Parents want that free public edu 4 their kids b...t Out #CommonCore testing http://nyti.ms/1wDojSo\n', 'Why parents should support #PARCC and #SBAC @Hop...ould-welcome-the-new-common-core-tests.html�__�_\n', 'Pride in your child or student should never be d...optout #morethanascorepic.twitter.com/GmP5Huyu1n\n', 'Pearson has not actually released any validity s...es on #PARCC" - Cassie Cresswell #TestingSeason"\n', 'Go NewJersey! @NJOptOut: First NJ Refuse PARCC b...__ѝ NewMexico #OptOut #CommonCore #PARCC #nmpol"\n', 'We are officially finished with the #PARCC Perfo...Thanks to everyone for your support. #PohatPride\n', 'hey @ISBEnews no provision for lots of things in...ur right as parents & students! @ILRaiseYourHand\n', ...], array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1,...1, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 1]), {'accuracy': make_scorer(accuracy_score), 'f1': make_scorer(f1_score), 'fscore_prec': make_scorer(fbeta_score, beta=0), 'precision': make_scorer(precision_score), 'recall': make_scorer(recall_score)}, array([198, 202, 204, 205, 206, 207, 208, 209, 2...94, 595, 596, 597, 598, 599, 600, 601, 602, 603]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 194,
       195, 196, 197, 199, 200, 201, 203]), 1, {'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'})
        kwargs = {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': False, 'return_times': True, 'return_train_score': 'warn'}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator=Pipeline(memory=None,
     steps=[('vect', Count...e, shrinking=True,
  tol=0.001, verbose=False))]), X=['@NorwichBulletin compares #SBAC to CEA\'s testing... delays education reform" http://shar.es/1ggVAN"\n', '@OHEducation Those poor kids. Adm you should be ...gwash disguised as assessment. Parents~ #OptOut!\n', 'High Stakes Testing is a waste if time and money...iswatching @NYGovCuomopic.twitter.com/QoxhLdYpCl\n', 'My students are braver and stronger and wiser th...nt will be able to capture. #SBAC Day 1 is over.\n', '#CO parent: Don_��t _��opt out_�� of transparenc...://bit.ly/1boSZ5L�_ via @edu_post #PARCC #edcolo\n', "So my third grader won't get to see her #PARCC e... Maybe next year we will suggest a #PARCCWALKOUT\n", '#BastaYa! Hay unas preguntas que no tienen logic...youtube.com/watch?v=94LQQz18MbI&app=desktop�__�_\n', '@15Warrenton Fantastic! No punitive sit and stare! #optout #morethanascore #whyIrefuse\n', "#PARCC victimhood and why it's overblown. @starledger editorial: http://bit.ly/1FtPB6f #nj\n", 'Teacher: #PARCC devours resources and valuable l..._resources_and_valuable_l.html#incart_river�__�_\n', "@CanibeYOUNG @Stoptesting15 That's why we all need to #OptOut\n", 'NM @Education_Stuff: CURMUDGUCATION: PA: All Abo...�__�__ѝ #poverty #nm #OptOut #CommonCore #PARCC"\n', "@pearson Teachers & parents will unite against y...kes tests don't negate freedom of speech. #PARCC\n", 'Parents want that free public edu 4 their kids b...t Out #CommonCore testing http://nyti.ms/1wDojSo\n', 'Why parents should support #PARCC and #SBAC @Hop...ould-welcome-the-new-common-core-tests.html�__�_\n', 'Pride in your child or student should never be d...optout #morethanascorepic.twitter.com/GmP5Huyu1n\n', 'Pearson has not actually released any validity s...es on #PARCC" - Cassie Cresswell #TestingSeason"\n', 'Go NewJersey! @NJOptOut: First NJ Refuse PARCC b...__ѝ NewMexico #OptOut #CommonCore #PARCC #nmpol"\n', 'We are officially finished with the #PARCC Perfo...Thanks to everyone for your support. #PohatPride\n', 'hey @ISBEnews no provision for lots of things in...ur right as parents & students! @ILRaiseYourHand\n', ...], y=array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1,...1, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 1]), scorer={'accuracy': make_scorer(accuracy_score), 'f1': make_scorer(f1_score), 'fscore_prec': make_scorer(fbeta_score, beta=0), 'precision': make_scorer(precision_score), 'recall': make_scorer(recall_score)}, train=array([198, 202, 204, 205, 206, 207, 208, 209, 2...94, 595, 596, 597, 598, 599, 600, 601, 602, 603]), test=array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 194,
       195, 196, 197, 199, 200, 201, 203]), verbose=1, parameters={'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'}, fit_params={}, return_train_score='warn', return_parameters=False, return_n_test_samples=True, return_times=True, error_score='raise')
    439                       for k, v in fit_params.items()])
    440 
    441     test_scores = {}
    442     train_scores = {}
    443     if parameters is not None:
--> 444         estimator.set_params(**parameters)
        estimator.set_params = <bound method Pipeline.set_params of Pipeline(me..., shrinking=True,
  tol=0.001, verbose=False))])>
        parameters = {'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'}
    445 
    446     start_time = time.time()
    447 
    448     X_train, y_train = _safe_split(estimator, X, y, train)

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/pipeline.py in set_params(self=Pipeline(memory=None,
     steps=[('vect', Count...e, shrinking=True,
  tol=0.001, verbose=False))]), **kwargs={'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'})
    137 
    138         Returns
    139         -------
    140         self
    141         """
--> 142         self._set_params('steps', **kwargs)
        self._set_params = <bound method _BaseComposition._set_params of Pi..., shrinking=True,
  tol=0.001, verbose=False))])>
        kwargs = {'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'}
    143         return self
    144 
    145     def _validate_steps(self):
    146         names, estimators = zip(*self.steps)

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/utils/metaestimators.py in _set_params(self=Pipeline(memory=None,
     steps=[('vect', Count...e, shrinking=True,
  tol=0.001, verbose=False))]), attr='steps', **params={'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'})
     44         names, _ = zip(*getattr(self, attr))
     45         for name in list(six.iterkeys(params)):
     46             if '__' not in name and name in names:
     47                 self._replace_estimator(attr, name, params.pop(name))
     48         # 3. Step parameters and other initilisation arguments
---> 49         super(_BaseComposition, self).set_params(**params)
        self.set_params = <bound method Pipeline.set_params of Pipeline(me..., shrinking=True,
  tol=0.001, verbose=False))])>
        params = {'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'}
     50         return self
     51 
     52     def _replace_estimator(self, attr, name, new_val):
     53         # assumes `name` is a valid estimator name

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/base.py in set_params(self=Pipeline(memory=None,
     steps=[('vect', Count...e, shrinking=True,
  tol=0.001, verbose=False))]), **params={'tfidf__use_idf': (True, False), 'vect__max_df': 0.29999999999999999, 'vect__ngram_range': (1, 1), 'vect__preprocessor': <function normalize_tweet>, 'vect__stopwords': 'english'})
    277                 nested_params[key][sub_key] = value
    278             else:
    279                 setattr(self, key, value)
    280 
    281         for key, sub_params in nested_params.items():
--> 282             valid_params[key].set_params(**sub_params)
        valid_params = {'clf': SVC(C=1.0, cache_size=200, class_weight=None, co...None, shrinking=True,
  tol=0.001, verbose=False), 'clf__C': 1.0, 'clf__cache_size': 200, 'clf__class_weight': None, 'clf__coef0': 0.0, 'clf__decision_function_shape': 'ovr', 'clf__degree': 3, 'clf__gamma': 'auto', 'clf__kernel': 'linear', 'clf__max_iter': -1, ...}
        key.set_params = undefined
        sub_params = {'max_df': 0.29999999999999999, 'ngram_range': (1, 1), 'preprocessor': <function normalize_tweet>, 'stopwords': 'english'}
    283 
    284         return self
    285 
    286     def __repr__(self):

...........................................................................
/Users/amyburkhardt/Documents/NLP/venv/lib/python3.5/site-packages/sklearn/base.py in set_params(self=CountVectorizer(analyzer='word', binary=False, d...\w+\\b',
        tokenizer=None, vocabulary=None), **params={'max_df': 0.29999999999999999, 'ngram_range': (1, 1), 'preprocessor': <function normalize_tweet>, 'stopwords': 'english'})
    269             key, delim, sub_key = key.partition('__')
    270             if key not in valid_params:
    271                 raise ValueError('Invalid parameter %s for estimator %s. '
    272                                  'Check the list of available parameters '
    273                                  'with `estimator.get_params().keys()`.' %
--> 274                                  (key, self))
        key = 'stopwords'
        self = CountVectorizer(analyzer='word', binary=False, d...\w+\\b',
        tokenizer=None, vocabulary=None)
    275 
    276             if delim:
    277                 nested_params[key][sub_key] = value
    278             else:

ValueError: Invalid parameter stopwords for estimator CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
        tokenizer=None, vocabulary=None). Check the list of available parameters with `estimator.get_params().keys()`.
___________________________________________________________________________

In [None]:
# The parameters selected by the grid search
svc_search.best_params_

In [None]:
# print the average scores over the k training folds
fields = ['accuracy', 'precision', 'recall', 'f1', 'fscore_prec']

for f in fields:
    score = svc_search.cv_results_["mean_test_%s" % f][svc_search.best_index_]
    print("%s: %.3f" % (f, score))

## Results

We check how it works by running the best classifier from the grid search on our held out set.

In [None]:
# Get best model from grid search we ran in previous section
best_model = svc_search.best_estimator_

In [None]:
# use model to predict held out set (X_test) and print score table
# Note that in binary classification, accuracy is the same as the
# [mico averaged recall reported in the table
predictions = best_model.predict(X_test)
print(classification_report(y_test, predictions, target_names=classes))

In [None]:
# Print confusion matrix
# http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
print(confusion_matrix(y_test, predictions))

## Persist model

Take our best model, retrain it on entire training dataset (including the held out set used for evaluation above), and persist it to disk.

In [None]:
# retrain on all data
best_model.fit(docs, bin_targets)

In [None]:
# save to disk
with open(model_filename, 'wb') as f:
    pickle.dump(best_model, f)