# **TF-IDF** (+ Linear SVC classification) - Training
This notebook loads the fitted tfidf transformer and trained support vector classifier, 
and recreates the same predictions.


## Sources
Uses the scikit-learn library both for obtaining the tf-idf representation with TfidfVectorizer (https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html) and for classification with LinearSVC (https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html).
## Reproducibility
After running this notebook, you will obtain the model used for Submission **#109900** on AIcrowd

| Accuracy | F1 |
|:---:|:---:|
| 86.2% | 86.5% |

### Import modules and download dataset

In [2]:
import os 
import wget
import helpers 
import pickle
root = 'data/'

os.makedirs(root, exist_ok=True)

seed = 0

# Reload functions needed to unpickle the vectorizer 
from sklearn.feature_extraction.text import TfidfVectorizer 
from nltk.tokenize import TweetTokenizer
from preprocessing import process_sentence, to_vec, split_hashtag, remove_repeats, remove_informal_contractions
preproc_pipeline = [to_vec(split_hashtag),  
                    to_vec(remove_repeats)]
def tk(sent):
    """ Tokenize a tweet.
    
    Parameters
    ----------
        sent: string
            a tweet
        
    Returns
    -------
        tokens: list of strings
            a tokenized version of the string
    """
    
    tokens = TweetTokenizer().tokenize(sent)
    tokens = process_sentence(tokens, preproc_pipeline)
    return tokens

# Prepare test set
test_url = 'https://api.onedrive.com/v1.0/shares/u!aHR0cHM6Ly8xZHJ2Lm1zL3QvcyFBclREZ3U5ejdJT1ZqcDR5Q3hoWXM4T2FJd1JLenc_ZT1hSXh0/root/content'
test_filename = root + 'test.txt'
wget.download(test_url, test_filename)

test_tweets = []
with open(test_filename, encoding = 'utf-8') as f:
    for line in f:
        sp = line.split(',')
        index = sp[0]
        test_tweets.append(','.join(sp[1:]))

  0% [                                                                        ]        0 / 96050024  0% [                                                                        ]     8192 / 96050024  0% [                                                                        ]    16384 / 96050024  0% [                                                                        ]    24576 / 96050024  0% [                                                                        ]    32768 / 96050024  0% [                                                                        ]    40960 / 96050024  0% [                                                                        ]    49152 / 96050024  0% [                                                                        ]    57344 / 96050024  0% [                                                                        ]    65536 / 96050024  0% [                                                                        ]    73728 / 96050024

  0% [                                                                        ]   688128 / 96050024  0% [                                                                        ]   696320 / 96050024  0% [                                                                        ]   704512 / 96050024  0% [                                                                        ]   712704 / 96050024  0% [                                                                        ]   720896 / 96050024  0% [                                                                        ]   729088 / 96050024  0% [                                                                        ]   737280 / 96050024  0% [                                                                        ]   745472 / 96050024  0% [                                                                        ]   753664 / 96050024  0% [                                                                        ]   761856 / 96050024

  3% [..                                                                      ]  2891776 / 96050024  3% [..                                                                      ]  2899968 / 96050024  3% [..                                                                      ]  2908160 / 96050024  3% [..                                                                      ]  2916352 / 96050024  3% [..                                                                      ]  2924544 / 96050024  3% [..                                                                      ]  2932736 / 96050024  3% [..                                                                      ]  2940928 / 96050024  3% [..                                                                      ]  2949120 / 96050024  3% [..                                                                      ]  2957312 / 96050024  3% [..                                                                      ]  2965504 / 96050024

  4% [...                                                                     ]  4669440 / 96050024  4% [...                                                                     ]  4677632 / 96050024  4% [...                                                                     ]  4685824 / 96050024  4% [...                                                                     ]  4694016 / 96050024  4% [...                                                                     ]  4702208 / 96050024  4% [...                                                                     ]  4710400 / 96050024  4% [...                                                                     ]  4718592 / 96050024  4% [...                                                                     ]  4726784 / 96050024  4% [...                                                                     ]  4734976 / 96050024  4% [...                                                                     ]  4743168 / 96050024

  6% [....                                                                    ]  6529024 / 96050024  6% [....                                                                    ]  6537216 / 96050024  6% [....                                                                    ]  6545408 / 96050024  6% [....                                                                    ]  6553600 / 96050024  6% [....                                                                    ]  6561792 / 96050024  6% [....                                                                    ]  6569984 / 96050024  6% [....                                                                    ]  6578176 / 96050024  6% [....                                                                    ]  6586368 / 96050024  6% [....                                                                    ]  6594560 / 96050024  6% [....                                                                    ]  6602752 / 96050024

  8% [......                                                                  ]  8347648 / 96050024  8% [......                                                                  ]  8355840 / 96050024  8% [......                                                                  ]  8364032 / 96050024  8% [......                                                                  ]  8372224 / 96050024  8% [......                                                                  ]  8380416 / 96050024  8% [......                                                                  ]  8388608 / 96050024  8% [......                                                                  ]  8396800 / 96050024  8% [......                                                                  ]  8404992 / 96050024  8% [......                                                                  ]  8413184 / 96050024  8% [......                                                                  ]  8421376 / 96050024

 10% [.......                                                                 ] 10297344 / 96050024 10% [.......                                                                 ] 10305536 / 96050024 10% [.......                                                                 ] 10313728 / 96050024 10% [.......                                                                 ] 10321920 / 96050024 10% [.......                                                                 ] 10330112 / 96050024 10% [.......                                                                 ] 10338304 / 96050024 10% [.......                                                                 ] 10346496 / 96050024 10% [.......                                                                 ] 10354688 / 96050024 10% [.......                                                                 ] 10362880 / 96050024 10% [.......                                                                 ] 10371072 / 96050024

 12% [.........                                                               ] 12484608 / 96050024 13% [.........                                                               ] 12492800 / 96050024 13% [.........                                                               ] 12500992 / 96050024 13% [.........                                                               ] 12509184 / 96050024 13% [.........                                                               ] 12517376 / 96050024 13% [.........                                                               ] 12525568 / 96050024 13% [.........                                                               ] 12533760 / 96050024 13% [.........                                                               ] 12541952 / 96050024 13% [.........                                                               ] 12550144 / 96050024 13% [.........                                                               ] 12558336 / 96050024

 15% [...........                                                             ] 14729216 / 96050024 15% [...........                                                             ] 14737408 / 96050024 15% [...........                                                             ] 14745600 / 96050024 15% [...........                                                             ] 14753792 / 96050024 15% [...........                                                             ] 14761984 / 96050024 15% [...........                                                             ] 14770176 / 96050024 15% [...........                                                             ] 14778368 / 96050024 15% [...........                                                             ] 14786560 / 96050024 15% [...........                                                             ] 14794752 / 96050024 15% [...........                                                             ] 14802944 / 96050024

 18% [............                                                            ] 17317888 / 96050024 18% [............                                                            ] 17326080 / 96050024 18% [............                                                            ] 17334272 / 96050024 18% [.............                                                           ] 17342464 / 96050024 18% [.............                                                           ] 17350656 / 96050024 18% [.............                                                           ] 17358848 / 96050024 18% [.............                                                           ] 17367040 / 96050024 18% [.............                                                           ] 17375232 / 96050024 18% [.............                                                           ] 17383424 / 96050024 18% [.............                                                           ] 17391616 / 96050024

 20% [..............                                                          ] 19693568 / 96050024 20% [..............                                                          ] 19701760 / 96050024 20% [..............                                                          ] 19709952 / 96050024 20% [..............                                                          ] 19718144 / 96050024 20% [..............                                                          ] 19726336 / 96050024 20% [..............                                                          ] 19734528 / 96050024 20% [..............                                                          ] 19742720 / 96050024 20% [..............                                                          ] 19750912 / 96050024 20% [..............                                                          ] 19759104 / 96050024 20% [..............                                                          ] 19767296 / 96050024

 22% [................                                                        ] 21979136 / 96050024 22% [................                                                        ] 21987328 / 96050024 22% [................                                                        ] 21995520 / 96050024 22% [................                                                        ] 22003712 / 96050024 22% [................                                                        ] 22011904 / 96050024 22% [................                                                        ] 22020096 / 96050024 22% [................                                                        ] 22028288 / 96050024 22% [................                                                        ] 22036480 / 96050024 22% [................                                                        ] 22044672 / 96050024 22% [................                                                        ] 22052864 / 96050024

 25% [..................                                                      ] 24076288 / 96050024 25% [..................                                                      ] 24084480 / 96050024 25% [..................                                                      ] 24092672 / 96050024 25% [..................                                                      ] 24100864 / 96050024 25% [..................                                                      ] 24109056 / 96050024 25% [..................                                                      ] 24117248 / 96050024 25% [..................                                                      ] 24125440 / 96050024 25% [..................                                                      ] 24133632 / 96050024 25% [..................                                                      ] 24141824 / 96050024 25% [..................                                                      ] 24150016 / 96050024

 27% [...................                                                     ] 26288128 / 96050024 27% [...................                                                     ] 26296320 / 96050024 27% [...................                                                     ] 26304512 / 96050024 27% [...................                                                     ] 26312704 / 96050024 27% [...................                                                     ] 26320896 / 96050024 27% [...................                                                     ] 26329088 / 96050024 27% [...................                                                     ] 26337280 / 96050024 27% [...................                                                     ] 26345472 / 96050024 27% [...................                                                     ] 26353664 / 96050024 27% [...................                                                     ] 26361856 / 96050024

 29% [.....................                                                   ] 28401664 / 96050024 29% [.....................                                                   ] 28409856 / 96050024 29% [.....................                                                   ] 28418048 / 96050024 29% [.....................                                                   ] 28426240 / 96050024 29% [.....................                                                   ] 28434432 / 96050024 29% [.....................                                                   ] 28442624 / 96050024 29% [.....................                                                   ] 28450816 / 96050024 29% [.....................                                                   ] 28459008 / 96050024 29% [.....................                                                   ] 28467200 / 96050024 29% [.....................                                                   ] 28475392 / 96050024

 32% [.......................                                                 ] 30793728 / 96050024 32% [.......................                                                 ] 30801920 / 96050024 32% [.......................                                                 ] 30810112 / 96050024 32% [.......................                                                 ] 30818304 / 96050024 32% [.......................                                                 ] 30826496 / 96050024 32% [.......................                                                 ] 30834688 / 96050024 32% [.......................                                                 ] 30842880 / 96050024 32% [.......................                                                 ] 30851072 / 96050024 32% [.......................                                                 ] 30859264 / 96050024 32% [.......................                                                 ] 30867456 / 96050024

 34% [........................                                                ] 32989184 / 96050024 34% [........................                                                ] 32997376 / 96050024 34% [........................                                                ] 33005568 / 96050024 34% [........................                                                ] 33013760 / 96050024 34% [........................                                                ] 33021952 / 96050024 34% [........................                                                ] 33030144 / 96050024 34% [........................                                                ] 33038336 / 96050024 34% [........................                                                ] 33046528 / 96050024 34% [........................                                                ] 33054720 / 96050024 34% [........................                                                ] 33062912 / 96050024

 36% [..........................                                              ] 35389440 / 96050024 36% [..........................                                              ] 35397632 / 96050024 36% [..........................                                              ] 35405824 / 96050024 36% [..........................                                              ] 35414016 / 96050024 36% [..........................                                              ] 35422208 / 96050024 36% [..........................                                              ] 35430400 / 96050024 36% [..........................                                              ] 35438592 / 96050024 36% [..........................                                              ] 35446784 / 96050024 36% [..........................                                              ] 35454976 / 96050024 36% [..........................                                              ] 35463168 / 96050024

 39% [............................                                            ] 37658624 / 96050024 39% [............................                                            ] 37666816 / 96050024 39% [............................                                            ] 37675008 / 96050024 39% [............................                                            ] 37683200 / 96050024 39% [............................                                            ] 37691392 / 96050024 39% [............................                                            ] 37699584 / 96050024 39% [............................                                            ] 37707776 / 96050024 39% [............................                                            ] 37715968 / 96050024 39% [............................                                            ] 37724160 / 96050024 39% [............................                                            ] 37732352 / 96050024

 41% [.............................                                           ] 39559168 / 96050024 41% [.............................                                           ] 39567360 / 96050024 41% [.............................                                           ] 39575552 / 96050024 41% [.............................                                           ] 39583744 / 96050024 41% [.............................                                           ] 39591936 / 96050024 41% [.............................                                           ] 39600128 / 96050024 41% [.............................                                           ] 39608320 / 96050024 41% [.............................                                           ] 39616512 / 96050024 41% [.............................                                           ] 39624704 / 96050024 41% [.............................                                           ] 39632896 / 96050024

 43% [...............................                                         ] 41951232 / 96050024 43% [...............................                                         ] 41959424 / 96050024 43% [...............................                                         ] 41967616 / 96050024 43% [...............................                                         ] 41975808 / 96050024 43% [...............................                                         ] 41984000 / 96050024 43% [...............................                                         ] 41992192 / 96050024 43% [...............................                                         ] 42000384 / 96050024 43% [...............................                                         ] 42008576 / 96050024 43% [...............................                                         ] 42016768 / 96050024 43% [...............................                                         ] 42024960 / 96050024

 45% [................................                                        ] 43827200 / 96050024 45% [................................                                        ] 43835392 / 96050024 45% [................................                                        ] 43843584 / 96050024 45% [................................                                        ] 43851776 / 96050024 45% [................................                                        ] 43859968 / 96050024 45% [................................                                        ] 43868160 / 96050024 45% [................................                                        ] 43876352 / 96050024 45% [................................                                        ] 43884544 / 96050024 45% [................................                                        ] 43892736 / 96050024 45% [................................                                        ] 43900928 / 96050024

 48% [..................................                                      ] 46186496 / 96050024 48% [..................................                                      ] 46194688 / 96050024 48% [..................................                                      ] 46202880 / 96050024 48% [..................................                                      ] 46211072 / 96050024 48% [..................................                                      ] 46219264 / 96050024 48% [..................................                                      ] 46227456 / 96050024 48% [..................................                                      ] 46235648 / 96050024 48% [..................................                                      ] 46243840 / 96050024 48% [..................................                                      ] 46252032 / 96050024 48% [..................................                                      ] 46260224 / 96050024

 50% [....................................                                    ] 48496640 / 96050024 50% [....................................                                    ] 48504832 / 96050024 50% [....................................                                    ] 48513024 / 96050024 50% [....................................                                    ] 48521216 / 96050024 50% [....................................                                    ] 48529408 / 96050024 50% [....................................                                    ] 48537600 / 96050024 50% [....................................                                    ] 48545792 / 96050024 50% [....................................                                    ] 48553984 / 96050024 50% [....................................                                    ] 48562176 / 96050024 50% [....................................                                    ] 48570368 / 96050024

 53% [......................................                                  ] 51027968 / 96050024 53% [......................................                                  ] 51036160 / 96050024 53% [......................................                                  ] 51044352 / 96050024 53% [......................................                                  ] 51052544 / 96050024 53% [......................................                                  ] 51060736 / 96050024 53% [......................................                                  ] 51068928 / 96050024 53% [......................................                                  ] 51077120 / 96050024 53% [......................................                                  ] 51085312 / 96050024 53% [......................................                                  ] 51093504 / 96050024 53% [......................................                                  ] 51101696 / 96050024

 55% [........................................                                ] 53362688 / 96050024 55% [........................................                                ] 53370880 / 96050024 55% [........................................                                ] 53379072 / 96050024 55% [........................................                                ] 53387264 / 96050024 55% [........................................                                ] 53395456 / 96050024 55% [........................................                                ] 53403648 / 96050024 55% [........................................                                ] 53411840 / 96050024 55% [........................................                                ] 53420032 / 96050024 55% [........................................                                ] 53428224 / 96050024 55% [........................................                                ] 53436416 / 96050024

 57% [.........................................                               ] 55558144 / 96050024 57% [.........................................                               ] 55566336 / 96050024 57% [.........................................                               ] 55574528 / 96050024 57% [.........................................                               ] 55582720 / 96050024 57% [.........................................                               ] 55590912 / 96050024 57% [.........................................                               ] 55599104 / 96050024 57% [.........................................                               ] 55607296 / 96050024 57% [.........................................                               ] 55615488 / 96050024 57% [.........................................                               ] 55623680 / 96050024 57% [.........................................                               ] 55631872 / 96050024

 59% [...........................................                             ] 57466880 / 96050024 59% [...........................................                             ] 57475072 / 96050024 59% [...........................................                             ] 57483264 / 96050024 59% [...........................................                             ] 57491456 / 96050024 59% [...........................................                             ] 57499648 / 96050024 59% [...........................................                             ] 57507840 / 96050024 59% [...........................................                             ] 57516032 / 96050024 59% [...........................................                             ] 57524224 / 96050024 59% [...........................................                             ] 57532416 / 96050024 59% [...........................................                             ] 57540608 / 96050024

 62% [.............................................                           ] 60047360 / 96050024 62% [.............................................                           ] 60055552 / 96050024 62% [.............................................                           ] 60063744 / 96050024 62% [.............................................                           ] 60071936 / 96050024 62% [.............................................                           ] 60080128 / 96050024 62% [.............................................                           ] 60088320 / 96050024 62% [.............................................                           ] 60096512 / 96050024 62% [.............................................                           ] 60104704 / 96050024 62% [.............................................                           ] 60112896 / 96050024 62% [.............................................                           ] 60121088 / 96050024

 64% [..............................................                          ] 62062592 / 96050024 64% [..............................................                          ] 62070784 / 96050024 64% [..............................................                          ] 62078976 / 96050024 64% [..............................................                          ] 62087168 / 96050024 64% [..............................................                          ] 62095360 / 96050024 64% [..............................................                          ] 62103552 / 96050024 64% [..............................................                          ] 62111744 / 96050024 64% [..............................................                          ] 62119936 / 96050024 64% [..............................................                          ] 62128128 / 96050024 64% [..............................................                          ] 62136320 / 96050024

 66% [...............................................                         ] 63741952 / 96050024 66% [...............................................                         ] 63750144 / 96050024 66% [...............................................                         ] 63758336 / 96050024 66% [...............................................                         ] 63766528 / 96050024 66% [...............................................                         ] 63774720 / 96050024 66% [...............................................                         ] 63782912 / 96050024 66% [...............................................                         ] 63791104 / 96050024 66% [...............................................                         ] 63799296 / 96050024 66% [...............................................                         ] 63807488 / 96050024 66% [...............................................                         ] 63815680 / 96050024

 68% [.................................................                       ] 65708032 / 96050024 68% [.................................................                       ] 65716224 / 96050024 68% [.................................................                       ] 65724416 / 96050024 68% [.................................................                       ] 65732608 / 96050024 68% [.................................................                       ] 65740800 / 96050024 68% [.................................................                       ] 65748992 / 96050024 68% [.................................................                       ] 65757184 / 96050024 68% [.................................................                       ] 65765376 / 96050024 68% [.................................................                       ] 65773568 / 96050024 68% [.................................................                       ] 65781760 / 96050024

 70% [..................................................                      ] 67960832 / 96050024 70% [..................................................                      ] 67969024 / 96050024 70% [..................................................                      ] 67977216 / 96050024 70% [..................................................                      ] 67985408 / 96050024 70% [..................................................                      ] 67993600 / 96050024 70% [..................................................                      ] 68001792 / 96050024 70% [..................................................                      ] 68009984 / 96050024 70% [..................................................                      ] 68018176 / 96050024 70% [..................................................                      ] 68026368 / 96050024 70% [..................................................                      ] 68034560 / 96050024

 72% [....................................................                    ] 69640192 / 96050024 72% [....................................................                    ] 69648384 / 96050024 72% [....................................................                    ] 69656576 / 96050024 72% [....................................................                    ] 69664768 / 96050024 72% [....................................................                    ] 69672960 / 96050024 72% [....................................................                    ] 69681152 / 96050024 72% [....................................................                    ] 69689344 / 96050024 72% [....................................................                    ] 69697536 / 96050024 72% [....................................................                    ] 69705728 / 96050024 72% [....................................................                    ] 69713920 / 96050024

 74% [.....................................................                   ] 71696384 / 96050024 74% [.....................................................                   ] 71704576 / 96050024 74% [.....................................................                   ] 71712768 / 96050024 74% [.....................................................                   ] 71720960 / 96050024 74% [.....................................................                   ] 71729152 / 96050024 74% [.....................................................                   ] 71737344 / 96050024 74% [.....................................................                   ] 71745536 / 96050024 74% [.....................................................                   ] 71753728 / 96050024 74% [.....................................................                   ] 71761920 / 96050024 74% [.....................................................                   ] 71770112 / 96050024

 76% [.......................................................                 ] 73695232 / 96050024 76% [.......................................................                 ] 73703424 / 96050024 76% [.......................................................                 ] 73711616 / 96050024 76% [.......................................................                 ] 73719808 / 96050024 76% [.......................................................                 ] 73728000 / 96050024 76% [.......................................................                 ] 73736192 / 96050024 76% [.......................................................                 ] 73744384 / 96050024 76% [.......................................................                 ] 73752576 / 96050024 76% [.......................................................                 ] 73760768 / 96050024 76% [.......................................................                 ] 73768960 / 96050024

 78% [........................................................                ] 75399168 / 96050024 78% [........................................................                ] 75407360 / 96050024 78% [........................................................                ] 75415552 / 96050024 78% [........................................................                ] 75423744 / 96050024 78% [........................................................                ] 75431936 / 96050024 78% [........................................................                ] 75440128 / 96050024 78% [........................................................                ] 75448320 / 96050024 78% [........................................................                ] 75456512 / 96050024 78% [........................................................                ] 75464704 / 96050024 78% [........................................................                ] 75472896 / 96050024

 80% [..........................................................              ] 77553664 / 96050024 80% [..........................................................              ] 77561856 / 96050024 80% [..........................................................              ] 77570048 / 96050024 80% [..........................................................              ] 77578240 / 96050024 80% [..........................................................              ] 77586432 / 96050024 80% [..........................................................              ] 77594624 / 96050024 80% [..........................................................              ] 77602816 / 96050024 80% [..........................................................              ] 77611008 / 96050024 80% [..........................................................              ] 77619200 / 96050024 80% [..........................................................              ] 77627392 / 96050024

 82% [...........................................................             ] 79650816 / 96050024 82% [...........................................................             ] 79659008 / 96050024 82% [...........................................................             ] 79667200 / 96050024 82% [...........................................................             ] 79675392 / 96050024 82% [...........................................................             ] 79683584 / 96050024 82% [...........................................................             ] 79691776 / 96050024 82% [...........................................................             ] 79699968 / 96050024 82% [...........................................................             ] 79708160 / 96050024 82% [...........................................................             ] 79716352 / 96050024 83% [...........................................................             ] 79724544 / 96050024

 84% [.............................................................           ] 81436672 / 96050024 84% [.............................................................           ] 81444864 / 96050024 84% [.............................................................           ] 81453056 / 96050024 84% [.............................................................           ] 81461248 / 96050024 84% [.............................................................           ] 81469440 / 96050024 84% [.............................................................           ] 81477632 / 96050024 84% [.............................................................           ] 81485824 / 96050024 84% [.............................................................           ] 81494016 / 96050024 84% [.............................................................           ] 81502208 / 96050024 84% [.............................................................           ] 81510400 / 96050024

 86% [..............................................................          ] 83296256 / 96050024 86% [..............................................................          ] 83304448 / 96050024 86% [..............................................................          ] 83312640 / 96050024 86% [..............................................................          ] 83320832 / 96050024 86% [..............................................................          ] 83329024 / 96050024 86% [..............................................................          ] 83337216 / 96050024 86% [..............................................................          ] 83345408 / 96050024 86% [..............................................................          ] 83353600 / 96050024 86% [..............................................................          ] 83361792 / 96050024 86% [..............................................................          ] 83369984 / 96050024

 88% [................................................................        ] 85385216 / 96050024 88% [................................................................        ] 85393408 / 96050024 88% [................................................................        ] 85401600 / 96050024 88% [................................................................        ] 85409792 / 96050024 88% [................................................................        ] 85417984 / 96050024 88% [................................................................        ] 85426176 / 96050024 88% [................................................................        ] 85434368 / 96050024 88% [................................................................        ] 85442560 / 96050024 88% [................................................................        ] 85450752 / 96050024 88% [................................................................        ] 85458944 / 96050024

 90% [.................................................................       ] 87318528 / 96050024 90% [.................................................................       ] 87326720 / 96050024 90% [.................................................................       ] 87334912 / 96050024 90% [.................................................................       ] 87343104 / 96050024 90% [.................................................................       ] 87351296 / 96050024 90% [.................................................................       ] 87359488 / 96050024 90% [.................................................................       ] 87367680 / 96050024 90% [.................................................................       ] 87375872 / 96050024 90% [.................................................................       ] 87384064 / 96050024 90% [.................................................................       ] 87392256 / 96050024

 93% [...................................................................     ] 89513984 / 96050024 93% [...................................................................     ] 89522176 / 96050024 93% [...................................................................     ] 89530368 / 96050024 93% [...................................................................     ] 89538560 / 96050024 93% [...................................................................     ] 89546752 / 96050024 93% [...................................................................     ] 89554944 / 96050024 93% [...................................................................     ] 89563136 / 96050024 93% [...................................................................     ] 89571328 / 96050024 93% [...................................................................     ] 89579520 / 96050024 93% [...................................................................     ] 89587712 / 96050024

 96% [.....................................................................   ] 92659712 / 96050024 96% [.....................................................................   ] 92667904 / 96050024 96% [.....................................................................   ] 92676096 / 96050024 96% [.....................................................................   ] 92684288 / 96050024 96% [.....................................................................   ] 92692480 / 96050024 96% [.....................................................................   ] 92700672 / 96050024 96% [.....................................................................   ] 92708864 / 96050024 96% [.....................................................................   ] 92717056 / 96050024 96% [.....................................................................   ] 92725248 / 96050024 96% [.....................................................................   ] 92733440 / 96050024

100% [............................................................................] 817297 / 817297

In [7]:
# Load the fitted vectorizer
vectorizer_filename = root + 'tf-idf_fitted_vectorizer.pkl'
with open(vectorizer_filename, 'rb') as file:
    vect = pickle.load(file)
    
# Load the trained classifier
clf_filename = root + 'tf-idf_trained_linearSVC.pkl'
with open(clf_filename, 'rb') as file:
    clf = pickle.load(file)

In [8]:
X_test = vect.transform(test_tweets)

save_filename = 'submission_tfidf_predictions.csv'
predictions = clf.predict(X_test)
helpers.save_pred(save_filename, predictions)