# Dave's VA Gov's Race Test Set Accuracy from ULMFit Twitter Model 
Making NLP prediction on new datasets using FAST.ai
### The functions were copied from Matthew SF Choo's article [HERE](https://scientistwhostayed.medium.com/making-nlp-predictions-on-new-datasets-using-fast-ai-4a9be5e07ba1)

In [2]:
#install everything you need
# !pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastbook import *
from IPython.display import display,HTML
from fastai.text.all import *

In [3]:
from pathlib import Path
path = Path('/notebooks/clean/Tutorials')
path2 = Path('/notebooks/clean/va_gov')

Get Learner Func

In [4]:
def get_learner_for_inference(export_model_path: str):
    learn    = load_learner(export_model_path)
    # ensure learner is loaded on gpu cuda
    learn.dls.cuda()
    
    print(f'Learner stored on {learn.dls.device.type}')
    return learn

Get Preds Func

In [5]:
def get_preds_from_series(s: pd.Series, learner: Learner):
    # test_dl creates a new testing DataLoader 
    # using the pd.Series of text inputs
    dl_test   = learner.dls.test_dl(s)
    preds     = learner.get_preds(dl=dl_test)
    

#     return preds[0].numpy().flatten()
    return preds[0].numpy()

Load your df for inference/prediction

In [6]:
df_test = pd.read_csv(path2/'va_test_set_250bl_250re_500rows_simple_label.csv')

Run it

In [7]:
# learn, dls_clas = get_learner_for_inference('/path/to/awd_lstm_fully_trained_export')
learn = get_learner_for_inference('/notebooks/clean/Tutorials/clas_va2_from_recall_best_1000_one_hundred_percent_accurate_ex.pkl')

df_test[['pterry','pneither','pglenn']] = get_preds_from_series(
                                        s = df_test.text,                                  
                                          learner  = learn
                              )

Learner stored on cuda


In [8]:
df = df_test
df.to_csv('va_test_set_250r_250bl_500rows_labeled_ML_inf.csv')

# Model is 99% correct on what it trained on and 93.6% Accurate on the test set of 500 hand labeled tweets
note:  all test tweets had hashtags.  This number could be double checked by having 250 tweets that do not have hashtags.

In [9]:
# label all pro terry preds 0 and all pro glenn preds 4
conds = (df.pterry > df.pglenn)
df['plabel'] = np.where(conds, 0,4)

In [10]:
#convert df.label to integer type
df['label'] = pd.to_numeric(df['label'],errors='coerce')
df = df.replace(np.nan, 0, regex=True)
df['label'] = df['label'].astype(int)

In [11]:
# create is_correct column 
conds = (df.plabel == df.label)
df['is_correct'] = np.where(conds, 1,0)

### 93.6% Accuracy!!

In [12]:
# check to see your Accuracy
df.is_correct.sum()/len(df)

0.936

In [14]:
df.head(15)

Unnamed: 0,idx,label,text,id,timestamp,pterry,pneither,pglenn,plabel,is_correct
0,0,4,no word from working parents of school children affected by a decision to close schools in richmond on their “burnout” status. #winwithglenn https://t.co/dyafscuhew,1.4517554138425836e+18,2021-10-23 03:40:37,0.000543,2.106327e-06,0.999455,4,1
1,1,4,"@terrymcauliffe \nparents out of their kids education? \n\nendorsing crt?\n\nscooter braun did donate to dnc virginians! \n\ncomplicit with the clintons and other people in power?\n\nfailed ""green"" business? \n\n#vagov #vapol #twofacedterry #winwithglenn",1.4468911168471122e+18,2021-10-09 17:31:38,0.001112,9.618165e-06,0.998878,4,1
2,2,4,i’ve had the worst work day. i’ll be up all night. i didn’t even get to watch #themattwalshshow and i’m even wearing my new #sbg shirt. #sad but the princess is on her throne and there is live #truecrime to watch. #brianlaundrie is dead. btw #winwithglenn https://t.co/v16dibcgla,1.450934872638382e+18,2021-10-20 21:20:05,0.012246,7.121306e-10,0.987754,4,1
3,3,4,"mcauliffe wanted $25,000 to hear parent concerns\n\n...about @coalitionfortj insights on fairfax effort to end meritocracy-based admissions to tjhsst, the #1 high school in america \n\na $25,000 ‘donation’ to meet with parents - via zoom?! watch ⤵️\n\n#winwithglenn #vagov https://t.co/kdk0onn4jg",1.4546670121903964e+18,2021-10-31 04:30:16,0.001104,1.250228e-07,0.998896,4,1
4,4,4,#virginia democrats voted to allow schools to refrain from reporting sexual battery in 2020: #vagov #vapol #parentsmatter #winwithglenn #fairfax #loudouncounty #nova #swva #nrv https://t.co/pb2zvs8tgi,1.448912698733998e+18,2021-10-15 07:24:41,0.00056,2.347651e-07,0.99944,4,1
5,5,4,@lion_politics #winwithglenn,1.446912583991169e+18,2021-10-09 18:56:57,7e-06,8.902466e-09,0.999993,4,1
6,6,4,"#winwithglenn\namericans, ""alabama pastor who raped, impregnated 14-year-old won't face jail, and not registered sex offended""\n\nrepubnant leaders want to bring back legalized rape😡 https://t.co/lx9lvc4s1m",1.451700983965311e+18,2021-10-23 00:04:20,0.0919,1.454404e-05,0.908086,4,1
7,7,4,@lisamarieboothe #winwithglenn \n#vagov \n#parentsforyoungkin \n#parentsmatter \n\n🇺🇸♥️♥️♥️🇺🇸,1.45555284640179e+18,2021-11-02 15:10:16,0.001579,1.420811e-10,0.998421,4,1
8,8,4,i can feel the energy and momentum for glenn youngkin all the way here in michigan. praying for you and your team @glennyoungkin! #winwithglenn,1.455555197846106e+18,2021-11-02 15:19:36,0.005027,8.958877e-08,0.994973,4,1
9,9,4,"only 1⃣7⃣ days and 3⃣ more saturdays until election day, and we are making the most of it here in springfield. i talked to a voter who has been a democrat for a long time and is excited to #winwithglenn this november.\n\nwant to join us? feel free to dm me or reply to this post! https://t.co/asb6a6xpaj",1.4494938290910085e+18,2021-10-16 21:53:53,0.000422,1.57414e-05,0.999562,4,1
