<center>
<h1>Chapter Nine</h1>
</center>

<hr>

* Semi-deep dive into our first machine learning algorithm, Logistic Regression.

* Bring in idea of cross-validation.

* Start to look at "explainable" AI using LIME.

#I. Set-up our data and splits




##Set-up

First bring in your library.

In [None]:
github_name = 'smith'
repo_name = 'cis423'
source_file = 'library.py'
url = f'https://raw.githubusercontent.com/{github_name}/{repo_name}/main/{source_file}'
!rm $source_file
!wget $url
%run -i $source_file

In [None]:
#this should be in your library but just making sure
titanic_variance_based_split = 107
customer_variance_based_split = 113

In [None]:
url = 'https://raw.githubusercontent.com/fickas/asynch_models/refs/heads/main/datasets/titanic_trimmed.csv'

titanic_trimmed = pd.read_csv(url)

In [None]:
titanic_features = titanic_trimmed.drop(columns='Survived')
titanic_features.head()  #print first 5 rows of the table

In [None]:
labels = titanic_trimmed['Survived'].to_list()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(titanic_features, labels, test_size=0.2, shuffle=True,
                                                    random_state=titanic_variance_based_split, stratify=labels)

In [None]:
X_train.head()

In [None]:
y_train[:10]

##Remember want to transform train and test separately

To avoid data leakage.

In [None]:
%%capture
X_train_transformed = titanic_transformer.fit_transform(X_train, y_train)

In [None]:
X_train_transformed.head()

In [None]:
%%capture
X_test_transformed = titanic_transformer.transform(X_test)

In [None]:
X_test_transformed.head()

## Convert to numpy

Many (but not all) of the machine learning algorithms we use will choke on a dataframe. Luckily, it is easy to convert to `numpy` matrix form, which they all accept. But note we lose the column names. Now just have a plain old matrix.

In [None]:
X_train_numpy = X_train_transformed.to_numpy()
X_test_numpy = X_test_transformed.to_numpy()
y_train_numpy = np.array(y_train)
y_test_numpy = np.array(y_test)

#Could add new step to pipeline but will not

I wanted to mention this pretty cool `sklearn` transformer. It is called the `FunctionTransformer`, and it is an empty vessel. You pass it the function you want called and it is called to get the return value.

Check it out below. I want to convert a dataframe (X) to a numpy matrix.

In [None]:
from sklearn.preprocessing import FunctionTransformer

#here is my custom function. Has to have this signature line.
def numpy_converter(X, y=None):
  assert isinstance(X, pd.core.frame.DataFrame)
  return X.to_numpy()

#Now I pass the function in
numpy_transformer = FunctionTransformer(numpy_converter)

# Plug into a pipeline
p = Pipeline(steps=[('numpy', numpy_transformer)])

p.transform(X_test_transformed)

I won't add it as a new pipeline step but I could.

#II. We will be using Logistic Regression

We have seen it before but now let's look at it in more detail.

When you google you may come across this first:

<img src='https://www.dropbox.com/s/oqhwz22r6gfbvvc/Screen%20Shot%202022-02-02%20at%202.04.45%20PM.png?raw=1'>

That is not the one we want! Confusing. But the one we want is this one. Note it has CV added on to end. More on that shortly.

<img src='https://www.dropbox.com/s/yln5e02q1gfiq9a/Screen%20Shot%202021-09-24%20at%2011.41.40%20AM.png?raw=1' height=200>

I count 17 parameters! All have default values. I'm going to suggest we use those default parameters. I have found them to work ok in past. And frankly, it would likely take another chapter or two to delve into all of the options offered, e.g., see the [User Guide](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression). [And this article discusses the solvers](https://towardsdatascience.com/dont-sweat-the-solver-stuff-aea7cddc3451). So I'm going with defaults with exceptions: (a) I'll set the `random_state` so we get repeatable results; (b) I'll set epochs to 5000 using `max_iter`.

I'll also set `cv`  (`cv=5`). It stands for *Cross Validation* and I'll explain that a little later.

In [None]:
from sklearn.linear_model import LogisticRegressionCV
model = LogisticRegressionCV(cv=5, random_state=1, max_iter=5000)

##A brief word about the `Cs` parameter

I know I said I would skip most parameters, but this one is kind of key. It is part of what is called *Regularization* in regression problems. And it crops up in Neural Nets as well. The general idea is we are worried about weight outliers. If there is noise in our training set, we might end up with weights that are overfitted to that noise. Weight overfitting leads to large values for some weights.

For instance, if we end up with a large weight for `Married`, then we are saying `Married` is really important and is outweighing other columns, e.g., `Gender`. This could be true: `Married` is the key parameter and should carry a high weight. But more often it is noisiness in our training set that is making `Married` so important. This can give us poor accuracy on the test set.

So what the regression folks decided was to add a "Regularization function", `R`, to the loss. So we now have this:

<img src='https://www.dropbox.com/s/vus4cm5mmv7fw5c/Screen%20Shot%202021-09-27%20at%203.38.41%20PM.png?raw=1' height=50>

The general idea is that we want to penalize large weights with the new term on the right. In essence, we are now trying to minimize both the loss function (as usual) but also that new term. And that new term will be large if some set of weights are large. Hence we will penalize large weights during gradient descent.

The `R` function itself can take on several forms, the most common uses the `L2` or *Euclidean norm*. You can see that is the default under the `penalty` parameter.

<img src='https://www.dropbox.com/s/9f7x156k5stb2k5/Screen%20Shot%202021-09-27%20at%203.45.21%20PM.png?raw=1' height=50>

So `||wi||`, the L2 norm, says square all `wi`, sum then take the square root. The R function then squares that  (so undo the square root in essence). I hope you can see that large weights will be magnified by squaring them. So I believe we have our function that will give us large values for large weights.

That leaves that `lambda` term. It allows you to choose how much value to place on `R`. Maybe you are ok with large weights because you believe your data is not noisy and if `Married` should be key, let its weight be big. Then make lambda small. The smaller the lambda, the less weight size will factor into gradient descent. In other words, weights will not be penalized for growing large.

The cool thing is that the model will search for a lambda that makes the most sense for the data we have.
Not unlike Box Cox, the model will search through a set of lambda values for you. You get to choose how many using the
`Cs` paramater. The general idea is that the algorithm will try different values for `lambda`, building a separate model for each value. So it does a search for you, looking for best value. With `Cs`, if you give an integer `n`, the algorithm will build a list of lambda candidates of size `n` with values it chooses for you: in a logarithmic scale between `1e-4` and `1e4`. You can see the default is `10`, so it will build a list of `10` candidates. And you will get `10` different models built, looking for best value of lambda out of `10`.

The final twist is the values in `Cs` are `1/lambda`, so the inverse of lambda above or *reverse regularization strength*.

The big takeaway is what you are doing with this parameter. If you really trust that your  data is non-noisy, then you want to minimize `lambda` values. Or maximize `Cs` values given they are `1/lambda`. If you are worried your training data is noisy, then the opposite: increase `lambda` values (or decrease `Cs` values which is inverse).

Sorry, I think it is important to know given it is baked into this algorithm and linked strongly to regression, and deep learning with neural nets for that matter.

I am going to let the model do the search for us using 10 possible reverse lambda values.

And BTW, I really like this set of slides for going through the topic in more detail: [slides](https://cmci.colorado.edu/classes/INFO-4604/files/slides-6_regularization.pdf).

###Another name for the L2 style of regularization is Ridge Regression

[Here is short article if interested](https://medium.com/@minions.k/ridge-regression-l1-regularization-method-31b6bc03cbf).

##At this point we have the algorithm

We have done no training yet. That comes next. So this is the general process we will follow. First build model. Then train. Then test.

## Training options

We have a `model`. Let's look at the methods available.

<img src='https://www.dropbox.com/s/hvfscyjzdcwe4hy/Screen%20Shot%202021-09-24%20at%2011.21.32%20AM.png?raw=1' height=200>

The method of interest to us for training is `fit` which has 2 required arguments, `X` and `y` and an optional `sample_weight` argument. All of these are expected to be numpy arrays and all have the same length. We won't be weighting samples so can ignore the optional argument.

In [None]:
model.fit(X_train_numpy, y_train_numpy)

###The `model` is an object

Notice I did **not** do this:

<pre>
model = model.fit(...)
</pre>

The `model` is an object and when I call its methods, it keeps track of data internally.

#III. Cross-validation
<img src='https://www.dropbox.com/s/9fcc1crlxp19ijt/major_section.png?raw=1' width='300'>

Remember when I set `cv=5`? I was asking for 5 "folds".
What the heck is a "fold"? And why 5 of them? I think it is easier to introduce the idea of cross-validation (CV) more broadly first. The general idea is that we do not want to wait until testing to see our results. We want to test during training! We don't touch the test set. It remains the gold-standard. Instead, we break off pieces of the training set as a "validation" set. It is bascially a test set used during training. Here is a picture. While it is titled `Total dataset`, it should be `Total training dataset`.

<img src='https://i.imgur.com/9k60cVA.png'>

Why are there 5 rows, called Experiments in the diagram? We decided that we wanted to do this validation split 5 times. Where did 5 come from? Well, I set `cv=5`. Yet another hyperparameter.

How it works. I first divide up the training set into 5 folds. Each fold is just a slice of the traing data. So I'll have 5 slices.

Next, I build the model and train on slices 2 through 4 conjoined into a training set then test on slice 1. I remember the accuracy I got.

I repeat the process 4 more times, each time with a new model and a new training set and a new slice I use for testing.

At the end, I average all 5 accuracies and that is my final result.

##Yet another hyperparameter: fold distribution

If I look under the description of the cv parameter, I see this:
<pre>
cv int or cross-validation generator, default=None

The default cross-validation generator used is Stratified K-Folds.
</pre>
What is `Stratified K-Folds`?
The way we take the folds is a choice. We could use a simple CV where I just slice things up sequentially into folds. A more sophisticated approach would try to choose folds that carry the same distribution as the entire set. This is called "stratified" k-fold. See below.

<img src='https://i.stack.imgur.com/B9CCp.png'>

This is what we get by default. Cool.

And it is quite similar to what `train_test_split` gives us for the target column.

#IV. Back to the `Cs` parameter and cross-validation

The cool thing about `LogisticRegressionCV` is that it uses cross-validation to help to search for best `Cs` value. So for each fold, it tries each `Cs` value. Let's say we have a list of 10 alternatives for `Cs` - we let the model choose these 10 for us. Then for each fold, we will build 10 separate models, one for each `Cs` value. And record the accuracy of each of those models. If `cv=5`, then we will end up building 50 models: `5 x 10`, right? And compute the 10 accuracies for each of 10 models per fold.

This table may help. The rows are folds and the columns are alternative `Cs` values.

Note we are switching over to accuracy here and away from loss or error.


In [None]:
scores_df = pd.DataFrame(model.scores_[1], columns=range(1,11))  #accuracies across folds and Cs values
scores_df


The model then chooses the `Cs` value that performs the best, on average, across the 5 folds. This value can be seen below.


In [None]:
scores_df.describe().T

It looks like 7 is the best. Differs from video.

In [None]:
model.Cs_  #the 10 alternatives that were tried - 2.15443469e+01 is the 7th

The first among the tied best is chosen.

In [None]:
model.C_  #final Cs chosen agrees with 7th as best

Remember this is the reverse-lambda value so actually `0.046415888336814703`.

In [None]:
1/21.5443469

###We can also see the expected accuracy

Remember this was the original purpose we stated for cross-validation: give a more realistic accuracy for the training set. From table above, we can see that is `0.736190`.

Is this the highest accuracy from the table? No. We have values close to `.8` on the first fold. The argument is that that is not realistic. It rests on a specific way we divided the data. What we want is the average across all folds. It is not unlike what we did with using variance to chose train-test split. We do not want a "special" split that gives us fabulous results. We want a split that gives us the average variance. The argument is that by using special splits we are falling  into magical thinking: we will do better than reality when we actually put our models into production with new and unseen data.

##We will switch from loss to accuracy

From now on, we will be looking at some form of accuracy. So instead of errors, what did we get correct.

The score method simply counts how many times we predicted correctly out of all predictions made. Simple accuracy, which we will see later is typically too crude a measure.


In [None]:
model.score(X_train_numpy, y_train_numpy)  #simple accuracy

We were correct `74%` of the time on the training set in predicting `Survived` value based on 6 feature values.

How about now on the test set.

In [None]:
model.score(X_test_numpy, y_test_numpy)

This is not bad. Typically, the test score is lower and often much lower than training. Having two scores roughly equal signals that we may be avoiding overfitting, a huge problem that we will discuss later.

##We can see the weights `wi`

In last chapter we only had one feature column so only one weight. Now we have 6 feature columns so 6 weights:
<pre>
yraw = w1*f1 + w2*f2 ... + w6*f6 + b
</pre>
We can see the final weights produced for the features.

In [None]:
list(zip(X_train_transformed.columns.to_list(), model.coef_[0]))

Think about this for a minute. We know that negative values for `yraw` will turn into predictions of 0 (or perished) once sent through the `sigmoid` function. Vice versa for positive values.

We can see that one of the highest survival weights belongs to `Joined` at `3.1`. But `Age` has a moderate negative weight; in essence, the higher the Age the more likekly to perish.

We will get back to this in a more principled way later.

#V. Probability output

Let's say I want to know a bit more about predictions.
Here is what we get with the `predict` method.

In [None]:
yhat = model.predict(X_test_numpy)
yhat[:10]

I can get accuracy on my own, right?

In [None]:
sum([a==b for a,b in zip(yhat, y_test_numpy)])/len(yhat)  #accuracy

##Let's take even more control

I am going to ask for the output from the sigmoid function, itself. It should be a value between 0 and 1. I am going to treat that as a probability that the yhat value is 1. Check it out.

In [None]:
yprob = model.predict_proba(X_test_numpy)  #output from sigmoid as pair (perished, survived)
yprob[:10]

I actually get a pair of values, the 0 prob and the 1 prob. I only want the 1 prob.

In [None]:
yprob = yprob[:,1]  #grab the 2nd (1) column
yprob[:10]

Now I will create the yhat array. I'll use a threshold of .5: less that or equal to that is perished (0) and greater is survived (1).

In [None]:
threshold = .5  #might want to explore different values - see below
yhat2 = [0 if v<=threshold else 1 for v in yprob]
all(yhat2==yhat)  #yhat is what we got from model.predict

In [None]:
sum([a==b for a,b in zip(yhat2, y_test_numpy)])/len(yhat2)  #accuary on test set

#VI. We can explore that threshold!

You may be asking why go through all this trouble when we could just use the `predict` method to get predictions of `0` and `1`. The reason is that we may want to move that threshold of `.5`.

##The "value" of correctly predicting 1

We can think about this for the Titanic, although it is admittedly a bit academic. If we had a time machine and could take our logistic model back with us, we could interview passengers as they board. We could ask our model if it predicts they will survive. It seems there are 4 possibilities:

1. We predict they survive and they do survive, i.e., we predict 1  and it was a 1. Good for us (and them).

2. We predict they survive and they perish. Kind of bad. In fact, maybe the worst.

3. We predict they perish and they do perish. Again, good for us.

4. We predict they perish and they survive. So they don't board even though they could have and survived. Inconvenience.

We will look at these 4 cases more later. But for now think about the 2 errors we can make above. I postulate that we may way want to make sure we avoid #2. And are much more accepting of #4. One way to do that is to increase the threshold. What if I increase from .5 to .8? We will likely make fewer #2 errors at the cost of making more #4 errors.

This whole idea of moving the threshold around has a formal basis in machine learning. We will get to it later.

<img src='https://www.dropbox.com/s/8x575mvbi1xumje/cash_line.png?raw=1' height=3 width=500><br>
<img src='https://www.gannett-cdn.com/-mm-/56cbeec8287997813f287995de67747ba5e101d5/c=9-0-1280-718/local/-/media/2018/02/15/Phoenix/Phoenix/636542954131413889-image.jpg'
height=50 align=center>


Compute accuracy when lower threshold to `.2` using `yprob` we have already computed.

I get `0.4790874524714829`.
`.


In [None]:
#reminder that yprob holds probablity predictions for a 1 value
print(yprob[:10])

In [None]:
#compute new yhat3 using threshold of .2
threshold = .2
yhat3 =


In [None]:

print(yhat3[:10])  #[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]   a lot more 1 values

In [None]:
sum([a==b for a,b in zip(yhat3, y_test_numpy)])/len(yhat3)  #accuracy 0.4790874524714829

##It went down

Here is a question: is every threshold other than .5 guaranteed to be worse?

That would be a big no. It depends on the dataset. We will see where we get better results with thresholds different than `.5`. We will also see that "results" are nuanced. We may wish to look beyond simple accuracy.

In following chapters we will explore the threshold value and how it impacts the goodness of our model's output.

#VII. Lime Explainer

We are at a spot that we can get ready to explain our predictions in our web server. We will still have to do more work in the server code, but we can capture some useful information now and save it to file.

I have chosen to demo explantation with Lime. I like Lime because it works with any model, e.g., LogisticRegression, XGBoost, etc. I dislike Lime because its explanations can be hard to align with the actual predictions coming from our models.

First let's get set up then discuss.

In [None]:
%%capture
!pip install lime

In [None]:
import lime
from lime import lime_tabular

In [None]:
feature_names  = X_train_transformed.columns.to_list()
print(feature_names)

###Set up the explainer before using in server

We only have to do this once here. Then save to file.

In [None]:
explainer = lime.lime_tabular.LimeTabularExplainer(X_train_numpy,
                    feature_names=feature_names,
                    training_labels=y_train_numpy,
                    class_names=[0,1], #Outcome values
                    verbose=True,
                    mode='classification')

##Saving

I'll leave it as a challenge to save the explainer to GitHub. We can later load it into our server.

In [None]:
!pip install dill
import dill as pickle
with open('lime_explainer.pkl', 'wb') as file:
    pickle.dump(explainer, file)


###You should now see this in your Colab folder

<img src='https://www.dropbox.com/scl/fi/7sbhuagywvjgdqe9tlriv/Screenshot-2025-01-16-at-10.26.01-AM.png?rlkey=9oce6z30a5virrvo6m02pqvy7&raw=1' height=200>

Download the pkl file to your computer then upload it to your GitHub repository. We will need it later.

##Small example

Lime accepts a single row/sample as input and explains how it believes each feature contributes to a prediction of 0 or 1. My problem is with these explanations, and in particular the weights it shows us. Let's check it out.

I'll make up a new passenger/row/sample then ask for an explantation. Remember, this is to be used with unlabeled rows - we do not know what the real classification is.

In [None]:
#['Age', 'Gender', 'Class', 'Married', 'Fare', 'Joined']
new_row = np.array([.25, 0, 0, 0, .26, .4])
logreg_explanation = explainer.explain_instance(new_row, model.predict_proba, num_features=len(feature_names))

###Model agnostic

Notice I used `model.predict_proba` as a parameter. This can be any model we build! So it takes the model in as an argument.

###Probabilities

These are the probabilities that Lime comes up with for `new_row`. You should not view that as alternatives to an actual model output.

In [None]:
logreg_explanation.predict_proba  #perishing vs surviving - predicting perished at .93

##A graphical view

The real value of Lime to me is the middle chart. It is providing the "strength" of each feature in swaying the prediction one way or the other. These are not probabilities and do not add up to 1.

What I read from chart below is that the person being Male (0) was a big contributor to them not surviving. The person being Married (0<) has a small contribution to them surviving.

My take away is that I can see how each feature is contributing in relative strength but can't use as absolute strength.

In [None]:
logreg_explanation.show_in_notebook()

##Here is what we will use on the server

When we build our actual web-server, I'll display this simpler list.

In [None]:
logreg_explanation.as_list()

You can see that values on left of chart have simply been made negative in the pair of values in the list form. In essence, the only feature "voting" for survival (i.e., is positive) is Married.  All other features have a negative (perished) vote. This is what we will show the user of our system for an explanation of how features affect the prediction a model produces.

##chatGPT

For what it is worth, I gave chatGPT (4) a try to find how it would explain the pecularities of Lime. See this link for my chat results: https://chat.openai.com/share/579f2751-e482-48cd-8ae4-ba868c16e042.

#Challenge 1
<img src='https://www.dropbox.com/s/3uyvp722kp5to2r/assignment.png?raw=1' width='300'>

Save `lime_explainer.pkl` (should be in your folder on left) on your GitHub site. You will need to read it in later.



#Challenge 2
<img src='https://www.dropbox.com/s/3uyvp722kp5to2r/assignment.png?raw=1' width='300'>

Let's define a data set-up function to avoid tedium of the steps we normally have to do manually. I'll give you the signature line.

It should return: `x_train_numpy, x_test_numpy, y_train_numpy,  y_test_numpy`



In [None]:
def dataset_setup(original_table, label_column_name:str, the_transformer, rs, ts=.2):
  #your code below


###Test it

Note you will need to have `titanic_transformer` defined for this test.

In [None]:
%%capture
x_train_numpy, x_test_numpy, y_train_numpy,  y_test_numpy = dataset_setup(titanic_trimmed, 'Survived',  titanic_transformer, titanic_variance_based_split)

In [None]:
x_train_numpy[:2]

<pre>
array([[ 0.78947368,  1.        ,  1.        ,  0.40075188,  0.        ,
        -0.26086957],
       [-1.31578947,  0.        ,  1.        ,  0.40075188,  0.        ,
         0.60869565]])
</pre>

In [None]:
print(y_train_numpy[:10])  #[0 0 1 1 0 1 0 1 1 0]

#Challenge 3
<img src='https://www.dropbox.com/s/3uyvp722kp5to2r/assignment.png?raw=1' width='300'>

Let's define two new setup functions that are specific to Titanic and Customer datasets. They "wrap" the more general function from challenge 1.





In [None]:
def titanic_setup(titanic_table, transformer=titanic_transformer, rs=titanic_variance_based_split, ts=.2):


In [None]:
def customer_setup(customer_table, transformer=customer_transformer, rs=customer_variance_based_split, ts=.2):


##Go ahead and test Titanic

In [None]:
%%capture
x_train_numpy, x_test_numpy, y_train_numpy, y_test_numpy = titanic_setup(titanic_trimmed)

In [None]:
x_train_numpy[:2]

<pre>
array([[ 0.78947368,  1.        ,  1.        ,  0.40075188,  0.        ,
        -0.26086957],
       [-1.31578947,  0.        ,  1.        ,  0.40075188,  0.        ,
         0.60869565]])
</pre>

In [None]:
print(y_train_numpy[:10])  #[0 0 1 1 0 1 0 1 1 0]

##Now test Customers

In [None]:
url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQPM6PqZXgmAHfRYTcDZseyALRyVwkBtKEo_rtaKq_C7T0jycWxH6QVEzTzJCRA0m8Vz0k68eM9tDm-/pub?output=csv'

In [None]:
customers_df = pd.read_csv(url)
customers_trimmed = customers_df.drop(columns='ID')  #this is a useless column which we will drop early
customers_trimmed = customers_trimmed.drop_duplicates(ignore_index=True)  #get rid of any duplicates
customers_trimmed.head()

In [None]:
%%capture
x_train_numpy, x_test_numpy, y_train_numpy,  y_test_numpy = customer_setup(customers_trimmed)

In [None]:
x_train_numpy[:2]

<pre>
array([[ 0.4       ,  1.        , -0.08922509,  0.        ,  0.25708227,
        -0.21333333],
       [ 0.        ,  1.        , -0.25675277,  1.        ,  0.27826316,
        -0.6       ]])
</pre>

In [None]:
print(y_train_numpy[:10])  #[0 0 0 1 1 0 1 0 0 0]

#Challenge 4
<img src='https://www.dropbox.com/s/3uyvp722kp5to2r/assignment.png?raw=1' width='300'>

Add these 3 functions to your library. We can use them in following chapters.



#Challenge 5
<img src='https://www.dropbox.com/s/3uyvp722kp5to2r/assignment.png?raw=1' width='300'>

Go ahead and use `LogisticRegressionCV` to compute training and testing accuracy for the Customer dataset using a threshold of `.5`.



##Step 1. Train

In [None]:
from sklearn.linear_model import LogisticRegressionCV
model = LogisticRegressionCV(cv=5, random_state=1, max_iter=5000)

In [None]:
#training code



In [None]:
#How well do predicting from training?

model.score(x_train_numpy, y_train_numpy)  #0.8245838668373879  _ changes slightly on different runs, unclear why

##Step 2. Test

In [None]:
#produce probabilities

yprob = model.predict_proba(x_test_numpy)
yprob[:10]

Ditto. Slight changes on different runs.
<pre>
array([[0.88908012, 0.11091988],
       [0.84899054, 0.15100946],
       [0.9934116 , 0.0065884 ],
       [0.52382151, 0.47617849],
       [0.65730416, 0.34269584],
       [0.1207584 , 0.8792416 ],
       [0.59773494, 0.40226506],
       [0.12899046, 0.87100954],
       [0.34580037, 0.65419963],
       [0.71286386, 0.28713614]])
</pre>

In [None]:
#carve off 2nd column

yprob = yprob[:,1]  #grab the 2nd column
yprob[:5]  #array([0.11091988, 0.15100946, 0.0065884 , 0.47617849, 0.34269584])

Now create the yhat array using threshold of .5. Use it to compute overall accuracy.

In [None]:
#your code

threshold = .5
yhat = [0 if v<=threshold else 1 for v in yprob]
sum([a==b for a,b in zip(yhat, y_test_numpy)])/len(yhat)  #accuracy np.float64(0.8112244897959183)


Dropped a bit on test set.