# Segment 2 - Machine Learning Model Refinement 


#### Initial Draft
Utilizing a scrapped database for U.S.A.F. plane crashes during the Vietnam War, multiple supervised ML models were created to see if pilot fatalities could be predicted by the following 1) The ammunition or guns used to shoot down the plane "Defense type" 2) Mission Phase "Leaving, Returning, etc." 3) The pilot's aircraft "Aircraft Type" 4) The pilots home base "Base".  Other variables existed in the database, however many observations had null values.  Therefore, we were limited with the variables or features for our model.  

In light of our data types (All Categorical) and the lower amount of observations (1000-1500) an ensemble model appeared to be the best fit for our analysis.  To prove that, multiple sampling and and logistic regression analysis were run on the side.  

#### Refinement
To imporve our inital draft new features were added such normalized aicraft types and defense types.  Additional new variables were introduced from our database such as ejection seat(Y/N).  Lastly, the variables Base and Mission Phase were dropped in favor of numceric lognitude and latitude data.  Dropping these variables preserved enough data to meet the requirments of our analysis.  With the addition of these variables model accuracy improved from 60% to 84%.  




In [106]:
# dependencies

import pandas as pd
import psycopg2 as pg
from sqlalchemy import create_engine
from sklearn.preprocessing import StandardScaler

from imblearn.ensemble import BalancedRandomForestClassifier
from imblearn.ensemble import EasyEnsembleClassifier
from imblearn.over_sampling import RandomOverSampler
from sklearn.linear_model import LogisticRegression
from imblearn.combine import SMOTEENN
from imblearn.under_sampling import ClusterCentroids

from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import confusion_matrix
from imblearn.metrics import classification_report_imbalanced
from sklearn.model_selection import train_test_split

from collections import Counter

In [107]:

# Posgres Connection

engine = pg.connect("dbname='Capstone_Project' user='postgres' host='127.0.0.1' port='5432' password='samboest'")
usaf_df = pd.read_sql("select * from usaf_complete", con=engine)

# Confirming the dataframe was created.
usaf_df.head()

Unnamed: 0,Crash_Date,Crash_Time,Aircraft_SN,Aircraft_Type,Summarized_Name,Ejection_Seats,Base,Wing,Squadron,Mission_Type,...,Longitude,Defense_Type,Defense_Category,Pilot_Hit,Pilot_Rank,Pilot,Pilot_Egress,Pilot_Condition,Pilot_Recovered,Pilot_Status
0,1962-02-11,_,4315732,SC-47A,C-47 Skytrain,N,BHA,_,4400CCTS,,...,107.0,Gunfire (combat associated),Automatic Weapons,_,Capt,Kissam E. K.,Crash,_,_,KIA
1,1962-08-28,_,538376,T-28B,T-28 Trojan,N,_,_,_,,...,,Gunfire,Automatic Weapons,_,Capt,Simpson R. L.,_,_,_,KIA
2,1962-10-15,_,625909,U-10,U-10 Courier,N,_,_,_,,...,,Gunfire,Automatic Weapons,_,Capt,Booth H. W.,Crash,_,_,KIA
3,1962-10-16,_,538365,T-28B,T-28 Trojan,N,_,_,_,,...,,Gunfire,Automatic Weapons,_,Capt,Chambers B. L.,Ejection,Minor injuries,_,Recovered
4,1962-11-05,_,4435530,RB-26B,B-26 Invader,N,_,_,_,,...,,Gunfire,Automatic Weapons,_,Capt,Bennett R. D.,Crash,_,_,KIA


In [108]:
# iterating the columns
for col in usaf_df.columns:
    print(col)

Crash_Date
Crash_Time
Aircraft_SN
Aircraft_Type
Summarized_Name
Ejection_Seats
Base
Wing
Squadron
Mission_Type
Weapon
Target_Objective
Ceiling_Vis
Maneuver
Pass
Angle
Altitude
Airspeed
Mission_Phase
Where_Hit
Fire_Observed
Hit_Country
Loss_Country
Latitude
Longitude
Defense_Type
Defense_Category
Pilot_Hit
Pilot_Rank
Pilot
Pilot_Egress
Pilot_Condition
Pilot_Recovered
Pilot_Status


## Pre-Model Cleaning

In [128]:
# Capture capture variables for model. 

usaf_df_model = usaf_df.filter(['Crash_Date','Summarized_Name','Ejection_Seats','Latitude', 'Longitude', 'Pilot_Egress','Defense_Category', 'Pilot_Status'], axis=1)

usaf_df_model.head()

Unnamed: 0,Crash_Date,Summarized_Name,Ejection_Seats,Latitude,Longitude,Pilot_Egress,Defense_Category,Pilot_Status
0,1962-02-11,C-47 Skytrain,N,11.75,107.0,Crash,Automatic Weapons,KIA
1,1962-08-28,T-28 Trojan,N,,,_,Automatic Weapons,KIA
2,1962-10-15,U-10 Courier,N,,,Crash,Automatic Weapons,KIA
3,1962-10-16,T-28 Trojan,N,,,Ejection,Automatic Weapons,Recovered
4,1962-11-05,B-26 Invader,N,,,Crash,Automatic Weapons,KIA


In [129]:
usaf_df_model.dtypes

Crash_Date          datetime64[ns]
Summarized_Name             object
Ejection_Seats              object
Latitude                   float64
Longitude                  float64
Pilot_Egress                object
Defense_Category            object
Pilot_Status                object
dtype: object

In [130]:
# Remove rows that have string value "_".  For some reason I couldnt chanin the code to do all columns in one line

clean_nulls_df = usaf_df_model[usaf_df_model["Summarized_Name"].str.contains("_")==False]

clean_nulls_df = clean_nulls_df[clean_nulls_df["Ejection_Seats"].str.contains("_")==False]

clean_nulls_df = clean_nulls_df[clean_nulls_df["Defense_Category"].str.contains("_")==False]

clean_nulls_df = clean_nulls_df[clean_nulls_df["Pilot_Egress"].str.contains("_")==False]

usaf_cleaned_nulls = clean_nulls_df[clean_nulls_df["Pilot_Status"].str.contains("_")==False]

usaf_cleaned_nulls.head(20)

Unnamed: 0,Crash_Date,Summarized_Name,Ejection_Seats,Latitude,Longitude,Pilot_Egress,Defense_Category,Pilot_Status
0,1962-02-11,C-47 Skytrain,N,11.75,107.0,Crash,Automatic Weapons,KIA
2,1962-10-15,U-10 Courier,N,,,Crash,Automatic Weapons,KIA
3,1962-10-16,T-28 Trojan,N,,,Ejection,Automatic Weapons,Recovered
4,1962-11-05,B-26 Invader,N,,,Crash,Automatic Weapons,KIA
5,1963-02-03,B-26 Invader,N,,,Crash,Small Arms Fire,KIA
6,1963-02-06,B-26 Invader,N,,,Crash,Automatic Weapons,KIA
7,1963-04-08,B-26 Invader,N,,,Crash,Automatic Weapons,KIA
8,1963-06-27,T-28 Trojan,N,,,Crash,Automatic Weapons,KIA
9,1963-08-16,B-26 Invader,N,,,Crash,Automatic Weapons,KIA
11,1963-09-10,T-28 Trojan,N,,,Crash,AAA,Recovered


In [131]:
#Remove float nulls
usaf_cleaned_nulls = usaf_cleaned_nulls.dropna()

usaf_cleaned_nulls.head(5)

Unnamed: 0,Crash_Date,Summarized_Name,Ejection_Seats,Latitude,Longitude,Pilot_Egress,Defense_Category,Pilot_Status
0,1962-02-11,C-47 Skytrain,N,11.75,107.0,Crash,Automatic Weapons,KIA
29,1964-09-22,A-1 Skyraider,Y,9.5,105.583333,Crashlanded,AAA,KIA
30,1964-09-22,A-1 Skyraider,Y,9.5,105.583333,Ejection,AAA,Recovered
31,1964-09-26,A-1 Skyraider,Y,10.083333,106.066667,Crashlanded,Automatic Weapons,Recovered
32,1964-10-02,A-1 Skyraider,Y,9.566667,106.45,Crash,Automatic Weapons,KIA


In [133]:
#Dummy Coding

usaf_df_dummy = pd.get_dummies(usaf_cleaned_nulls, columns=["Summarized_Name", "Ejection_Seats", "Defense_Category", "Pilot_Egress"])

usaf_df_dummy.head()

Unnamed: 0,Crash_Date,Latitude,Longitude,Pilot_Status,Summarized_Name_A-1 Skyraider,Summarized_Name_A-26 Invader,Summarized_Name_A-37 Dragonfly,Summarized_Name_A-7 Corsair,Summarized_Name_AC-47 Gunship I,Summarized_Name_B-52 Stratofortress,...,Defense_Category_Mid-Air Collision,Defense_Category_Other,Defense_Category_SAM,Defense_Category_Small Arms Fire,Pilot_Egress_Crash,Pilot_Egress_Crashlanded,Pilot_Egress_Crld at AB,Pilot_Egress_Crld at UBN,Pilot_Egress_Crld at UDN,Pilot_Egress_Ejection
0,1962-02-11,11.75,107.0,KIA,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
29,1964-09-22,9.5,105.583333,KIA,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
30,1964-09-22,9.5,105.583333,Recovered,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
31,1964-09-26,10.083333,106.066667,Recovered,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
32,1964-10-02,9.566667,106.45,KIA,1,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [134]:
# Get target labels

usaf_df_dummy["Pilot_Status"].value_counts()



Recovered                  575
KIA                        337
POW (returned)             148
MIA                         54
POW (died)                   6
POW                          3
KIA (chute failure)          3
Recoverd                     2
KIA (chute failed)           1
Recovered DaNang             1
POW (died in captivity)      1
Recovered (chute fail)       1
recovered                    1
KIA,body MIA,PJ abandnd      1
Name: Pilot_Status, dtype: int64

In [135]:
# Clean Target variable pilot status - Create Binomial with dictionary.map

recovered = {'Recovered': 1, 'KIA': 0,'POW (returned)': 1,'MIA': 0,'POW (died)': 0,'POW': 1,'KIA (chute failure)': 0,'Recoverd': 1, 'u': 0, 
    'KIA (chute failed)': 0, 'Recovered DaNang': 1, 'POW (died in captivity)':0, 'Recovered (chute fail)': 1,'recovered': 1,'KIA,body MIA,PJ abandnd': 1, 'POW (died in captivity)': 0}


usaf_df_dummy["Target"] = usaf_df_dummy["Pilot_Status"].map(recovered)

usaf_df_dummy["Target"].value_counts

<bound method IndexOpsMixin.value_counts of 0       0
29      0
30      1
31      1
32      0
       ..
1518    0
1520    1
1521    0
1522    1
1523    1
Name: Target, Length: 1134, dtype: int64>

In [136]:
usaf_df_dummy['year'] = pd.DatetimeIndex(usaf_df_dummy['Crash_Date']).year

## Model Testing

### Random Forest
Random forest was chosen as our prefered model type due to higher accuracy and the fact that the majority of our data is categorical.  The primary barrier to developing a highly accurate model our lack of data.  Pilot    

In [137]:
# Create feature Variables

Y = usaf_df_dummy["Target"]

x = usaf_df_dummy.drop(["Target", "Pilot_Status", "Crash_Date"], axis=1)

x.head()

Unnamed: 0,Latitude,Longitude,Summarized_Name_A-1 Skyraider,Summarized_Name_A-26 Invader,Summarized_Name_A-37 Dragonfly,Summarized_Name_A-7 Corsair,Summarized_Name_AC-47 Gunship I,Summarized_Name_B-52 Stratofortress,Summarized_Name_B-57 Canberra,Summarized_Name_B-66 Destroyer,...,Defense_Category_Other,Defense_Category_SAM,Defense_Category_Small Arms Fire,Pilot_Egress_Crash,Pilot_Egress_Crashlanded,Pilot_Egress_Crld at AB,Pilot_Egress_Crld at UBN,Pilot_Egress_Crld at UDN,Pilot_Egress_Ejection,year
0,11.75,107.0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,1962
29,9.5,105.583333,1,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,1964
30,9.5,105.583333,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1964
31,10.083333,106.066667,1,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,1964
32,9.566667,106.45,1,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,1964


In [138]:
x.dtypes

Latitude                               float64
Longitude                              float64
Summarized_Name_A-1 Skyraider            uint8
Summarized_Name_A-26 Invader             uint8
Summarized_Name_A-37 Dragonfly           uint8
Summarized_Name_A-7 Corsair              uint8
Summarized_Name_AC-47 Gunship I          uint8
Summarized_Name_B-52 Stratofortress      uint8
Summarized_Name_B-57 Canberra            uint8
Summarized_Name_B-66 Destroyer           uint8
Summarized_Name_C-119 Flying Boxcar      uint8
Summarized_Name_C-123 Provider           uint8
Summarized_Name_C-130 Hercules           uint8
Summarized_Name_C-47 Skytrain            uint8
Summarized_Name_C-7 Caribou              uint8
Summarized_Name_F-100 Super Sabre        uint8
Summarized_Name_F-101 Voodoo             uint8
Summarized_Name_F-102 Delta Dagger       uint8
Summarized_Name_F-104 Starfighter        uint8
Summarized_Name_F-105 Thunderchief       uint8
Summarized_Name_F-111 Aardvark           uint8
Summarized_Na

In [139]:
Y.value_counts()

1    732
0    402
Name: Target, dtype: int64

In [140]:
X_train, X_test, y_train, y_test = train_test_split(x, Y, random_state=78)

In [141]:
# Creating StandardScaler instance
scaler = StandardScaler()

X_scaler = scaler.fit(X_train)

X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

In [142]:
brfc = BalancedRandomForestClassifier(n_estimators=100, random_state=1)
modle=brfc.fit(X_train_scaled, y_train)

In [143]:
# Calculated the balanced accuracy score
y_pred = brfc.predict(X_test_scaled)
balanced_accuracy_score(y_test, y_pred)

0.8347654012393897

In [144]:
# Display the confusion matrix
confusion_matrix(y_test, y_pred)

array([[ 91,  20],
       [ 26, 147]], dtype=int64)

In [127]:
# Print the imbalanced classification report
print(classification_report_imbalanced(y_test, y_pred))

                   pre       rec       spe        f1       geo       iba       sup

          0       0.78      0.69      0.91      0.73      0.79      0.62        78
          1       0.86      0.91      0.69      0.89      0.79      0.64       166

avg / total       0.84      0.84      0.76      0.84      0.79      0.63       244



In [149]:
# send to data frame, I need to figure out how to clean this up and do it properly. 

classification_df = classification_report_imbalanced(y_test, y_pred, output_dict=True)

classification_df = pd.DataFrame(classification_df).transpose()

classification_df

Unnamed: 0,pre,rec,spe,f1,geo,iba,sup
0,0.777778,0.81982,0.849711,0.798246,0.834632,0.694528,111.0
1,0.88024,0.849711,0.81982,0.864706,0.834632,0.698692,173.0
avg_pre,0.840193,0.840193,0.840193,0.840193,0.840193,0.840193,0.840193
avg_rec,0.838028,0.838028,0.838028,0.838028,0.838028,0.838028,0.838028
avg_spe,0.831503,0.831503,0.831503,0.831503,0.831503,0.831503,0.831503
avg_f1,0.83873,0.83873,0.83873,0.83873,0.83873,0.83873,0.83873
avg_geo,0.834632,0.834632,0.834632,0.834632,0.834632,0.834632,0.834632
avg_iba,0.697064,0.697064,0.697064,0.697064,0.697064,0.697064,0.697064
total_support,284.0,284.0,284.0,284.0,284.0,284.0,284.0


In [151]:
#Posgres connection - Still trying to ficure out

#from sqlalchemy import create_engine
#from config import db_password

# Code to create connection to PostgreSQL database 
#db_string = f"postgresql://postgres:{db_password}@127.0.0.1:5432/Capstone_Project"

# Create database engine
#engine = create_engine(db_string)

# Saving usaf_df to SQL table
#classification_df.to_sql(name='classification_table', con=engine, if_exists='replace')


ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'numpy.int64'
[SQL: INSERT INTO classification_table (index, pre, rec, spe, f1, geo, iba, sup) VALUES (%(index)s, %(pre)s, %(rec)s, %(spe)s, %(f1)s, %(geo)s, %(iba)s, %(sup)s)]
[parameters: ({'index': 0, 'pre': 0.7777777777777778, 'rec': 0.8198198198198198, 'spe': 0.8497109826589595, 'f1': 0.7982456140350876, 'geo': 0.8346315981931131, 'iba': 0.6945276566927084, 'sup': 111.0}, {'index': 1, 'pre': 0.8802395209580839, 'rec': 0.8497109826589595, 'spe': 0.8198198198198198, 'f1': 0.8647058823529413, 'geo': 0.8346315981931131, 'iba': 0.6986921527120719, 'sup': 173.0}, {'index': 'avg_pre', 'pre': 0.8401928537291614, 'rec': 0.8401928537291614, 'spe': 0.8401928537291614, 'f1': 0.8401928537291614, 'geo': 0.8401928537291614, 'iba': 0.8401928537291614, 'sup': 0.8401928537291614}, {'index': 'avg_rec', 'pre': 0.8380281690140845, 'rec': 0.8380281690140845, 'spe': 0.8380281690140845, 'f1': 0.8380281690140845, 'geo': 0.8380281690140845, 'iba': 0.8380281690140845, 'sup': 0.8380281690140845}, {'index': 'avg_spe', 'pre': 0.8315026334646949, 'rec': 0.8315026334646949, 'spe': 0.8315026334646949, 'f1': 0.8315026334646949, 'geo': 0.8315026334646949, 'iba': 0.8315026334646949, 'sup': 0.8315026334646949}, {'index': 'avg_f1', 'pre': 0.8387302141019493, 'rec': 0.8387302141019493, 'spe': 0.8387302141019493, 'f1': 0.8387302141019493, 'geo': 0.8387302141019493, 'iba': 0.8387302141019493, 'sup': 0.8387302141019493}, {'index': 'avg_geo', 'pre': 0.8346315981931132, 'rec': 0.8346315981931132, 'spe': 0.8346315981931132, 'f1': 0.8346315981931132, 'geo': 0.8346315981931132, 'iba': 0.8346315981931132, 'sup': 0.8346315981931132}, {'index': 'avg_iba', 'pre': 0.6970644799721094, 'rec': 0.6970644799721094, 'spe': 0.6970644799721094, 'f1': 0.6970644799721094, 'geo': 0.6970644799721094, 'iba': 0.6970644799721094, 'sup': 0.6970644799721094}, {'index': 'total_support', 'pre': 284.0, 'rec': 284.0, 'spe': 284.0, 'f1': 284.0, 'geo': 284.0, 'iba': 284.0, 'sup': 284.0})]
(Background on this error at: https://sqlalche.me/e/14/f405)

### Easy Ensemble Classifier
Random forest was perfered over Easy Ensemble 

In [82]:

# Instantiate
easy = EasyEnsembleClassifier(n_estimators=100, random_state=1)

# Fit
easy.fit(X_train_scaled, y_train)

EasyEnsembleClassifier(n_estimators=100, random_state=1)

In [83]:
y_pred = easy.predict(X_test_scaled)
balanced_accuracy_score(y_test, y_pred)

0.5866027317640221

In [84]:
confusion_matrix(y_test, y_pred)

array([[ 61,  50],
       [ 70, 116]], dtype=int64)

In [85]:
print(classification_report_imbalanced(y_test, y_pred))

                   pre       rec       spe        f1       geo       iba       sup

          0       0.47      0.55      0.62      0.50      0.59      0.34       111
          1       0.70      0.62      0.55      0.66      0.59      0.35       186

avg / total       0.61      0.60      0.58      0.60      0.59      0.34       297



## Logistic Regression and Sampling Techniques

These models led to inferior results and will not be used to predict pilot survival

Random Oversampling

In [86]:
from imblearn.over_sampling import RandomOverSampler

ros = RandomOverSampler(random_state=1)
X_resampled, y_resampled = ros.fit_resample(X_train_scaled, y_train)

Counter(y_resampled)

Counter({1: 489, 0: 489})

In [87]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(solver='lbfgs', random_state=1)

model.fit(X_resampled, y_resampled)

LogisticRegression(random_state=1)

In [88]:
from sklearn.metrics import balanced_accuracy_score

y_pred = model.predict(X_test_scaled)
balanced_accuracy_score(y_test, y_pred)

0.550566695727986

In [89]:
confusion_matrix(y_test, y_pred)

array([[ 53,  58],
       [ 70, 116]], dtype=int64)

In [90]:
# Print the imbalanced classification report
from imblearn.metrics import classification_report_imbalanced

print(classification_report_imbalanced(y_test, y_pred))

                   pre       rec       spe        f1       geo       iba       sup

          0       0.43      0.48      0.62      0.45      0.55      0.29       111
          1       0.67      0.62      0.48      0.64      0.55      0.30       186

avg / total       0.58      0.57      0.53      0.57      0.55      0.30       297



SMOTE Oversampling

In [91]:
from imblearn.over_sampling import SMOTE

X_resampled, y_resampled = SMOTE(random_state=1, sampling_strategy=1.0).fit_resample(
    X_train_scaled, y_train
)
from collections import Counter

Counter(y_resampled)

Counter({1: 489, 0: 489})

In [92]:
# Train the Logistic Regression model using the resampled data
model = LogisticRegression(solver='lbfgs', random_state=1)
model.fit(X_resampled, y_resampled)

LogisticRegression(random_state=1)

In [93]:
# Calculated the balanced accuracy score
y_pred = model.predict(X_test_scaled)
balanced_accuracy_score(y_test, y_pred)

0.5677855274629469

In [94]:
# Display the confusion matrix
confusion_matrix(y_test, y_pred)

array([[ 61,  50],
       [ 77, 109]], dtype=int64)

In [95]:
# Print the imbalanced classification report
print(classification_report_imbalanced(y_test, y_pred))

                   pre       rec       spe        f1       geo       iba       sup

          0       0.44      0.55      0.59      0.49      0.57      0.32       111
          1       0.69      0.59      0.55      0.63      0.57      0.32       186

avg / total       0.59      0.57      0.56      0.58      0.57      0.32       297



In [96]:
# Resample the data using the ClusterCentroids resampler

from imblearn.under_sampling import ClusterCentroids

cluster = ClusterCentroids(random_state=1)
X_resampled, y_resampled = cluster.fit_resample(X_train_scaled, y_train)

Counter(y_resampled)

  self.estimator_.fit(_safe_indexing(X, target_class_indices))


Counter({0: 400, 1: 400})

In [97]:
# Train Model

model = LogisticRegression(solver='lbfgs', random_state=1)
model.fit(X_resampled, y_resampled)

LogisticRegression(random_state=1)

In [98]:
# Calculated the balanced accuracy score
y_pred = model.predict(X_test_scaled)
balanced_accuracy_score(y_test, y_pred)

0.550857308921825

In [99]:
# Display the confusion matrix
confusion_matrix(y_test, y_pred)

array([[65, 46],
       [90, 96]], dtype=int64)

In [100]:
# Print the imbalanced classification report
print(classification_report_imbalanced(y_test, y_pred))

                   pre       rec       spe        f1       geo       iba       sup

          0       0.42      0.59      0.52      0.49      0.55      0.30       111
          1       0.68      0.52      0.59      0.59      0.55      0.30       186

avg / total       0.58      0.54      0.56      0.55      0.55      0.30       297



### SMOTEEN

In [101]:
from imblearn.combine import SMOTEENN

smoteenn = SMOTEENN(random_state=1)
X_resampled, y_resampled = smoteenn.fit_resample(X_train_scaled, y_train)
Counter(y_resampled)

Counter({0: 136, 1: 92})

In [102]:
model = LogisticRegression(solver='lbfgs', random_state=1)
model.fit(X_resampled, y_resampled)

LogisticRegression(random_state=1)

In [103]:
# Calculated the balanced accuracy score
y_pred = model.predict(X_test)
balanced_accuracy_score(y_test, y_pred)

  f"X has feature names, but {self.__class__.__name__} was fitted without"


0.5602295844231329

In [104]:
# Display the confusion matrix
confusion_matrix(y_test, y_pred)

array([[ 82,  29],
       [115,  71]], dtype=int64)

In [105]:
# Print the imbalanced classification report
print(classification_report_imbalanced(y_test, y_pred))

                   pre       rec       spe        f1       geo       iba       sup

          0       0.42      0.74      0.38      0.53      0.53      0.29       111
          1       0.71      0.38      0.74      0.50      0.53      0.27       186

avg / total       0.60      0.52      0.61      0.51      0.53      0.28       297

