# Mineral Resource Data Exploration

### Goal

For this notebook, the main goal is to use an ML model to find predictions within the MRDS data. The intention is to find out if it can run in its entirety without errors, get accuracy counts, and find out next steps.

### Sources

The CSV file that will be used in this notebook comes from the [Mineral Resouces Data System](https://mrdata.usgs.gov/mrds/) website. Specifically, the CSV (single) was used, due to the flattened contents and the smaller file size.

Some discovery work was performed, and `sklearn` was chosen over `keras` due to the simplistic API's that can be used, and the fact that layers don't need to be added to the neural networks for the ML model to run.

### Analysis

To begin, the required packages were loaded in and the csv was converted into a Pandas Dataframe.

In [2]:
import pandas as pd
import numpy as np
import pickle
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

pd.set_option('display.max_columns', None)

In [3]:
mrds_csv = pd.read_csv('../data/mrds.csv')

  mrds_csv = pd.read_csv('../data/mrds.csv')


In [4]:
mrds_csv.shape

(304632, 45)

In [5]:
mrds_csv.dtypes

dep_id          int64
url            object
mrds_id        object
mas_id         object
site_name      object
latitude      float64
longitude     float64
region         object
country        object
state          object
county         object
com_type       object
commod1        object
commod2        object
commod3        object
oper_type      object
dep_type       object
prod_size      object
dev_stat       object
ore            object
gangue         object
other_matl     object
orebody_fm     object
work_type      object
model          object
alteration     object
conc_proc      object
names          object
ore_ctrl       object
reporter       object
hrock_unit     object
hrock_type     object
arock_unit     object
arock_type     object
structure      object
tectonic       object
ref            object
yfp_ba         object
yr_fst_prd    float64
ylp_ba         object
yr_lst_prd    float64
dy_ba          object
disc_yr       float64
prod_yrs       object
discr          object
dtype: obj

I noticed that there are some `float64` objects, so I wanted to know if using the subset of that type and dropping the `na` values to see if there would still be a good amount of data that can be used.

In [6]:
mrds_csv[['dep_id','latitude','longitude','yr_fst_prd','yr_lst_prd','disc_yr']].dropna().shape

(2195, 6)

Doing this will drop over 99% of the data, so I decided it would really affect the results that I was looking to get out of this and decided to not pursue this. Instead, the subset that was chosen are all the rows that fall within the state of Colorado. If this subset we to give out a good prediction with the model, then the idea would be to extend it to the rest of the data and have it hopefully give a good prediction accuracy.

### Ore prediction

For this test, the `ore` column will be used as the target, with `latitude` and `longitude` as the features that will fed in as the inputs. since the ore types are delimited by commas, a new column was which holds the split array with different ores for that row.

In [7]:
df =(
    mrds_csv[['latitude','longitude','state','country','ore']]
    .assign(ore_array = lambda row: row['ore'].str.split(pat=','))
    .query("state == 'Colorado'")
    .dropna()
)

In [8]:
df

Unnamed: 0,latitude,longitude,state,country,ore,ore_array
475,38.58332,-108.80068,Colorado,United States,Carnotite,[Carnotite]
756,38.68331,-109.03402,Colorado,United States,Carnotite,[Carnotite]
921,38.58332,-108.91735,Colorado,United States,Carnotite,[Carnotite]
935,38.66666,-108.93401,Colorado,United States,Carnotite,[Carnotite]
1085,38.14999,-108.86735,Colorado,United States,"Carnotite, Uraninite","[Carnotite, Uraninite]"
...,...,...,...,...,...,...
302137,37.37720,-104.97750,Colorado,United States,"Galena, Sphalerite, Pyrite, Tetrahedrite","[Galena, Sphalerite, Pyrite, Tetrahedrite]"
302144,38.15417,-105.40861,Colorado,United States,"Galena, Sphalerite","[Galena, Sphalerite]"
302158,37.48364,-107.47734,Colorado,United States,"Chalcocite, Malachite, Azurite, Siderite","[Chalcocite, Malachite, Azurite, Siderite]"
302163,37.69124,-105.51941,Colorado,United States,Chalcopyrite,[Chalcopyrite]


Now that there is an array that can be used, we can use the `MultiLabelBinarizer` from the `sklearn` package. The way this works is similar to a one-hot encoding technique, but because each row has multiple values that it can encode will be captured. The matrix that is outputted is still fairly sparse which means it should be useful for the multi valued prediction desired. To ensure that the labels are unique, the whitespace will be stripped in each value.

In [9]:
ore_array = df['ore_array'].tolist()
ore_array = [[x.strip() for x in l] for l in ore_array]
mlb = MultiLabelBinarizer()
mlb_arr = mlb.fit_transform(ore_array)
mlb_cols = mlb.classes_

In [10]:
multi_label_df = pd.DataFrame(mlb_arr, columns=mlb_cols)

In [11]:
multi_label_df

Unnamed: 0,Acanthite,Alabandite,Albite,Allanite,Altaite,Alunite,Amalgam,Amazonstone,Amblygonite,Analcime,Anatase,Ancylite,Andalusite,Anglesite,Ankerite,Annabergite,Apatite,Aquamarine,Aragonite,Argentite,Arsenopyrite,Ash,Astrophyllite,Aurichalcite,Autunite,Azurite,Barite,Bastnaesite,Bentonite,Bertrandite,Beryl,Biotite,Bismuth,Bismuthinite,Bismutite,Bornite,Bournonite,Braunite,Brochantite,Bromyrite,Calamine,Calaverite,Calciovolborthite,Calcite,Carnotite,Cassiterite,Cerargyrite,Cerussite,Chalcanthite,Chalcedony,Chalcocite,Chalcopyrite,Chlorite,Chromite,Chrysoberyl,Chrysocolla,Cinnabar,Clay,Cleavelandite,Coal,Coffinite,Coloradoite,Columbite,Conichalcite,Copper,Corundum,Covellite,Cryolite,Cuprite,Dawsonite,Diaspore,Digenite,Dolomite,Electrum,Enargite,Erythrite,Euclase,Euxenite,Feldspar,Ferberite,Fergusonite,Ferrimolybdite,Fluorite,Fluorspar,Freibergite,Gadolinite,Gahnite,Galena,Garnet,Gilsonite,Gneiss,Goethite,Gold,Granite,Graphite,Gravel,Greenockite,Gypsum,Hausmannite,Helium,Hematite,Hemimorphite,Hessite,Huebnerite,Hydrozincite,Ilmenite,Ilsemannite,Iron,Jalpaite,Jamesonite,Jarosite,Jordisite,Kaolin,Kaolinite,Krennerite,Kyanite,Lead,Lepidolite,Limestone,Limonite,Magnesite,Magnetite,Malachite,Manganese Ox-Hydrous,Manganite,Marble,Marcasite,Marmatite,Melonite,Mercury,Miargyrite,Mica,Microcline,Microlite,Molybdenite,Molybdite,Monazite,Montroseite,Muscovite,Nahcolite,Niccolite,Ocher,Oligoclase,Orthoclase,Pascoite,Pearceite,Peat,Pegmatite,Pentlandite,Perovskite,Petzite,Phenacite,Phosgenite,Pitchblende,Plagioclase,Plumbojarosite,Polybasite,Powellite,Proustite,Psilomelane,Pumice,Pyrargyrite,Pyrite,Pyrochlore,Pyrolusite,Pyromorphite,Pyrrhotite,Quartz,Rhabdophane,Rhodochrosite,Rhodonite,Rhyolite,Rickardite,Riebeckite,Roscoelite,Rutile,Samarskite,Sand,Sand and Gravel,Sandstone,Scheelite,Scoria,Sericite,Siderite,Sillimanite,Silver,Slate,Smithsonite,Specularite,Sphalerite,Sphene,Spodumene,Stephanite,Stibnite,Stromeyerite,Sulfur,Sylvanite,Sylvite,Synchisite,Tantalite,Telluride,Tennantite,Tenorite,Tetradymite,Tetrahedrite,Thalenite,Thorianite,Thorite,Titaniferous Magnetitite,Topaz,Torbernite,Tourmaline,Travertine,Trona,Tuff,Turgite,Turquoise,Tyuyamunite,Uraninite,Uranophane,Vermiculite,Volborthite,Wad,Willemite,Wolframite,Wollastonite,Wulfenite,Wurtzite,Xenotime,Zinc,Zinkenite,Zippeite,Zircon
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4656,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4657,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4658,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4659,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now that the labels have been encoded, the longitude and latitude columns will be converted to a numpy array so that it can be fit into the desired model.

In [12]:
X = df[['latitude','longitude']].to_numpy()

In [13]:
len(X)

4661

Now that we have the features and targets that will be used, a train and test split will be done to determine the accuracy of the model. We will split it such that 33% of the given data can be used for testing and set a seed value for reproducability.

In [14]:
seed = 1234
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, multi_label_df.to_numpy(), test_size=test_size, random_state=seed)

The model that will be used for this is the `KNeighborsClassifier`, which can naturally handle the multi label essence of the data.

In [15]:
classifier = KNeighborsClassifier(n_neighbors=3)
# train
classifier.fit(X_train, y_train)

In [16]:
# predict
predictions = classifier.predict(X_test)
# accuracy
print("Accuracy = ", accuracy_score(y_test,predictions))

Accuracy =  0.2871994801819363


As we can see the accuracy is not what we could call as useful. This could be because only 2 features were used, which logically should be significant but have not been proven to be quite yet.

### Commodity Prediction

Another target that could be used in this dataset would be the commodities. The metadata found [here](https://mrdata.usgs.gov/metadata/mrds.html) indicates that there are 3 columns. Here is the attribute definition for each one:


* `COMMOD1` - Primary commodities. Guidance given to database editors is to include in this category commodities that have a strong effect on the economics of the project, and might be economically viable as the only commodity.
* `COMMOD2` - Secondary commodities. Guidance given to database editors is to include commodities that can be economically recovered but have little effect on the economic viability of the project.
* `COMMOD3` - Tertiary commodities. Guidance given to database editors is to list commodities that are economically interesting but not economically recoverable as of the date of the source information.

For now, the decision was made to combine all of the commodities found within these columns regardless of their importance. This was done to see if the model can predict more accurately with more labels to choose from.

A function was created to combine the columns, ensuring that `NaN` was not included in the output. One thing to note is that in the metadata for these attributes indicate that the values are delimited by a hyphen, so that will be used when splitting the string rather than the traditional comma or whitespace split.

In [17]:
df =(
    mrds_csv[['latitude','longitude','state','com_type','commod1','commod2','commod3']]
    .query("state == 'Colorado'")
)

In [18]:
def combine_cols(row):
    arr =[]
    if not pd.isna(row['commod1']):
        arr.extend(str(row['commod1']).split('-'))
    if not pd.isna(row['commod2']):
        arr.extend(str(row['commod2']).split('-'))
    if not pd.isna(row['commod3']):
        arr.extend(str(row['commod3']).split('-'))
    return arr

There happens to be another feature that is related to the commodity columns used called `com_type`. This column contains 4 categories: metallic (M), non-metallic (N), both (B) or (E). There seems to be no information on what the last category is within the metadata, but those rows will be kept for now. These categories were converted to a numerical representation and was added as a feature column for use within the model.

In [19]:
commod_X = (
    df[['latitude','longitude', 'com_type']]
    .assign(cat_codes = pd.Categorical(df['com_type']).codes)
    .drop(['com_type'], axis=1)
).to_numpy()

The same steps are performed as what was done when fitting the model for the ores. Although it might not have been needed to strip the whitespace like what was needed before, this was left in just in case to make sure that all labels are unique.

In [20]:
commod_array = df.apply(combine_cols,axis=1).values.tolist()
commod_array = [[x.strip() for x in l] for l in commod_array]
commod_mlb = MultiLabelBinarizer()
commod_mlb_arr = commod_mlb.fit_transform(commod_array)
commod_cols = commod_mlb.classes_
commod_label_df = pd.DataFrame(commod_mlb_arr, columns=commod_cols)

Now that the multi label matrix has been created, we split the data, fit the training data into the model, and test for its accuracy.

In [21]:
seed = 1234
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(commod_X, commod_label_df.to_numpy(), test_size=test_size, random_state=seed)

In [22]:
commod_classifier = KNeighborsClassifier(n_neighbors=3)
# train
commod_classifier.fit(X_train, y_train)
# predict
predictions = commod_classifier.predict(X_test)
# accuracy
print("Accuracy = ", accuracy_score(y_test,predictions))

Accuracy =  0.3810667123992454


So it looks like the feature must not have been significant enough since the accuracy did not change that much (although it did improve). Some more tweaking must be done to try and improve these results before these models can be considered useful:

* It could be possible that there are some columns that can be added as features that have more significance than the ones that were used. Some more feature engineering will have to be done to see if that would help
* Maybe a different model can be used. The reason why the KNN classifiers was chosen was because the ideal would be that the latitude and longitude would be significant enough to create a distance to the target due to the nature of them being coordinates
* maybe a focus on the primary commodities found in each location will give better results, rather than combining each commodity regardless of how viable they are for that location

Further analysis will have to be done, with the hopes that the models' accuracies can be improved.

In [24]:
model_filepath = '../models/commod_classifier.pkl'
with open(model_filepath, 'wb') as f:
    pickle.dump(commod_classifier, f)

In [25]:
with open(model_filepath, 'rb') as f:
    pickled_model = pickle.load(f)

In [28]:
pickled_predictions = pickled_model.predict(X_test)
print("Accuracy = ", accuracy_score(y_test,pickled_predictions))

Accuracy =  0.3810667123992454
