![MLU Logo](data/MLU_Logo.png)

# <a name="0">Machine Learning Accelerator - Tabular Data - Lecture 1</a>


## Final Project 

In this notebook, we build a ML model to predict the __Time at Center__ field of our final project dataset.

1. <a href="#1">Read the dataset</a> (Given) 
2. <a href="#2">Train a model</a> (Implement)
    * <a href="#21">Exploratory Data Analysis</a>
    * <a href="#22">Select features to build the model</a>
    * <a href="#23">Data processing</a>
    * <a href="#24">Model training</a>
3. <a href="#3">Make predictions on the test dataset</a> (Implement)
4. <a href="#4">Write the test predictions to a CSV file</a> (Given)

__Austin Animal Center Dataset__:

In this exercise, we are working with pet adoption data from __Austin Animal Center__. We have two datasets that cover intake and outcome of animals. Intake data is available from [here](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) and outcome is from [here](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238). 

In order to work with a single table, we joined the intake and outcome tables using the "Animal ID" column and created a training.csv, test_features.csv and y_test.csv files. Similar to our review dataset, we didn't consider animals with multiple entries to the facility to keep it simple. If you want to see the original datasets, they are available under data/review folder: Austin_Animal_Center_Intakes.csv, Austin_Animal_Center_Outcomes.csv.

__Dataset schema:__ 
- __Pet ID__ - Unique ID of pet
- __Outcome Type__ - State of pet at the time of recording the outcome
- __Sex upon Outcome__ - Sex of pet at outcome
- __Name__ - Name of pet 
- __Found Location__ - Found location of pet before entered the center
- __Intake Type__ - Circumstances bringing the pet to the center
- __Intake Condition__ - Health condition of pet when entered the center
- __Pet Type__ - Type of pet
- __Sex upon Intake__ - Sex of pet when entered the center
- __Breed__ - Breed of pet 
- __Color__ - Color of pet 
- __Age upon Intake Days__ - Age of pet when entered the center (days)
- __Time at Center__ - Time at center (0 = less than 30 days; 1 = more than 30 days). This is the value to predict. 


## 1. <a name="1">Read the datasets</a> (Given)
(<a href="#0">Go to top</a>)

Let's read the datasets into dataframes, using Pandas.

In [5]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")
  
tr = pd.read_csv('training.csv')
tr.columns = tr.columns.str.strip()
te = pd.read_csv('test_features.csv')
te.columns = te.columns.str.strip()

print('The shape of the training dataset is:', tr.shape)
print('The shape of the test dataset is:', te.shape)
tr

The shape of the training dataset is: (71538, 13)
The shape of the test dataset is: (23846, 12)


Unnamed: 0,Pet ID,Outcome_Type,Sex_upon_Outcome,Name,Found_Location,Intake_Type,Intake_Condition,Pet_Type,Sex_upon_Intake,Breed,Color,Age_upon_Intake Days,Time_at_Center
0,A745079,Transfer,Unknown,,7920 Old Lockhart in Travis (TX),Stray,Normal,Cat,Unknown,Domestic Shorthair Mix,Blue,3,0
1,A801765,Transfer,Intact Female,,5006 Table Top in Austin (TX),Stray,Normal,Cat,Intact Female,Domestic Shorthair,Brown Tabby/White,28,0
2,A667965,Transfer,Neutered Male,,14100 Thermal Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,Chihuahua Shorthair Mix,Brown/Tan,1825,0
3,A687551,Transfer,Intact Male,,5811 Cedardale Dr in Austin (TX),Stray,Normal,Cat,Intact Male,Domestic Shorthair Mix,Brown Tabby,28,0
4,A773004,Adoption,Neutered Male,*Boris,Highway 290 And Arterial A in Austin (TX),Stray,Normal,Dog,Intact Male,Chihuahua Shorthair Mix,Tricolor/Cream,365,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
71533,A705211,Euthanasia,Neutered Male,Charlie,Austin (TX),Public Assist,Normal,Dog,Neutered Male,St. Bernard Smooth Coat Mix,White/Red,730,0
71534,A782455,Return to Owner,Neutered Male,Arlo,124 West Anderson Lane in Austin (TX),Stray,Normal,Cat,Neutered Male,Maine Coon,Brown Tabby,1825,0
71535,A757270,Died,Spayed Female,,3129 E 12Th St in Austin (TX),Stray,Sick,Cat,Spayed Female,Domestic Shorthair Mix,Black,3650,0
71536,A737192,Return to Owner,Neutered Male,Leo,8701 Panadero Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,Miniature Poodle/Chihuahua Shorthair,White/Black,365,0


## 2. <a name="2">Train a model</a> (Implement)
(<a href="#0">Go to top</a>)

 * <a href="#21">Exploratory Data Analysis</a>
 * <a href="#22">Select features to build the model</a>
 * <a href="#23">Data processing</a>
 * <a href="#24">Model training</a>

### 2.1 <a name="21">Exploratory Data Analysis</a> 
(<a href="#2">Go to Train a model</a>)

We look at number of rows, columns and some simple statistics of the dataset.

In [6]:
def convert(name):
    target = []
    length = []
    temp = tr[name].value_counts().sort_index().index
    for i in temp:
        target.append(i)
    for i in range(len(target)):
        length.append(i)
    return target, length

def convert_s(name):
    target = []
    length = []
    result = []
    #temp = tr[name].value_counts().sort_index().index
    for i in tr[name]:
        i = i.split('/')
        i = i[0].split(' ')
        target.append(i[0])
    tr[name] = target
    temp = tr[name].value_counts().sort_index().index
    for i in temp:
        result.append(i)
    for i in range(len(result)):
        length.append(i)
    return result, length

In [7]:
convert('Outcome_Type')

(['Adoption',
  'Died',
  'Disposal',
  'Euthanasia',
  'Missing',
  'Relocate',
  'Return to Owner',
  'Rto-Adopt',
  'Transfer'],
 [0, 1, 2, 3, 4, 5, 6, 7, 8])

In [8]:
temp = []
for i in tr['Color'].value_counts().index:
    i = i.split('/')[0]
    i = i.split(' ')[0]
    temp.append(i)
print(set(temp))

{'Lilac', 'Tan', 'Red', 'Fawn', 'Chocolate', 'Calico', 'Tricolor', 'Gray', 'Apricot', 'Blue', 'Black', 'Ruddy', 'Cream', 'Lynx', 'Flame', 'Sable', 'Torbie', 'Liver', 'Buff', 'Silver', 'White', 'Yellow', 'Tortie', 'Pink', 'Agouti', 'Gold', 'Orange', 'Brown', 'Seal', 'Green'}


In [9]:
dict = tr['Breed'].value_counts()
temp = []
for i in dict.index:
    temp.append((i, dict[i]))
print(temp)
# drop_list = []
# for i in temp:
#     if i[1] < 1000:
#         drop_list.append(i[0])
# print(drop_list)
# tr['Color'].dropna(how='any', inplace=True)

[('Domestic Shorthair Mix', 20676), ('Domestic Shorthair', 3796), ('Pit Bull Mix', 3770), ('Chihuahua Shorthair Mix', 3708), ('Labrador Retriever Mix', 3631), ('Domestic Medium Hair Mix', 2060), ('German Shepherd Mix', 1537), ('Bat Mix', 1279), ('Domestic Longhair Mix', 1001), ('Bat', 969), ('Siamese Mix', 866), ('Australian Cattle Dog Mix', 808), ('Dachshund Mix', 623), ('Pit Bull', 569), ('Labrador Retriever', 544), ('Chihuahua Shorthair', 536), ('Border Collie Mix', 486), ('Miniature Poodle Mix', 481), ('Boxer Mix', 460), ('Domestic Medium Hair', 449), ('Raccoon Mix', 406), ('German Shepherd', 397), ('Yorkshire Terrier Mix', 395), ('Australian Shepherd Mix', 392), ('Rat Terrier Mix', 345), ('Great Pyrenees Mix', 339), ('Catahoula Mix', 336), ('Miniature Schnauzer Mix', 332), ('Chihuahua Longhair Mix', 327), ('Jack Russell Terrier Mix', 321), ('Beagle Mix', 316), ('Cairn Terrier Mix', 301), ('Siberian Husky Mix', 295), ('Shih Tzu Mix', 265), ('Staffordshire Mix', 263), ('Pointer Mix'

In [10]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")
  
tr = pd.read_csv('training.csv')
tr.columns = tr.columns.str.strip()


tr.drop(['Pet ID'], axis=1, inplace=True) #
tr.drop(['Name'], axis=1, inplace=True)
tr.drop(['Found_Location'], axis=1, inplace=True)

tr['Pet_Type'].dropna(how='any', inplace=True)
tr.drop(tr[tr['Pet_Type'] == 'Other'].index, inplace=True)
# tr['Pet Type'].value_counts()


columns = ['Pet_Type','Outcome_Type','Intake_Condition',
           'Intake_Type','Sex_upon_Intake']
for term in columns:
    tr = tr.join(pd.get_dummies(tr[term]))
    tr = tr.drop([term], axis=1)
tr.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 66975 entries, 0 to 71537
Data columns (total 39 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Sex_upon_Outcome      66974 non-null  object
 1   Breed                 66975 non-null  object
 2   Color                 66975 non-null  object
 3   Age_upon_Intake Days  66975 non-null  int64 
 4   Time_at_Center        66975 non-null  int64 
 5   Bird                  66975 non-null  uint8 
 6   Cat                   66975 non-null  uint8 
 7   Dog                   66975 non-null  uint8 
 8   Livestock             66975 non-null  uint8 
 9   Adoption              66975 non-null  uint8 
 10  Died                  66975 non-null  uint8 
 11  Disposal              66975 non-null  uint8 
 12  Euthanasia            66975 non-null  uint8 
 13  Missing               66975 non-null  uint8 
 14  Relocate              66975 non-null  uint8 
 15  Return to Owner       66975 non-null

In [11]:
lists = ['Age_upon_Intake Days',
       'Time_at_Center', 'Bird', 'Cat', 'Dog', 'Livestock', 'Adoption', 'Died',
       'Disposal', 'Euthanasia', 'Missing', 'Relocate', 'Return to Owner',
       'Rto-Adopt', 'Transfer', 'Aged', 'Behavior', 'Feral', 'Injured',
       'Medical', 'Normal', 'Nursing', 'Other', 'Pregnant', 'Sick',
       'Abandoned', 'Euthanasia Request', 'Owner Surrender', 'Public Assist',
       'Stray', 'Wildlife', 'Intact Female', 'Intact Male', 'Neutered Male',
       'Spayed Female', 'Unknown']

In [17]:
columns = tr.columns
tr = tr.dropna(subset = columns)
# columns = ['Outcome_Type', 'Sex_upon_Outcome','Intake_Type',
#            'Intake_Condition', 'Pet_Type', 'Sex_upon_Intake',
#        'Age_upon_Intake Days', 'Time_at_Center'] 

# for i in columns:
#     tr[i].replace(
#         convert(i)[0],
#         convert(i)[1],
#         inplace = True
#     )
#     tr[i] = tr[i].astype(int)
# tr

# tr['Color'].replace(
#     convert_s('Color')[0],
#     convert_s('Color')[1],
#     inplace = True
# )

# tr['Breed'].replace(
#     convert_s('Breed')[0],
#     convert_s('Breed')[1],
#     inplace = True
# )
tr

Unnamed: 0,Sex_upon_Outcome,Breed,Color,Age_upon_Intake Days,Time_at_Center,Bird,Cat,Dog,Livestock,Adoption,...,Euthanasia Request,Owner Surrender,Public Assist,Stray,Wildlife,Intact Female,Intact Male,Neutered Male,Spayed Female,Unknown
0,Unknown,Domestic Shorthair Mix,Blue,3,0,0,1,0,0,0,...,0,0,0,1,0,0,0,0,0,1
1,Intact Female,Domestic Shorthair,Brown Tabby/White,28,0,0,1,0,0,0,...,0,0,0,1,0,1,0,0,0,0
2,Neutered Male,Chihuahua Shorthair Mix,Brown/Tan,1825,0,0,0,1,0,0,...,0,0,0,1,0,0,0,1,0,0
3,Intact Male,Domestic Shorthair Mix,Brown Tabby,28,0,0,1,0,0,0,...,0,0,0,1,0,0,1,0,0,0
4,Neutered Male,Chihuahua Shorthair Mix,Tricolor/Cream,365,0,0,0,1,0,1,...,0,0,0,1,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71533,Neutered Male,St. Bernard Smooth Coat Mix,White/Red,730,0,0,0,1,0,0,...,0,0,1,0,0,0,0,1,0,0
71534,Neutered Male,Maine Coon,Brown Tabby,1825,0,0,1,0,0,0,...,0,0,0,1,0,0,0,1,0,0
71535,Spayed Female,Domestic Shorthair Mix,Black,3650,0,0,1,0,0,0,...,0,0,0,1,0,0,0,0,1,0
71536,Neutered Male,Miniature Poodle/Chihuahua Shorthair,White/Black,365,0,0,0,1,0,0,...,0,0,0,1,0,0,0,1,0,0


In [18]:
# Implement here
te.head()
print(te.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23846 entries, 0 to 23845
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Pet ID                23846 non-null  object
 1   Outcome_Type          23846 non-null  object
 2   Sex_upon_Outcome      23846 non-null  object
 3   Name                  14733 non-null  object
 4   Found_Location        23846 non-null  object
 5   Intake_Type           23846 non-null  object
 6   Intake_Condition      23846 non-null  object
 7   Pet_Type              23846 non-null  object
 8   Sex_upon_Intake       23846 non-null  object
 9   Breed                 23846 non-null  object
 10  Color                 23846 non-null  object
 11  Age_upon_Intake Days  23846 non-null  int64 
dtypes: int64(1), object(11)
memory usage: 2.2+ MB
None


In [19]:
tr.corr().style.background_gradient(cmap='tab20c')

Unnamed: 0,Age_upon_Intake Days,Time_at_Center,Bird,Cat,Dog,Livestock,Adoption,Died,Disposal,Euthanasia,Missing,Relocate,Return to Owner,Rto-Adopt,Transfer,Aged,Behavior,Feral,Injured,Medical,Normal,Nursing,Other,Pregnant,Sick,Abandoned,Euthanasia Request,Owner Surrender,Public Assist,Stray,Wildlife,Intact Female,Intact Male,Neutered Male,Spayed Female,Unknown
Age_upon_Intake Days,1.0,-0.101249,-0.016004,-0.217446,0.219865,-0.005389,-0.166846,-0.01717,-0.00199,0.117204,-0.005316,0.004333,0.316167,0.036792,-0.112845,0.217195,3.7e-05,0.001238,0.057488,0.006911,-0.044835,-0.1157,0.02123,0.003024,0.04735,-0.00818,0.109643,0.113694,0.115975,-0.175894,-0.007639,-0.208435,-0.199683,0.358854,0.344304,-0.10767
Time_at_Center,-0.101249,1.0,-0.022873,0.1681,-0.164418,0.006671,0.25701,-0.016147,-0.010429,-0.047189,0.022064,0.001238,-0.095096,0.000589,-0.173398,-0.01295,-0.001232,-0.001649,-0.011678,-0.001588,-0.023555,0.070633,-0.005034,-0.008356,-0.006421,-0.005301,-0.014064,-0.037448,-0.043665,0.059984,-0.009211,0.053394,0.054084,-0.062167,-0.051063,-0.073055
Bird,-0.016004,-0.022873,1.0,-0.071854,-0.087712,-0.001113,-0.014367,0.022057,0.088419,0.072589,0.0064,0.119565,-0.022485,-0.005122,-0.011008,-0.005084,-0.000309,-0.002491,0.088596,-0.001235,-0.047299,-0.015155,0.017462,-0.002095,-0.007475,-0.002548,0.008095,-0.01284,0.059676,-0.056251,0.421771,-0.038012,-0.012594,-0.029045,-0.027186,0.189322
Cat,-0.217446,0.1681,-0.071854,1.0,-0.986877,-0.012528,-0.023864,0.054193,0.013681,0.031307,0.007035,-0.00708,-0.276079,-0.025636,0.211137,-0.042039,-0.003474,0.027914,0.014081,-0.006126,-0.07236,0.085412,-0.001279,-0.009817,0.042368,-0.009806,-0.02058,0.007494,-0.137283,0.074311,-0.030306,0.041025,-0.015045,-0.091822,-0.075405,0.180736
Dog,0.219865,-0.164418,-0.087712,-0.986877,1.0,-0.015293,0.026107,-0.057601,-0.027751,-0.042766,-0.008038,-0.011997,0.279136,0.026444,-0.209005,0.042819,0.003521,-0.027468,-0.028103,0.006321,0.07966,-0.082809,-0.00149,0.010148,-0.041048,0.010211,0.019281,-0.005346,0.127679,-0.065378,-0.036995,-0.035053,0.017145,0.096382,0.079775,-0.210876
Livestock,-0.005389,0.006671,-0.001113,-0.012528,-0.015293,1.0,0.000609,-0.001406,-0.000517,-0.00277,-0.000319,-0.000152,0.005983,-0.000893,-0.003561,-0.000886,-5.4e-05,-0.000434,-0.003246,-0.000215,0.005323,-0.002642,-0.000624,-0.000365,-0.002584,-0.000444,-0.000662,-0.003243,-0.00337,0.004877,-0.00047,0.005177,-0.003958,-0.001726,-0.00474,0.006423
Adoption,-0.166846,0.25701,-0.014367,-0.023864,0.026107,0.000609,1.0,-0.089408,-0.032864,-0.176182,-0.020262,-0.009685,-0.376711,-0.056793,-0.654341,-0.035012,0.004361,-0.016026,-0.076029,-0.002019,0.138151,-0.057777,-0.014094,0.000882,-0.085593,8.7e-05,-0.038316,0.10793,-0.146371,-0.008769,-0.028079,0.104067,0.087777,-0.100935,-0.052434,-0.198146
Died,-0.01717,-0.016147,0.022057,0.054193,-0.057601,-0.001406,-0.089408,1.0,-0.003742,-0.020062,-0.002307,-0.001103,-0.042897,-0.006467,-0.074511,0.003017,-0.00039,0.001656,0.065785,-0.00156,-0.102237,0.04456,-0.004518,-0.002645,0.063849,-0.003217,0.001507,-0.007662,-0.01001,0.010854,0.018799,-0.010296,0.003647,-0.018049,-0.013245,0.058768
Disposal,-0.00199,-0.010429,0.088419,0.013681,-0.027751,-0.000517,-0.032864,-0.003742,1.0,-0.007374,-0.000848,-0.000405,-0.015768,-0.002377,-0.027388,-0.00236,-0.000143,-0.001156,0.06798,-0.000573,-0.058381,-0.004831,-0.001661,-0.000972,0.031356,-0.001182,-0.001763,-0.010899,0.017496,-0.00245,0.034672,-0.013545,-0.002369,-0.008457,-0.006006,0.055039
Euthanasia,0.117204,-0.047189,0.072589,0.031307,-0.042766,-0.00277,-0.176182,-0.020062,-0.007374,1.0,-0.004546,-0.002173,-0.08453,-0.012744,-0.146827,0.055108,-0.000768,0.001324,0.271262,-0.003074,-0.268936,-0.023207,0.012061,-0.005213,0.167816,-0.003888,0.196287,-0.000378,-0.011872,-0.024444,0.113871,-0.031901,-0.016298,0.02451,-0.001841,0.071626


### 2.2 <a name="22">Select features to build the model</a> 
(<a href="#2">Go to Train a model</a>)


In [20]:
columns = ['Outcome_Type', 'Sex_upon_Outcome', 'Name','Intake_Type',
           'Intake_Condition', 'Pet_Type', 'Sex_upon_Intake',
       'Breed', 'Color', 'Age_upon_Intake Days', 'Time_at_Center']

columns = ['Outcome_Type', 'Sex_upon_Outcome','Intake_Type',
            'Intake_Condition', 'Pet_Type', 'Sex_upon_Intake',
            'Breed', 'Color', 'Age_upon_Intake Days']

columns = ['Outcome_Type', 'Sex_upon_Outcome','Intake_Condition',
           'Intake_Type','Sex_upon_Intake','Age_upon_Intake Days']

In [21]:
# Implement here
from sklearn.model_selection import train_test_split

train_data, test_data = train_test_split(tr, test_size=0.1, shuffle=True, random_state=23)
X_train = train_data[['Age_upon_Intake Days', 'Bird', 'Cat',
                      'Dog', 'Livestock', 'Adoption', 'Died',
       'Disposal', 'Euthanasia', 'Missing', 'Relocate', 'Return to Owner',
       'Rto-Adopt', 'Transfer', 'Aged', 'Behavior', 'Feral', 'Injured',
       'Medical', 'Normal', 'Nursing', 'Other', 'Pregnant', 'Sick',
       'Abandoned', 'Euthanasia Request', 'Owner Surrender', 'Public Assist',
       'Stray', 'Wildlife', 'Intact Female', 'Intact Male', 'Neutered Male',
       'Spayed Female', 'Unknown']].values #Selected Columns
# X_train = train_data[
#     ['Sex_upon_Outcome','Intake_Type','Intake_Condition']
# ]
y_train = train_data['Time_at_Center'].tolist()
# numerical_features = ...

### 2.3 <a name="23">Data Processing</a> 
(<a href="#2">Go to Train a model</a>)


In [22]:
def convert_te(name):
    target = []
    length = []
    temp = te[name].value_counts().sort_index().index
    for i in temp:
        target.append(i)
    for i in range(len(target)):
        length.append(i)
    return target, length

def convert_te_s(name):
    target = []
    length = []
    result = []
    for i in te[name]:
        i = i.split('/')
        i = i[0].split(' ')
        target.append(i[0])
    te[name] = target
    temp = te[name].value_counts().sort_index().index
    for i in temp:
        result.append(i)
    for i in range(len(result)):
        length.append(i)
    return result, length

In [24]:
# Implement here
# columns = ['Outcome_Type', 'Sex_upon_Outcome','Intake_Type',
#            'Intake_Condition', 'Pet_Type', 'Sex_upon_Intake',
#        'Age_upon_Intake_Days']
# for i in columns:
#     te[i].replace(
#         convert_te(i)[0],
#         convert_te(i)[1],
#         inplace = True
#     )
#     te[i] = te[i].astype(int)

# te['Color'].replace(
#     convert_te_s('Color')[0],
#     convert_te_s('Color')[1],
#     inplace = True
# )

# te['Breed'].replace(
#     convert_te_s('Breed')[0],
#     convert_te_s('Breed')[1],
#     inplace = True
# )

# te


### 2.4 <a name="24">Model training</a> 
(<a href="#2">Go to Train a model</a>)


In [25]:
# Implement here
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline

classifier = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', MinMaxScaler()),
    ('estimator', KNeighborsClassifier(n_neighbors = 3))
])

classifier.fit(X_train, y_train)

# tune your parameters using the validation dataset
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score

# Use the fitted model to make predictions on the train dataset
# Train data going through the Pipeline it's first imputed (with means from the train), scaled (with the min/max from the train data), and finally used to make predictions
train_predictions = classifier.predict(X_train)

print('Model performance on the train set:')
print(confusion_matrix(y_train, train_predictions))
print(classification_report(y_train, train_predictions))
print("Train accuracy:", accuracy_score(y_train, train_predictions))

Model performance on the train set:
[[53066  1669]
 [ 2786  2755]]
              precision    recall  f1-score   support

           0       0.95      0.97      0.96     54735
           1       0.62      0.50      0.55      5541

    accuracy                           0.93     60276
   macro avg       0.79      0.73      0.76     60276
weighted avg       0.92      0.93      0.92     60276

Train accuracy: 0.9260899860641051


## 3. <a name="3">Make predictions on the test dataset</a> (Implement)
(<a href="#0">Go to top</a>)

Use the test set to make predictions with the trained model.

In [26]:
# Implement here

# test_predictions = ...

In [27]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")
  
te = pd.read_csv('test_features.csv')
te.columns = te.columns.str.strip()
te

Unnamed: 0,Pet ID,Outcome_Type,Sex_upon_Outcome,Name,Found_Location,Intake_Type,Intake_Condition,Pet_Type,Sex_upon_Intake,Breed,Color,Age_upon_Intake Days
0,A782657,Adoption,Spayed Female,,1911 Dear Run Drive in Austin (TX),Stray,Normal,Dog,Intact Female,Labrador Retriever Mix,Black,60
1,A804622,Adoption,Neutered Male,,702 Grand Canyon in Austin (TX),Stray,Normal,Dog,Intact Male,Boxer/Anatol Shepherd,Brown/Tricolor,60
2,A786693,Return to Owner,Neutered Male,Zeus,Austin (TX),Public Assist,Normal,Dog,Neutered Male,Australian Cattle Dog/Pit Bull,Black/White,3285
3,A693330,Adoption,Spayed Female,Hope,Levander Loop & Airport Blvd in Austin (TX),Stray,Normal,Dog,Intact Female,Miniature Poodle,Gray,1825
4,A812431,Adoption,Neutered Male,,Austin (TX),Owner Surrender,Injured,Cat,Intact Male,Domestic Shorthair,Blue/White,210
...,...,...,...,...,...,...,...,...,...,...,...,...
23841,A706720,Adoption,Neutered Male,Nikko,Mc Callen Pass And Parmer in Austin (TX),Stray,Normal,Dog,Neutered Male,Miniature Schnauzer Mix,Tan/Gray,1460
23842,A782751,Adoption,Neutered Male,,18706 Blake Manor Rd in Manor (TX),Stray,Normal,Dog,Intact Male,American Pit Bull Terrier Mix,Brown,60
23843,A768058,Euthanasia,Unknown,,1701 Congress Avenue in Austin (TX),Wildlife,Normal,Other,Unknown,Bat Mix,Black/Black,730
23844,A729326,Adoption,Neutered Male,*Jester,5017 W. 290 in Austin (TX),Stray,Normal,Dog,Intact Male,Pointer Mix,Black/White,730


In [30]:
# sex_upon_intake_dummy = sex_upon_intake_dummy.rename(columns={
#  'Intact Male':'Intake Intact Male',
#  'Intact Female':'Intake Intact Female',
#  'Neutered Female':'Intake Neutered Female',
#  'Spayed Female':'Intake Spayed Female'
# })

In [32]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")
  
te = pd.read_csv('test_features.csv')
te.columns = te.columns.str.strip()


# te.drop(['Pet_ID'], axis=1, inplace=True) #
# te.drop(['Name'], axis=1, inplace=True)
# te.drop(['Found_Location'], axis=1, inplace=True)

# # te['Pet_Type'].dropna(how='any', inplace=True)
# te.drop(te[te['Pet_Type'] == 'Other'].index, inplace=True)
# tr['Pet Type'].value_counts()
columns = ['Pet_Type','Outcome_Type','Intake_Condition',
           'Intake_Type','Sex_upon_Intake']
for term in columns:
    te = te.join(pd.get_dummies(te[term]))
    te = te.drop([term], axis=1)
te.columns

ValueError: columns overlap but no suffix specified: Index(['Other'], dtype='object')

In [None]:
# te = te.reindex(columns=['Outcome_Type', 'Sex_upon_Outcome','Intake_Condition',
#            'Intake_Type','Sex_upon_Intake','Age_upon_Intake Days'])

te = te[['Age_upon_Intake_Days', 'Bird', 'Cat',
        'Dog', 'Livestock', 'Adoption', 'Died',
       'Disposal', 'Euthanasia', 'Missing', 'Relocate', 'Return to Owner',
       'Rto-Adopt', 'Transfer', 'Aged', 'Behavior', 'Feral', 'Injured',
       'Medical', 'Normal', 'Nursing', 'Other', 'Pregnant', 'Sick',
       'Abandoned', 'Euthanasia Request', 'Owner Surrender', 'Public Assist',
       'Stray', 'Wildlife', 'Intact Female', 'Intact Male', 'Neutered Male',
       'Spayed Female', 'Unknown']].values
X_test = te
train_predictions = classifier.predict(X_test)

In [None]:
len(train_predictions)

22291

In [None]:
te = te.reindex(columns=['Age_upon_Intake Days', 'Bird', 'Cat',
                      'Dog', 'Livestock', 'Adoption', 'Died',
       'Disposal', 'Euthanasia', 'Missing', 'Relocate', 'Return to Owner',
       'Rto-Adopt', 'Transfer', 'Aged', 'Behavior', 'Feral', 'Injured',
       'Medical', 'Normal', 'Nursing', 'Other', 'Pregnant', 'Sick',
       'Abandoned', 'Euthanasia Request', 'Owner Surrender', 'Public Assist',
       'Stray', 'Wildlife', 'Intact Female', 'Intact Male', 'Neutered Male',
       'Spayed Female', 'Unknown'])
X_test = te
train_predictions = classifier.predict(X_test)
train_predictions

In [None]:
print(len(train_predictions))

In [None]:
train_predictions = list(train_predictions)
#print(train_predictions)
num_1 = train_predictions.count(1)
num_0 = train_predictions.count(0)
print(num_1/num_0)

In [None]:
print(train_predictions)

In [None]:
# temp = te['Outcome_Type'].value_counts().sort_index()
# target = []
# index = []
# num0 = 0
# for term in temp.index:
#     target.append(term)
#     index.append(num0)
#     num0 += 1
# #print(target, index)

# te.replace(
#     target,
#     index,
#     inplace = True
# )
# te