# ML Review (Con't)

### Today we're going to spend time with a few different ML algorithms. We will be implementing a few, and then discussing some others and trying to get a feel for how the different variations work and what some requirements are for each algorithm and we'll talk about the basics of how they work and what approach you may want to try when working wtih them.

## Supervised Learning

### Linear Regression
- Brief refresher on OLS regression and the basics of trying to fit a line to a series of points.
- implementation of the linear regression with SKLearn
- Ridge and Lasso regression

### Logistic Regression
- We'll talk about the difference between this and linear regression
- Discuss how this is a classifier as opposed to a regressor.
- implementation of logistic regression with SKLearn
- I'll point you towards some sources for model scoring for classification problems

### K Nearest Neighbors
- We'll talk about what KNN does in space and why it is important to scale data before passing it into KNN
- There is an interesting thing about KNN when it comes to fitting.

### Tree-Based Methods

#### Decision Trees

- How to think about a decision tree
- What is a weak learner?
- How do these help us as part of a model?

#### Bootstrapped / Aggregated Trees / Random Forest

- How can we use many different decision trees together to help us get better information?
- What is bootstrapping and how can it be used alongside tree-based models?
- How can we put all of this together to end up with a random forest of different sized trees/stumps?
- Feature Importance
- Brief aside on Boosted trees (XGBoost specifically)

### Support Vector Machines

- We'll talk about how SVM's try to weave a surface between different classes.

## Unsupervised Learning

### Clustering

- There are several types of clustering that are available
- K Means clustering is a very good place to start
- Will briefly talk about DPGMM
- Brief run-through of other types of clustering and how they work.

## Hyperparameter Tuning

- Will quickly discuss how grid search can be used to explore a space of possible hyperparameters and can return a series of best models.

## AutoML Demo
- Want to briefly demo how AutoML can be set up very quickly to run once you have cleaned up data.

In [1]:
### Bringing in our data + basic cleaning
# import tensorflow as tf
# import keras
import pandas as pd
import sklearn as sk
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

#Reading Data
df = pd.read_csv('fifa_player_data.csv')

#Dropping some problematic columns
df.pop('weakfoot')
df.pop('work')
df.pop('version')
df.set_index('player_num', inplace = True)

def cm_getter(height):
    return(float((str(height)).split('cm')[0]))

df.height = df.height.apply(cm_getter)

nonames = df.drop(['PKey', 'name'], axis = 1)

dummies = pd.get_dummies(nonames)

In [2]:
nonames

Unnamed: 0_level_0,rating,position,price,skills,PACE,SHOT,PASS,DRIBBLE,DEFENSE,PHYSICAL,...,stand_tkl,stamina,strength,vision,volleys,finishing,composure,team,country,league
player_num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,96,ST,6900000,5,97,95,81,95,45,76,...,44,81,85,81,96,98,89,Icons,Brazil,Icons
1,98,CAM,4800000,5,95,96,93,96,60,76,...,53,86,76,97,95,98,98,Icons,Brazil,Icons
2,90,CM,3500000,4,88,88,88,87,80,87,...,80,92,86,87,85,87,88,Icons,Holland,Icons
3,94,ST,3280000,5,90,93,81,90,35,79,...,31,88,79,82,87,94,95,Juventus,Portugal,Serie A TIM
4,95,CF,3100000,5,96,93,90,95,56,75,...,49,87,74,93,93,95,95,Icons,Brazil,Icons
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2755,76,CM,400,4,61,74,75,75,65,72,...,75,72,74,76,65,69,72,Philadelphia Union,Bosnia and Herzegovina,Major League Soccer
2756,75,CAM,400,4,70,71,73,77,61,63,...,61,69,56,73,62,68,68,Boavista FC,Portugal,Liga NOS
2757,75,LM,400,3,81,68,71,80,27,51,...,30,59,53,71,58,68,68,SV Zulte-Waregem,Belgium,Belgium Pro League
2758,75,RW,400,4,82,72,68,80,26,66,...,21,71,70,70,66,69,71,Heracles Almelo,Holland,Eredivisie


In [3]:
dummies

Unnamed: 0_level_0,rating,price,skills,PACE,SHOT,PASS,DRIBBLE,DEFENSE,PHYSICAL,height,...,league_SAF,league_Saudi Professional League,league_Scottish Premiership,league_Serie A TIM,league_South African FL,league_Superliga,league_Süper Lig,league_Ukrayina Liha,league_Österreichische Fußball-Bundesliga,league_Česká Liga
player_num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,96,6900000,5,97,95,81,95,45,76,183.0,...,0,0,0,0,0,0,0,0,0,0
1,98,4800000,5,95,96,93,96,60,76,173.0,...,0,0,0,0,0,0,0,0,0,0
2,90,3500000,4,88,88,88,87,80,87,191.0,...,0,0,0,0,0,0,0,0,0,0
3,94,3280000,5,90,93,81,90,35,79,187.0,...,0,0,0,1,0,0,0,0,0,0
4,95,3100000,5,96,93,90,95,56,75,173.0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2755,76,400,4,61,74,75,75,65,72,188.0,...,0,0,0,0,0,0,0,0,0,0
2756,75,400,4,70,71,73,77,61,63,171.0,...,0,0,0,0,0,0,0,0,0,0
2757,75,400,3,81,68,71,80,27,51,176.0,...,0,0,0,0,0,0,0,0,0,0
2758,75,400,4,82,72,68,80,26,66,181.0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
y = dummies.rating.values.reshape(-1, 1)
X = dummies.drop('rating', axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .3, train_size = .7)

X_scaler = StandardScaler().fit(X_train)
y_scaler = StandardScaler().fit(y_train)

X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
y_train_scaled = y_scaler.transform(y_train)
y_test_scaled = y_scaler.transform(y_test)

## Linear Regression

- Linear regression has its uses and can be used to try to fit a line that will allow us to minimize the error between the points and the line that is created.
- Despite the broad availability of other algorithms and applied ML methods, you may still consistently see linear regression used as it is pretty explainable and it can have good results in certain use-cases.
- You can use regularization which helps to prevent over-fitting.
- https://medium.com/all-about-ml/lasso-and-ridge-regularization-a0df473386d5
- https://towardsdatascience.com/understanding-the-ols-method-for-simple-linear-regression-e0a4e8f692cc
- https://www.datacamp.com/community/tutorials/tutorial-ridge-lasso-elastic-net
- https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso-regression-python-complete-tutorial/

In [5]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model

LinearRegression()

In [6]:
model.fit(X_train[['SHOT', 'PASS', 'DRIBBLE']], y_train)

LinearRegression()

In [7]:
print('Weight coefficients: ', model.coef_)
print('y-axis intercept: ', model.intercept_) 

Weight coefficients:  [[0.00576194 0.14775048 0.09241527]]
y-axis intercept:  [62.09415945]


In [8]:
predictions = model.predict(X_test[['SHOT', 'PASS', 'DRIBBLE']])

In [9]:
np.mean(predictions - y_test)

0.08363958937813334

In [10]:
display(predictions[:10])
display(y_test[:10])
display((predictions - y_test)[:10])

array([[80.54832119],
       [79.20669753],
       [81.15529412],
       [78.5763314 ],
       [74.94622909],
       [80.99601977],
       [80.70148815],
       [79.27931855],
       [77.01771088],
       [81.43253993]])

array([[82],
       [75],
       [82],
       [76],
       [75],
       [79],
       [78],
       [79],
       [75],
       [78]])

array([[-1.45167881],
       [ 4.20669753],
       [-0.84470588],
       [ 2.5763314 ],
       [-0.05377091],
       [ 1.99601977],
       [ 2.70148815],
       [ 0.27931855],
       [ 2.01771088],
       [ 3.43253993]])

### Logistic Regression
- This is similar in shape to linear regression, but we are now using it for classification. There is a fairly large change that needs to be made fo that to be possible. We are now fitting on the sigmoid function.
- The reason we're using the sigmoid function as the one we are fitting is because we essentially want to figure out if something is in one class or another (this is similar for multiclass problems).
- We call any algorithm that will have the output of a class a classifier as opposed to a regressor which would respond with a number, here we can place items into classes.
- https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148
- https://towardsdatascience.com/common-classification-model-evaluation-metrics-2ba0a7a7436e#:~:text=The%20F1%20score%20is%20calculated,good%20recall%20and%20precision%20values.


In [11]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(max_iter = 100)

In [12]:
y = dummies['league_Serie A TIM'].values.reshape(-1, 1)
X = dummies.drop(['league_Serie A TIM'], axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .3, train_size = .7)

X_scaler = StandardScaler().fit(X_train)
y_scaler = StandardScaler().fit(y_train)

X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
y_train_scaled = y_scaler.transform(y_train)
y_test_scaled = y_scaler.transform(y_test)

In [13]:
clf.fit(X_train, y_train)

LogisticRegression()

In [14]:
clf.score(X_test, y_test)

0.8611111111111112

In [15]:
preds = clf.predict(X_test)

display(preds[:5])
display(y_test[:5])

array([0, 0, 0, 0, 0], dtype=uint8)

array([[0],
       [0],
       [0],
       [0],
       [0]], dtype=uint8)

### K Nearest Neighbords Classification
- K Nearest Neighbors is a clever algorithm that allows us to use some amount of the closest points to a datapoint to classify it. It will look at the K nearest neighbors (K is defined by the user) and then after that has been computed, it will output to us the class that it believes our new point to be.
- The funny thing about KNN is that there is that the "fit" process is all that there is, there is no separate training phase, the training is just calculating the distance between points.
- KNN is also interesting in that you can use different distances to compute it. 
- https://towardsdatascience.com/importance-of-distance-metrics-in-machine-learning-modelling-e51395ffe60d
- https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761
- Scoring metrics for classification problems

In [16]:
y = dummies['position_CB'].values.reshape(-1, 1)
X = dummies.drop(['position_CB'], axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .3, train_size = .7)

X_scaler = StandardScaler().fit(X_train)
y_scaler = StandardScaler().fit(y_train)

X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
y_train_scaled = y_scaler.transform(y_train)
y_test_scaled = y_scaler.transform(y_test)

In [17]:
from sklearn.neighbors import KNeighborsClassifier
train_scores = []
test_scores = []
for k in range(1, 20, 2):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train_scaled, y_train)
    train_score = knn.score(X_train_scaled, y_train)
    test_score = knn.score(X_test_scaled, y_test)
    train_scores.append(train_score)
    test_scores.append(test_score)
    print(f"k: {k}, Train/Test Score: {train_score:.3f}/{test_score:.3f}")


k: 1, Train/Test Score: 1.000/0.880
k: 3, Train/Test Score: 0.943/0.873
k: 5, Train/Test Score: 0.913/0.867
k: 7, Train/Test Score: 0.898/0.871
k: 9, Train/Test Score: 0.889/0.873
k: 11, Train/Test Score: 0.887/0.871
k: 13, Train/Test Score: 0.896/0.882
k: 15, Train/Test Score: 0.903/0.888
k: 17, Train/Test Score: 0.909/0.893
k: 19, Train/Test Score: 0.917/0.905


## Tree based methods (Random Forest)

- First, let's take a look at what decision trees are...
- https://medium.com/greyatom/decision-trees-a-simple-way-to-visualize-a-decision-dc506a403aeb
- The tree model that we just took a look at would make up only a small part of any tree-based model that we'd be likely to use out in the field.
- We can think about an individual tree in this example as a weak learner.
- We will also want to think about bootstrapping trees. What is the idea of bootstrapping in statistics?
- https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/
- https://machinelearningmastery.com/a-gentle-introduction-to-the-bootstrap-method/
- We will also be talking about aggregation of trees, this is that idea of using several trees and putting them all together to get a better understanding for how individual components affect the outcome of a model such as random forest.
- There is a concept of feature importance in aggregated tree models/random forest.

In [18]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

df_encoded = df.apply(le.fit_transform)

In [19]:
df_encoded

Unnamed: 0_level_0,name,rating,position,price,skills,PACE,SHOT,PASS,DRIBBLE,DEFENSE,...,stamina,strength,vision,volleys,finishing,composure,team,country,league,PKey
player_num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1603,21,15,547,4,67,74,44,54,24,...,59,57,63,86,88,53,120,11,12,2238
1,1488,23,0,545,4,65,75,56,55,39,...,64,48,78,85,88,62,120,11,12,2066
2,1623,15,4,543,3,58,67,51,46,59,...,70,58,69,75,77,52,120,43,12,2265
3,339,19,15,541,4,60,72,44,49,14,...,66,51,64,77,84,59,127,68,29,464
4,1488,20,3,539,4,66,72,53,54,35,...,65,46,75,83,85,59,120,11,12,2065
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2755,711,1,4,0,3,31,53,38,34,44,...,50,46,58,55,59,36,176,10,22,986
2756,617,0,0,0,3,40,50,36,36,40,...,47,28,55,52,58,32,37,68,20,855
2757,1781,0,8,0,2,51,47,34,39,6,...,37,25,53,48,58,32,208,8,0,2485
2758,240,0,13,0,3,52,51,31,39,5,...,49,42,52,56,59,35,116,43,9,332


In [20]:
df.position.value_counts()

CB     486
ST     422
CM     368
CDM    223
GK     204
CAM    196
RB     163
LB     160
LM     151
RM     148
LW      90
RW      80
CF      48
LWB     11
RWB      9
LF       1
Name: position, dtype: int64

In [21]:
y = df_encoded['position'].values.reshape(-1, 1)
X = df_encoded.drop(['position'], axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .3, train_size = .7)

In [22]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=300, max_depth = 10)
rf = RandomForestClassifier(n_estimators = 100, max_depth=20, warm_start=True)

rf_res = rf.fit(X_train, y_train)
rf.score(X_test, y_test)

0.7934782608695652

In [23]:
display(rf.predict(X_test)[:10])
display(y_test[:10])

array([ 6,  2,  5,  6, 15, 15, 15, 11,  5,  2])

array([[ 6],
       [ 2],
       [ 5],
       [ 6],
       [15],
       [15],
       [ 3],
       [11],
       [ 5],
       [ 1]])

In [24]:
df_encoded.merge(df, left_index = True, right_index = True)[['position_y', 'position_x']].head(30)

Unnamed: 0_level_0,position_y,position_x
player_num,Unnamed: 1_level_1,Unnamed: 2_level_1
0,ST,15
1,CAM,0
2,CM,4
3,ST,15
4,CF,3
5,LW,9
6,CM,4
7,CF,3
8,CAM,0
9,CAM,0


### XGBoost is a boosted gradient tree algorithm that has seen quite a lot of usage over the past few years and was often included in the top rankings of kaggle competitions quite often.
- https://towardsdatascience.com/how-does-xgboost-work-748bc75c58aa
- It is important to note that XGBoost can be prone to overfitting, but with the correct hyperparameters it is an extremely useful alrgorithm.
- You can learn morea about XGBoost, installation, and implementation rules at https://xgboost.readthedocs.io/en/latest/

### SVM
- https://en.wikipedia.org/wiki/Support_vector_machine
- https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47
- https://scikit-learn.org/stable/modules/svm.html
- SVM sort of teases the data into a new dimension so that may help the algorithm tease out patterns in nonlinear data (we will talk a little bit about an example where there is a small cluster of points inside of a circle of points.


## Unsupervised learning
- We are going to take a look at and discuss a little more about clustering algorithms and how they work....
- Although I'm not going to be implementing one live, they follow the model>fit>predict workflow that we are familiar with. 
- https://machinelearningmastery.com/clustering-algorithms-with-python/
- https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.htm

### Grid Search
- Grid search allows us to search the hyperparameter space so that we can find the hyperaparameters that produce the best model
- We will take a look at an example of grid search and discuss what it does with a little bit more detail

### Beyond Grid search
- The space beyond grid search is the automl space, this is what I have described in the past as the "big red button" that you can press. So essentially, you present cleaned data (and in some cases you don't even need to clean the data) and get the model working for you!
- This is where I would caution again to not just chalk everything up to a black-box model that you have no insight into, you will want to understand how to interpret these models in most cases.

### h2o automl Demo

In [None]:
#UNCOMMENT THIS SECTION (Didn't want to share system info)
#import h2o
#from h2o.automl import H2OAutoML
#h2o.init();

In [26]:
aml = H2OAutoML(max_models = 10, seed = 1)

In [27]:
X_train

Unnamed: 0_level_0,name,rating,price,skills,PACE,SHOT,PASS,DRIBBLE,DEFENSE,PHYSICAL,...,stamina,strength,vision,volleys,finishing,composure,team,country,league,PKey
player_num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2412,1702,2,1,0,47,52,30,38,20,33,...,9,41,21,3,6,24,89,43,9,2375
246,1169,11,356,3,56,67,47,46,25,23,...,47,40,70,80,78,47,39,38,1,1627
220,227,14,369,1,36,39,44,34,70,41,...,66,54,63,51,38,58,120,31,12,315
833,220,5,63,3,59,52,37,44,23,26,...,53,36,57,66,65,37,165,13,21,305
552,1517,8,153,4,60,62,41,42,11,25,...,61,33,63,59,76,42,216,43,16,2104
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1194,157,5,9,2,48,48,39,35,55,31,...,63,40,48,59,46,34,106,68,16,222
36,1072,16,511,2,57,63,49,36,68,40,...,68,47,72,67,67,53,120,38,12,1489
411,599,11,245,4,49,56,46,48,10,16,...,38,33,68,71,67,49,86,34,1,826
403,726,9,237,3,56,64,39,44,21,26,...,63,35,62,65,75,45,241,53,24,1010


In [28]:
X_test

Unnamed: 0_level_0,name,rating,price,skills,PACE,SHOT,PASS,DRIBBLE,DEFENSE,PHYSICAL,...,stamina,strength,vision,volleys,finishing,composure,team,country,league,PKey
player_num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1292,150,0,7,2,55,13,25,34,50,26,...,56,37,33,23,11,32,5,49,21,211
2562,371,2,0,2,25,40,32,29,54,38,...,57,50,53,38,45,41,71,19,16,504
1873,468,1,3,0,47,53,35,34,12,32,...,23,44,36,4,0,20,257,11,20,644
625,582,10,131,2,45,41,41,38,60,37,...,67,46,52,46,49,52,27,11,16,803
917,689,5,41,2,39,60,30,33,18,43,...,70,54,46,64,75,43,96,6,1,959
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1854,1480,1,3,2,51,48,33,36,32,26,...,53,41,48,42,62,31,254,78,16,2056
1769,63,1,4,2,47,41,33,39,7,22,...,51,40,58,38,54,34,21,61,24,86
169,40,14,419,1,54,69,38,37,27,42,...,65,57,55,80,82,49,120,31,12,53
1652,1247,3,5,2,47,25,32,34,56,32,...,65,40,42,31,23,36,127,49,29,1736


In [29]:
h2firstframe = h2o.H2OFrame(X_train)
h2secondframe = h2o.H2OFrame(X_test)

x = h2firstframe.columns
y = 'rating'

x.remove(y)

Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [30]:
h2firstframe

name,rating,price,skills,PACE,SHOT,PASS,DRIBBLE,DEFENSE,PHYSICAL,height,overall,IGS,acceleration,aggression,agility,balance,control,crossing,curve,dribbling,heading,interceptions,jumping,l_pass,l_shot,marking,penalties,positioning,reactions,s_pass,fk_accuracy,shot_power,slide_tkl,sprint_speed,stand_tkl,stamina,strength,vision,volleys,finishing,composure,team,country,league,PKey
1702,2,1,0,47,52,30,38,20,33,37,90,55,14,12,5,21,21,8,8,9,10,3,34,36,9,1,15,0,17,17,9,26,4,8,3,9,41,21,3,6,24,89,43,9,2375
1169,11,356,3,56,67,47,46,25,23,20,134,694,60,35,61,58,71,68,81,78,39,42,42,56,78,39,76,78,26,69,76,77,29,59,27,47,40,70,80,78,47,39,38,1,1627
227,14,369,1,36,39,44,34,70,41,18,134,744,40,66,53,59,71,61,60,51,74,83,52,71,56,83,66,51,35,69,56,62,80,39,81,66,54,63,51,38,58,120,31,12,315
220,5,63,3,59,52,37,44,23,26,21,111,547,58,50,62,48,70,58,69,76,53,36,50,53,53,30,59,71,18,59,65,62,27,64,30,53,36,57,66,65,37,165,13,21,305
1517,8,153,4,60,62,41,42,11,25,14,111,563,62,46,61,56,66,69,65,74,48,17,53,57,73,25,64,69,23,59,66,69,17,64,14,61,33,63,59,76,42,216,43,16,2104
567,0,1,2,45,52,18,28,9,28,24,50,276,43,41,41,42,57,35,47,57,66,15,37,36,53,10,58,67,13,41,35,65,28,52,14,44,51,42,65,64,37,246,11,29,778
1467,2,2,3,43,53,32,39,15,24,26,76,386,47,45,57,44,64,54,68,72,62,12,40,43,63,40,48,68,15,55,46,59,9,46,14,40,42,55,57,67,41,198,25,29,2035
1656,13,334,1,43,44,35,32,68,41,22,133,670,44,67,42,43,62,59,63,59,67,77,61,58,65,81,51,48,24,63,52,71,76,47,80,56,57,40,61,41,47,84,34,16,2314
1602,14,516,4,50,64,51,49,17,35,20,136,709,54,54,59,61,76,79,78,81,38,15,43,66,76,29,78,78,32,68,78,70,22,53,32,53,54,73,70,75,57,120,11,12,2233
1539,0,1,1,15,24,21,7,54,28,26,20,225,23,65,28,42,57,39,24,15,66,62,48,40,31,60,36,29,2,54,55,59,68,13,69,28,47,28,34,26,32,195,78,16,2138




In [None]:
aml = H2OAutoML(max_models = 10, seed = 1)
aml.train(x=x, y=y, training_frame = h2firstframe)

AutoML progress: |████████

In [None]:
lb = aml.leaderboard
lb.head(rows = 10)