# Using shallow learning algorithms from ManufacturingNet
##### To know more about ManufacturingNet please visit: http://manufacturingnet.io/

In [43]:
import ManufacturingNet
import numpy as np

First we import manufacturingnet. We can use this to experiment with several shallow learning models.

It is important to note that all the dependencies of the package must also be installed in your environment. 

##### Now the dataset first needs to be downloaded. The dataset class can be used where different types of datasets have been curated and only two lines of code need to be run to download the data.

In [44]:
from ManufacturingNet import datasets

In [45]:
datasets.ThreeDPrintingData()

Alright! Now the dataset desired should be downloaded and present in the working directory.

The 3D Printing dataset consists of several continuous and discrete parameters. We can perform classification or regression depending on what the desired output attribute is. We perform classification by predicting the material used based on the input and measured parameters. We can then perform regression on possibly a different attribute in the data.


### Loading the dataset
Here, we can use the pandas library to read and import the data, since there are categorial attributes. If pandas is not installed in your environment, here is a useful reference : https://pandas.pydata.org/docs/getting_started/index.html

In [46]:
import pandas as pd

In [47]:
data = pd.read_csv("3D_printing_dataset/data.csv", sep = ",")

We then discretize the categorical attributes - infill pattern and material. 

In [48]:
data.material = [0 if each == "abs" else 1 for each in data.material]
# abs = 0, pla = 1

data.infill_pattern = [0 if each == "grid" else 1 for each in data.infill_pattern]
# grid = 0, honeycomb = 1

### Classification

For classification, we need the input data and an output variable to be predicted. 
We then separate our x and y values from the pandas dataframe. The value we want to predict is the "material", and our input data will be all the columns except "material".

In [49]:
y_data = data.material.values
x_data = data.drop(["material"],axis=1).values


We first get a birds-eye view of how the data can perform with some default classifiers. The metrics we use to measure the performance of these classifiers with some default values are Accuracy, 5-Fold cross validation, and the time. 

This will allow users to get a glance of how possible classifiers can perform on their data.

In [50]:
from ManufacturingNet.models import AllClassificationModels

In [51]:
all_models = AllClassificationModels(x_data, y_data)
all_models.run()


= All Classifier Models Parameter Inputs =

Enable verbose logging (y/N)? y
verbose = True

What fraction of the dataset should be used for testing (0,1)? 0.3
test_size = 0.3

= End of inputs; press enter to continue. =



[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: 

[LibSVM][LibSVM][LibSVM][LibSVM][LibSVM][LibSVM][LibSVM][LibSVM][LibSVM][LibSVM][LibSVM][LibSVM][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear]
= Results =

Model                Accuracy             5-Fold CV Mean       Time (seconds)      

LogisticRegression   1.0                  1.0                  0.03215217590332031 

RandomForest         1.0                  0.96                 0.16355037689208984 

SVC                  0.6                  0.6                  0.0059528350830078125

NuSVC                1.0                  0.9800000000000001   0.0059871673583984375

LinearSVC            1.0                  1.0                  0.003967761993408203

XGBClassifier        1.0                  1.0                  0.016940593719482422






If the user wants to modify a particular classifier more specifically, they are free to choose the classifier they want and pass the data to that.

The user can either choose to persist with the default parameters displayed or can customize the parameters according to their requirements.

In [52]:
from ManufacturingNet.models import RandomForest

rf_model = RandomForest(x_data, y_data)

rf_model.run_classifier()


= RandomForestClassifier Parameter Inputs =

Default values:
test_size = 0.25
cv = 5
graph_results = False
criterion = 'gini'
class_weight = None
n_estimators = 100
max_depth = None
min_samples_split = 2
min_samples_leaf = 1
min_weight_fraction_leaf = 0.0
max_features = 'auto'
max_leaf_nodes = None
min_impurity_decrease = 0.0
bootstrap = True
oob_score = False
n_jobs = None
random_state = None
verbose = 0
warm_start = False
ccp_alpha = 0.0
max_samples = None

Use default parameters (Y/n)? n

If you are unsure about a parameter, press enter to use its default value.
If you finish entering parameters early, enter 'q' to skip ahead.


What fraction of the dataset should be the testing set (0,1)? 0.3
test_size = 0.3

Use GridSearch to find the best hyperparameters (y/N)? n

Enter the number of folds for cross validation [2,): 4
cv = 4

Graph the ROC curve? Only binary classification is supported (y/N): n
graph_results = False

Enter a positive number of trees for the forest: 5
n_estimator

### Regression

For regression, we need the input data and an output value to be obtained. 
We then separate our x and y values from the pandas dataframe. In this example, the value we want to output is the "roughness", and our input data will be all the columns except "roughness".

In [53]:
y_data_lin = data.roughness.values
x_data_lin = data.drop(["roughness"],axis=1).values


We first get a birds-eye view of how the data can perform with some default regression models. The metrics we use to measure the performance of these regression models with some default parameters are R-2 score and the time taken to run the algorithm. 

This will allow users to get a glance of how possible regression models can perform on their data.

In [54]:
from ManufacturingNet.models import AllRegressionModels

models_reg = AllRegressionModels(x_data_lin, y_data_lin)
models_reg.run()


= All Regression Models Parameter Inputs =


Enable verbose logging (y/N)? y
verbose = True

What fraction of the dataset should be used for testing (0,1)? 0.3
test_size = 0.3

= End of inputs; press enter to continue. =

[LibSVM][LibSVM][LibLinear]
= Results =

Model                R2 Score             Time (seconds)      

LinearRegression     0.7839528897180778   0.0009965896606445312

RandomForest         0.7286256946765022   0.15658211708068848 

SVR                  -0.3089653305411606  0.001996278762817383

NuSVR                -0.3455457990858901  0.0009965896606445312

LinearSVR            -1.1582988832961658  0.003989458084106445

XGBRegressor         0.6152293548509122   0.023934364318847656




[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished


If the user wants to modify a particular regression model more specifically, they are free to choose the model they want and pass the data to that.

The user can either choose to persist with the default parameters displayed or can customize the parameters according to their requirements.

In [55]:
from ManufacturingNet.models import LinRegression as LinReg


model_lin = LinReg(x_data_lin, y_data_lin)
model_lin.run()

print("MSE:", model_lin.get_mean_squared_error())
print("R2:", model_lin.get_r2_score())
print("R:", model_lin.get_r_score())


= LinRegression Parameter Inputs =

Default values:
test_size = 0.25
cv = 5
graph_results = False
fit_intercept = True
normalize = False
copy_X = True
n_jobs = None

Use default parameters (Y/n)? n

If you are unsure about a parameter, press enter to use its default value.
If you finish entering parameters early, enter 'q' to skip ahead.

What fraction of the dataset should be the testing set (0,1)? 0.3
test_size = 0.3

Enter the number of folds for cross validation [2,): 4
cv = 4

Include a y-intercept in the model (Y/n)? y
fit_intercept = True

Normalize the dataset (y/N)? y
normalize = True

Copy the dataset's features (Y/n)? y
copy_X = True

Enter a positive number of CPU cores to use: 1
n_jobs = 1

= End of inputs; press enter to continue. =


= LinRegression Results =

Coefficients:
 [ 1.43922399e+03  2.72533590e+00 -1.06612868e-01 -2.62514622e+00
  1.42579384e+01 -8.00135057e+00  5.53519193e-01  2.86363338e+02
 -1.60027011e+00  7.24157778e-01 -2.27783434e+01]

Intercept:       

This is how we can use ManufacturingNet to accomplish classification and regression tasks. 
We can first obtain a birds-eye view of the performance of all the models that can be used with our data. If we want to modify  a particular model specifically for our data, we can customize the parameters for the model of our choice.