# Using shallow learning algorithms from ManufacturingNet
##### To know more about ManufacturingNet please visit: http://manufacturingnet.io/

In [1]:
import sys
sys.path.append('../')  # add parent directory to path

In [2]:
import ManufacturingNet
import numpy as np

First we import manufacturingnet. We can use this to experiment with several shallow learning models.

It is important to note that all the dependencies of the package must also be installed in your environment. 

##### Now the dataset first needs to be downloaded. The dataset class can be used where different types of datasets have been curated and only two lines of code need to be run to download the data.

In [3]:
from ManufacturingNet import datasets

In [4]:
datasets.ThreeDPrintingData()

Alright! Now the dataset desired should be downloaded and present in the working directory.

The 3D Printing dataset consists of several continuous and discrete parameters. We can perform classification or regression depending on what the desired output attribute is. We perform classification by predicting the material used based on the input and measured parameters. We can then perform regression on possibly a different attribute in the data.


### Loading the dataset
Here, we can use the pandas library to read and import the data, since there are categorial attributes. If pandas is not installed in your environment, here is a useful reference : https://pandas.pydata.org/docs/getting_started/index.html

In [5]:
import pandas as pd

In [6]:
data = pd.read_csv("3D_printing_dataset/data.csv", sep = ",")

We then discretize the categorical attributes - infill pattern and material. 

In [7]:
data.material = [0 if each == "abs" else 1 for each in data.material]
# abs = 0, pla = 1

data.infill_pattern = [0 if each == "grid" else 1 for each in data.infill_pattern]
# grid = 0, honeycomb = 1

### Classification

For classification, we need the input data and an output variable to be predicted. 
We then separate our x and y values from the pandas dataframe. The value we want to predict is the "material", and our input data will be all the columns except "material".

In [8]:
y_data = data.material.values
x_data = data.drop(["material"],axis=1).values


We first get a birds-eye view of how the data can perform with some default classifiers. The metrics we use to measure the performance of these classifiers with some default values are Accuracy, 5-Fold cross validation, and the time. 

This will allow users to get a glance of how possible classifiers can perform on their data.

In [9]:
from ManufacturingNet.models import AllClassificationModels

In [10]:
all_models = AllClassificationModels(x_data, y_data)
all_models.run()


= All Classifier Models Parameter Inputs =
verbose = True
test_size = 0.2

= End of inputs; press enter to continue. =


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
 This problem is unconstrained.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
 This problem is unconstrained.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-re

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           12     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.77259D+01    |proj g|=  4.12500D+02

At iterate   50    f=  1.51161D+00    |proj g|=  3.35865D-02

At iterate  100    f=  1.50955D+00    |proj g|=  1.43180D-01

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
   12    100    116      1     0     0   1.432D-01   1.510D+00
  F =   1.5095472175421287     

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT                 
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220

[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Us

[LibSVM]*
optimization finished, #iter = 23
obj = -22.939326, rho = -0.163843
nSV = 27, nBSV = 23
Total nSV = 27
*
optimization finished, #iter = 24
obj = -25.153360, rho = 0.159480
nSV = 30, nBSV = 27
Total nSV = 30
*.*
optimization finished, #iter = 36
obj = -20.443385, rho = -0.181031
nSV = 26, nBSV = 21
Total nSV = 26
*
optimization finished, #iter = 24
obj = -23.232356, rho = 0.150667
nSV = 28, nBSV = 23
Total nSV = 28
*.*
optimization finished, #iter = 41
obj = -22.459159, rho = -0.491829
nSV = 26, nBSV = 20
Total nSV = 26
*.*
optimization finished, #iter = 55
obj = -28.355148, rho = 0.059459
nSV = 35, nBSV = 29
Total nSV = 35
[LibSVM]*
optimization finished, #iter = 16
obj = -24.325072, rho = -0.051267
nSV = 29, nBSV = 26
Total nSV = 29
*
optimization finished, #iter = 23
obj = -26.160815, rho = 0.974156
nSV = 30, nBSV = 26
Total nSV = 30
*
optimization finished, #iter = 18
obj = -22.805646, rho = 0.112783
nSV = 27, nBSV = 24
Total nSV = 27
*
optimization finished, #iter = 22
ob

If the user wants to modify a particular classifier more specifically, they are free to choose the classifier they want and pass the data to that.

The user can either choose to persist with the default parameters displayed or can customize the parameters according to their requirements.

In [11]:
from ManufacturingNet.models import RandomForest

rf_model = RandomForest(x_data, y_data)

rf_model.run_classifier()


= RandomForestClassifier Parameter Inputs =

Default values:
test_size = 0.25
cv = 5
graph_results = False
criterion = 'gini'
class_weight = None
n_estimators = 100
max_depth = None
min_samples_split = 2
min_samples_leaf = 1
min_weight_fraction_leaf = 0.0
max_features = 'auto'
max_leaf_nodes = None
min_impurity_decrease = 0.0
bootstrap = True
oob_score = False
n_jobs = None
random_state = None
verbose = 0
warm_start = False
ccp_alpha = 0.0
max_samples = None

= End of inputs; press enter to continue. =

= RandomForestClassifier Results =

Classes:
 [0 1]

Accuracy:            1.0                 

ROC AUC:             1.0                 

Cross Validation Scores: [1.  1.  1.  0.8 1. ]

Feature Importances: [0.03085996 0.0673289  0.08198325 0.02094606 0.38845729 0.05159277
 0.00707522 0.06071114 0.0664804  0.10127104 0.12329398]


Call predict_classifier() to make predictions for new data.

= End of results. =



### Regression

For regression, we need the input data and an output value to be obtained. 
We then separate our x and y values from the pandas dataframe. In this example, the value we want to output is the "roughness", and our input data will be all the columns except "roughness".

In [12]:
y_data_lin = data.roughness.values
x_data_lin = data.drop(["roughness"],axis=1).values

We first get a birds-eye view of how the data can perform with some default regression models. The metrics we use to measure the performance of these regression models with some default parameters are R-2 score and the time taken to run the algorithm. 

This will allow users to get a glance of how possible regression models can perform on their data.

In [13]:
from ManufacturingNet.models import AllRegressionModels

models_reg = AllRegressionModels(x_data_lin, y_data_lin)
models_reg.run()


= All Regression Models Parameter Inputs =

verbose = True
test_size = 0.2

= End of inputs; press enter to continue. =


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished


[LibSVM][LibSVM]*
optimization finished, #iter = 20
obj = -2938.211052, rho = -165.272761
nSV = 40, nBSV = 40
*
optimization finished, #iter = 18
epsilon = 65.172899
obj = -2345.500603, rho = -157.239148
nSV = 20, nBSV = 20
[LibLinear]....................................................................................................
optimization finished, #iter = 1000

Using -s 11 may be faster

Objective value = -2249.829485
nSV = 40

= Results =

Model                R2 Score             Time (seconds)      

LinearRegression     0.8495032715191071   0.011422872543334961

RandomForest         0.790419536435486    0.12665534019470215 

SVR                  -0.0024695127050318177 0.0007905960083007812

NuSVR                -0.010896841754949094 0.0004944801330566406

LinearSVR            -0.5892127676794754  0.001963376998901367

XGBRegressor         0.9062022279834505   2.391239881515503   




If the user wants to modify a particular regression model more specifically, they are free to choose the model they want and pass the data to that.

The user can either choose to persist with the default parameters displayed or can customize the parameters according to their requirements.

In [14]:
from ManufacturingNet.models import LinRegression as LinReg


model_lin = LinReg(x_data_lin, y_data_lin)
model_lin.run()

print("MSE:", model_lin.get_mean_squared_error())
print("R2:", model_lin.get_r2_score())
print("R:", model_lin.get_r_score())


= LinRegression Parameter Inputs =

Default values:
test_size = 0.25
cv = 5
graph_results = False
fit_intercept = True
normalize = False
copy_X = True
n_jobs = None

= End of inputs; press enter to continue. =

= LinRegression Results =

Coefficients:
 [ 1.39931120e+03  1.84748087e+00  1.06309991e-01 -1.12894328e+01
  1.26434657e+01 -5.52559352e-01  6.36071840e-01  2.64259648e+02
 -2.76279676e+00  1.17877621e+00 -3.52518138e+01]

Intercept:           -2747.1297188706626 

Mean Squared Error:  1496.9165446449076  

R2 Score:            0.8965442504291975  

R Score:             0.9468602063817011  

Cross Validation Scores:
 [-6.83994071  0.28585519  0.48941658  0.3642745  -0.75248048]


Call predict() to make predictions for new data.

= End of results. =

MSE: 1496.9165446449076
R2: 0.8965442504291975
R: 0.9468602063817011


This is how we can use ManufacturingNet to accomplish classification and regression tasks. 
We can first obtain a birds-eye view of the performance of all the models that can be used with our data. If we want to modify  a particular model specifically for our data, we can customize the parameters for the model of our choice.