# Using shallow learning algorithms from ManufacturingNet
##### To know more about ManufacturingNet please visit: http://manufacturingnet.io/

In [2]:
import ManufacturingNet
import numpy as np

First we import manufacturingnet. We can use this to experiment with several shallow learning models.

It is important to note that all the dependencies of the package must also be installed in your environment. 

##### Now the dataset first needs to be downloaded. The dataset class can be used where different types of datasets have been curated and only two lines of code need to be run to download the data.

In [3]:
from ManufacturingNet import datasets

In [4]:
datasets.ThreeDPrintingData()

Downloading...
From: https://drive.google.com/uc?id=1VhZcOgNOEw_Sciuww25XZdIuaqO90Nkj
To: /home/cmu/ManufacturingNet/tutorials/ThreeDPrintingData.zip
100%|██████████| 928/928 [00:00<00:00, 2.50MB/s]


Alright! Now the dataset desired should be downloaded and present in the working directory.

The 3D Printing dataset consists of several continuous and discrete parameters. We can perform classification or regression depending on what the desired output attribute is. We perform classification by predicting the material used based on the input and measured parameters. We can then perform regression on possibly a different attribute in the data.


### Loading the dataset
Here, we can use the pandas library to read and import the data, since there are categorial attributes. If pandas is not installed in your environment, here is a useful reference : https://pandas.pydata.org/docs/getting_started/index.html

In [5]:
import pandas as pd

In [6]:
data = pd.read_csv("3D_printing_dataset/data.csv", sep = ",")

We then discretize the categorical attributes - infill pattern and material. 

In [8]:
data.material = [0 if each == "abs" else 1 for each in data.material]
# abs = 0, pla = 1

data.infill_pattern = [0 if each == "grid" else 1 for each in data.infill_pattern]
# grid = 0, honeycomb = 1

### Classification

For classification, we need the input data and an output variable to be predicted. 
We then separate our x and y values from the pandas dataframe. The value we want to predict is the "material", and our input data will be all the columns except "material".

In [9]:
y_data = data.material.values
x_data = data.drop(["material"],axis=1).values


We first get a birds-eye view of how the data can perform with some default classifiers. The metrics we use to measure the performance of these classifiers with some default values are Accuracy, 5-Fold cross validation, and the time. 

This will allow users to get a glance of how possible classifiers can perform on their data.

In [10]:
from ManufacturingNet.models import AllClassificationModels

In [11]:
all_models = AllClassificationModels(x_data, y_data)
all_models.run()


= All Classifier Models Parameter Inputs =
verbose = True
test_size = 0.25

= End of inputs; press enter to continue. =

LogisticRegression failed. Exception message:
This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1 



[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_j


SVC failed. Exception message:
The number of classes has to be greater than one; got 1 class 


NuSVC failed. Exception message:
The number of classes has to be greater than one; got 1 class 


LinearSVc failed. Exception message:
This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1 


XGBClassifier failed. Exception message:
Invalid classes inferred from unique values of `y`.  Expected: [0], got [1] 


= Results =

Model                Accuracy             5-Fold CV Mean       Time (seconds)      

RandomForest         1.0                  1.0                  0.11263608932495117 

The following models failed to run:

LogisticRegression
SVC
NuSVC
LinearSVC
XGBClassifier



[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished


If the user wants to modify a particular classifier more specifically, they are free to choose the classifier they want and pass the data to that.

The user can either choose to persist with the default parameters displayed or can customize the parameters according to their requirements.

In [12]:
from ManufacturingNet.models import RandomForest

rf_model = RandomForest(x_data, y_data)

rf_model.run_classifier()


= RandomForestClassifier Parameter Inputs =

Default values:
test_size = 0.25
cv = 5
graph_results = False
criterion = 'gini'
class_weight = None
n_estimators = 100
max_depth = None
min_samples_split = 2
min_samples_leaf = 1
min_weight_fraction_leaf = 0.0
max_features = 'auto'
max_leaf_nodes = None
min_impurity_decrease = 0.0
bootstrap = True
oob_score = False
n_jobs = None
random_state = None
verbose = 0
warm_start = False
ccp_alpha = 0.0
max_samples = None

= End of inputs; press enter to continue. =

= RandomForestClassifier Results =

Classes:
 [1]

Accuracy:            1.0                 

Confusion Matrix:
 [[13]]

Cross Validation Scores: [1. 1. 1. 1. 1.]

Feature Importances: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


Call predict_classifier() to make predictions for new data.

= End of results. =



### Regression

For regression, we need the input data and an output value to be obtained. 
We then separate our x and y values from the pandas dataframe. In this example, the value we want to output is the "roughness", and our input data will be all the columns except "roughness".

In [13]:
y_data_lin = data.roughness.values
x_data_lin = data.drop(["roughness"],axis=1).values

We first get a birds-eye view of how the data can perform with some default regression models. The metrics we use to measure the performance of these regression models with some default parameters are R-2 score and the time taken to run the algorithm. 

This will allow users to get a glance of how possible regression models can perform on their data.

In [14]:
from ManufacturingNet.models import AllRegressionModels

models_reg = AllRegressionModels(x_data_lin, y_data_lin)
models_reg.run()


= All Regression Models Parameter Inputs =

verbose = True
test_size = 0.25

= End of inputs; press enter to continue. =
[LibSVM][LibSVM]*
optimization finished, #iter = 18
obj = -3025.054628, rho = -168.091317
nSV = 36, nBSV = 36
*
optimization finished, #iter = 16
epsilon = 86.186773
obj = -2440.444148, rho = -178.451713
nSV = 20, nBSV = 18
[LibLinear]....................................................................................................
optimization finished, #iter = 1000

Using -s 11 may be faster

Objective value = -2459.234403
nSV = 37

= Results =

Model                R2 Score             Time (seconds)      

LinearRegression     0.8693412643632428   0.14420318603515625 

RandomForest         0.8708424953956723   0.1092691421508789  

SVR                  -0.040819647472628784 0.02405858039855957 

NuSVR                -0.10098978747025633 0.00043845176696777344

LinearSVR            -0.3248304204246244  0.0016911029815673828

XGBRegressor         0.8434830286881

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.0s finished


If the user wants to modify a particular regression model more specifically, they are free to choose the model they want and pass the data to that.

The user can either choose to persist with the default parameters displayed or can customize the parameters according to their requirements.

In [15]:
from ManufacturingNet.models import LinRegression as LinReg


model_lin = LinReg(x_data_lin, y_data_lin)
model_lin.run()

print("MSE:", model_lin.get_mean_squared_error())
print("R2:", model_lin.get_r2_score())
print("R:", model_lin.get_r_score())


= LinRegression Parameter Inputs =

Default values:
test_size = 0.25
cv = 5
graph_results = False
fit_intercept = True
normalize = False
copy_X = True
n_jobs = None

= End of inputs; press enter to continue. =

= LinRegression Results =

Coefficients:
 [ 1.50127922e+03  3.60458777e+00  3.16100838e-01 -2.74404943e-11
  1.81933303e+00 -6.01404671e-02  5.17822426e-01  6.25277607e-13
 -3.00702335e-01  5.27989686e-01 -3.76072117e+01]

Intercept:           -384.8244587422615  

Mean Squared Error:  1997.0339841582975  

R2 Score:            0.6040234380012761  

R Score:             0.7771894479477164  

Cross Validation Scores:
 [-16.34166016  -0.50854448   0.47085326  -0.06020198  -0.6218723 ]


Call predict() to make predictions for new data.

= End of results. =

MSE: 1997.0339841582975
R2: 0.6040234380012761
R: 0.7771894479477164


This is how we can use ManufacturingNet to accomplish classification and regression tasks. 
We can first obtain a birds-eye view of the performance of all the models that can be used with our data. If we want to modify  a particular model specifically for our data, we can customize the parameters for the model of our choice.