# Machine Learning in Go
---

## What is Machine Learning? 
---
Machine learning is teaching a program to recognize patterns. These patterns, once recognized can predict similar outcomes with extreme 
]accuracy. 

There are several steps to building a Machine Learning model. These steps can be accomplished in almost every programming language. Languages like Python and R that have strong mathematical tooling are popular in the Machine Learning community and many libraries have been created to facilitate building these models.

Go has a lot of benefits for machine learning. It has strong data typing, it is easily build and deployed. It also allows for concurrecy, which is important to processing large data sets. 

The plan is to run through setting up, training and testing a model in go. 

---

The data used in this example is from the datahub.io[https://datahub.io/machine-learning/magictelescope]. It is gamma radiation images from hadronic showers. 

## Understanding Data
---
Start by accessing your data. We are using a dataframe to view the file because it gives us a nice clean table like view for reading data. The dataframes used here are from the gota/dataframe[https://github.com/go-gota/gota] package. Although good for visualization, dataframe types are not currently useful for anything other than viewing the data. 

In [7]:
import (
    "os"
    "fmt"
    "bufio"
    "github.com/kniren/gota/dataframe"
)
// access file
f, err := os.Open("data/magictelescope_csv.csv")
if err != nil {
    fmt.Println(err)
}
r := bufio.NewReader(f)
// parse data into the data frame type. 
df := dataframe.ReadCSV(r)
fmt.Println(df)
f.Close()

[19020x12] DataFrame

    ID    fLength:   fWidth:    fSize:   fConc:   fConc1:  fAsym:     ...
 0: 1     28.796700  16.002100  2.644900 0.391800 0.198200 27.700400  ...
 1: 2     31.603600  11.723500  2.518500 0.530300 0.377300 26.272200  ...
 2: 3     162.052000 136.031000 4.061200 0.037400 0.018700 116.741000 ...
 3: 4     23.817200  9.572800   2.338500 0.614700 0.392200 27.210700  ...
 4: 5     75.136200  30.920500  3.161100 0.316800 0.183200 -5.527700  ...
 5: 6     51.624000  21.150200  2.908500 0.242000 0.134000 50.876100  ...
 6: 7     48.246800  17.356500  3.033200 0.252900 0.151500 8.573000   ...
 7: 8     26.789700  13.759500  2.552100 0.423600 0.217400 29.633900  ...
 8: 9     96.232700  46.516500  4.154000 0.077900 0.039000 110.355000 ...
 9: 10    46.761900  15.199300  2.578600 0.337700 0.191300 24.754800  ...
    ...   ...        ...        ...      ...      ...      ...        ...
    <int> <float>    <float>    <float>  <float>  <float>  <float>    ...

Not Showing: fM

Dataframes also do dasic statistical operations on data to we can get a clear understanding of what our data is and if the data needs to be further cleaned. 

In [14]:
df.Describe()

[7x13] DataFrame

    column   ID           fLength:   fWidth:    fSize:   fConc:   fConc1:  ...
 0: mean     9510.500000  53.250154  22.180966  2.825017 0.380327 0.214657 ...
 1: stddev   5490.745396  42.364855  18.346056  0.472599 0.182813 0.110511 ...
 2: min      1.000000     4.283500   0.000000   1.941300 0.013100 0.000300 ...
 3: 25%      4755.000000  24.336000  11.863500  2.477100 0.235800 0.128400 ...
 4: 50%      9510.000000  37.146400  17.139200  2.739600 0.354100 0.196500 ...
 5: 75%      14265.000000 70.117500  24.739000  3.101600 0.503700 0.285200 ...
 6: max      19020.000000 334.177000 256.382000 5.323300 0.893000 0.675200 ...
    <string> <float>      <float>    <float>    <float>  <float>  <float>  ...

Not Showing: fAsym: <float>, fM3Long: <float>, fM3Trans: <float>, fAlpha: <float>,
fDist: <float>, class: <string>


In [15]:
fmt.Println(df.Select([]string{"ID", "fWidth:", "fSize:","class:"}))

[19020x4] DataFrame

    ID    fWidth:    fSize:   class:  
 0: 1     16.002100  2.644900 g       
 1: 2     11.723500  2.518500 g       
 2: 3     136.031000 4.061200 g       
 3: 4     9.572800   2.338500 g       
 4: 5     30.920500  3.161100 g       
 5: 6     21.150200  2.908500 g       
 6: 7     17.356500  3.033200 g       
 7: 8     13.759500  2.552100 g       
 8: 9     46.516500  4.154000 g       
 9: 10    15.199300  2.578600 g       
    ...   ...        ...      ...     
    <int> <float>    <float>  <string>



529 <nil>

The last step for understanding our data is visualizing it. We are using the gonum/plot [https://github.com/gonum/plot] library to make a scatter plot of the width vs size. 

In [None]:
import (
    "bytes"
    "fmt"
    "encoding/csv"
    "gonum.org/v1/plot"
    "gonum.org/v1/plot/vg"
    "gonum.org/v1/plot/plotter"
    "strconv"
)

type GammaImage struct {
    ID int
    FLength float64
    FWidth float64
    FSize float64
    FConc float64
    FConcl float64
    FAsym float64
    FM3Long float64
    FM3Trans float64
    FAlpha float64
    FDist float64
    Class string
}

f, err := os.Open("data/magictelescope_csv.csv")
    if err != nil {
        fmt.Println(err)
    }
// read in file as csv
reader := csv.NewReader(f)
records, err := reader.ReadAll()
if err != nil {
    fmt.Println(err)
}
size:=len(records)
images := make ([]GammaImage, size)
//store data for making plot
for idx, img := range records{
    if idx != 0 {
        image := GammaImage{
            ID: strconv.Atoi(img[0]),
            FLength: strconv.ParseFloat(img[1], 64),
            FWidth: strconv.ParseFloat(img[2], 64),
            FSize: strconv.ParseFloat(img[3], 64),
            FConc: strconv.ParseFloat(img[4], 64),
            FConcl: strconv.ParseFloat(img[5], 64),
            FAsym: strconv.ParseFloat(img[6], 64),
            FM3Long: strconv.ParseFloat(img[7], 64),
            FM3Trans: strconv.ParseFloat(img[8], 64),
            FAlpha: strconv.ParseFloat(img[9], 64),
            FDist: strconv.ParseFloat(img[10], 64),
            Class: img[11],
        }
        images = append(images, image)
    }
}
f.Close()

In [16]:
// store width and size for plotting
pts := make(plotter.XYs, len(images))
for i, img := range images{
    pts[i].X = img.FWidth
    pts[i].Y = img.FSize
}
// make new scatter plot
scatter, err := plotter.NewScatter(pts)
if err != nil {
    fmt.Println(err)
}
// make plot formatter    
p, err := plot.New()
if err != nil {
    fmt.Println(err)
}
// label plot
p.Title.Text = "Width vs Size"
p.X.Label.Text = "Width"
p.Y.Label.Text = "Size"
p.Add(scatter)
w, err := p.WriterTo(8*vg.Inch, 8*vg.Inch, "png")
if err != nil{
    panic(err)
}
// display inside notebook
var b bytes.Buffer
writer := bufio.NewWriter(&b)
w.WriteTo(writer)
Display(display.PNG(b.Bytes()))

ERROR: repl.go:1:13: undefined "plotter" in plotter.XYs <*ast.SelectorExpr>

# Build KNN Model
---
The K-nearest neighbor model is used to classify types based on clustering. It measure the distance between points. Here I am using the golearn/knn[https://github.com/sjwhitworth/golearn] package. 

In [20]:
import "github.com/sjwhitworth/golearn/knn"
import "github.com/sjwhitworth/golearn/base"
import "github.com/sjwhitworth/golearn/evaluation"

dataCSV, err := base.ParseCSVToInstances("data/magictelescope_csv.csv", true)
if err != nil {
fmt.Println(err)
}

k := knn.NewKnnClassifier("euclidean","kdtree",2)

// Do a training-test split
trainData, testData := base.InstancesTrainTestSplit(dataCSV, 0.75)
k.Fit(trainData)
x,y:=trainData.Size()
w,z := testData.Size()
fmt.Println(x,y,w,z)

// Calculates the Euclidean distance and returns the most popular label
predictions, err := k.Predict(testData)
if err != nil {
fmt.Println(err)
}
fmt.Println(predictions)

// Prints precision/recall metrics
confusionMat, err := evaluation.GetConfusionMatrix(testData, predictions)
if err != nil {
        fmt.Println("Unable to get confusion matrix: %s", err.Error())
}



12 4735 12 14285


ERROR: matrix: zero length in matrix dimension