# Machine Learning in Go
---

## What is Machine Learning? 
---
Machine learning is teaching a program to recognize patterns. These patterns, once recognized can predict similar outcomes with extreme 
]accuracy. 

There are several steps to building a Machine Learning model. These steps can be accomplished in almost every programming language. Languages like Python and R that have strong mathematical tooling are popular in the Machine Learning community and many libraries have been created to facilitate building these models.

Go has a lot of benefits for machine learning. It has strong data typing, it is easily build and deployed. It also allows for concurrecy, which is important to processing large data sets. 

The plan is to run through setting up, training and testing a model in go. 

---

The data used in this example is from the datahub.io[https://datahub.io/machine-learning/magictelescope]. It is gamma radiation images from hadronic showers. 

## Understanding Data
---
Start by accessing your data. We are using a dataframe to view the file because it gives us a nice clean table like view for reading data. The dataframes used here are from the gota/dataframe[https://github.com/go-gota/gota] package. Although good for visualization, dataframe types are not currently useful for anything other than viewing the data. 

In [None]:
import (
    "os"
    "fmt"
    "bufio"
    "github.com/kniren/gota/dataframe"
)
// access file
f, err := os.Open("data/magictelescope_csv.csv")
if err != nil {
    fmt.Println(err)
}
r := bufio.NewReader(f)
// parse data into the data frame type. 
df := dataframe.ReadCSV(r)
fmt.Println(df)
f.Close()

Dataframes also do dasic statistical operations on data to we can get a clear understanding of what our data is and if the data needs to be further cleaned. 

In [None]:
df.Describe()

In [None]:
fmt.Println(df.Select([]string{"ID", "fWidth:", "fSize:","class:"}))

The last step for understanding our data is visualizing it. We are using the gonum/plot [https://github.com/gonum/plot] library to make a scatter plot of the width vs size. 

In [None]:
import (
    "bytes"
    "fmt"
    "encoding/csv"
    "gonum.org/v1/plot"
    "gonum.org/v1/plot/vg"
    "gonum.org/v1/plot/plotter"
    "strconv"
)

type GammaImage struct {
    ID int
    FLength float64
    FWidth float64
    FSize float64
    FConc float64
    FConcl float64
    FAsym float64
    FM3Long float64
    FM3Trans float64
    FAlpha float64
    FDist float64
    Class string
}

f, err := os.Open("data/magictelescope_csv.csv")
    if err != nil {
        fmt.Println("os.Open: ",err)
    }
// read in file as csv
reader := csv.NewReader(f)
records, err := reader.ReadAll()
if err != nil {
    fmt.Println("ReadAll: ", err)
}
size:=len(records)
fmt.Println("size: ", size)
images := make ([]GammaImage, size)
//store data for making plot
for idx, img := range records{
    // fmt.Println(img)
    // fmt.Printf("%T, %v\n", img, img)
    if idx != 0 {
			ID, _ := strconv.Atoi(img[0])
			FLength, _ := strconv.ParseFloat(img[1], 64)
			FWidth, _ := strconv.ParseFloat(img[2], 64)
			FSize, _ := strconv.ParseFloat(img[3], 64)
			FConc, _ := strconv.ParseFloat(img[4], 64)
			FConcl, _ := strconv.ParseFloat(img[5], 64)
			FAsym, _ := strconv.ParseFloat(img[6], 64)
			FM3Long, _ := strconv.ParseFloat(img[7], 64)
			FM3Trans, _ := strconv.ParseFloat(img[8], 64)
			FAlpha, _ := strconv.ParseFloat(img[9], 64)
			FDist, _ := strconv.ParseFloat(img[10], 64)
			image := GammaImage{
				ID:       ID,
				FLength:  FLength,
				FWidth:   FWidth,
				FSize:    FSize,
				FConc:    FConc,
				FConcl:   FConcl,
				FAsym:    FAsym,
				FM3Long:  FM3Long,
				FM3Trans: FM3Trans,
				FAlpha:   FAlpha,
				FDist:    FDist,
				Class:    img[11],
			}
        images = append(images, image)
    }
}
f.Close()

In [None]:
// store width and size for plotting
pts := make(plotter.XYs, len(images))
for i, img := range images{
    pts[i].X = img.FWidth
    pts[i].Y = img.FSize
}
// make new scatter plot
scatter, err := plotter.NewScatter(pts)
if err != nil {
    fmt.Println(err)
}
// make plot formatter    
p, err := plot.New()
if err != nil {
    fmt.Println(err)
}
// label plot
p.Title.Text = "Width vs Size"
p.X.Label.Text = "Width"
p.Y.Label.Text = "Size"
p.Add(scatter)
w, err := p.WriterTo(8*vg.Inch, 8*vg.Inch, "png")
if err != nil{
    panic(err)
}
// display inside notebook
var b bytes.Buffer
writer := bufio.NewWriter(&b)
w.WriteTo(writer)
Display(display.PNG(b.Bytes()))

# Build KNN Model
---
The K-nearest neighbor model is used to classify types based on clustering. It measure the distance between points. Here I am using the golearn/knn[https://github.com/sjwhitworth/golearn] package. 

In [None]:
import "github.com/sjwhitworth/golearn/knn"
import "github.com/sjwhitworth/golearn/base"
import "github.com/sjwhitworth/golearn/evaluation"

dataCSV, err := base.ParseCSVToInstances("data/magictelescope_csv.csv", true)
if err != nil {
fmt.Println(err)
}

k := knn.NewKnnClassifier("euclidean","kdtree",2)

// Do a training-test split
trainData, testData := base.InstancesTrainTestSplit(dataCSV, 0.75)
k.Fit(trainData)
x,y:=trainData.Size()
w,z := testData.Size()
fmt.Println(x,y,w,z)

// Calculates the Euclidean distance and returns the most popular label
predictions, err := k.Predict(testData)
if err != nil {
fmt.Println(err)
}
fmt.Println(predictions)

// Prints precision/recall metrics
confusionMat, err := evaluation.GetConfusionMatrix(testData, predictions)
if err != nil {
        fmt.Println("Unable to get confusion matrix: %s", err.Error())
}

