# Chicago Golang Meetup Demo #1

## What is Jupyter and Gophernotes?

[Jupyter](http://jupyter.org/) is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

[gophernotes](https://github.com/gopherds/gophernotes) is a Go kernel for Juypter that lets you program in Go within a Jupyter notebook (interactively).

## This Demo Notebook

In this notebook we are going to:

1. Explore the github data we gathered.
2. Visualize the github data.
4. Make a prediction.

# 1. Exploring our Github Data

### 1.1 What does our data look like?

Let's use a "dataframes" package call Gota, to parse our data and explore it:

In [None]:
import (
    "fmt"
    "io/ioutil"
    "github.com/kniren/gota/data-frame"
)

In [None]:
// Pull in the CSV data
csvData, err := ioutil.ReadFile("repodata.csv")
if err != nil {
    fmt.Println(err.Error())
}

// Load a dataframe from the CSV string.
// The types of the columns will be inferred.
repoDataFrame := df.ReadCSV(string(csvData))

In [None]:
// We can create and view subsets of our data.
// For example, let's get the first and last entries,
// and print out only the repo name, forks, stars, and issues.
filtered1 := repoDataFrame.Subset([]int{0,repoDataFrame.Nrow()-1}).Select("repo_name", "forks", "stars", "issues")
fmt.Println(filtered1)

In [None]:
// We can do cool things like see the
// names of repos with more than 30k stars.
filtered2 := repoDataFrame.Filter(df.F{"stars", ">", 30000}).Select("repo_name")
fmt.Println(filtered2)

### 1.2 Let's pick out only the data we are interested in.

Let's say that we are only interested in the repos and when they were created.  We are going to parse out only those columns and save back to a processed CSV.  Gota makes this very quick and easy.

In [None]:
// Read in the original data.
outputData := df.ReadCSV(string(csvData))

// Select out the columns of interest.
outputData = outputData.Select("repo_name", "created_date")

// Save this data to another file.
b, err := outputData.SaveCSV()
if err != nil {
    fmt.Println(err)
}
if err = ioutil.WriteFile("processed.csv", b, 0644); err != nil {
    fmt.Println(err)
}

# 2. Visualize our Data.

Now that we have picked out the repo names and created datetimes, let's create a time series of the number of Go repos created daily on github.  Then let's visualize that time series with github.com/gonum/plot.

### 2.1 Create the time series.

In [None]:
import (
    "fmt"
    "io/ioutil"
    "bytes"
    "encoding/csv"
    "time"
    "sort"
)

In [None]:
// Create a map to store the daily counts of created repos.
countMap := make(map[int]int)

// Get the data..
csvData, err := ioutil.ReadFile("processed.csv")
if err != nil {
    fmt.Println(err.Error())
}

// Extract the records from the data.
reader := csv.NewReader(bytes.NewReader(csvData))
reader.FieldsPerRecord = 2
records, err := reader.ReadAll()
if err != nil {
    fmt.Println(err)
}

// Create a map of daily created repos where the keys are the days and
// the values are the counts of created repos on that day.
startTime := time.Date(2013, time.January, 1, 0, 0, 0, 0, time.UTC)
layout := "2006-01-02 15:04:05"
for idx, each := range records {
    
    // Skip the header line
    if idx == 0 {
        continue
    }
    
    // Parse the time.
    t, err := time.Parse(layout, each[1][0:19])
    if err != nil {
        fmt.Println(err)
    }
    
    // Increment the counter
    interval := int(t.Sub(startTime).Hours() / 24.0)
    countMap[interval]++
}

In [None]:
// Sort the day values, which is required for plotting.
var keys []int
for k := range countMap {
    keys = append(keys, k)
}

In [None]:
sort.Ints(keys)
var sortedCounts [][]int
for _, k := range keys {
    sortedCounts = append(sortedCounts, []int{k, countMap[k]})
}

In [None]:
sortedCounts[0:3]

### 2.2 Visualize the time series.

In [None]:
import (
    "github.com/gonum/plot"
    "github.com/gonum/plot/plotter"
    "github.com/gonum/plot/plotutil"
    "github.com/gonum/plot/vg"
)

In [None]:
// Prepare the points for plotting.
pts := make(plotter.XYs, len(sortedCounts))
var i int
for _, count := range sortedCounts {
    pts[i].X = float64(count[0])
    pts[i].Y = float64(count[1])
    i++
}

In [None]:
// Create a new plot.
p, err := plot.New()
if err != nil {
    fmt.Println(err)
}

// Label the new plot.
p.Title.Text = "Daily Counts of Go Repos Created"
p.X.Label.Text = "Days from Jan. 1, 2013"
p.Y.Label.Text = "Count"

// Add the prepared points to the plot.
if err = plotutil.AddLinePoints(p, "Counts", pts); err != nil {
    fmt.Println(err)
}

// Save the plot to a PNG file.
if err := p.Save(7*vg.Inch, 4*vg.Inch, "countseries.png"); err != nil {
    fmt.Println(err)
}

# 3. Make a Prediction

Now that we have parsed, organized, and visualized our data.  Let's make a prediction based on our data.  Because we have a daily time series of counts of Go repos created, we are going to try and predict how many Go repositories will be created at some future date using github.com/sajari/regression.

In [None]:
import "github.com/sajari/regression"

In [None]:
// Fit the regression model.
var r regression.Regression
r.SetObserved("count of created Github repos")
r.SetVar(0, "days since Jan 1 2013")
for _, count := range sortedCounts {
    r.Train(regression.DataPoint(
        float64(count[1]),
        []float64{float64(count[0])}))
}

In [None]:
r.Run()

In [None]:
// Predict how many Go repos will be created today
prediction, err := r.Predict([]float64{1500.0})
if err != nil {
    fmt.Println(err)
}

In [None]:
r

In [None]:
prediction