In [None]:
#r "nuget:AresLazarus.FsKaggle"
open FsKaggle // 

# Quick start

If you're already setup with a kaggle account and the <code>kaggle.json</code> file is under *~/.kaggle*, you can just declare the name of the dataset and the dataset owner and sit back while the requested dataset zip is downloaded to the current directory.

In [None]:
open FsKaggle

{ Owner = "dataset-owner"
  Dataset = "dataset-name"
  Request = Filename "dataset-file.csv" }
|> Kaggle.DownloadDatasetAsync  
|> Async.RunSynchronously

<style>.alert-content{ display:flex; align-items: center; }</style>
<div class="alert alert-info" style=".alert-content"><i class="fa fa-info-circle fa-2x" style="margin: .3em"></i> <span>To download the entire dataset as a zip use <code>Request = All</code></span></div>

# Extended options 

You can also call <code>Kaggle.DownloadDatasetAsync</code> using <code>FsKaggle.DownloadDatasetOptions</code>, which allows further customization:

In [None]:
open System.Threading

type DownloadDatasetOptions =
    { DatasetInfo: DatasetInfo
      Credentials: CredentialsSource
      DestinationFolder: string
      Overwrite: bool
      CancellationToken: CancellationToken option
      ReportingCallback: (ProgressData -> unit) option }

* **DatasetInfo**: The dataset options, see above.
* **Credentials**: Where to look the API key info. Default is <code>Path "~/.kaggle/kaggle.json"</code>. 
* **DestinationFolder**: Send incoming files to some other existing directory. Default is current.
* **Overwrite**: Default is false, otherwise any existing file with the same name will be overwritten without confirmation.
* **CancellationToken**: In case you think you might need to manually stop the download.
* **ReportingCallback**: Implement a custom progress reporter by inserting your own handler. Default is <code>FsKaggle.Reporter.ProgressBar</code>

## Partially setting extended options:
To set only what you need you can get the default <code>DownloadDatasetOption</code> record from any <code>DatasetInfo</code> instance and use record cloning:

In [None]:
let datasetOptions = 
    { Owner = "owner"
      Dataset = "dataset"
      Request = DatasetFile.All }

{ datasetOptions.Extended() with Overwrite = true }

# Setup a custom progress tracker

(default progress tracker at <code>FsKaggle.Reporter.ProgressBar</code>)

Waiting for kaggle data to travel over the net can be tedious, so consider the reporting callback property to visualise your progress or even make a rough estimation of the time remaining.

All you need is a function that takes an FsKaggle.ProgressData argument, which is an F# record that looks like this:

<code>type ProgressData = { TimeStamp: DateTime; Notes: string; BytesRead: int64; TotalBytes: int64; BytesPerSecond: float }</code>

## Example reporting function

In [None]:
open FsKaggle
open System

let report = ResizeArray<ProgressData>()
let simpleProgressTracking data =
    report.Add data
    
    let percent = int (100L * data.BytesRead / data.TotalBytes)
    let bar = "[".PadRight(percent,'|') + "]".PadLeft(100-percent, ' ')
    let status = sprintf "%d%% @ %.02fKB/s" percent (float data.BytesPerSecond/1024.0)
    let remainingTime = TimeSpan.FromSeconds(float (data.TotalBytes - data.BytesRead) / float data.BytesPerSecond);

    Console.Write(sprintf "%s %s %s\r" bar status (remainingTime.ToString("mm\\:ss")))    

In [None]:
open System.IO

if (Directory.Exists>>not) "Data"
then 
    Directory.CreateDirectory "Data"
    |> ignore

{ datasetOptions.Extended() with 
    DestinationFolder = "Data"    
    ReportingCallback = Some simpleProgressTracking }
|> Kaggle.DownloadDatasetAsync
|> Async.RunSynchronously

# Obligatory jupyter notebook data plot

Oh so that's why we logged the progress reports...

In [None]:
open XPlot.Plotly

Scatter(
    name = "Bytes/sec",
    showlegend = true,
    x = (report |> Seq.map (fun r -> r.TimeStamp)),
    y = (report |> Seq.map (fun r -> r.BytesPerSecond)),
    fill = "tozeroy")
|> Chart.Plot    