<div style='margin: auto; width: 80%;'><h1 style='font-size: 55px; display: inline-block'>CSV to OpenML Helper</h1> <img style="float: left; height:80px; margin-right:10px;" src="https://raw.githubusercontent.com/PGijsbers/Talks/master/odsc/images/openml/dots.png"></div> <i class="fas fa-file-csv"></i>

This notebook helps you upload a csv-file to OpenML.
To use this notebook, run it cell-by-cell.
Whenever the text prompts you to do something, do that before continuing in the notebook.

If you experience issues using this notebook, or have further questions, please [click here](https://github.com/PGijsbers/csv-to-openml/issues/new) to open an issue on Github.

In [1]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import Markdown

import csv
import io
import re
import numpy as np
import openml
import pandas as pd

In [2]:
from app_widgets import file_upload_widget, publish_button, data_annotation_widget_collection,metadata_widget, DatasetAnnotation
from app_logic import csv_bytes_to_dataframe, create_openml_dataset

data = None
da = DatasetAnnotation()

In [3]:
def on_file_uploaded(input_):
    global data
    data = csv_bytes_to_dataframe(input_['new'][0])

file_upload_widget.observe(on_file_uploaded, 'data')
file_upload_widget

FileUpload(value={}, accept='.csv', description='Select a csv file')

Please provide select the CSV file you want to upload to OpenML:

In [None]:
Markdown(f"The selected file has {len(data)} rows and {len(data.columns)} columns. "
         "Below is a preview of the first rows of your csv file.")

In [None]:
data.head()

OpenML wants to capture some rich meta-data about uploaded datasets, so that other users and programs may make better use of the datasets.

It's crucial for OpenML to know the *type* of data in each column.
Each feature should be one of:

 - A numeric feature. Examples: `car price` or `tree height`
 - A string (text). Examples: `sales text` or `tree name`
 - A categorical feature (can only take one of a set of unique values). Examples: `car color` (red, blue, ...) or `evergreen` (yes, no).
 
Based on the data found in our csv file, we inferred the the types for each column.
Below you will find a table which allows you to add or edit any of the feature meta-data OpenML accepts:

In the **'Column Names'** column you will find the column names of your data.
You can edit the column names directly.

The **'Column Types'** column shows the column types as described above, as inferred from the data.
If the column type is not correct, please select the correct option from the dropdown menu.

The **'Example Values'** column simply shows some values of the column for easy reference.
This column should not be edited (editing it has no effect).

In the **'Ignore'** column, you can select the columns which should be ignored when creating models (e.g. identifiers or indexes).
If no such column exists in the dataset, this column may be ignored.

In the **'ID'** column you can select the column that contains row ids, if such a column is present.
If no row id column is present in the dataset, this column may be ignored.


Please check that the names and types are correct, complete the 'Ignore' and 'ID' columns

In [None]:
data_annotation_widget_collection(data, da)

Before you continue, double-check that the data, column names and data types look correct (if not, retrace steps above). 

*If you accidentally selected an ID or Target column but there should be none, please rerun the large code cell before the previous markdown segment as well as the `widgets.VBox(coltype_widgets)` cell. This will erase the Ignore, ID and Target data (but not names and types).*

## Meta-data

Thanks for bearing with us! All this extra information is going to make sure the dataset is easier for others to find and understand. There's just a few more things we'd like to know:

In [None]:
metadata_widget(da)

## Uploading to OpenML
The following few code cells process your input and format the data for uploading.

In [None]:
oml_dataset = create_openml_dataset(data, da)

In [None]:
# Cell for testing
openml.config.start_using_configuration_for_example()
# openml.config.apikey = ''

In [None]:
def publish(_):
    oml_dataset.publish()
    display(oml_dataset)
    publish_button.disabled = True
    publish_button.description = 'Published!'

publish_button.on_click(publish)

In [None]:
text_and_input = None
if openml.config.apikey == '':
    key_text = widgets.Output()    
    need_api_key_text = """    
    We noticed you have not configured an API key for OpenML yet.
    To find your API key, log in on the [OpenML website](https://openml.org) ([register](https://www.openml.org/register) if needed)
    , go to your account page (click the avatar image on the top right) and click "API Authentication".
    """
    with key_text:
        display(Markdown(need_api_key_text))

    def set_openml_apikey(key):
        openml.config.apikey = key
        if re.fullmatch('[a-f0-9]{32}', key):
            text_and_input.close()
            publish_button.disabled = False

    key_input = interactive(set_openml_apikey, key='')
    key_input.kwargs_widgets[0].description = 'API Key:'

    text_and_input = widgets.VBox([key_text, key_input])
    # show 'need_api_key_text'
else:
    publish_button.disabled = False

Below the following the following cell you should find the button which allows you to publish to OpenML!
In case your authentication is not (correcty) configured, follow the instructions to enable the button.

In [None]:
if text_and_input is not None:
    display(text_and_input)
display(publish_button)

---
### Thank you very much for sharing your dataset and contributing to a world of Open Science!

-----
#### Please Ignore Anything Below

TODOs:

[ ] Format text as Markdown for API Key message (Maybe have separate cell output markdown widget).

[x] Infer Row ID attribute and/or allow user to set this. 
    [ ] try infer?
    
[x] Set default target attribute 
    [ ] try infer?

[ ] Perform checks for **required** meta-data fields before publish (name, description)

[ ] Bug - Column names may not be identical **at any point**.

[ ] Input Checking - Perform xsd checks (e.g. no space in column name)