If you experience issues using this notebook, or have further questions, please [click here](https://github.com/PGijsbers/csv-to-openml/issues/new) to open an issue on Github.

TODOs:

[ ] Make bare bones first

[ ] Format text as Markdown for API Key message (Maybe have separate cell output markdown widget).

[ ] Feedback on loaded file: column names, number columns, instances.

[ ] Infer Column Types

[ ] Allow user to modify column types.

[ ] OpenML Logo :)


In [1]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import Markdown

import csv
import io
import re
import openml
import pandas as pd

In [2]:
# UI components that will be rendered in this notebook:
upload_widget = widgets.FileUpload(
    accept='.csv',
    multiple=False,
    description='Select a csv file'
)

publish_button = widgets.Button(
    description='Publish dataset',
    disabled=True,
    button_style='', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click me',
    icon='check',
    visible=False
)

Please provide select the CSV file you want to upload to OpenML:

In [3]:
data = None

def on_file_uploaded(input_):
    global data
    file_content = io.StringIO(upload_widget.data[0].decode())
    
    has_header = csv.Sniffer().has_header(file_content.read(1024))
    file_content.seek(0)
    
    data = pd.read_csv(file_content, header=0 if has_header else None)
    publish_button.visible=True

upload_widget.observe(on_file_uploaded, 'data')
upload_widget

FileUpload(value={}, accept='.csv', description='Select a csv file')

In [11]:
Markdown(f"The selected file has {len(data)} rows and {len(data.columns)} columns. "
         "Below is a preview of the first rows of your csv file.")

The selected file has 150 rows and 5 columns. Below is a preview of the first rows of your csv file.

In [6]:
data.head()

Unnamed: 0,A,B,C,D,Flower
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


    def __init__(self, name, description, format=None,
                 data_format='arff', dataset_id=None, version=None,
                 creator=None, contributor=None, collection_date=None,
                 upload_date=None, language=None, licence=None,
                 url=None, default_target_attribute=None,
                 row_id_attribute=None, ignore_attribute=None,
                 version_label=None, citation=None, tag=None,
                 visibility=None, original_data_url=None,
                 paper_url=None, update_comment=None,
                 md5_checksum=None, data_file=None, features=None,
                 qualities=None, dataset=None):

Column Names

It's crucial for OpenML to know the *type* of data in each column.
Each feature should be one of:

 - A numeric feature. Examples: `car price` or `tree height`
 - A string (text). Examples: `sales text` or `tree name`
 - A nominal feature (can only take one of a set of unique values). Examples: `car color` (red, blue, ...) or `evergreen` (yes, no).
 
Based on the data found in our csv file, we inferred the the types for each column.
The results are shown below.
Please check that the types are correct, and correct any mistakes.

Please have a look 

In [None]:
openml.config.apikey = ''

In [None]:
if openml.config.apikey == '':
    key_text = widgets.Output()    
    need_api_key_text = """    
    We noticed you have not configured an API key for OpenML yet.
    To find your API key, log in on the [OpenML website](https://openml.org) ([register](https://www.openml.org/register) if needed)
    , go to your account page (click the avatar image on the top right) and click "API Authentication".
    """
    with key_text:
        display(Markdown(need_api_key_text))

    def set_openml_apikey(key):
        openml.config.apikey = key
        if re.fullmatch('[a-f0-9]{32}', key):
            publish_button.disabled = False

    key_input = interactive(set_openml_apikey, key='')
    key_input.kwargs_widgets[0].description = 'API Key:'

    text_and_input = widgets.VBox([key_text, key_input])
    # show 'need_api_key_text'
    display(text_and_input)
else:
    publish_button.disabled = False

In [None]:
display(publish_button)