feat(dataset): Implement dataset support #10

Fangop · 2020-09-07T03:42:01Z

Hi, I recently considered about the example we used for testing pystoned could be a feature.

This is inspired by sklearn, which provides user toy datasets for better comprehension of the usage/feature of the model.
The toy datasets made sklearn the wildly used all over the world, since it is pretty easy to use/comprehend for the beginners.

This pr reduce the complexity of the use of the datasets
Original:

import pandas as pd
import numpy as np

url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
df = pd.read_csv(url, error_bad_lines=False)
df.head(5)

# output
y = df['Energy']

# inputs
x1 = df['OPEX']
x1 = np.asmatrix(x1).T
x2 = df['CAPEX']
x2 = np.asmatrix(x2).T
x = np.concatenate((x1, x2), axis=1)

This pr:

from pystoned import dataset

x, y = dataset.firm(['OPEX', 'CAPEX'], 'Energy')

This pr is not yet finished

Please give me the information of the datasets, in order to:

making sure the datasets are used in rational way
give the user the brief introduction of the dataset
etc..

thanks for your review, do not merge yet!

ds2010 · 2020-09-07T06:16:30Z

Thanks for your good proposal. I have the same idea. I want to do it after we complete refactoring all functions.
Usually, in the R package, we will add the data into the package directly and import the data using the following codes:

data(dataset)
y <- with(dataset, cbind(variable_y))
x <- with(dataset, cbind(variable_x1, variable_x2, variable_x3))

Thus, we can also add these two example datasets (Finnish electricity distribution firms and OECD countries) to our package and then toy them as you suggested.

What do you think?

Fangop · 2020-09-07T07:34:25Z

It is good to have other datasets.
Maybe we can take reference from other packages which provide datasets for the users.
It'll be more user-friendly following some habits of python users.

I'll reopen this pr after finishing refactoring.
thanks!

ds2010 · 2020-09-07T07:43:38Z

OK. No problem. We can do it later.

feat(dataset): Implement dataset support

e8db9c4

Fangop closed this Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dataset): Implement dataset support #10

feat(dataset): Implement dataset support #10

Fangop commented Sep 7, 2020

ds2010 commented Sep 7, 2020 •

edited

Fangop commented Sep 7, 2020

ds2010 commented Sep 7, 2020

feat(dataset): Implement dataset support #10

feat(dataset): Implement dataset support #10

Conversation

Fangop commented Sep 7, 2020

ds2010 commented Sep 7, 2020 • edited

Fangop commented Sep 7, 2020

ds2010 commented Sep 7, 2020

ds2010 commented Sep 7, 2020 •

edited