Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dataset): Implement dataset support #10

Closed
wants to merge 1 commit into from
Closed

feat(dataset): Implement dataset support #10

wants to merge 1 commit into from

Conversation

Fangop
Copy link
Collaborator

@Fangop Fangop commented Sep 7, 2020

Hi, I recently considered about the example we used for testing pystoned could be a feature.

This is inspired by sklearn, which provides user toy datasets for better comprehension of the usage/feature of the model.
The toy datasets made sklearn the wildly used all over the world, since it is pretty easy to use/comprehend for the beginners.

This pr reduce the complexity of the use of the datasets
Original:

import pandas as pd
import numpy as np

url = 'https://raw.githubusercontent.com/ds2010/pyStoNED-Tutorials/master/Data/firms.csv'
df = pd.read_csv(url, error_bad_lines=False)
df.head(5)

# output
y = df['Energy']

# inputs
x1 = df['OPEX']
x1 = np.asmatrix(x1).T
x2 = df['CAPEX']
x2 = np.asmatrix(x2).T
x = np.concatenate((x1, x2), axis=1)

This pr:

from pystoned import dataset

x, y = dataset.firm(['OPEX', 'CAPEX'], 'Energy')

This pr is not yet finished

Please give me the information of the datasets, in order to:

  • making sure the datasets are used in rational way
  • give the user the brief introduction of the dataset
  • etc..

thanks for your review, do not merge yet!

@ds2010
Copy link
Owner

ds2010 commented Sep 7, 2020

Thanks for your good proposal. I have the same idea. I want to do it after we complete refactoring all functions.
Usually, in the R package, we will add the data into the package directly and import the data using the following codes:

data(dataset)
y <- with(dataset, cbind(variable_y))
x <- with(dataset, cbind(variable_x1, variable_x2, variable_x3))

Thus, we can also add these two example datasets (Finnish electricity distribution firms and OECD countries) to our package and then toy them as you suggested.

What do you think?

@Fangop
Copy link
Collaborator Author

Fangop commented Sep 7, 2020

It is good to have other datasets.
Maybe we can take reference from other packages which provide datasets for the users.
It'll be more user-friendly following some habits of python users.

I'll reopen this pr after finishing refactoring.
thanks!

@Fangop Fangop closed this Sep 7, 2020
@ds2010
Copy link
Owner

ds2010 commented Sep 7, 2020

OK. No problem. We can do it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants