-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download datasets from online so unit tests can run locally #408
Conversation
Codecov Report
@@ Coverage Diff @@
## master #408 +/- ##
==========================================
+ Coverage 97.39% 97.47% +0.07%
==========================================
Files 103 105 +2
Lines 3229 3287 +58
==========================================
+ Hits 3145 3204 +59
+ Misses 84 83 -1
Continue to review full report at Codecov.
|
Copyright (c) 2012-2020, Michael L. Waskom | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
|
||
* Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
* Redistributions in binary form must reproduce the above copyright notice, | ||
this list of conditions and the following disclaimer in the documentation | ||
and/or other materials provided with the distribution. | ||
|
||
* Neither the name of the project nor the names of its | ||
contributors may be used to endorse or promote products derived from | ||
this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | ||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | ||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | ||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | ||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm not exactly sure if this is how to distribute w/ license? 🤔🤔🤔🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good -- maybe add a note on #317?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have one "licenses" directory somewhere, perhaps just licenses/
. Let's do that to start? Do we have other licenses in the repo yet, and if so, where do we keep them?
Also, what is this license for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it's a license for the dataset. Ok, I guess this is a fine place for the license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I currently have the license folder as closely linked to the dataset as possible, so here I have the /data folder and then a licenses subfolder. What do you think about this vs a root /licenses folder for all datasets?
Is the fraud transactions data generated? 2.5mb is pretty good :) but if its generated perhaps we could just truncate it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 👍
@dsherry Yup, it's generated! 2.5mb is the size of it, zipped. I think the original .csv has 99,000+ rows and we could truncate it but not sure if it's necessary? :o Could be nice to just have a larger dataset on hand, if not simply for testing purposes. Users currently have the ability to load in any number of rows they'd like using the |
Sounds good! In that case, I think we should file a separate ticket to track permanently reducing the size of this dataset. There we can either generate a new one and upload it, or have the unit tests generate that data in realtime. This PR is a great start though! Keeping installation as lightweight as possible is helpful to keep in mind. |
Filed #424, closing this! |
Closes #342.
titanic.csv
andtips.csv
datasets in tests/data folder(Also uploaded license for
tips.csv
in tests/data/licenses folder)