Skip to content

AsmaaAlrefae/Dataset-for-BestSellers-Books-In-Bookdepository

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Project 3: Sharing Dataset on Kaggle

Description

We have been using various datasets in the course that are either model/toy datasets or collected in conditions fairly remote to local relevance. With the newfound API and webscrapping skills that you learned, this project challenges you to create a dataset revolving around a topic, problem, or theme of your choice, clean it, properly document it, and submit it to the Kaggle dataset repository. Curating and sharing a dataset is an integral part of your skills and practice as a data scientist that should not be overlooked!

For project 3, your goal is three-fold:

  1. Define a domain, issue, and problem that you are interested in (preferably with local/regional relevance).
  2. Collect, clean, and submit the data the Kaggle datasets repository under the course's organization.
  3. Submit the data collection and a starter kernel in public associated to the published dataset.

Readings

To help you get started, please read this blog post by Kaggle.

For some good examples, tutorials, and steps to publish your dataset, read this page

For inspiration on how a company successfully published a dataset on Kaggle read this story.

More information and documentation on the Kaggle datasets platform see this page


Requirements

  • Gather and prepare your data using API or webscrapping. A ready-made dataset is NOT allowed.
  • Make your data accessible and readable by using common open file formats like CSV.
  • Take the time to describe your dataset thoroughly.
  • Pick a clear, open license ensuring your dataset is reusable.
  • Publish a kernel on your dataset to help others learn how they can work with the data. The kernel should show features of the dataset with a plot or two to showcase some variables in the dataset. Also raise potentially interesting questions that could be answered using this dataset.
  • Put your data collection and cleaning scripts in a repo.

Submission

  • The link to the dataset (and the associated starter kernel) on Kaggle.
  • The link to the repo which includes the scripts you used to collect and clean the data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published