This repository is for training future data-scientists in "industry-like" environment.
- Fork this repository.
- Read about the challenge and download the data.
- Write code to solve the problem.
- Use branches (don't work on the GitHub master branch)
- Export the notebook to python script and push the notebooks and python script to GitHub.
- When having good results, create a pull request.
- I will comment on the changes.
- We reiterate with the comments until we're good to move forward to the next challenge.
https://www.kaggle.com/c/titanic Get above 85% accuracy.
- The idea is to write good code which theoretically could be used for future deployments.
- This project is about training, not just results.
- Work with branches, not on the master in Github.
- Use Python, Jupyter, and Turi
- Always start by splitting the data into three parts: train, validations and test. You can use the test dataset only once! to prevent overfitting.
- The example code already have issues in it - good luck!
- Try to coomit every small change to github, instead of big uploads of a lot of code.