#### Data Science Workflow

See slides: https://gitpitch.com/symeneses/data_science_projects#/

Data science typically is an iterative process, both in training and comparing the model and in the overall, long-term improvement of the model.

* Define metrics - both standard metrics (e.g. accuracy) and business metrics (e.g. sales)
* Formulate hypothesis (which might also come after data collection in case it's too unclear what data is available in a company)
* Collect data
* Clean / process data
* Engineer feature
* Establish baseline; this serves as the benchmark
    * E.g. always same amount of sale as yesterday; linear combination of weekday weighting and temperature
    * For categorisation: might always be the majority
* Select model
* Tune parameters
* Train model
* Compare to baseline

#### Project Structure
* We need the following folders: _docs_, _tests_, _data_, _output_, _models_, _source_
* Easiest is to use the cookiecutter template (see next section for how to do this)


#### Setting up a Project
* Go to command line and go to C:\Users\Christian\git
* Execute ```cookiecutter https://github.com/drivendata/cookiecutter-data-science``` and enter the following:
    * project: git_exercise_project
    * repo: git_exercise
* Go to GitHub and create a new repo _git_exercise_
* Go to git bash and then to /c/users/christian/git/git_exercise
* Initiate repo with ```git init```
* Link to remote with ```git remote add origin https://github.com/chriswegmann/git_exercise.git```
* Add all files with ```git add .```
* Do initial commit of all staged files with ```git commit -m 'initial commit'```
* Push repo to GitHub with ```git push -u origin master```

Other useful commands:
* ```git remote -v``` shows the remote(s)
* ```git status``` shows the untracked and staged files
* ```git log``` shows the commits

#### Creating / merging / deleting branches
* Create and check out new branch with ```git checkout -b new_branch```
* Go to folder and create new file (or modify an existing file)
* Go to git bash and stage new / modified files with ```git add [file]```
* Commit staged files with ```git commit -m 'first changes'```
* Switch to master branch with ```git checkout master```
* Merge new branch to master branch with ```git merge new_branch```
* Delete merged branch with ```git branch -d new_branch```
* Push repo to GitHub with ```git push origin master```

#### Pull requests
* Go to GitHub and fork the repository you want to contribute
* Go to git bash and then to /c/users/christian/git
* Clone repository locally with ```git clone https://github.com/chriswegmann/git-initialization-best-practices```
* Go to newly created directory
* Initialize repo with ```git init```
* Add remote of originally forked repo with ```git remote add originally_forked https://github.com/LoKemper/git-initialization-best-practices```
* Check if we now have two remotes with ```git remote -v```
* Check out a new branch with ```git checkout -b branch_for_pr```
* Add / modify file
* Stage and commit (see above)
* Push branch to origin with ```git push origin branch_for_pr```
* Switch to master with ```git checkout master```
* Delete branch with ```git branch -D branch_for_pr```
* Go to GitHub (https://github.com/chriswegmann/git-initialization-best-practices/pull/new/branch_for_pr) and create a new pull request in the web interface
* In order to see it in the originally forked repo go to that folder in the git bash and do again a ```git pull origin master``` - voil√†, the new / modified file is in the local folder.

#### Update a Forked Repo with Changes from Original
* Starting position: we have forked a repo, then the owner of the original repo makes a change. We see that we are one change behind. We want to have this change incorporated as well.
* Go to git bash and clone my forked repo through ```git clone https://github.com/chriswegmann/RedBadgePack```
* Add the original repo as an upstream remote with ```git remote add upstream https://github.com/Nachzeher/RedBadgePack```
* Fetch the data from the upstream remote using ```git fetch upstream```
* Update the forked repo with the fetched data from the upstream (original) repo with ```git pull upstream master```
* Push to forked repo with ```git push```