# Stage 2: Where do I go next?

Have you ever wondered:

* Where did I leave off?
* What needs to get done next?
* Where do I find that again?

If so, this stage of the Quest is for you.

## Reproducibility Issues
* (WHERE-WAS-THAT): Difficulty finding where you put that file or that work? `My-Notebook-Copy-4.ipynb` maybe? or was it `My-Notebook-Copy-2.ipynb`? `ouput.png`?
* (MY-DOG-ATE-MY-HOMEWORK): Anything not checked in to a repo. Your computer died and took everything with it.
* (GIT-NIGHTMARE): A git snotball of a mess. Complicated merge conflicts. Mostly caused by underlying git workflow issues.


### Default Better Principles
On most data science projects, we don't need all of software engineering, but some of it really comes in handy. Especially when it comes to saving things for future retrieval, and avoiding the awkward "my computer ate my homework" moments. Git+GitHub works great for this. So do default project organization schemes. And intentional default workflows.

* **Leave a bread crumb trail with revision control**: Always save your work. This is the way to do it with code. Even better...
* **Use a collaborative git+GitHub workflow**: Tools like GitHub exist to make mutli-user git workflows not suck. In fact, with some basic conventions in place with your collaborators, it can make working together on code great.
* **Use a default project/repo organization**: Putting the same kinds of things in the same places accross all of your projects means that there's one less thing to think about.


## The Easydata Way: `git-workflow.md` and the default project organization

In our experience through many workshops, one of the biggest barriers to using multi-user git is the anxiety about what and how to do things. Where did things go when I changed branches? How do I resolve merge conflicts? How do I use a workflow that helps to resolve the nastiest merge conflicts?

Yes, it can be tricky, but it doesn't have to once you get used to a few best practices and habits with git, and when you follow a shared multi-user git workflow.

Here's how we do it. (Also found in `reference/easydata/git-configuration.md`).

## Git Configuration
When sharing a git repo with a small team, your code usually lives in at least 3 different places:

* "local" refers to any git checkout on a local machine (or JupyterHub instance). This is where you work most of the time.
* `upstream` refers to the shared Easydata repo on github.com; i.e. the **team repo**,
* `origin` refers to your **personal fork** of the shared Easydata repo. It also lives on github.com.

### Create a Personal Fork

We strongly recommend you make all your edits on a personal fork of this repo. Here's how to create such a fork:

* On Github or Gitlab, press the Fork button in the top right corner.
* On Bitbucket, press the "+" icon on the left and choose **Fork this Repo**

### Local, `origin`, and `upstream`
git calls `upstream` (the **team repo**), and `origin` (your **personal fork** of the team repo) "remote" branches. Here's how to create them.

Create a local git checkout by cloning your personal fork:
```bash
git clone git@github.com:<your_git_handle>/easydata-tutorial.git
```
Add the team (shared) repo as a remote branch named `upstream`:
```bash
  cd easydata-tutorial
  git remote add upstream git@github.com:<upstream-repo>/easydata-tutorial.git
```

You can verify that these branches are configured correctly by typing

```
>>> git remote -v
origin	git@github.com:<your_git_handle>/easydata-tutorial.git (fetch)
origin	git@github.com:<your_git_handle>/easydata-tutorial.git (push)
upstream	git@github.com:<upstream-repo>/easydata-tutorial.git (fetch)
upstream	git@github.com:<upstream-repo>/easydata-tutorial.git (push)
```
or if you use HTTPS-based authentication:
```
origin	https://github.com/<your_git_handle>/easydata-tutorial.git (fetch)
origin	https://github.com/<your_git_handle>/easydata-tutorial.git (push)
upstream	https://github.com/<upstream-repo>/easydata-tutorial.git (fetch)
upstream	https://github.com/<upstream-repo>/easydata-tutorial.git (push)
```

### Do Your Work in Branches
To make life easiest, we recommend you do all your development **in branches**, and use your main branch **only** for tracking changes in the shared `upstream/main`. This combination makes it much easier not only to stay up to date with changes in the shared project repo, but also makes it easier to submit Pull/Merge Requests (PRs) against the upstream project repository should you want to share your code or data.

### A Useful Git Workflow
Once you've got your local, `origin`, and `upstream` branches configured, you can follow the instructions in this handy [Git Workflow Cheat Sheet](../reference/easydata/git-workflow.md) to keep your working copy of the repo in sync with the others.


For now, take a peak at the [Git Workflow Cheat Sheet](../reference/easydata/git-workflow.md). No need to look too carefully, just take a browse and make sure you know where to find it.

## Project Organization
One of the features we love about orginally basing Easydata off of cookiecutter-datascience is the thoughtful project organization structure and description that it automagically includes as part of the README file. We love that. And while we've made our own tweaks, we haven't strayed far from the original. Every Easydata repo comes with a default project organization so you never evey have to think about where to find something.

**Next stop**: explore the (mostly) default Easydata project organization at the bottom of the `README.md` for this repo. If you look closely, you'll find the next task on your Quest for Reproducibility.
