# Reproducibility, the BASH console and GIT

---

## Reproducibility: What is it, What does it mean, Why it's important

The concept of reproducibility is just that \- **is your work reproducible**? Reproduction of methods, analyses and results is important not only in scientific research but also in day\-to\-day activities. It is a major factor that determines if **research results are consistent and valid** If your work is reproducible, then **valid experiments and results build authentic discoveries that are more reliable** in your field and help develop robust literature for future research studies. While reproducibility ensures that your work is accurate and reliable, it can also **encourage community involvement and participation**. _**Remember, if your foundation is shaky then so is all the work built upon it**_

<p align="center"> 
    <img src="./Figures/Reproducibility.jpeg"   width="596.667px"  height="459.661px" /> 
</p>

There are different fields of reproducibility :

1. **Scientific** Reproducibility \- **Can your results be reproduced from your methods?**
2. **Computational** Reproducibility \- **Can your results be reproduced from your code?**
3. **Statistical** Reproducibility \- **Are your methods robust enough to reproduce your results?**

While talk of reproducibility is all good and dandy, it may not be available to everyone. There are significant barriers to reproducibility including:

- **Costs**
- The structure of dissemination does not currently fully support reproducibility
- **Lack of training and incentives**
- **Takes time, held to a higher standard**
- Privacy and security
- Differences between fields

Your job as a scientist and a researcher is to try your best to ensure that your work is reproducible. **If the ability is beyond your means, do not hesitate to say so.** Science is about being open and transparent

There are many tools and conventions to help with your reproducibility journey. These include having a proper workflow and file organization system, as well as GIT for version control



## Workflows

One way to ensure reproducibility is to have a well\-organized project workflow and management. A common way to organize your files is by **project i.e. a project\-oriented workflow**. Here, each project that you are working on is **self\-contained in its own folder**, with its **different data types in different folders**. I.e. your raw data should be in a different folder than your processed data, documents and files should be in another folder etc. 

<p align="center"> 
    <img src="./Figures/File Organization.png"   width="596.667px"  height="459.661px" /> 
    <br>
    <em> Note the differences between the first pane and the other two. The first pane, different file types are in the same folder which makes it difficult to find and filter through projects. In the second pane, all the projects are sorted into their own respective folders and with their own subdirectories based on their analysis or function. </em>
</p>



## File Paths

Having a default file structure can also help with your own mental organization of your file paths. There are two types of file paths: **absolute and relative**. 

### Absolute File Paths

Absolute file paths **start at your root file and mention every directory before arriving at your file**. For example, if I wanted to get the absolute file path of the CSV in the above figure, it would look something like this: `/Users/yeshoda/Desktop/Projects/Salt Tolerance/Chlamy sample - Sheet1.csv`, Where `Chlamy sample - Sheet1.csv` is the name of my file, which is located in the `Salt Tolerance/` folder. This folder is in the `Projects/` folder which is on my `Desktop/` \(also a folder\), within my user folder \(`yeshoda/`\), within the`Users/` folder in the **root** \(`/`\). The / **denotes a directory \(also known as a folder\)**. Absolute file paths are useful when trying to find files you're looking for, however, **they are sensitive to breaking**. For example, if I moved `Projects/` to `Documents/` rather than `Desktop/` my path will **no longer work and will cause an error**. 

### Relative File Paths

Relative file paths are **relative to the current directory that you are working in**. Your current directory will be noted with a ./ and the parent directory will be denoted with ../ For example if I am in the `Figures/` directory and wish to go to the `References/` directory I can type: `cd ../References/.` Here I'm using the `cd` command which stands for **change directory**, and then specify I want to move into my parent folder Salt Tolerance \(`../`\) and then into `References/.` The benefit of using absolute paths is that now if I moved `Projects/` to `Documents/` rather than `Desktop/` **my path still works**. 

**There is no right or wrong file path format**! **Choose the one that works for you and for the type of project that you have.** If you know the hierarchy of your folders, relative paths might serve better. However, if you do not know the hierarchy but know where your file is in general then an absolute path might serve better. 

<p align="center"> 
    <img src="./Figures/File_Paths.jpeg"   width="596.667px"  height="459.661px" /> 
    <br>
    <em> devopsschool </em>
</p>

Just like variable names, **it helps to name your folders something meaningful.** Folders called `1/` and `2/` do not do much to help you when you need to find your files.



## BASH - Quick Recap

Before we dive into GIT, let's refresh ourselves with some commonly used BASH commands. BASH is the typical Unix shell and command language for your computers. You can use BASH on your terminals. For your note\-keeping, we can use BASH within a Python notebook! We just need to tell Python, hey! this cell is just going to run BASH \- don't run me with Python! **We can do this by using a magic command denoted with** **`%%`**`. By running %%bash`, we're telling Python to run this entire cell in BASH. **It must be the first line in the cell or it will not work**. Let's try it out:


In [1]:
%%bash # denotes the start of a magic cell

UsageError: %%bash is a cell magic, but the cell body is empty.


Amazing! Now let's go ahead and play with some commonly used commands:



In [2]:
%%bash
pwd # path to working directory. Gives you the absolute file path of where you are
ls # list. Will list all of the files in your directory
ls -l # long list. Gives the long list of all the files in the directory
mkdir temp/ # make directory. This makes a new folder
cd temp/ # change directory. Will move "you" into the folder/directory you specify
# vim hello.txt # open the vim text editor and make a note
# hit i to go into insert mode, and start typing. Once done, hit esc, :wq to save and quit
less hello.txt # view the file contents
cat hello.txt # prints the contents of the file to standard out
mv hello.txt ../ # move hello.txt into the parent directory
mv ../hello.txt ./hello_there.txt # move hello.txt back to the current directory and rename it to hello_there.txt
cp hello_there.txt hello_there_copy.txt # copy. Make a copy of the file
rm hello_there.txt # remove. Delete the file

cd ../ # go to parent directory
rm -r temp/ #delete the temp folder and all it's contents

/Users/princess/Downloads/10_reproducibility
BINF5503_Week10_Class
Figures
Master_week10_module5_Reproducibility.html.html
Master_week10_module5_Reproducibility.html.ipynb
Master_week10_module5_Reproducibility.html.pdf
STUDENT_week10_module5_Reproducibility.html.ipynb
STUDENT_week10_module5_Reproducibility.html_Master.ipynb
UofTCoders_git_lesson_old
UofT_Coders_Master_Reproducibility.ipynb
week10_git_lesson
total 3208
drwxr-xr-x@ 14 princess  staff      448 14 Nov 16:01 BINF5503_Week10_Class
drwxr-xr-x@ 15 princess  staff      480 14 Nov 16:42 Figures
-rw-rw-r--@  1 princess  staff    83583 13 Mar  2024 Master_week10_module5_Reproducibility.html.html
-rw-rw-r--@  1 princess  staff    35238  2 Apr  2024 Master_week10_module5_Reproducibility.html.ipynb
-rw-rw-r--@  1 princess  staff  1397902 13 Mar  2024 Master_week10_module5_Reproducibility.html.pdf
-rw-rw-r--@  1 princess  staff    29500  2 Apr  2024 STUDENT_week10_module5_Reproducibility.html.ipynb
-rw-rw-r--@  1 princess  staff    24

hello.txt: No such file or directory
cat: hello.txt: No such file or directory
mv: rename hello.txt to ../hello.txt: No such file or directory
mv: rename ../hello.txt to ./hello_there.txt: No such file or directory
cp: hello_there.txt: No such file or directory
rm: hello_there.txt: No such file or directory


## Git

Now for the program of the hour! Git is a version control system that can be used to track file changes. While most commonly thought to be used for coding files only, **it can also be used for any file** including documents, figures, presentations etc. It's a great way to edit your files but also be able to **revert to a previous version if the need arises**. Used in tandem with **GitHub**, it can be a great way **to easily collaborate on projects with others**. It is important to note that you can use Git without having a GitHub account, however you cannot use GitHub without Git. 

<p align="center"> 
    <img src="./Figures/Version_control.jpeg"   width="400px"  height="300px" /> 
</p>

### Configure git global settings on your local computer

Before we start we need to tell git who we are so that we know who is making the files. We can do this using the `git config` commands



In [3]:
%%bash

# first tell git your email. Use the email you have when you set up your github account
git config --global user.email "y.harrypaul@yahoo.com"

# Next your name
git config --global user.name "Yeshoda Harry-Paul"

# Change the name of the branch from master to main
git config --global init.defaultBranch main

### Local git repo

Let's make a local git repository. A repository **\(or repo for short\)** is a storage area for a project containing all the files for the project and the history of the changes made to it. **This repo will be on your local computer.** 


In [4]:
%%bash 
# Make a new folder that we want to use:
mkdir UofTCoders_git_lesson
cd UofTCoders_git_lesson
pwd
ls

/Users/princess/Downloads/10_reproducibility/UofTCoders_git_lesson


In [5]:
%%bash 
# Initialize the git repository
cd UofTCoders_git_lesson
git init
ls -la # we see the new hidden .git folder
ls -la .git

Initialized empty Git repository in /Users/princess/Downloads/10_reproducibility/UofTCoders_git_lesson/.git/
total 0
drwxr-xr-x   3 princess  staff   96 14 Nov 17:00 .
drwxr-xr-x@ 23 princess  staff  736 14 Nov 17:00 ..
drwxr-xr-x   9 princess  staff  288 14 Nov 17:00 .git
total 24
drwxr-xr-x   9 princess  staff  288 14 Nov 17:00 .
drwxr-xr-x   3 princess  staff   96 14 Nov 17:00 ..
-rw-r--r--   1 princess  staff   23 14 Nov 17:00 HEAD
-rw-r--r--   1 princess  staff  137 14 Nov 17:00 config
-rw-r--r--   1 princess  staff   73 14 Nov 17:00 description
drwxr-xr-x  14 princess  staff  448 14 Nov 17:00 hooks
drwxr-xr-x   3 princess  staff   96 14 Nov 17:00 info
drwxr-xr-x   4 princess  staff  128 14 Nov 17:00 objects
drwxr-xr-x   4 princess  staff  128 14 Nov 17:00 refs


In [6]:
%%bash
cd UofTCoders_git_lesson
# Now that we have our git repo, lets add a file to it
touch first_file.txt
ls

# Let's see how git is doing by checking it's status
git status

# git has noticed some untracked changes

first_file.txt
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	first_file.txt

nothing added to commit but untracked files present (use "git add" to track)


### Hold your horses! The staging environment, the commit and you

Before we get going on our git train, it's important to differentiate between your working directory, the staging environment and a commit. Commits are r**ecords of the changes you've last made since a commit**. These are what allow you to jump between the different stages of your project. However, git does not commit on its own, **you need to tell git what you want to commit**. To do this we first need to **add the file to the staging area** via the command `git add`. Once you've made a nice package of files in the staging area, you can then **commit your changes to your repo** in the `.git` repository. Let's see what this looks like

<p align="center"> 
    <img src="./Figures/git_staging_area.png"   width="700px"  height="400px" /> 
</p>



In [7]:
%%bash
cd UofTCoders_git_lesson

# Now let's add our file to the staging environment
git add first_file.txt
git status # We see now that our file isn't committed but it is about to be

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   first_file.txt



In [8]:
%%bash
cd UofTCoders_git_lesson

# Now let's commit our file. We have the option to also add a message using the -m flag. It is always beneficial to write something useful so that when you are looking back between your versions it is easy to find what you're looking for 
git commit -m "Created empty file first_file.txt" first_file.txt
git status # We see now that our file isn't committed but it is about to be

[master (root-commit) 5061b3f] Created empty file first_file.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 first_file.txt
On branch master
nothing to commit, working tree clean


In [9]:
%%bash
cd UofTCoders_git_lesson

# Great! You just created your first commit! Let's see what happens when we edit our file to add some information to it!

#edit file in vim

# git status #see that the file has been modified

# git diff first_file.txt # what was different from the last commit?

# git add first_file.txt #stage our changes
# git status 

# git commit -m "Added some filler text to file" 
# git status

In [10]:
%%bash
cd UofTCoders_git_lesson

#let's see what we've done so far
git log

commit 5061b3f9de53b181a14f495c6f7ee45c07e2d763
Author: Yeshoda Harry-Paul <y.harrypaul@yahoo.com>
Date:   Thu Nov 14 17:01:02 2024 -0500

    Created empty file first_file.txt


### Branching in Git

Branching is a great way to work on a new idea, bug fixes or developing features **without affecting your** **`main`** **branch**. They are designed to be **temporary** and to facilitate edits and trials without breaking your original codebase. They allow you to keep your original `main` branch as is. For example, say I had a website and I wanted to make a new webpage. I can branch off of my main branch, create my webpage and then merge once it is completed. This way my website still stays functional as I work on my new webpage. 

<p align="center"> 
    <img src="./Figures/branches.png"   width="700px"  height="400px" /> 
</p>

There are two phases of branch creation: 1**\) Making a branch and 2\) Moving to that branch**. Both of these steps can be combined into one but let us take it one at a time. First, let's make a new branch called new\_files. Just like commits and variable names, it helps to have meaningful branch names so that you know what that branch is meant to be working on 


In [11]:
%%bash
cd UofTCoders_git_lesson

git branch new_branch
git log
git status # notice that while we made a branch our status has not changed
git branch --list

# move to that branch
git checkout new_branch
git branch --list

git log
git status # now our branch has changed

commit 5061b3f9de53b181a14f495c6f7ee45c07e2d763
Author: Yeshoda Harry-Paul <y.harrypaul@yahoo.com>
Date:   Thu Nov 14 17:01:02 2024 -0500

    Created empty file first_file.txt
On branch master
nothing to commit, working tree clean
* master
  new_branch
  master
* new_branch
commit 5061b3f9de53b181a14f495c6f7ee45c07e2d763
Author: Yeshoda Harry-Paul <y.harrypaul@yahoo.com>
Date:   Thu Nov 14 17:01:02 2024 -0500

    Created empty file first_file.txt
On branch new_branch
nothing to commit, working tree clean


Switched to branch 'new_branch'


In [12]:
%%bash
cd UofTCoders_git_lesson

# Let's make some changes to our files

# edit first_file.txt

cp first_file.txt second_file.txt # make a copy called second_file.txt
git status

# edit the second_file.txt

git commit -a  -m "Added another line of text about the weather to file" #Notice how it only added changes and committed for the first_file.txt as git was watching it

# Let's add and commit the other files
git add * # the * will add all the files in the current folder at once, rather than having to type them all individually
git commit -m "adding second file"

# now let's merge our branch back to the main one
# First head back to the main branch
git checkout master
ls

git branch --list

# Merge your branch
git merge new_branch

# Delete the old branch since there's no use for it anymore 
git branch -d new_branch

git status

On branch new_branch
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	second_file.txt

nothing added to commit but untracked files present (use "git add" to track)
On branch new_branch
Untracked files:
	second_file.txt

nothing added to commit but untracked files present
[new_branch 9a7d541] adding second file
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 second_file.txt
first_file.txt
* master
  new_branch
Updating 5061b3f..9a7d541
Fast-forward
 second_file.txt | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 second_file.txt
Deleted branch new_branch (was 9a7d541).
On branch master
nothing to commit, working tree clean


Switched to branch 'master'


### Going back to a previous commit

Sometimes we had versions of our code that worked better or documents that were worded more eloquently, but we no longer have them as we've modified the same file. With Git, **as long as the file was committed** we can revert to that version. Let's play around with it:


In [13]:
%%bash
cd UofTCoders_git_lesson

# Make a temporary file and commit it
touch temp.txt
git add temp.txt
git commit -m "adding temporary file to delete"

# Remove the file and update git
rm temp.txt
ls # Check that our file is deleted
git status # while we deleted the file locally, it is still in our repo
# git rm temp.txt # delete from git repo
# git restore --staged temp.txt # can pull our file back if didn't commit
git restore temp.txt
git commit -m "deleted temporary file"

# Make some minor changes
touch temp2.txt
git add temp2.txt 
git commit -m "added temp2.txt"

## Revert back to the previous commit
# Get the commit ID
# git log --oneline
# git revert --no-commit 79c4a61..HEAD #use the --no-commit flag to avoid an automatic commit 
# ls
# git commit -m "reverting back to other commit with temp.txt"

## Revert but keep changes, can also do with the one above ^ however this one will delete the commits following the one you want to go to
# git reset --soft 8e4a7e0
# git restore --staged <file>
# git restore <file>
# git commit -m "message"
# git log

[master 3168bd8] adding temporary file to delete
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 temp.txt
first_file.txt
second_file.txt
On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    temp.txt

no changes added to commit (use "git add" and/or "git commit -a")
On branch master
nothing to commit, working tree clean
[master b3f5db5] added temp2.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 temp2.txt


## Integrating GitHub

Now that we have some of the basics under our belt let's push something to GitHub. First we need to create an access token to be able to push our work to a repository on GitHub. 

Click on your profile picture on Github and Navigate to `Settings`. Once there scroll all the way down to `Developer settings`. Next navigate to `Personal Access Tokens` > `Tokens Classic` > `Generate New Token`. Select all the permissions you would like this key to be able to access. Once you generate your key **KEEP IT SOMEWHERE SAFE**. You will not be able to view it again. 

<p align="center"> 
    <img src="./Figures/Personal Access Tokens.png"   width="300px"  height="300px" /> 
</p>


Now that we have our key we can create a new repo on GitHub. We can do this by logging in, pressing the plus button (create new) and selecting "New Repository"

<p align="center"> 
    <img src="./Figures/GitHub Repo.png"   width="700px"  height="400px" /> 
</p>

Next we can go ahead and name our repository and add a brief description. Once that is done go ahead and press "Create repository". 

<p align="center"> 
    <img src="./Figures/Naming_repo.png"   width="700px"  height="400px" /> 
</p>

GitHub gives us the option of pushing an existing repo from the commandline. Let's go ahead and push our week10_git_lesson folder to our GitHub:

In [None]:
%%bash
cd UofTCoders_git_lesson

git remote add origin https://github.com/YeshodaHP/UofTCoders_Git_Lesson_2024.git
git branch -M main
git push -u origin main

#This will prompt you to input your username and password (Now using token)
# To generate a token go to settings > developer settings > Personal access tokens > Generate new token

Great! Now we can see our files and commits on our GitHub! Note that now when we want to place our files onto GitHub (the remote directory), we now have to `push` it there. Let's try creating a file and pushing it to our GitHub

<p align="center"> 
    <img src="./Figures/git_workflow.png"   width="700px"  height="400px" /> 
</p>

You should now be able to see your new file in your repo!

In [None]:
%%bash
cd week10_git_lesson

#Make a new file, add and commit
vim github.txt
git add github.txt 
git commit -m "Adding new github.txt file"

# now push it to GitHub
git push 

## Collaborative Repos
Great! Now that we've pushed files to GitHub let's try to use it in a collaborative way. There are two common collaborative workflows:
1. The Shared Repository Workflow
	- Everyone works on their own branch and then merges to the main
2. The Fork and Pull Workflow
	- The project owner can assign rights to "collaborators" which do not have push access to the main branch. The project lead will accept pull requests (PRs), review and accept changes made
    
Let's see how the shared repo workflow looks!

### Shared Repo
Let's try creating a file with your name as the file name, containing something fun about you. Let's push it to Yeshoda's GitHub. First we will need to clone her repo located at: https://github.com/YeshodaHP/UofTCoders_Git_Lesson_2024.git. Next create a branch with your name, create your file, commit and push!

Remember to always pull before you start to do your work so that you're on the most recent branch!

In [None]:
%%bash
git clone https://github.com/YeshodaHP/UofTCoders_Git_Lesson_2024.git #clone the repo

git fetch #fetch new work done by others - it does not merge with your branch

git merge #merge the work you've done with others

git pull # will pull all the online work and merge it with your local work. Make sure you commit your own work before pulling 

#Create a branch with your name as the branch and add a new file with your name and something fun about you!
git branch Yeshoda_HP
git checkout Yeshoda_HP
#vim Yeshoda_HP.txt
git add Yeshoda_HP.txt
git commit -m "Yeshoda is adding her file"
git push --set-upstream origin Yeshoda

#go back to the main branch and delete your branch
git checkout main
git branch -d Yeshoda

# Recap
Amazing Job! So far today we learned about :
- Reproducibility and why it is important
- File hierarchies
- The BASH console and some common commands
- An intro to Git
	- Where we learned to make a local repository 
    - How to commit changes
    - Create branches 
    - How to revert to previous commits
 - GitHub:
 	- How to connect and clone
    - How to collaborate with other users. 
    
We did a lot of Git today, and we know there is a lot of jargon surrounding git so here is a breif recap:

| Term      | Description |
| ----------- | ----------- |
| Add | Adding your changes to the “Staging” area|
| Branch | A parallel version of a repository where changes can be made that do not affect the main repository. New branches are often used to test changes or new ideas, which can later be merged with the base branch|
| Checkout | Moving from one branch to another|
| Clone | Make a copy of the your GitHub repo on your local computer|
| Commit | Takes the files in your staging area and store their changes permanently in your Git directory|
| Head | Head is a reference or pointer of the latest commit on your branch, there can only be one head at a time. You can move the head to a different commit by using checkout - this is known as a detatched head |
| Merge | Incorporate changes into the branch you are on|
| Pull | A combination of fetch and merge. This fetches the files from the remote repo and merges them with your local files|
| Pull Request | Term used in collaboration. You “issue a pull request” to the owner of the upstream repo asking them to pull your changes into their repo (accept your work)|
| Push | Sync your file changes to the remote repo on GitHub|

As always if you have any questions feel free to email both Yeshoda and Vicki! Happy coding, see you next week!
