# Project Structure Git and GitHub

## Definitions

### Git
- Git is a version control tool that track changes in your project. 
- instead of creating multiple files for every version of the project.
    - ex: project_file_v1 , project_file_v2, project_file_v3, etc..
- git can save a snapshots of the project (called commits).
    - ex: commit 1, commit 2, commit 3, etc..
    - instead of saving a new file for each project version, git saves (commits) a snapshot of the project  

### Repository 
- to setup a file and track its versions (changes) using git, a git repository must be initialized
- when a repo is initialized a hidden .get folder is generated that tracks a project folder staged files

### Staged Files
- any file in a repo can be tracked; however, not all files need to be tracked. (ex: raw data)
- in a new repo all files are un-staged (git is not tracking them)
- git start tracking a file when it is staged
- note (un-staged files): git is used to track changes to code adding data files will bloat the repo
- note: Github allows only 25MB limit on its cloud repo and the max local repo size is 10GB, so only stage necessary files only

### Git Ignore 
- to avoid staging certain files or file types git ignore files track all unwanted files and removes them from the staging selection 

### Commits
- after staging all files that needs to be tracked we can star committing after making changes to a file

### Push Repo
- when creating a commit the project is saved locally on the system
- to share the project to a new remote repository to be accessed online we can publish the repo to github
- to update the repos online with the latest commits we can commit and push the repo on git hub
- note: you must have a git hub account and specify your username and password
- note: when pushing a repo to an existing github repo, the github repo URL must be specified 


## Project structure
- every project should have a consistent file structure to easily navigate the project
- future improvement: read cookiecutter datascience docs and watch some vids

## references
- [youtube full git tutorial](https://www.youtube.com/watch?v=fQLK8Ib_SKk&list=RDCMUCW5YeuERMmlnqo4oq8vwUpg&start_radio=1)
- [gitignore video](https://www.youtube.com/watch?v=L8l89nUFggU)
- [git reset article](https://devconnected.com/how-to-unstage-files-on-git/)
- [git commit and push timestamp](https://youtu.be/fQLK8Ib_SKk?t=503)

## Project Workflow
1. navigate to the project folder
```bash
 $ cd 'example path/project folder' 
 ```
2. initialize the project repository 
```bash
$ git init
```
3. check files (command display staged and un-staged files)
```bash
$ get status
```
4. create a gitignore file 
```bash
$ touch .gitignore
```
5. add files / file extensions / folders to ignore and their exceptions (ex: you can choose all file with extension .txt except)
- example file:
```txt
*.txt               # ignore all .txt files
!file_text.txt      # except file_text.txt don't ignore it
folder_A/           # ignore all files in folder_A
api_key.txt         # ignore the api_key file
```
6. stage files
- stage multiple files
```bash
$ git add file_1.py file_2.py
```
- stage all files
```bash
$ git add .
```
- un-stage a file
```bash
$ git reset -- file_name
```
- un-stage all files
```bash
$ git reset
```
7. check staged files
```bash 
git status
```
8. commit
```bash 
git commit -m "commit message"
```
9. check commits history
```bash
git log
```

10. create alias for repo url (note that "origin" is the alias for the repo url)
```bash
git remote add origin repo_url
```
11. push to remote repo
```bash
git push origin master
``` 