<a href="https://colab.research.google.com/github/aaubs/ds-master/blob/main/notebooks/M1_Colab_GitHub_Drive_Kaggle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Step 1: Use colab notebook as a Shell
Visit Google Colaboratory website
Click on New Notebook button. A blank notebook is initialized and opened

##Step 2: Mount Google Drive to Google Colab Notebook
Run the below script to mount your Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


<img src="https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/M1_Colab_GitHub_1.png" alt="Image" width="400" height="250"/>
<img src="https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/M1_Colab_GitHub_2.png" alt="Image" width="400" height="250"/>
<img src="https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/M1_Colab_GitHub_3.png" alt="Image" width="400" height="250"/>


##Step 3: Change present working directory
Below shell command will set the present working directory to:
> /content/drive/MyDrive/GitHub

In [None]:
%cd /content/drive/MyDrive/GitHub/

/content/drive/MyDrive/GitHub


> Note: Your Google Drive’s Home directory is at: /content/drive/MyDrive/



##Step 4: Generate GitHub Access Token
Now its time to generate your GitHub token, that can be used to access the GitHub API.

1. Visit [GitHub](https://github.com/settings/profile) website and login to your account.
2. Go to **Settings**, navigate to **Developer settings** and then click on **Personal access tokens**.
3. Click on **Generate new token** button on top right corner of the page.
4. Click the **repo** checkbox under Select scopes section.
5. Now, click on **Generate token** button at the bottom of the page.

Now, there arises two different scenario.

*   A. Create **a new git repository** from scratch
*   B. Clone **an existing git repository** from GitHub



##Step 5.A: Create a new Git repository

Follow the below steps to create a new git repository from scratch directly in your Google Drive.

####Step 5.A.1: Initialize new Git repository
Initialize git using ```git init <directory>``` . In this tutorial, we will be using **titanic** repository.
- Change your working directory to the created repository.
- List the files and folder using ```ls``` command.

<img src="https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/M1_Colab_GitHub_git_1.png" alt="Image" width="600" height="250"/>

In [None]:
%pwd

'/content/drive/MyDrive/GitHub'

In [None]:
%mkdir titanic

mkdir: cannot create directory ‘titanic’: File exists


In [None]:
%ls

[0m[01;34mtitanic[0m/


In [None]:
!git init titanic/

Reinitialized existing Git repository in /content/drive/MyDrive/GitHub/titanic/.git/


In [None]:
%cd titanic/

/content/drive/MyDrive/GitHub/titanic


In [None]:
%ls -a

[0m[01;34m.git[0m/  [01;34m.ipynb_checkpoints[0m/  titanic.csv  titanic_notebook.ipynb


####Step 5.A.2: Working with Git repository
- It’s time to add files and folders to our working directory.

To download the dataset from Kaggle, follow the below steps carefully.



***Step 1: Create your Kaggle API Token:***
- Go to Your Profile and click on Edit Profile.
- Scroll the page until API section and click on Create New API Token button
- A file named ```kaggle.json``` will get downloaded containing your username and token key


<img src="https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/M1_Kaggle_GitHub_1.png" alt="Image" width="500" height="150"/>

***Step 2: Upload kaggle.json to Google Drive***
- Create a folder in Google Drive ( in my case I'm using: ```Kaggle``` ) where we will be storing our Kaggle Datasets
- Upload your downloaded ```kaggle.json``` file to the created folder

<img src="https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/M1_Kaggle_GitHub_2.png" alt="Image" width="400" height="200"/>

***Step 3: Configure Kaggle***

Below code will set the Kaggle configuration path to ```kaggle.json```.
> Note: If you have used different fol
der name or directory path for ```kaggle.json```, please use the same instead of /Kaggle in the below code

In [None]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/drive/MyDrive/API_Tokens"

***Step 4: Download the Kaggle datasets***

Now, you can download either normal dataset or competition dataset. Based on your requirements follow the below steps:
- Go to [Kaggle datasets Dashboard](https://www.kaggle.com/datasets/heptapod/titanic) and click on Copy API Command as shown:




<img src="https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/M1_Kaggle_GitHub_3.png" alt="Image" width="500" height="200"/>

- Your API Command will look like ```kaggle datasets download -d <username>/<datasets> or kaggle datasets download -d <datasets>```


In [None]:
%cd titanic

[Errno 2] No such file or directory: 'titanic'
/content/drive/MyDrive/GitHub/titanic


In [None]:
!pwd

/content/drive/MyDrive/GitHub/titanic


In [None]:
!kaggle datasets download -d heptapod/titanic --unzip

Downloading titanic.zip to /content/drive/MyDrive/GitHub/titanic
  0% 0.00/10.8k [00:00<?, ?B/s]
100% 10.8k/10.8k [00:00<00:00, 1.73MB/s]


> Note: The datasets are downloaded as a zip file. You need to manually ```unzip``` the file. But, there is a keyword ```--unzip```used to instantly unzip the file after download and delete the zip file.

In [None]:
#You can check the file using ls command
!ls

titanic.csv  titanic_notebook.ipynb  train_and_test2.csv


In [None]:
!mv train_and_test2.csv titanic.csv

In [None]:
!touch titanic_notebook.ipynb

In [None]:
!ls

titanic.csv  titanic_notebook.ipynb


- ```git status``` to view the state of the working directory and the staging area.
- ```git add ``` to add changes in the working directory to the staging area.

In [None]:
!git status

On branch master
Your branch is up to date with 'm1_mlecosys/master'.

nothing to commit, working tree clean


In [None]:
!ls -a

.git  .ipynb_checkpoints  titanic.csv  titanic_notebook.ipynb


In [None]:
!git add --all

In [None]:
!git add .

In [None]:
!git status

On branch master
Your branch is up to date with 'm1_mlecosys/master'.

nothing to commit, working tree clean


After adding files and folder as per your requirements, commit your work using ```git commit -m "message"```.

In [None]:
!git config --global user.email "hamid.bekam@gmail.com"
!git config --global user.name "HamidBekamiri"


- Create a set of variables from your GitHub account:

> ```https://github.com/<username>/<repository>```

In [None]:
!git commit -m "M1 - ML Ecosystem"

On branch master
Your branch is up to date with 'm1_mlecosys/master'.

nothing to commit, working tree clean


In [None]:
username = "HamidBekamiri"
repository = "titanic"
git_token = "your_github_token"

Add ```remote``` to your git from the above variable as:

In [None]:
!git remote add m1_mlecosys https://{git_token}@github.com/{username}/{repository}.git
!git remote -v

error: remote m1_mlecosys already exists.
m1_mlecosys	https://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@github.com/HamidBekamiri/titanic.git (fetch)
m1_mlecosys	https://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@github.com/HamidBekamiri/titanic.git (push)


Push your commits using ```git``` push command as:

In [None]:
!git push -u m1_mlecosys master

Branch 'master' set up to track remote branch 'master' from 'm1_mlecosys'.
Everything up-to-date


In [None]:
!git branch

* [32mmaster[m


##Step 5.B: Clone an existing GitHub repository

Follow the below steps to clone an existing git repository from GitHub into your Google Drive:
- Go to your GitHub repository to clone the repository.
- Click on Code button and copy the url as shown:

<img src="https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/M1_GitHub_1.png" alt="Image" width="500" height="250"/>

> You’ll need your GitHub access token before cloning your GitHub repository. Also extract set of variables from your GitHub account.

In [None]:
username = "HamidBekamiri"
repository = "titanic"
git_token = "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"

In [None]:
!git clone https://{git_token}@github.com/{username}/{repository}

Cloning into 'titanic'...
remote: Enumerating objects: 4, done.[K
remote: Counting objects: 100% (4/4), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 4 (delta 0), reused 4 (delta 0), pack-reused 0[K
Receiving objects: 100% (4/4), 10.42 KiB | 1.16 MiB/s, done.


In [None]:
!pwd

/content/drive/MyDrive/GitHub/titanic


In [None]:
!touch titanic/test.csv

In [None]:
!git init

Reinitialized existing Git repository in /content/drive/MyDrive/GitHub/titanic/.git/


In [None]:
!git status

On branch master
Your branch is up to date with 'm1_mlecosys/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mtitanic/[m

nothing added to commit but untracked files present (use "git add" to track)


In [None]:
!git add .

[33mhint: You've added another git repository inside your current repository.[m
[33mhint: Clones of the outer repository will not contain the contents of[m
[33mhint: the embedded repository and will not know how to obtain it.[m
[33mhint: If you meant to add a submodule, use:[m
[33mhint: [m
[33mhint: 	git submodule add <url> titanic[m
[33mhint: [m
[33mhint: If you added this path by mistake, you can remove it from the[m
[33mhint: index with:[m
[33mhint: [m
[33mhint: 	git rm --cached titanic[m
[33mhint: [m
[33mhint: See "git help submodule" for more information.[m


In [None]:
!git commit -m "Add a new file"

[master e5a4dea] Add a new file
 1 file changed, 1 insertion(+)
 create mode 160000 titanic


In [None]:
!git push

Enumerating objects: 3, done.
Counting objects:  33% (1/3)Counting objects:  66% (2/3)Counting objects: 100% (3/3)Counting objects: 100% (3/3), done.
Delta compression using up to 2 threads
Compressing objects:  50% (1/2)Compressing objects: 100% (2/2)Compressing objects: 100% (2/2), done.
Writing objects:  50% (1/2)Writing objects: 100% (2/2)Writing objects: 100% (2/2), 306 bytes | 102.00 KiB/s, done.
Total 2 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/HamidBekamiri/titanic.git
   a7dac57..e5a4dea  master -> master
