# Notebook objectives



* Fetch data from Kaggle
* Add dataset to the project
* Push generated file to GitHub repo



---

# **Connection between: Colab Session and your GitHub Repo**

### Insert your **credentials**

* The variable's content will exist only while the session exists. Once this session terminates, the variable's content will be lost.

In [1]:
from getpass import getpass
import os
print("* Type in and hit Enter")
UserName = getpass('GitHub User Name: ')
UserEmail = getpass('GitHub User E-mail: ')
RepoName = getpass('GitHub Repository Name: ')
UserPwd = getpass('GitHub Account Password: ')

* Type in and hit Enter
GitHub User Name: ··········
GitHub User E-mail: ··········
GitHub Repository Name: ··········
GitHub Account Password: ··········


---

### **Clone** your GitHub Repo to your current Colab session

* So you can have access to your project's files

In [2]:
! git clone https://github.com/{UserName}/{RepoName}.git

print("\n")
%cd /content/{RepoName}
print(f"\n\n* Current session directory is:  {os.getcwd()}")
print(f"* You may refresh the session folder to access {RepoName} folder.")

Cloning into 'WalkthroughProject1'...
remote: Enumerating objects: 256, done.[K
remote: Counting objects: 100% (256/256), done.[K
remote: Compressing objects: 100% (197/197), done.[K
remote: Total 256 (delta 118), reused 90 (delta 18), pack-reused 0[K
Receiving objects: 100% (256/256), 3.74 MiB | 6.59 MiB/s, done.
Resolving deltas: 100% (118/118), done.


/content/WalkthroughProject1


* Current session directory is:  /content/WalkthroughProject1
* You may refresh the session folder to access WalkthroughProject1 folder.


---

### **Connect** this Colab session to your GitHub Repo

* So if you need, you can push files generated in this session to your Repo.

In [3]:
!git config --global user.email {UserEmail}
!git config --global user.name {UserName}
!git remote rm origin
!git remote add origin https://{UserName}:{UserPwd}@github.com/{UserName}/{RepoName}.git
print(f"\n\n * The current Colab Session is connected to the following GitHub repo: {UserName}/{RepoName}")
print(" * You can now push new files to the repo.")



 * The current Colab Session is connected to the following GitHub repo: FernandoRocha88/WalkthroughProject1
 * You can now push new files to the repo.


---

### **Push** generated/new files from this Session to GitHub repo

* Git commit

In [12]:
CommitMsg = "added-dataset"
!git add .
!git commit -m {CommitMsg}

[main 002b4d7] added-dataset
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename inputs/datasets/{weatherAUS1.csv => weatherAUS.csv} (100%)


* Git Push

In [13]:
!git push origin main

Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects:  50% (1/2)   Compressing objects: 100% (2/2)   Compressing objects: 100% (2/2), done.
Writing objects:  25% (1/4)   Writing objects:  50% (2/4)   Writing objects:  75% (3/4)   Writing objects: 100% (4/4)   Writing objects: 100% (4/4), 351 bytes | 351.00 KiB/s, done.
Total 4 (delta 1), reused 2 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.[K
To https://github.com/FernandoRocha88/WalkthroughProject1.git
   03caedf..002b4d7  main -> main


---

### **Delete** Cloned Repo from current Session

In [None]:
%cd /content
!rm -rf {RepoName}
print(f"\n * Please refresh session folder to validate that {RepoName} folder was removed from this session.")

---

# Fecth data from Kaggle

* Make sure kaggle package is installed. In a Colab session, it normally should be. In case it is not, run the following command in a code cell: **! pip install -q kaggle**

In [4]:
pip show kaggle

Name: kaggle
Version: 1.5.12
Summary: Kaggle API
Home-page: https://github.com/Kaggle/kaggle-api
Author: Kaggle
Author-email: support@kaggle.com
License: Apache 2.0
Location: /usr/local/lib/python3.7/dist-packages
Requires: tqdm, python-slugify, requests, certifi, python-dateutil, six, urllib3
Required-by: 


---

* You first need to download to your machine a **json file (authentication token)** from Kaggle for authentication. 
* The process is:
  1. From the site header, click on your user profile picture, then on “My Account” from the dropdown menu. This will take you to your account settings. Scroll down to the section of the page labelled API:
  2. Click Expire API Token to remove previous tokens
  3. To create a new token, click on the “Create New API Token” button. It will generate a fresh authentication token and will download kaggle.json file on your machine.
  

* In case you find any difficulty, go to "Authentication" section in this [link](https://www.kaggle.com/docs/api).



* In the end, you should have this file saved locally in your machine. **Please make sure this file is labelled as kaggle.json**


* Upload to this Colab session your kaggle.json file
* Once you run the cell below, Click on "Choose Files", find your kaggle.json file and select it

In [5]:
from google.colab import files
files.upload()

import os
os.environ['KAGGLE_CONFIG_DIR'] = os.getcwd()
! chmod 600 kaggle.json

Saving kaggle.json to kaggle.json


* Get the dataset path from the Kaggle url. When you are viewing the dataset at Kaggle, check what is after https://www.kaggle.com/ . You should copy that at KaggleDatasetPath.
* Set your destination folder.

In [6]:
KaggleDatasetPath = "jsphyg/weather-dataset-rattle-package"
DestinationFolder = "inputs/datasets"
!kaggle datasets download -d {KaggleDatasetPath} -p {DestinationFolder}

Downloading weather-dataset-rattle-package.zip to inputs/datasets
  0% 0.00/3.83M [00:00<?, ?B/s]
100% 3.83M/3.83M [00:00<00:00, 62.7MB/s]


* Unzip the downloaded file, delete the zip file and delete kaggle.json file

In [8]:
!unzip {DestinationFolder}/*.zip -d {DestinationFolder} \
  && rm {DestinationFolder}/*.zip \
  && rm kaggle.json

Archive:  inputs/datasets/weather-dataset-rattle-package.zip
  inflating: inputs/datasets/weatherAUS.csv  


* Well done! You can now push the changes to your GitHub Repo, using the Git commands (git add, git commit, git push)
* The codes for executing that are in the section "Connection between: Colab Session and your GitHub Repo"

---

* Once you **push the all files** to the Repo, you may save/push the notebook changes to the Repo.