# Notebook objectives



* Fetch data from Kaggle
* Add dataset to the project
* Push generated file to GitHub repo



---

# **Connection between: Colab Session and your GitHub Repo**

### Insert your **credentials**

* The variable's content will exist only while the session exists. Once this session terminates, the variable's content will be lost.

In [None]:
from getpass import getpass
import os
print("* Type in and hit Enter")
UserName = getpass('GitHub User Name: ')
UserEmail = getpass('GitHub User E-mail: ')
RepoName = getpass('GitHub Repository Name: ')
UserPwd = getpass('GitHub Account Password: ')

---

### **Clone** your GitHub Repo to your current Colab session

* So you can have access to your project's files

In [None]:
! git clone https://github.com/{UserName}/{RepoName}.git

print("\n")
%cd /content/{RepoName}
print(f"\n\n* Current session directory is:  {os.getcwd()}")
print(f"* You may refresh the session folder to access {RepoName} folder.")

---

### **Connect** this Colab session to your GitHub Repo

* So if you need, you can push files generated in this session to your Repo.

In [None]:
!git config --global user.email {UserEmail}
!git config --global user.name {UserName}
!git remote rm origin
!git remote add origin https://{UserName}:{UserPwd}@github.com/{UserName}/{RepoName}.git
print(f"\n\n * The current Colab Session is connected to the following GitHub repo: {UserName}/{RepoName}")
print(" * You can now push new files to the repo.")

---

### **Push** generated/new files from this Session to GitHub repo

* Git commit

In [None]:
CommitMsg = "added-dataset"
!git add .
!git commit -m {CommitMsg}

* Git Push

In [None]:
!git push origin main

---

### **Delete** Cloned Repo from current Session

In [None]:
%cd /content
!rm -rf {RepoName}
print(f"\n * Please refresh session folder to validate that {RepoName} folder was removed from this session.")

---

# Fecth data from Kaggle

* Make sure kaggle package is installed. In a Colab session, it normally should be. In case it is not, run the following command in a code cell: **! pip install -q kaggle**

In [None]:
pip show kaggle

---

* You first need to download to your machine a **json file (authentication token)** from Kaggle for authentication. 
* The process is:
  1. From the site header, click on your user profile picture, then on “My Account” from the dropdown menu. This will take you to your account settings. Scroll down to the section of the page labelled API:
  2. Click Expire API Token to remove previous tokens
  3. To create a new token, click on the “Create New API Token” button. It will generate a fresh authentication token and will download kaggle.json file on your machine.
  

* In case you find any difficulty, go to "Authentication" section in this [link](https://www.kaggle.com/docs/api).



* In the end, you should have this file saved locally in your machine. **Please make sure this file is labelled as kaggle.json**


* Upload to this Colab session your kaggle.json file
* Once you run the cell below, Click on "Choose Files", find your kaggle.json file and select it

In [None]:
from google.colab import files
files.upload()

import os
os.environ['KAGGLE_CONFIG_DIR'] = os.getcwd()
! chmod 600 kaggle.json

* Get the dataset path from the Kaggle url. When you are viewing the dataset at Kaggle, check what is after https://www.kaggle.com/ . You should copy that at KaggleDatasetPath.
* Set your destination folder.

In [None]:
KaggleDatasetPath = "jsphyg/weather-dataset-rattle-package"
DestinationFolder = "inputs/datasets"
!kaggle datasets download -d {KaggleDatasetPath} -p {DestinationFolder}

* Unzip the downloaded file, delete the zip file and delete kaggle.json file

In [None]:
!unzip {DestinationFolder}/*.zip -d {DestinationFolder} \
  && rm {DestinationFolder}/*.zip \
  && rm kaggle.json

* Well done! You can now push the changes to your GitHub Repo, using the Git commands (git add, git commit, git push)
* The codes for executing that are in the section "Connection between: Colab Session and your GitHub Repo"

---

* Once you **push the all files** to the Repo, you may save/push the notebook changes to the Repo.

# Get spatial data

* https://simplemaps.com/data/au-cities

In [None]:
import pandas as pd
df = pd.read_csv("/content/WalkthroughProject1/inputs/datasets/weatherAUS.csv")
df.head()

In [None]:
df_spatial = (pd.read_csv("/content/WalkthroughProject1/inputs/datasets/au.csv")
              .filter(['city', 'lat', 'lng', 'admin_name'])
              )
# df_spatial.rename(mapper={"city":"Location"},inplace=True,axis=1)

In [None]:
df_spatial.head()

In [None]:
df = (df
      .merge(right=df_spatial, how='left',left_on='Location', right_on="city")
      # .drop(['city'],axis=1)
    )


In [None]:
for city_df in df.sort_values(by='Location').Location.unique():
  if city_df not in df_spatial.sort_values(by='city').city.unique():
    print(f"{city_df} not in spatial df")