<a href="https://colab.research.google.com/github/PavanDaniele/child-mind-institute-problematic-internet-use-challenge/blob/main/KaggleIntegrationSetup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to set the environment
This is a brief and basic introduction on how to set up the Google Colab environment for a Kaggle challenge. I will use Kaggle's API to connect your profile and download the dataset.

Tips: Remember to change the competition name, in this example I was working on 'Child Mind Institute — Problematic Internet Use'

**Step 1: (Optional)** Link Google Drive to Colab to save progress locally in your Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

**Step 2: Install and configure Kaggle's API** <br>
First, download your kaggle.json file from Kaggle (go to Account > API > Create New API Token).<br>
Second, upload kaggle.json to configure the Kaggle API. <br>
**_NW:_** Never send your _kaggle.json_ file. It contains your personal credentials.

Install the Kaggle package and authenticate:

In [None]:
!pip install kaggle # to install kaggle on the colab environment
from google.colab import files
files.upload()  # Upload kaggle.json here

**Step 3:** Save the 'kaggle.json' (if you've already configured the API) on your GitHub and make sure it's located ~/.kaggle/

In [None]:
!mkdir -p ~/.kaggle  # Create a hidden directory named '.kaggle' in the user's home directory if it doesn't already exist
!cp kaggle.json ~/.kaggle/  # Copy the 'kaggle.json' file to the newly created '.kaggle' directory
!chmod 600 ~/.kaggle/kaggle.json  # Change file permissions to make 'kaggle.json' readable and writable only by the file owner

**Step 3.1:** Verify Kaggle API with competition list (optional)

In [None]:
!kaggle competitions list # shows all competitions along with details

**Step 3.2:** Download all relevant files for that competition

In [None]:
!kaggle competitions download -c child-mind-institute-problematic-internet-use # This will download all relevant files for that competition directly into your Colab environment

Ok, now you have downloaded all relevant files for your competition!<br>

**Step 4: Unzip the files**<br> First, Unzip the downloaded file to access train.csv and other files. <br>Second, After unzipping, load train.csv into a DataFrame for data exploration and processing.

In [None]:
import zipfile
import os
import pandas as pd

##FIRST
# Unzip the dataset
with zipfile.ZipFile('/content/child-mind-institute-problematic-internet-use.zip', 'r') as zip_ref:
    zip_ref.extractall('/content/child_mind_institute')

# Check extracted files
os.listdir('/content/child_mind_institute')


## SECOND
# Load train.csv
train_df = pd.read_csv('/content/child_mind_institute/train.csv')
train_df.head() # Display the first few rows of train_set



Now you can work. Good Luck!

# How to commit and push properly

To work within the child-mind-institute-challenge folder of your GitHub repository and push updates consistently, you can follow these steps. This way, you won’t need to push a separate file every time but can commit all changes at once.

**Step 1:** I'm going to create a new branch in my repository for all Kaggle competitions I partecipate in.<br>This branch will be called: "child-mind-institute-problematic-internet-use-challenge".

**Step 2:** Clone the Repository

In [None]:
# Clone the repository
!git clone https://github.com/PavanDaniele/child-mind-institute-problematic-internet-use-challenge.git
%cd child-mind-institute-problematic-internet-use-challenge


In [None]:
!ls # to see files and directory in the current directory

**Step 3.1:** Commit and Push to the New Branch (every time you want to save changes to your code in GitHub, you'll need to follow these basic steps: see code below)

In [None]:
# Check the status of the repository
!git status

# Stage all changes
!git add .

# Commit changes
!git commit -m "Add ProjectDWM.ipynb and setup environment instructions for Child Mind Institute Challenge"

# Push to the new branch
!git push origin child-mind-institute-problematic-internet-use-challenge


Or you can simply run one of these functions:

In [None]:
# Function to commit changes only
def git_commit(commit_message="Update"):
    !git add .
    !git commit -m "{commit_message}"
# Usage example: git_commit("message bla bla bla")

# Function to push committed changes on a specific branch
def git_push():
    !git push origin child-mind-institute-problematic-internet-use-challenge
# Usage example: git_push()

# Function to commit and push directly
def git_commit_and_push(commit_message="Update"):
    !git add .
    !git commit -m "{commit_message}"
    !git push origin child-mind-institute-problematic-internet-use-challenge
# Usage example: git_commit_and_push("message bla bla bla")

Each time you make a change and want to save progress, you just need to run this cell with a new commit message.

**Step 3.2:**  Instead of running all the Git commands manually, you can use the following path in Colab: *File → Save a copy in GitHub*.<br>This simplifies the process and allows you to commit and push.<br>
**_NW:_** remember to save the file in the right repository and branch!

**TIPS:** Add Checkpoints to Your Notebook

- Use headers and markdown notes to clearly define each stage of your project (e.g., Data Exploration, Preprocessing, Model Building, Evaluation). This will improve the structure of your notebook and make it easier to follow.
- These checkpoints will also help when exporting the notebook as a report, saving you time in organizing your documentation.
- For consistent formatting, consider using numbered sections (e.g., 1. Data Exploration, 2. Data Cleaning, etc.).
