# 🛠️ Hackathon Foundational Setup

This guide will walk you through almost everything you need to do **on Day 1** of the Data Analytics with AI Hackathon. Follow the steps in order, and don’t hesitate to reach out in your team chat if you hit any snags!

---

## 1. GitHub Repository & Permissions

1. **Create your repo from the template**  
   - Go to: [Code Institute Analytics Template](https://github.com/Code-Institute-Org/data-analytics-template)  
   - Click **Use this template** → **Create a new repository**  
   - Name it (e.g., `healthcare-insurance-dashboard`).

2. **Enforce branch protection**
   - Go to **Settings → Branches → Add rule**
   - Apply to the `main` branch and check:
     - ✅ Require pull request before merging
     - ✅ Require review from Code Owners
     - ✅ Do **not** allow force-pushes or bypasses

3. **Make the repo public & add collaborators**
   - Go to **Settings → Manage access**
   - Invite teammates or share the repo link

4. **Each collaborator forks the repo**
   - Clone your fork:
     ```bash
     git clone https://github.com/<your-username>/healthcare-insurance-dashboard.git
     ```

---

## 2. Local Project Setup

### 🗂️ Recommended Folder Structure

```plaintext
healthcare-insurance-dashboard/
├── .github/
├── data/
│   ├── raw/                # original dataset from Kaggle
│   ├── processed/          # cleaned data used for modeling
├── jupyter notebooks/              # Jupyter notebooks for EDA, modeling
│   ├── step1_etl_pipeline/                # step by step etl pipeline 
│   ├── step2_data_visualization/          # consolidated visualization snippets
├── src/                    # custom scripts for ETL, analysis
│   ├── etl.py
│   ├── visuals.py
├── .gitignore
├── requirements.txt
└── README.md


## 🧪 Virtual Environment Setup

### Remove any existing venv
rm -rf venv .venv

### Create and activate a new venv
python3 -m venv .venv  
source .venv/bin/activate        # Windows: source .venv/Scripts/activate

### Install packages
pip install -r requirements.txt  
pip install numpy plotly nbformat

### Save the environment
pip freeze > requirements.txt  

### Deactivate when done
deactivate


✅ Ensure .venv/ is included in your .gitignore file.

---

## 3. 🛠️ Project Management with GitHub Kanban Board

### 🛠️ Create a Project Board

1. Go to your GitHub repo → **Projects** tab → click **New Project**
2. Choose **Board view**
3. Name it (e.g., `Hackathon Sprint`)
4. Add columns:
   - `To Do`
   - `In Progress`
   - `Review`
   - `Done`

### 🧾 Create Issues & Link Tasks

1. Go to **Issues** → click **New Issue**
2. Use a naming format like:

   ```markdown
   [EDA] Initial exploration of raw dataset

3. Assign it to a teammate
4. Link it to the board via the Projects section in the issue sidebar

✅ Best Practices
One task = one issue

Use labels like: EDA, ETL, Visualization

Link commits using #issue-number
git commit -m "📊 Added EDA summary chart #7"

---


## 4. ⚙️ Kaggle API & Data Download

### ⚙️ Setup

#### Install the Kaggle Python package
pip install kaggle  

#### Create the .kaggle directory in the user's home folder
mkdir -p ~/.kaggle  

#### Move the downloaded Kaggle API key (kaggle.json) to the .kaggle directory
mv ~/Downloads/kaggle.json ~/.kaggle/  

#### Set the appropriate permissions for the Kaggle API key file
chmod 600 ~/.kaggle/kaggle.json  

#### 📦 Example Download  

kaggle datasets download -d sakshigoyal7/credit-card-customers -p data/raw --unzip

---


## 5. 🧪 Data Loading & Saving

import pandas as pd

# Load from raw folder
df = pd.read_csv('data/raw/BankChurners.csv')

# Example transformation
df_cleaned = df.drop(columns=['Unnamed: 0'])

# Save to processed folder
df_cleaned.to_csv('data/processed/cleaned.csv', index=False)


## 6. 🤝 Collaboration Methods

### 🔗 A. VS Code Live Share (Recommended)
1. Install the Live Share extension from the VS Code Marketplace

2. Click the Live Share button in the bottom bar

3. Sign in using GitHub or Microsoft

4. Share the session link in your team chat

5. Collaborate live on notebooks and scripts

### 🌿 B. Branching Workflow (Optional)

git checkout -b etl-pipeline     # Create a new branch  
#### Make your changes  
git add .  
git commit -m "added ETL pipeline"  
git push origin etl-pipeline  

- Open a Pull Request on GitHub

- Request a review

- Merge once approved

---

## 7. ✅ Good Coding Practices
- Use meaningful variable names

- Write modular, reusable functions

- Add docstrings and inline comments

- Use CoPilot or AI tools responsibly

- Maintain a clear folder structure: data/, src/, notebooks/

- Use relative imports inside scripts

- Move heavy logic from notebooks to .py files

- Follow PEP8 and use tools like flake8, or pylint

---

## 8. 🔁 Commit & Push Workflow

### For Collaborators

git checkout -b <feature-branch>        # Create your feature branch
git add .                               # Stage changes
git commit -m "🧼 Cleaned data"          # Use a clear message
git push origin <feature-branch>        # Push to GitHub


### 🤝 For Project Owner

1. Go to the **Pull Requests** tab on GitHub  
2. Review the PR  
3. Add comments if needed or **merge** when approved  
4. Delete the branch after merging

---

## 🚀 Git Workflow for Team Collaboration

Using a consistent Git workflow improves collaboration and reduces confusion. Here's a clean process to follow:

---

### 1. ✨ Create a New Branch

```bash
git checkout -b sb_feature_name         # Use initials and feature description 
                                        # to name branches clearly 
                                        #(e.g., sb_dashboard-refactor).



### 2. 📃 Develop and Commit

git add .  
git commit -m "Added feature XYZ with improvements"  
git push origin sb_feature_name  

Make small, meaningful commits as you work to maintain a readable history.

### 3. 🔄 Keep Your Branch Updated

git fetch

### 4. 🧹 Squash Commits Before Merging

git rebase -i origin/main

### 5. 🔗 Merge into Main

git checkout main  
git merge sb_feature_name  
git push origin main  


After merging, push your updated main branch to the remote repository.

### 6. 🗑️ Cleanup

git push origin --delete sb_feature_name  
git branch -d sb_feature_name  


💻 Bonus: Use VS Code Git Integration
Switch branches: Click the branch name in the bottom-left corner  

Stage & commit: Open the Source Control tab (Ctrl+Shift+G / Cmd+Shift+G)  

Fetch/Push: Use Ctrl+Shift+P → search for Git: Fetch or Git: Push  


---

## Credits

📘 Git Cheatsheet: https://github.com/banerjixplores/Git-Cheatsheet (Forked from
                                                                    Vasi's repo)

📊 Data Analytics Template: https://github.com/Code-Institute-Org/data-analytics-template

📂 Healthcare Dataset on Kaggle: https://www.kaggle.com/datasets/willianoliveiragibin/healthcare-insurance