# Project: Portfolio - Final Project

**Instructions for Students:**

Please carefully follow these steps to complete and submit your assignment:

1. **Completing the Assignment**: You are required to work on and complete all tasks in the provided assignment. Be disciplined and ensure that you thoroughly engage with each task.
   
2. **Creating a Google Drive Folder**: If you don't previously have a folder for collecting assignments, you must create a new folder in your Google Drive. This will be a repository for all your completed assignment files, helping you keep your work organized and easy to access.
   
3. **Uploading Completed Assignment**: Upon completion of your assignment, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.
   
4. **Sharing Folder Link**: You're required to share the link to your assignment Google Drive folder. This is crucial for the submission and evaluation of your assignment.
   
5. **Setting Permission toPublic**: Please make sure your **Google Drive folder is set to public**. This allows your instructor to access your solutions and assess your work correctly.

Adhering to these procedures will facilitate a smooth assignment process for you and the reviewers.

**Description:**

Welcome to your final portfolio project assignment for AI Bootcamp. This is your chance to put all the skills and knowledge you've learned throughout the bootcamp into action by creating real-world AI application.

You have the freedom to create any application or model, be it text-based or image-based or even voice-based or multimodal.

To get you started, here are some ideas:

1. **Sentiment Analysis Application:** Develop an application that can determine sentiment (positive, negative, neutral) from text data like reviews or social media posts. You can use Natural Language Processing (NLP) libraries like NLTK or TextBlob, or more advanced pre-trained models from transformers library by Hugging Face, for your sentiment analysis model.

2. **Chatbot:** Design a chatbot serving a specific purpose such as customer service for a certain industry, a personal fitness coach, or a study helper. Libraries like ChatterBot or Dialogflow can assist in designing conversational agents.

3. **Predictive Text Application:** Develop a model that suggests the next word or sentence similar to predictive text on smartphone keyboards. You could use the transformers library by Hugging Face, which includes pre-trained models like GPT-2.

4. **Image Classification Application:** Create a model to distinguish between different types of flowers or fruits. For this type of image classification task, pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be utilized.

5. **News Article Classifier:** Develop a text classification model that categorizes news articles into predefined categories. NLTK, SpaCy, and sklearn are valuable libraries for text pre-processing, feature extraction, and building classification models.

6. **Recommendation System:** Create a simplified recommendation system. For instance, a book or movie recommender based on user preferences. Python's Surprise library can assist in building effective recommendation systems.

7. **Plant Disease Detection:** Develop a model to identify diseases in plants using leaf images. This project requires a good understanding of convolutional neural networks (CNNs) and image processing. PyTorch, TensorFlow, and OpenCV are all great tools to use.

8. **Facial Expression Recognition:** Develop a model to classify human facial expressions. This involves complex feature extraction and classification algorithms. You might want to leverage deep learning libraries like TensorFlow or PyTorch, along with OpenCV for processing facial images.

9. **Chest X-Ray Interpretation:** Develop a model to detect abnormalities in chest X-ray images. This task may require understanding of specific features in such images. Again, TensorFlow and PyTorch for deep learning, and libraries like SciKit-Image or PIL for image processing, could be of use.

10. **Food Classification:** Develop a model to classify a variety of foods such as local Indonesian food. Pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be a good starting point.

11. **Traffic Sign Recognition:** Design a model to recognize different traffic signs. This project has real-world applicability in self-driving car technology. Once more, you might utilize PyTorch or TensorFlow for the deep learning aspect, and OpenCV for image processing tasks.

**Submission:**

Please upload both your model and application to Huggingface or your own Github account for submission.

**Presentation:**

You are required to create a presentation to showcase your project, including the following details:

- The objective of your model.
- A comprehensive description of your model.
- The specific metrics used to measure your model's effectiveness.
- A brief overview of the dataset used, including its source, pre-processing steps, and any insights.
- An explanation of the methodology used in developing the model.
- A discussion on challenges faced, how they were handled, and your learnings from those.
- Suggestions for potential future improvements to the model.
- A functioning link to a demo of your model in action.

**Grading:**

Submissions will be manually graded, with a select few given the opportunity to present their projects in front of a panel of judges. This will provide valuable feedback, further enhancing your project and expanding your knowledge base.

Remember, consistent practice is the key to mastering these concepts. Apply your knowledge, ask questions when in doubt, and above all, enjoy the process. Best of luck to you all!


In [None]:
# @title #### Student Identity
student_id = "REA6HXRRQ" # @param {type:"string"}
name = "Ratih Dewi Setyo Jati" # @param {type:"string"}
drive_link = "https://drive.google.com/drive/folders/1Vrjou0HK9_7a2qA6abCbjXJA6WRDqzBU?usp=sharing"  # @param {type:"string"}
assignment_id = "00_portfolio_project"

## Installation and Import `rggrader` Package

In [None]:
%pip install rggrader
from rggrader import submit_image
from rggrader import submit





Collecting rggrader
  Downloading rggrader-0.1.6-py3-none-any.whl.metadata (485 bytes)
Downloading rggrader-0.1.6-py3-none-any.whl (2.5 kB)
Installing collected packages: rggrader
Successfully installed rggrader-0.1.6


## Working Space

In [None]:
!pip install PyMuPDF gradio google-generativeai PyPDF2 pandas openpyxl

import PyPDF2
import gradio as gr
from google.colab import drive
import google.generativeai as genai
import re
import os
from datetime import datetime
import pandas as pd
from io import BytesIO

# Variabel global untuk menyimpan log
chat_log = []

def add_to_log(message):
    chat_log.append(message)
    print(message)

# Fungsi untuk membaca semua PDF dalam folder
def extract_text_from_folder(folder_path):
    all_text = ""

    if not os.path.exists(folder_path):
        add_to_log(f" Folder tidak ditemukan: {folder_path}")
        return None

    if not os.path.isdir(folder_path):
        add_to_log(f" Path bukan folder: {folder_path}")
        return None

    add_to_log(f" Isi folder: {os.listdir(folder_path)}")
    pdf_files = [f for f in os.listdir(folder_path) if f.lower().endswith('.pdf')]

    if not pdf_files:
        add_to_log(" Tidak ada file PDF di folder ini")
        return None

    add_to_log(f" Akan membaca {len(pdf_files)} file PDF: {pdf_files}")

    for pdf_file in pdf_files:
        file_path = os.path.join(folder_path, pdf_file)
        try:
            with open(file_path, 'rb') as file:
                reader = PyPDF2.PdfReader(file)
                add_to_log(f" Membaca {pdf_file} ({len(reader.pages)} halaman)")

                for page in reader.pages:
                    all_text += page.extract_text() + "\n---\n"

            add_to_log(f" Berhasil membaca {pdf_file}")
        except Exception as e:
            add_to_log(f" Gagal membaca {pdf_file}: {str(e)}")

    return all_text if all_text else None

# Fungsi untuk memproses data libur
def process_holiday_data(text):
    national_holidays = []
    joint_holidays = []

    try:
        text = text.replace('\r', ' ').replace('\n', ' ')
        text = ' '.join(text.split())

        add_to_log(f"Teks normalisasi (500 karakter): {text[:500]}...")

        parts = re.split(r'1\.\s*Hari\s*libur\s*nasional|2\.\s*Cuti\s*bersama', text, flags=re.IGNORECASE)

        if len(parts) < 3:
            raise ValueError("Format dokumen tidak dikenali")

        national_section = parts[1]
        national_pattern = re.compile(
            r'(?P<no>\d+)\s+(?P<tanggal>\d{1,2}\s+[A-Za-z]+\s*(?:\-\s*\d{1,2}\s+[A-Za-z]+)?)\s+'
            r'(?P<hari>[A-Za-z]+(?:\s*\-\s*[A-Za-z]+)?)\s+'
            r'(?P<keterangan>.+?)(?=\s*\d+\s+\d{1,2}\s+[A-Za-z]+|$)',
            re.IGNORECASE
        )

        for match in national_pattern.finditer(national_section):
            national_holidays.append({
                'tanggal': match.group('tanggal').strip(),
                'hari': match.group('hari').strip(),
                'keterangan': match.group('keterangan').strip()
            })

        joint_section = parts[2]
        joint_pattern = re.compile(
            r'(?P<no>\d+)\s+(?P<tanggal>\d{1,2}\s*[A-Za-z]+\s*(?:,\s*\d{1,2}\s*[A-Za-z]+\s*(?:dan\s*\d{1,2}\s*[A-Za-z]+)?)?)\s+'
            r'(?P<hari>[A-Za-z]+)\s+'
            r'(?P<keterangan>.+?)(?=\s*\d+\s+\d{1,2}\s*[A-Za-z]+|$)',
            re.IGNORECASE
        )

        for match in joint_pattern.finditer(joint_section):
            joint_holidays.append({
                'tanggal': match.group('tanggal').strip(),
                'hari': match.group('hari').strip(),
                'keterangan': match.group('keterangan').strip()
            })

        add_to_log(f" Libur nasional ditemukan: {len(national_holidays)}")
        add_to_log(f" Cuti bersama ditemukan: {len(joint_holidays)}")

        return national_holidays, joint_holidays
    except Exception as e:
        add_to_log(f" Error proses data: {str(e)}")
        return [], []

def find_holidays_by_month(month, national_holidays, joint_holidays):
    month_map = {
        'januari': ['januari', 'jan'], 'februari': ['februari', 'feb'],
        'maret': ['maret', 'mar'], 'april': ['april', 'apr'],
        'mei': ['mei', 'mei'], 'juni': ['juni', 'jun'],
        'juli': ['juli', 'jul'], 'agustus': ['agustus', 'agu', 'aug'],
        'september': ['september', 'sep'], 'oktober': ['oktober', 'okt', 'oct'],
        'november': ['november', 'nov'], 'desember': ['desember', 'des', 'dec']
    }

    target_months = month_map.get(month.lower(), [month.lower()])
    result = []

    for holiday in national_holidays:
        holiday_month = re.search(r'(\d{1,2}\s+)([A-Za-z]+)', holiday['tanggal'])
        if holiday_month and any(m in holiday_month.group(2).lower() for m in target_months):
            result.append(f" Libur Nasional: {holiday['tanggal']} ({holiday['hari']}) - {holiday['keterangan']}")

    for holiday in joint_holidays:
        holiday_month = re.search(r'(\d{1,2}\s*)([A-Za-z]+)', holiday['tanggal'])
        if holiday_month and any(m in holiday_month.group(2).lower() for m in target_months):
            result.append(f" Cuti Bersama: {holiday['tanggal']} ({holiday['hari']}) - {holiday['keterangan']}")

    return "\n".join(result) if result else f"Tidak ada libur di bulan {month.capitalize()}"

# Fungsi untuk memproses file Excel jadwal tim
def process_schedule_file(file_path):
    try:
        xls = pd.ExcelFile(file_path)
        sheets = {}

        for sheet_name in xls.sheet_names:
            df = pd.read_excel(xls, sheet_name=sheet_name, header=None)
            sheets[sheet_name] = df

        return sheets
    except Exception as e:
        add_to_log(f" Gagal membaca file Excel: {str(e)}")
        return None

def get_all_personnel_names(schedule_data):
    """Mendapatkan semua nama personel dari data jadwal"""
    personnel_names = set()
    for sheet_name, df in schedule_data.items():
        for idx, row in df.iterrows():
            if isinstance(row[0], str) and row[0].strip():
                if idx > 3:  # Asumsi baris 0-3 adalah header
                    name = row[0].split()[0]
                    personnel_names.add(name.strip().title())
    return sorted(list(personnel_names))

def get_person_schedule(person_name, month, schedule_data):
    try:
        month = month.lower()[:3]  # Normalisasi ke 3 huruf pertama

        # 1. Cari sheet yang mengandung nama bulan
        target_sheet = None
        for sheet_name, df in schedule_data.items():
            if month in sheet_name.lower():
                target_sheet = df
                add_to_log(f" Menggunakan sheet: {sheet_name}")
                break

        # Jika tidak ditemukan, cari sheet dengan pola lain
        if target_sheet is None:
            for sheet_name, df in schedule_data.items():
                if any(m in sheet_name.lower() for m in ['schedule', 'jadwal', 'tabel']):
                    target_sheet = df
                    add_to_log(f" Menggunakan sheet alternatif: {sheet_name}")
                    break

        # Jika masih tidak ditemukan, gunakan sheet pertama
        if target_sheet is None:
            target_sheet = next(iter(schedule_data.values()))
            add_to_log(f" Menggunakan sheet pertama: {list(schedule_data.keys())[0]}")

        # 2. Cari personel dengan pencarian lebih fleksibel
        person_row = None
        for idx, row in target_sheet.iterrows():
            if isinstance(row[0], str) and person_name.lower() in row[0].lower().replace(" ", ""):
                person_row = row
                add_to_log(f" Ditemukan baris untuk {person_name} di baris {idx}")
                break

        if person_row is None:
            available_names = get_all_personnel_names(schedule_data)
            return f"Personel {person_name} tidak ditemukan. Yang tersedia: {', '.join(available_names)}"

        # 3. Ekstrak tanggal - cari baris yang berisi angka tanggal
        dates = []
        for row_idx in range(len(target_sheet)):
            row = target_sheet.iloc[row_idx]
            if any(isinstance(cell, (int, float)) and 1 <= cell <= 31 for cell in row[1:] if pd.notna(cell)):
                date_row_idx = row_idx
                add_to_log(f" Baris tanggal ditemukan di baris {date_row_idx}")
                break

        # 4. Kumpulkan jadwal
        schedule_entries = []
        if 'date_row_idx' in locals():
            date_row = target_sheet.iloc[date_row_idx]
            for col in range(1, len(person_row)):
                if col < len(date_row) and pd.notna(date_row[col]):
                    try:
                        date_val = int(float(date_row[col]))
                        shift = person_row[col] if col < len(person_row) and pd.notna(person_row[col]) else None
                        if shift:
                            # Cari nama hari di baris berikutnya
                            day_name = target_sheet.iloc[date_row_idx+1, col] if (date_row_idx+1) < len(target_sheet) else ''
                            schedule_entries.append((date_val, str(shift).strip(), str(day_name).strip()))
                    except:
                        continue

        # 5. Format output
        result = f" Jadwal {person_name.title()} - April 2025:\n"
        if not schedule_entries:
            result += "Tidak ada jadwal yang ditemukan\n"
            # Debug info
            result += f"\nDebug Info:\n- Sheet: {list(schedule_data.keys())}\n"
            result += f"- Baris tanggal: {'ditemukan' if 'date_row_idx' in locals() else 'tidak ditemukan'}\n"
        else:
            for date, shift, day_name in sorted(schedule_entries, key=lambda x: x[0]):
                result += f"{date:02d} Apr: {shift}"
                if day_name:
                    result += f" ({day_name})"
                result += "\n"

        return result

    except Exception as e:
        add_to_log(f" Error: {str(e)}")
        return f"Terjadi error saat memproses jadwal {person_name}"

        # 6. Hitung total shift
        shift_counts = {
            'Shift 1 (P)': 0,
            'Shift 2 (S)': 0,
            'Shift 3 (M)': 0,
            'Leave (L)': 0,
            'Cuti (C)': 0,
            'Lainnya': 0
        }

        for _, shift, _ in schedule_entries:
            shift = shift.upper()
            if 'P' in shift:
                shift_counts['Shift 1 (P)'] += 1
            if 'S' in shift:
                shift_counts['Shift 2 (S)'] += 1
            if 'M' in shift:
                shift_counts['Shift 3 (M)'] += 1
            if shift == 'L':
                shift_counts['Leave (L)'] += 1
            if shift == 'C':
                shift_counts['Cuti (C)'] += 1
            if shift not in ['P', 'S', 'M', 'L', 'C']:
                shift_counts['Lainnya'] += 1

        result += "\nTotal:\n"
        for shift_type, count in shift_counts.items():
            if count > 0:
                result += f"- {shift_type}: {count} hari\n"

        return result

    except Exception as e:
        add_to_log(f" Error: {str(e)}")
        return f"Terjadi error saat memproses jadwal {person_name}"

def get_personnel_by_date(date, month, schedule_data):
    try:
        target_sheet = None
        month = month.lower()[:3] if month else 'apr'

        for sheet_name, df in schedule_data.items():
            if month in sheet_name.lower()[:3]:
                target_sheet = df
                break

        if target_sheet is None:
            target_sheet = next(iter(schedule_data.values()))

        date_col = None
        header_row = 3

        for col in range(len(target_sheet.columns)):
            if header_row < len(target_sheet):
                date_cell = target_sheet.iloc[header_row, col]
                if pd.notna(date_cell) and isinstance(date_cell, (int, float)) and int(float(date_cell)) == date:
                    date_col = col
                    break

        if date_col is None:
            return f"Tidak menemukan jadwal untuk tanggal {date} {month.capitalize()}"

        personnel = []

        for idx, row in target_sheet.iterrows():
            if isinstance(row[0], str) and idx > header_row:
                person_name = row[0].split()[0]
                shift = row[date_col] if date_col < len(row) and pd.notna(row[date_col]) else None
                if shift:
                    personnel.append((person_name, shift))

        result = f"👥 Personel pada {date} {month.capitalize()} 2025:\n"
        for person, shift in personnel:
            result += f"- {person}: {shift}\n"

        shift_types = set([s for _, s in personnel])
        result += "\nShift yang tersedia:\n"
        for shift in shift_types:
            count = len([p for p, s in personnel if s == shift])
            result += f"- {shift}: {count} orang\n"

        return result
    except Exception as e:
        add_to_log(f" Error mencari personel: {str(e)}")
        return f"Terjadi error saat mencari personel untuk tanggal {date} {month.capitalize()}"

def main():
    api_key = input("Masukkan Google Gemini API Key: ")
    genai.configure(api_key=api_key)
    model = genai.GenerativeModel('gemini-pro')

    drive.mount('/content/drive')

    folder_path = '/content/drive/MyDrive/Final Project Bootcamp/chatbot_docs'

    combined_text = extract_text_from_folder(folder_path)

    national_holidays, joint_holidays = [], []
    if combined_text:
        national_holidays, joint_holidays = process_holiday_data(combined_text)

    schedule_data = None
    excel_files = [f for f in os.listdir(folder_path) if f.lower().endswith(('.xlsx', '.xls'))]

    if excel_files:
        excel_file = excel_files[0]
        excel_path = os.path.join(folder_path, excel_file)
        schedule_data = process_schedule_file(excel_path)

    def chatbot(question, history=None):
        add_to_log(f"\n Pertanyaan: {question}")

        try:
            personnel_list = get_all_personnel_names(schedule_data) if schedule_data else []

            personel_dicari = None
            for name in personnel_list:
                if name.lower() in question.lower():
                    personel_dicari = name
                    break

            tanggal_dicari = None
            bulan_dicari = None
            date_match = re.search(r'(\d{1,2})\s*(april|apr|maret|mar|mei|juni|jun|juli|jul|agustus|agu|aug|september|sep|oktober|okt|oct|november|nov|desember|des|dec)',
                                 question.lower())
            if date_match:
                tanggal_dicari = int(date_match.group(1))
                bulan_dicari = date_match.group(2)

            if not personel_dicari and not tanggal_dicari:
                for bulan in ['januari', 'februari', 'maret', 'april', 'mei', 'juni',
                             'juli', 'agustus', 'september', 'oktober', 'november', 'desember']:
                    if bulan in question.lower():
                        bulan_dicari = bulan
                        break

            if personel_dicari and schedule_data:
                bulan_pertanyaan = bulan_dicari if bulan_dicari else 'april'
                response = get_person_schedule(personel_dicari, bulan_pertanyaan, schedule_data)
                add_to_log(f" Jawaban dari jadwal tim: {response[:200]}...")

            elif tanggal_dicari and schedule_data:
                response = get_personnel_by_date(tanggal_dicari, bulan_dicari, schedule_data)
                add_to_log(f" Jawaban dari jadwal tim: {response[:200]}...")

            elif bulan_dicari and national_holidays and joint_holidays:
                response = find_holidays_by_month(bulan_dicari, national_holidays, joint_holidays)
                add_to_log(f" Jawaban dari data libur: {response}")

            else:
                response = model.generate_content(question).text
                add_to_log(f" Jawaban dari Gemini: {response}")

            return response, "\n".join(chat_log[-5:])

        except Exception as e:
            error_msg = f" Error: {str(e)}"
            add_to_log(error_msg)
            return error_msg, "\n".join(chat_log[-5:])

    with gr.Blocks() as demo:
        gr.Markdown("##  Chatbot Jadwal Tim & Libur Nasional")
        gr.Markdown("Tanyakan tentang jadwal tim atau hari libur dan cuti bersama tahun 2025")

        with gr.Row():
            with gr.Column():
                question = gr.Textbox(label="Tanya tentang jadwal atau libur",
                                    placeholder="Contoh: Tampilkan jadwal Ratih untuk April 2025")
                submit_btn = gr.Button("Kirim")

            with gr.Column():
                output = gr.Textbox(label="Jawaban")
                console = gr.Textbox(label="Console Log", interactive=False, lines=5)

        submit_btn.click(
            fn=chatbot,
            inputs=question,
            outputs=[output, console]
        )

        question.submit(
            fn=chatbot,
            inputs=question,
            outputs=[output, console]
        )

    demo.launch(share=True)

if __name__ == "__main__":
    main()

Collecting PyMuPDF
  Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting gradio
  Downloading gradio-5.29.0-py3-none-any.whl.metadata (16 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.10.0 (from gradio)
  Downloading gradio_client-1.10.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py

## Submit Notebook

In [None]:
portfolio_link = "https://github.com/arrdsj/DocuBot"
presentation_link = "https://docs.google.com/presentation/d/1NNsNIM0Q6AC_zIojQHBdWvnOkkOHs_77/edit?usp=sharing&ouid=110604046736752212553&rtpof=true&sd=true"

question_id = "01_portfolio_link"
submit(student_id, name, assignment_id, str(portfolio_link), question_id, drive_link)

question_id = "02_presentation_link"
submit(student_id, name, assignment_id, str(presentation_link), question_id, drive_link)

'Assignment successfully submitted'