# CHUNGKHOAN AI - Google Colab Research Pipeline

## **Luan van nghien cuu: Du doan gia co phieu su dung Deep Learning**

---

## Tong quan
Notebook nay chay pipeline hoan chinh de tao tat ca ket qua can thiet cho luan van:
- Thu thap du lieu tu vnstock API
- Baseline models (Naive, SMA, EMA, ARIMA)
- Deep Learning models (TCN-Residual, GRU, LSTM)
- Ensemble learning
- Backtesting va danh gia
- Tao bao cao LaTeX cho luan van

## **Cai tien moi (Fix scaling & loss function)**
- ✅ Fixed: Khong scale targets (y) - giu nguyen log-returns
- ✅ Fixed: Giam delta Huber loss tu 1.0 xuong 0.1 cho log-returns nho
- ✅ Improved: Model se nhay cam hon voi bien dong nho

---

## Buoc 1: Cai dat moi truong

In [None]:
# Kiem tra GPU
!nvidia-smi

# Cai dat packages can thiet
!pip install -q vnstock pandas numpy pandas-ta scikit-learn matplotlib tensorflow statsmodels PyYAML tqdm scipy

# Clone repository (neu can)
# !git clone https://github.com/your-repo/chungkhoan-ai.git
# %cd chungkhoan-ai

print("Environment setup completed!")

## Buoc 2: Thu thap du lieu

In [None]:
# Mount Google Drive (tuy chon)
from google.colab import drive
drive.mount('/content/drive')

# Thay doi working directory
%cd /content/drive/MyDrive/ChungKhoanAI  # Thay doi duong dan phu hop

# Thu thap du lieu (tu dong fallback sang sample data neu gap loi 403)
!python src/collect_vnstock.py --tickers FPT HPG VNM VNINDEX --start 2015-01-01 --end 2025-08-28

print("Data collection completed!")
print("Luu y: Neu thay 'Creating sample data', co nghia la API bi chan va dang su dung du lieu mau")

## Buoc 3: Chuan bi dataset

In [None]:
# Chuan bi dataset cho baseline comparison
!python src/prepare_dataset.py --config configs/config_baseline.yaml

print("Dataset preparation completed!")

## Buoc 4: Baseline Models

In [None]:
# Chay baseline models
!python run_baselines.py --tickers FPT HPG VNM

print("Baseline models completed!")

## Buoc 5: Deep Learning Models

In [None]:
# Training TCN models (optimized极 for T4 GPU - ~5-7 minutes)
!python src/train.py --config configs/config_baseline.yaml

print("Deep learning training completed!")
print("Optimized for T4 15GB GPU with memory management")
print("NOTE: Da fix scaling (khong scale targets) va loss function (delta=0.1)")


## Buoc 6: Ensemble Models

In [None]:
# Tao ensemble predictions
!python src/ensemble.py --config configs/config_baseline.yaml

print("Ensemble creation completed!")

## Buoc 7: So sanh va phan tich

In [None]:
# So sanh baseline vs DL models
!python compare_baselines.py

print("Comparison analysis completed!")

## Buoc 8: Tao bao cao cho luan van

In [None]:
# Tao research summary
import sys
sys.path.append('src')
from research_summary import generate_research_report
generate_research_report()

print("Research reports generated!")

## Buoc 9: Kiem tra ket qua

In [None]:
# Hien thi tom tat ket qua
print("RESEARCH COMPLETION SUMMARY")
print("=" * 60)

# Doc va hien thi summary
with open('reports/research_summary.txt', 'r', encoding='utf-8') as f:
    content = f.read()
    print(content[:1000])  # Hien thi 1000 ky tu dau
    print("...")
    print(f"\nFull summary available at: reports/research_summary.txt")
    print(f"LaTeX tables available at: reports/latex_tables/")
    print(f"Detailed results available at: reports/")

print("Ready for thesis writing!")

## Buoc 10: Download ket qua

In [None]:
# Tao zip file de download
import shutil

# Zip toan bo reports folder
shutil.make_archive('thesis_results', 'zip', 'reports')

print("Results zipped as: thesis_results.zip")
print("Download file nay ve may de su dung cho luan van")