# Library Installation and Setup

## Overview
This notebook installs all required libraries for the Khmer text classification project. Run this notebook first before using the other notebooks in this project.

## Libraries Installed
- **Core Libraries**: pandas, numpy, matplotlib, seaborn
- **NLP Libraries**: gensim, scikit-learn, khmernltk
- **FastText**: For word embeddings
- **Utility Libraries**: tqdm, joblib

## Installation Order
1. **System packages** (if needed)
2. **Core Python packages**
3. **NLP and ML packages**
4. **Khmer-specific packages**

---

In [1]:
# Check Python version and pip
import sys
import subprocess

print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

# Check if pip is available
try:
    result = subprocess.run([sys.executable, "-m", "pip", "--version"], 
                          capture_output=True, text=True, check=True)
    print(f"pip version: {result.stdout.strip()}")
except subprocess.CalledProcessError:
    print("pip not found - installing pip...")
    subprocess.check_call([sys.executable, "-m", "ensurepip", "--upgrade"])

Python version: 3.9.6 (default, Nov 11 2024, 03:15:38) 
[Clang 16.0.0 (clang-1600.0.26.6)]
Python executable: /Library/Developer/CommandLineTools/usr/bin/python3
pip version: pip 25.0.1 from /Users/socheata/Library/Python/3.9/lib/python/site-packages/pip (python 3.9)
pip version: pip 25.0.1 from /Users/socheata/Library/Python/3.9/lib/python/site-packages/pip (python 3.9)


In [2]:
import sys
import subprocess

# Function to install packages with error handling
def install_package(package_name, extra_args=None):
    """Install a package using pip with error handling"""
    try:
        cmd = [sys.executable, "-m", "pip", "install", "--upgrade", package_name]
        if extra_args:
            cmd.extend(extra_args)
        
        print(f"Installing {package_name}...")
        result = subprocess.run(cmd, capture_output=True, text=True)
        
        if result.returncode == 0:
            print(f"✅ {package_name} installed successfully")
        else:
            print(f"❌ Failed to install {package_name}")
            print(f"Error: {result.stderr}")
    except Exception as e:
        print(f"❌ Exception installing {package_name}: {str(e)}")

def check_package(package_name, import_name=None):
    """Check if a package is installed and importable"""
    if import_name is None:
        import_name = package_name
    
    try:
        __import__(import_name)
        print(f"✅ {package_name} is available")
        return True
    except ImportError:
        print(f"❌ {package_name} not available")
        return False

# Function to install all required packages
def install_all_packages():
    """Install all required packages for the project"""
    packages = [
        "numpy",
        "pandas",
        "matplotlib",
        "seaborn",
        "tqdm",
        "joblib",
        "scikit-learn",
        "scipy",
        "gensim",
        "fasttext",
        "khmer-nltk",
        "streamlit",
        "PyPDF2"
    ]

    for package in packages:
        install_package(package)

# Call the function to install all packages
install_all_packages()

Installing numpy...
✅ numpy installed successfully
Installing pandas...
✅ numpy installed successfully
Installing pandas...
✅ pandas installed successfully
Installing matplotlib...
✅ pandas installed successfully
Installing matplotlib...
✅ matplotlib installed successfully
Installing seaborn...
✅ matplotlib installed successfully
Installing seaborn...
✅ seaborn installed successfully
Installing tqdm...
✅ seaborn installed successfully
Installing tqdm...
✅ tqdm installed successfully
Installing joblib...
✅ tqdm installed successfully
Installing joblib...
✅ joblib installed successfully
Installing scikit-learn...
✅ joblib installed successfully
Installing scikit-learn...
✅ scikit-learn installed successfully
Installing scipy...
✅ scikit-learn installed successfully
Installing scipy...
✅ scipy installed successfully
Installing gensim...
✅ scipy installed successfully
Installing gensim...
✅ gensim installed successfully
Installing fasttext...
✅ gensim installed successfully
Installing fast

In [3]:
# Install core scientific libraries
print("=== Installing Core Libraries ===")

core_packages = [
    "numpy",
    "pandas",
    "matplotlib",
    "seaborn",
    "tqdm",
    "joblib"
]

for package in core_packages:
    install_package(package)

=== Installing Core Libraries ===
Installing numpy...
✅ numpy installed successfully
Installing pandas...
✅ numpy installed successfully
Installing pandas...
✅ pandas installed successfully
Installing matplotlib...
✅ pandas installed successfully
Installing matplotlib...
✅ matplotlib installed successfully
Installing seaborn...
✅ matplotlib installed successfully
Installing seaborn...
✅ seaborn installed successfully
Installing tqdm...
✅ seaborn installed successfully
Installing tqdm...
✅ tqdm installed successfully
Installing joblib...
✅ tqdm installed successfully
Installing joblib...
✅ joblib installed successfully
✅ joblib installed successfully


In [4]:
# Install machine learning libraries
print("\n=== Installing Machine Learning Libraries ===")

ml_packages = [
    "scikit-learn",
    "scipy"
]

for package in ml_packages:
    install_package(package)


=== Installing Machine Learning Libraries ===
Installing scikit-learn...
✅ scikit-learn installed successfully
Installing scipy...
✅ scikit-learn installed successfully
Installing scipy...
✅ scipy installed successfully
✅ scipy installed successfully


In [5]:
# Install NLP libraries
print("\n=== Installing NLP Libraries ===")

nlp_packages = [
    "gensim",
    "fasttext"
]

for package in nlp_packages:
    install_package(package)


=== Installing NLP Libraries ===
Installing gensim...
✅ gensim installed successfully
Installing fasttext...
✅ gensim installed successfully
Installing fasttext...
✅ fasttext installed successfully
✅ fasttext installed successfully


In [6]:
# Install Khmer NLP toolkit
print("\n=== Installing Khmer NLP Toolkit ===")

# Install khmernltk
install_package("khmer-nltk")

# Alternative installation methods if the above fails
if not check_package("khmernltk"):
    print("Trying alternative installation for khmernltk...")
    try:
        # Try installing from GitHub
        subprocess.check_call([
            sys.executable, "-m", "pip", "install",
            "git+https://github.com/VietHoang1512/khmer-nltk"
        ])
        print("✅ khmernltk installed from GitHub")
    except:
        print("❌ Failed to install khmernltk from GitHub")
        print("You may need to install manually or check dependencies")


=== Installing Khmer NLP Toolkit ===
Installing khmer-nltk...
✅ khmer-nltk installed successfully
✅ khmernltk is available
✅ khmer-nltk installed successfully
✅ khmernltk is available


In [7]:
# Verify all installations
print("\n=== Verifying Installations ===")

packages_to_check = [
    ("numpy", "numpy"),
    ("pandas", "pandas"),
    ("matplotlib", "matplotlib"),
    ("seaborn", "seaborn"),
    ("scikit-learn", "sklearn"),
    ("gensim", "gensim"),
    ("khmernltk", "khmernltk"),
    ("tqdm", "tqdm"),
    ("joblib", "joblib")
]

all_installed = True
for package_name, import_name in packages_to_check:
    if not check_package(package_name, import_name):
        all_installed = False

if all_installed:
    print("\n🎉 All packages installed successfully!")
else:
    print("\n⚠️ Some packages failed to install. Please check the errors above.")


=== Verifying Installations ===
✅ numpy is available
✅ numpy is available
✅ pandas is available
✅ matplotlib is available
✅ pandas is available
✅ matplotlib is available
✅ seaborn is available
✅ scikit-learn is available
✅ gensim is available
✅ khmernltk is available
✅ tqdm is available
✅ joblib is available

🎉 All packages installed successfully!
✅ seaborn is available
✅ scikit-learn is available
✅ gensim is available
✅ khmernltk is available
✅ tqdm is available
✅ joblib is available

🎉 All packages installed successfully!




In [8]:
# Test core functionality
print("\n=== Testing Core Functionality ===")

try:
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.model_selection import train_test_split
    from gensim.models.fasttext import load_facebook_model
    import khmernltk
    
    print("✅ All core imports successful!")
    
    # Test basic functionality
    test_array = np.array([1, 2, 3])
    test_df = pd.DataFrame({'test': [1, 2, 3]})
    
    print(f"✅ NumPy test: {test_array.mean()}")
    print(f"✅ Pandas test: {len(test_df)} rows")
    print("✅ Basic functionality test passed!")
    
except Exception as e:
    print(f"❌ Import or functionality test failed: {str(e)}")


=== Testing Core Functionality ===
✅ All core imports successful!
✅ NumPy test: 2.0
✅ Pandas test: 3 rows
✅ Basic functionality test passed!
✅ All core imports successful!
✅ NumPy test: 2.0
✅ Pandas test: 3 rows
✅ Basic functionality test passed!


## Installation Complete!

If all tests passed, you can now proceed to use the other notebooks:

1. **`2_Data Preprocessing.ipynb`** - Data loading and preprocessing
2. **`3A_Model Development.ipynb`** - TF-IDF based models
3. **`4A_FastText_Model_Development.ipynb`** - FastText based models

### Troubleshooting

If any package failed to install:

1. **Check your Python environment** - Ensure you're using the correct environment
2. **Update pip**: `pip install --upgrade pip`
3. **Try conda**: If using Anaconda/Miniconda: `conda install package_name`
4. **Manual installation**: Download and install packages manually if needed

### Manual Installation Commands

If the notebook installation fails, you can run these commands in your terminal:

```bash
pip install numpy pandas matplotlib seaborn
pip install scikit-learn scipy
pip install gensim fasttext
pip install khmernltk
pip install tqdm joblib
```

---