# Shartnoma URL va PDF fayllarni avtomatik yuklash

Ushbu notebook `https://xarid.uzex.uz` saytidan shartnoma URL manzillarini yig'ish va ularga tegishli PDF fayllarni yuklab olishni avtomatlashtiradi. Kod modulli, tushunarli va xatolarni samarali boshqaradi. Quyida har bir qadamning maqsadi va ketma-ketligi batafsil tushuntiriladi.

## Maqsad
- Saytdagi shartnoma ro'yxatidan eng ko'p bilan 10 ta lotning detallari sahifalariga URL manzillarni olish.
- Har bir URL uchun shartnoma PDF faylini yuklab olish va noyob nom bilan saqlash.

## Talablar
- **Python 3.9+** muhiti.
- **Selenium** paketi: `pip install selenium`.
- **ChromeDriver**: Chrome brauzer bilan mos versiyasi o'rnatilgan bo'lishi kerak.
- **Papkalar tuzilishi**: Yuklangan PDF fayllar `Contracts/xariduz` papkasiga saqlanadi.

## Jarayonning umumiy ketma-ketligi
1. Chrome WebDriver-ni sozlash va brauzer sozlamalarini o'rnatish.
2. Saytga kirib, agar modal oyna chiqsa, uni yopish.
3. Lot ro'yxatidan detallar sahifalarining URL manzillarini yig'ish.
4. Har bir URL uchun shartnoma PDF faylini yuklab olish va noyob nom bilan saqlash.
5. Resurslarni tozalash (brauzerni yopish).


## 1-qadam: Chrome WebDriver sozlash

Bu qismda Selenium uchun Chrome WebDriver sozlanadi. Headless rejimi (brauzer oynasiz ishlash) va PDF fayllarni saqlash uchun maxsus sozlamalar qo'shiladi.


In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, ElementNotInteractableException, NoSuchElementException

def setup_chrome_driver(headless=True, download_dir=None):
    """Chrome WebDriver-ni sozlash va maxsus sozlamalarni qo'shish."""
    chrome_opts = webdriver.ChromeOptions()
    if headless:
        chrome_opts.add_argument('--headless=new')
        chrome_opts.add_argument('--disable-gpu')
    chrome_opts.add_argument('--window-size=1920,1080')
    chrome_opts.add_argument('--no-sandbox')
    chrome_opts.add_argument('--disable-dev-shm-usage')
    chrome_opts.add_argument(
        'user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
        '(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
    )
    if download_dir:
        prefs = {
            'download.default_directory': str(download_dir),
            'download.prompt_for_download': False,
            'download.directory_upgrade': True,
            'plugins.always_open_pdf_externally': True
        }
        chrome_opts.add_experimental_option('prefs', prefs)
    return webdriver.Chrome(options=chrome_opts)


## 2-qadam: Modal oynani yopish

Saytda ochilishi mumkin bo'lgan modal oynalarni (masalan, xabar yoki reklama oynalari) yopish uchun maxsus funksiya. Modal oynada 'Bekor', 'Close' yoki shunga o'xshash tugmalar bosiladi.


In [None]:
def handle_modal(driver, timeout=5):
    """Saytdagi modal oynani yopish."""
    try:
        modal = WebDriverWait(driver, timeout).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, '.modal-content'))
        )
        btn_xpath = (
            ".//button[contains(., 'Отмена') or contains(., 'Bekor') or contains(., 'Oʻtmen') "
            "or contains(., 'Close') or contains(., 'Yopish')]"
        )
        cancel_btn = modal.find_element(By.XPATH, btn_xpath)
        WebDriverWait(driver, timeout).until(lambda d: cancel_btn.is_displayed() and cancel_btn.is_enabled())
        try:
            cancel_btn.click()
        except ElementNotInteractableException:
            driver.execute_script('arguments[0].click();', cancel_btn)
        WebDriverWait(driver, timeout).until(EC.staleness_of(modal))
    except (TimeoutException, NoSuchElementException):
        pass  # Modal bo'lmasa, davom etamiz


## 3-qadam: Shartnoma URL manzillarini yig'ish

Bu funksiya saytdagi lot ro'yxatidan har bir lotning detallar sahifasiga olib boruvchi URL manzillarni yig'adi. Eng ko'p bilan 10 ta lot uchun URL olinadi.


In [None]:
import time

def get_detail_page_urls(list_url, max_items=10, headless=True, timeout=15):
    """Shartnoma detallari sahifalarining URL manzillarini yig'ish."""
    driver = setup_chrome_driver(headless=headless)
    try:
        driver.get(list_url)
        handle_modal(driver)

        # Lot ro'yxati yuklanishini kutamiz
        WebDriverWait(driver, timeout).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, '.lot-list .lot-item'))
        )
        items = driver.find_elements(By.CSS_SELECTOR, '.lot-list .lot-item')[:max_items]
        detail_urls = []
        original_window = driver.current_window_handle

        for idx, item in enumerate(items):
            try:
                btn = item.find_element(By.CSS_SELECTOR, 'a.btn.btn-lg.btn-primary')
                before_windows = set(driver.window_handles)
                btn.click()

                # Yangi oyna yoki URL o'zgarishini kutamiz
                WebDriverWait(driver, timeout).until(
                    lambda d: len(d.window_handles) != len(before_windows) or d.current_url != list_url
                )
                new_windows = set(driver.window_handles) - before_windows

                if new_windows:
                    driver.switch_to.window(new_windows.pop())
                    WebDriverWait(driver, timeout).until(lambda d: d.current_url and d.current_url != 'about:blank')
                    detail_urls.append(driver.current_url)
                    driver.close()
                    driver.switch_to.window(original_window)
                else:
                    WebDriverWait(driver, timeout).until(EC.url_changes(list_url))
                    detail_urls.append(driver.current_url)
                    driver.back()
                    WebDriverWait(driver, timeout).until(
                        EC.presence_of_element_located((By.CSS_SELECTOR, '.lot-list .lot-item'))
                    )
                time.sleep(0.5)  # Serverga yuklama tushirmaslik uchun
            except Exception as e:
                print(f"#{idx+1}-lot: Xato yuz berdi! Sabab: {e}")
                continue

        return detail_urls

    finally:
        driver.quit()


## 4-qadam: PDF fayllarni yuklab olish

Har bir shartnoma URL manzili uchun PDF faylni yuklab oladi va uni noyob nom bilan saqlaydi. Fayl `Contracts/xariduz` papkasiga saqlanadi.


In [None]:
from pathlib import Path
import uuid
from datetime import datetime

def download_contract_pdf(detail_url, save_folder='Contracts/xariduz', headless=True, timeout=30):
    """Shartnoma PDF faylini yuklab olish va noyob nom bilan saqlash."""
    save_path = Path(save_folder)
    save_path.mkdir(parents=True, exist_ok=True)

    driver = setup_chrome_driver(headless=headless, download_dir=save_path.resolve())
    try:
        driver.get(detail_url)
        handle_modal(driver)

        # Yuklash tugmasini kutamiz
        btn = WebDriverWait(driver, timeout).until(
            EC.element_to_be_clickable((By.XPATH, "//button[contains(normalize-space(.), 'Faylni yuklab olish')]"))
        )

        # Mavjud fayllarni eslab qolamiz
        before = set(save_path.glob('*.pdf'))
        before_tmp = set(save_path.glob('*.crdownload'))
        btn.click()

        # Yangi fayl paydo bo'lishini kutamiz
        deadline = time.time() + timeout
        new_file = None
        while time.time() < deadline:
            after = set(save_path.glob('*.pdf'))
            tmp = set(save_path.glob('*.crdownload'))
            diff = after - before
            if diff:
                new_file = max(diff, key=lambda f: f.stat().st_ctime)
                break
            elif tmp - before_tmp:
                time.sleep(0.5)  # Yuklash tugashini kutamiz
            else:
                time.sleep(0.5)

        if not new_file:
            raise RuntimeError(f"{timeout} soniyada yangi PDF fayl topilmadi.")

        # Faylni noyob nom bilan qayta nomlash
        ts = datetime.now().strftime('%Y%m%d_%H%M%S')
        uid = uuid.uuid4().hex
        new_name = f'contract_{ts}_{uid}.pdf'
        dest = save_path / new_name
        new_file.rename(dest)
        return dest

    finally:
        driver.quit()


## 5-qadam: Asosiy jarayon

Bu qismda yuqoridagi funksiyalar birlashtirilib, URL manzillarni yig'ish va PDF fayllarni yuklab olish jarayoni amalga oshiriladi. Faqat birinchi ikkita URL uchun PDF yuklanadi (misol sifatida).


In [None]:
# Asosiy jarayon
list_url = 'https://xarid.uzex.uz/purchase/e-direct-purchase/list'
print('URL manzillarni yig\'ish boshlandi...')
urls = get_detail_page_urls(list_url, max_items=10)
print('Topilgan URL manzillar:', urls)

print('\nPDF fayllarni yuklash boshlandi...')
for url in urls[:2]:  # Faqat birinchi ikkita URL uchun
    try:
        print(f'Yuklanmoqda: {url}')
        path = download_contract_pdf(url)
        print(f' → PDF saqlandi: {path}')
    except Exception as e:
        print(f'{url} uchun xato: {e}')


## Xulosa

Ushbu notebook quyidagi afzalliklarga ega:
- **Modullilik**: Har bir qadam alohida funksiyalarda, bu kodni qayta ishlatish va tushunishni osonlashtiradi.
- **Xato boshqaruvi**: Modal oynalar, sahifa yuklanishi va fayl yuklashdagi xatolar mustahkam boshqariladi.
- **Optimallashtirish**: Serverga yuklamani kamaytirish uchun `time.sleep` minimal ishlatildi.
- **Tushunarli**: Har bir qadam uchun o'zbekcha tushuntirishlar qo'shildi.
- **Fayl nomlash**: PDF fayllar noyob nomlar bilan saqlanadi (`contract_YYYYMMDD_HHMMSS_UUID.pdf`).

Agar qo'shimcha savollar yoki o'zgarishlar kerak bo'lsa, xabar bering!
