# Web Scraping Song Data from sendthesong.xyz with Selenium

This notebook demonstrates how to scrape song data from https://sendthesong.xyz/browse using Selenium for dynamic content.


In [20]:
import requests
import pandas as pd
import time
import json

# --- KONFIGURASI ---
API_URL = "https://api.sendthesong.xyz/api/posts"
SEARCH_QUERY = 'adit'  # Ganti dengan query pencarianmu
TARGET_DATA_COUNT = 300
LIMIT_PER_PAGE = 50  # Mengambil 50 data per request agar lebih efisien

HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
    'Referer': 'https://sendthesong.xyz/' 
}

# --- PROSES SCRAPING ---
all_songs_data = []
current_page = 1

print(f"[*] Memulai proses scraping untuk query: '{SEARCH_QUERY}' dengan target {TARGET_DATA_COUNT} data.")

while len(all_songs_data) < TARGET_DATA_COUNT:
    params = {
        'q': SEARCH_QUERY,
        'page': current_page,
        'limit': LIMIT_PER_PAGE
    }

    print(f"[*] Mengambil data halaman {current_page} ({len(all_songs_data)}/{TARGET_DATA_COUNT})...")

    try:
        response = requests.get(API_URL, params=params, headers=HEADERS)

        if response.status_code == 200:
            data = response.json()
            
            # ASUMSI: List lagu ada di dalam key 'data'. 
            # SESUAIKAN NAMA KEY INI ('data') JIKA BERBEDA DENGAN STRUKTUR JSON-MU.
            # Contoh lain mungkin: data['posts'], data['results'], dll.
            new_songs = data.get('data', []) 
            
            if not new_songs:
                print("[!] Halaman tidak berisi data baru atau sudah mencapai halaman terakhir. Menghentikan proses.")
                break
            
            all_songs_data.extend(new_songs)
            current_page += 1

            # Beri jeda 1 detik antar request agar tidak membebani server (good practice)
            time.sleep(1) 
            
        else:
            print(f"[!] Gagal mengambil data di halaman {current_page}. Status Code: {response.status_code}. Menghentikan proses.")
            break
            
    except requests.exceptions.RequestException as e:
        print(f"[!] Terjadi error koneksi: {e}. Menghentikan proses.")
        break

print(f"\n[+] Total data berhasil dikumpulkan: {len(all_songs_data)}")

# --- PROSES KONVERSI & EXPORT KE CSV ---
if all_songs_data:
    try:
        # Mengubah list of dictionaries menjadi Pandas DataFrame
        df = pd.DataFrame(all_songs_data)
        
        # Menentukan nama file output
        output_filename = f'hasil_scraping_{SEARCH_QUERY.replace(" ", "_")}.csv'
        
        # Menyimpan DataFrame ke file CSV
        # index=False agar nomor index dari DataFrame tidak ikut ditulis ke file
        df.to_csv(output_filename, index=False, encoding='utf-8')
        
        print(f"\n[SUCCESS] Data telah berhasil diekspor ke file: '{output_filename}'")
        
    except Exception as e:
        print(f"\n[!] Terjadi error saat mengonversi ke CSV: {e}")
else:
    print("\n[!] Tidak ada data untuk diekspor.")

[*] Memulai proses scraping untuk query: 'adit' dengan target 300 data.
[*] Mengambil data halaman 1 (0/300)...
[*] Mengambil data halaman 2 (15/300)...
[*] Mengambil data halaman 2 (15/300)...
[*] Mengambil data halaman 3 (30/300)...
[*] Mengambil data halaman 3 (30/300)...
[*] Mengambil data halaman 4 (45/300)...
[*] Mengambil data halaman 4 (45/300)...
[*] Mengambil data halaman 5 (60/300)...
[*] Mengambil data halaman 5 (60/300)...
[*] Mengambil data halaman 6 (75/300)...
[*] Mengambil data halaman 6 (75/300)...
[*] Mengambil data halaman 7 (90/300)...
[*] Mengambil data halaman 7 (90/300)...
[*] Mengambil data halaman 8 (105/300)...
[*] Mengambil data halaman 8 (105/300)...
[*] Mengambil data halaman 9 (120/300)...
[*] Mengambil data halaman 9 (120/300)...
[*] Mengambil data halaman 10 (135/300)...
[*] Mengambil data halaman 10 (135/300)...
[*] Mengambil data halaman 11 (150/300)...
[*] Mengambil data halaman 11 (150/300)...
[*] Mengambil data halaman 12 (165/300)...
[*] Mengambil

# Notes

- Increased the wait time for dynamic content to load (20 seconds).
- Added logs to print the page source and count of song cards for debugging.
- If the data still does not appear, inspect the website's structure for changes or use browser developer tools to debug.
- Added a check to verify if song cards are found. If not, the page source is logged for debugging.
- Ensure the CSS selector matches the website's structure. Use browser developer tools to inspect elements.
- If the issue persists, the website may have additional dynamic loading mechanisms.
