Skip to content

feat: subcategory auto-discovery, fyndhörna filter, CSV price drop toggle#103

Merged
blixten85 merged 4 commits intomainfrom
claude/ux-improvements
Apr 28, 2026
Merged

feat: subcategory auto-discovery, fyndhörna filter, CSV price drop toggle#103
blixten85 merged 4 commits intomainfrom
claude/ux-improvements

Conversation

@blixten85
Copy link
Copy Markdown
Owner

Summary

  • Förenklad URL-hantering: URL-fältet är tillbaka till en enda input. Ny checkbox "Auto-discover subcategories" styr om scrapern ska följa kategorilänkar automatiskt — inet.se/komplett/webhallen-templates har den förifylld
  • Fyndhörna-filter: Nytt exclude_link_pattern-fält per config. Produktlänkar som matchar strängen ignoreras helt. Inet.se-templaten sätter /produkt/x som default — filtrerar bort alla open-box-artiklar som orsakade falska prisraset (t.ex. WD Black visade -50% när det egentligen var 18%)
  • CSV price drop toggle: Checkbox "Price drops" bredvid Export CSV-knappen. När den är på läggs kolumnerna Was (SEK) och Drop % till i exporten baserat på price_history
  • Buggfix: urljoin i extract_product använde config['base_url'] som bas — fel efter multi-URL-stödet. Ändrat till page.url

Vad händer med befintliga inet.se-kategori-configs?

De fungerar fortfarande. Men efter merge kan du:

  1. Ta bort alla "Inet.se - X"-configs
  2. Lägga till en ny "Inet.se" via template-knappen — auto-discovery hittar alla kategorier

Test plan

  • Lägg till Inet.se via template → verifiera att dcl scraper loggar kategorier automatiskt utan att du listat dem
  • Verifiera att /produkt/x-artiklar inte dyker upp i produktlistan
  • Exportera CSV utan toggle → 3 kolumner; med toggle → 5 kolumner med prisfall
  • Befintliga single-URL configs fungerar som förut

🤖 Generated with Claude Code

…ggle

- Revert URL input from textarea back to single field — subcategory
  pagination now handles category discovery automatically
- Add "Auto-discover subcategories" checkbox to add form; inet.se/komplett/
  webhallen templates default to subcategory mode with pagination_selector
- Add exclude_link_pattern per config: product URLs containing the pattern
  are silently skipped (inet.se template defaults to /produkt/x to exclude
  open-box fyndhörna items that caused false -50% drop readings)
- Fix urljoin in extract_product: use page.url instead of config base_url
  (base_url may contain multiple lines after multi-URL feature)
- CSV export: add include_drops query param; when enabled adds Was/Drop%
  columns using price_history; UI shows "Price drops" toggle next to Export

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread scraper/scraper.py Fixed
blixten85 and others added 2 commits April 28, 2026 05:54
Replace partial SQL string concatenation with a static lookup table of
four fully-prewritten queries keyed on (include_drops, has_site_name).
No user input ever touches the SQL string itself; site_name is still
passed as a parameterized %s value.

Fixes CodeQL alert py/sql-injection (alert #37).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the pagination_selector was silently set to '' for any site
not matching a known template name, making subcategory auto-discovery
non-functional for custom sites.

- Show/hide 'Category link selector' input when subcategory checkbox is toggled
- loadTemplate populates the field from the template definition
- addConfig reads directly from the input instead of template lookup
- Move csv/StringIO/Response imports to top-level (style fix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread scraper/scraper.py Fixed
site_name (URL param) no longer influences which query is selected —
export_site_csv always uses the site-specific variant keyed on
(include_drops, True), while export_all_csv uses (include_drops, False).
site_name only appears in the parameterized tuple passed to cur.execute.

Removes _build_export_query helper (no longer needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@blixten85 blixten85 merged commit 32a9964 into main Apr 28, 2026
9 checks passed
@blixten85 blixten85 deleted the claude/ux-improvements branch April 28, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants