Skip to content

Add Microsoft Careers site fetcher#19

Merged
celloopa merged 1 commit intomainfrom
feature/issue-18-microsoft-fetcher
Jan 16, 2026
Merged

Add Microsoft Careers site fetcher#19
celloopa merged 1 commit intomainfrom
feature/issue-18-microsoft-fetcher

Conversation

@celloopa
Copy link
Copy Markdown
Owner

Summary

Closes #18

  • Added specialized extractor for careers.microsoft.com and apply.careers.microsoft.com URLs
  • Parses job data from Next.js __NEXT_DATA__ JSON embedded in page
  • Extracts title, description, qualifications, responsibilities, location, and employment type
  • Falls back to meta tags when JSON parsing fails
  • Validates data to reject numeric company names and empty positions

Changes

  • internal/fetch/fetcher.go - Added extractMicrosoft() and helper functions:
    • parseMicrosoftNextData() - Navigates JSON structure
    • extractMicrosoftJobData() - Extracts job fields
    • validateMicrosoftExtraction() - Data validation
    • isNumeric() - Helper for validation
  • internal/fetch/fetcher_test.go - Added 7 new tests
  • README.md - Added Microsoft Careers to supported job boards
  • CHANGELOG.md - Added feature to Unreleased section
  • PROGRESS.md - Updated task status and added completion log

Test plan

  • All existing tests pass (go test ./...)
  • New tests cover:
    • Full __NEXT_DATA__ JSON parsing
    • Meta tag fallback when no JSON
    • Alternative JSON structures (jobDetail vs job)
    • Array handling for qualifications/responsibilities
    • Data validation (numeric rejection)
    • apply.careers.microsoft.com subdomain support

🤖 Generated with Claude Code

- Add extractMicrosoft() function to parse __NEXT_DATA__ JSON
- Support careers.microsoft.com and apply.careers.microsoft.com
- Extract title, description, qualifications, responsibilities, location
- Fall back to meta tags when JSON parsing fails
- Add validation to reject numeric company names and empty positions
- Add 7 new tests for comprehensive coverage
- Update README with Microsoft Careers in supported job boards
- Update CHANGELOG with feature details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@celloopa celloopa merged commit e47542f into main Jan 16, 2026
@celloopa celloopa deleted the feature/issue-18-microsoft-fetcher branch January 16, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Microsoft Careers site fetcher

1 participant