Skip to content

Conversation

@Abssdghi
Copy link

@Abssdghi Abssdghi commented Nov 24, 2025

Description

This PR adds a brand-new Apple Music Web Scraper capable of scraping:

  • Songs
  • Albums
  • Playlists
  • Artists
  • Music videos
  • Rooms
  • Full search results

It parses Apple Music’s internal serialized-server-data JSON structure and converts it into a clean Python output.
This feature did NOT exist in the repository before and expands the Scrapping/Social Media category significantly.

What’s Included:

  • apple_music_scraper.py – Main scraper logic
  • utils.py – Helper methods (cover resolver, URL converter, etc.)
  • README.md – Full documentation + examples
  • requirements.txt – clean dependency list (requests, beautifulsoup4)

Fixes #none

No existing issue was referenced; this is a brand-new standalone feature.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Documentation Update

Checklist:

  • My code follows the style guidelines(Clean Code) of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have created a helpful and easy to understand README.md
  • My documentation follows Template for README.md
  • I have added the project meta data in the PR template.
  • I have created the requirements.txt file if needed.

Project Metadata

Category:

  • Calculators
  • AI/ML
  • Scrappers
  • Social_Media
  • PDF
  • Image_Processing
  • Video_Processing
  • Games
  • Networking
  • OS_Utilities
  • Automation
  • Cryptography
  • Computer_Vision
  • Fun
  • Others

Title: Apple Music Web Scraper

Folder: Apple-Music-Scraper

Requirements: requirements.txt

Script: apple_music_scraper.py

Arguments: none

Contributor: abssdghi

Description:
A powerful and fully-featured Apple Music scraper that extracts songs, albums, playlists, videos, artist pages, and full search results using Apple Music’s internal structured JSON data.

Summary by Sourcery

Add a new Apple Music web scraper module that extracts structured metadata from various Apple Music web pages using their embedded serialized JSON data.

New Features:

  • Provide scraping functions for Apple Music songs, albums, playlists, artists, music videos, rooms, and search results, returning structured Python dictionaries and URL lists.
  • Add utility helpers for generating full artwork URLs, converting album track URLs to direct song URLs, and collecting all singles and EPs for an artist.
  • Document the Apple Music scraper usage, capabilities, and example workflows in a dedicated README for the new module.
  • Declare bs4 and requests as dependencies for the Apple Music scraping functionality in a requirements file.

@sourcery-ai
Copy link

sourcery-ai bot commented Nov 24, 2025

Reviewer's Guide

Adds a new Apple Music web scraping module that parses Apple Music’s serialized-server-data and JSON-LD blocks to provide structured song, album, playlist, artist, video, room, and search results, supported by shared utilities for artwork URL formatting, URL conversion, and fetching singles/EPs, along with documentation and dependencies.

Sequence diagram for the Apple Music search-to-latest-song workflow

sequenceDiagram
    actor "User" as User
    participant "Client Script" as Client
    participant "main.search()" as Search
    participant "main.artist_scrape()" as ArtistScrape
    participant "main.album_scrape()" as AlbumScrape
    participant "utils.get_cover()" as GetCover
    participant "utils.get_all_singles()" as GetAllSingles
    participant "Apple Music Web" as AppleWeb

    "User" ->> "Client Script": "Call search('night tapes')"
    "Client Script" ->> "main.search()": "search(keyword)"
    "main.search()" ->> "Apple Music Web": "GET https://music.apple.com/us/search?term=keyword"
    "Apple Music Web" -->> "main.search()": "HTML with 'serialized-server-data' script"
    "main.search()" ->> "main.search()": "Parse HTML with BeautifulSoup"
    "main.search()" ->> "main.search()": "json.loads(serialized-server-data)"
    "main.search()" ->> "utils.get_cover()": "Build artwork URL for each result"
    "utils.get_cover()" -->> "main.search()": "Formatted artwork URL"
    "main.search()" -->> "Client Script": "Structured search results dict"

    "Client Script" ->> "main.artist_scrape()": "artist_scrape(artist_url)"
    "main.artist_scrape()" ->> "Apple Music Web": "GET artist page HTML"
    "Apple Music Web" -->> "main.artist_scrape()": "HTML with 'serialized-server-data'"
    "main.artist_scrape()" ->> "main.artist_scrape()": "Parse and extract sections (detail, latest, top, etc.)"
    "main.artist_scrape()" ->> "utils.get_cover()": "Build artist artwork URL"
    "utils.get_cover()" -->> "main.artist_scrape()": "Formatted artwork URL"
    "main.artist_scrape()" ->> "utils.get_all_singles()": "get_all_singles(artist_url)"
    "utils.get_all_singles()" ->> "Apple Music Web": "GET artist/see-all?section=singles"
    "Apple Music Web" -->> "utils.get_all_singles()": "HTML with singles section"
    "utils.get_all_singles()" ->> "utils.get_all_singles()": "Parse serialized-server-data and items"
    "utils.get_all_singles()" -->> "main.artist_scrape()": "List of singles and EP URLs"
    "main.artist_scrape()" -->> "Client Script": "Artist metadata dict (including 'latest' URL)"

    "Client Script" ->> "main.album_scrape()": "album_scrape(latest_song_album_url)"
    "main.album_scrape()" ->> "Apple Music Web": "GET album page HTML"
    "Apple Music Web" -->> "main.album_scrape()": "HTML with 'serialized-server-data'"
    "main.album_scrape()" ->> "main.album_scrape()": "Parse sections (album-detail, track-list, etc.)"
    "main.album_scrape()" ->> "utils.get_cover()": "Build album artwork URL"
    "utils.get_cover()" -->> "main.album_scrape()": "Formatted artwork URL"
    "main.album_scrape()" -->> "Client Script": "Album metadata dict (title, image, songs, more, similar)"

    "Client Script" -->> "User": "Display latest song title and cover art"
Loading

Class diagram for the new Apple Music scraper and utilities

classDiagram
    class MainScraper {
        +room_scrape(link="https://music.apple.com/us/room/6748797380") list~str~
        +playlist_scrape(link="https://music.apple.com/us/playlist/new-music-daily/pl.2b0e6e332fdf4b7a91164da3162127b5") list~str~
        +search(keyword="sasha sloan") dict
        +song_scrape(url="https://music.apple.com/us/song/california/1821538031") dict
        +album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585") dict
        +video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026") dict
        +artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534") dict
    }

    class Utils {
        +get_cover(url, width, height, format="jpg", crop_option="") str
        +convert_album_to_song_url(album_url) str
        +get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534") list~str~
    }

    MainScraper ..> Utils : "uses 'get_cover' for artwork URLs"
    MainScraper ..> Utils : "uses 'convert_album_to_song_url' in room_scrape, playlist_scrape, album_scrape"
    MainScraper ..> Utils : "uses 'get_all_singles' inside artist_scrape"
Loading

Flow diagram for generic Apple Music page scraping using serialized-server-data

flowchart TD
    A["Start scraping function (song_scrape, album_scrape, video_scrape, artist_scrape, room_scrape, playlist_scrape, search"] --> B["Build target Apple Music URL (page-specific)"]
    B["Build target Apple Music URL (page-specific)"] --> C["Set headers with 'User-Agent: Mozilla/5.0'"]
    C["Set headers with 'User-Agent: Mozilla/5.0'"] --> D["requests.get(URL, headers=headers)"]
    D["requests.get(URL, headers=headers)"] --> E["Parse HTML with BeautifulSoup"]
    E["Parse HTML with BeautifulSoup"] --> F{"Find script tag with id 'serialized-server-data'?"}
    F{"Find script tag with id 'serialized-server-data'?"} -->|"Yes"| G["Extract script text and load JSON via json.loads"]
    F{"Find script tag with id 'serialized-server-data'?"} -->|"No"| Z["Return empty or partial result (error or structure change)"]
    G["Extract script text and load JSON via json.loads"] --> H["Access our_json[0]['data']['sections']"]
    H["Access our_json[0]['data']['sections']"] --> I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"}
    I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"} --> J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"]
    J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"] --> K{"Artwork present in item?"}
    K{"Artwork present in item?"} -->|"Yes"| L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"]
    K{"Artwork present in item?"} -->|"No"| M["Set artwork field to empty string"]
    L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"] --> N["Attach formatted artwork URL to result object"]
    M["Set artwork field to empty string"] --> N["Attach formatted artwork URL to result object"]
    N["Attach formatted artwork URL to result object"] --> O{"Needs additional JSON-LD (preview or video URL)?"}
    O{"Needs additional JSON-LD (preview or video URL)?"} -->|"Yes"| P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"]
    O{"Needs additional JSON-LD (preview or video URL)?"} -->|"No"| R["Skip JSON-LD step"]
    P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"] --> Q["Extract preview or video content URL and add to result"]
    Q["Extract preview or video content URL and add to result"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
    R["Skip JSON-LD step"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
    S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"] --> T["Return JSON-like Python structure to caller"]
    Z["Return empty or partial result (error or structure change)"] --> T["Return JSON-like Python structure to caller"]
Loading

File-Level Changes

Change Details Files
Introduce a main Apple Music scraping module that exposes high-level scraping functions for different Apple Music entities.
  • Implement room_scrape and playlist_scrape to extract track URLs from room and playlist pages by parsing serialized-server-data sections and converting album-track URLs to song URLs.
  • Implement search to query Apple Music’s search endpoint, parse sectioned results (artists, albums, songs, playlists, videos), and normalize them into structured dictionaries including optional artwork URLs.
  • Implement song_scrape to extract detailed song metadata (title, artwork, album/artist info, preview URL, and related songs) using serialized-server-data and schema:song JSON-LD.
  • Implement album_scrape to collect album metadata, track song URLs, description, artist info, related albums, videos, and “more by artist” sections using multiple identified sections within serialized-server-data.
  • Implement video_scrape to fetch music-video metadata, artwork, artist info, direct video URL, and related content via serialized-server-data sections and schema:music-video JSON-LD.
  • Implement artist_scrape to aggregate rich artist data including latest release, top songs, albums, singles/EPs, playlists, videos, similar artists, appearances, and bio fields from multiple serialized-server-data sections, delegating singles/EP retrieval to a helper.
Apple-Music-Scraper/main.py
Add shared utilities to support artwork URL resolution, URL normalization, and singles retrieval for artists.
  • Implement get_cover to transform Apple Music artwork template URLs by replacing width, height, format, and crop placeholders with concrete values.
  • Implement convert_album_to_song_url to derive canonical song URLs from album track URLs by reading the i query parameter and reconstructing the path as a /song/ URL.
  • Implement get_all_singles to fetch and parse the artist’s singles section via the /see-all?section=singles endpoint and return all single/EP URLs.
Apple-Music-Scraper/utils.py
Document the new Apple Music scraper module and declare its external dependencies.
  • Create README describing scraper purpose, capabilities, setup, and usage example, including explanation of serialized-server-data parsing and JSON-shaped outputs.
  • Add requirements.txt listing bs4 and requests as the only dependencies for the scraper.
Apple-Music-Scraper/README.md
Apple-Music-Scraper/requirements.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • There are many try/except: pass blocks throughout the scraper; it would be more robust to catch specific exceptions (e.g., KeyError, IndexError, JSONDecodeError) and optionally log or default values so that real failures aren’t silently swallowed.
  • HTTP requests currently don’t specify timeouts or handle network-level errors; consider adding a shared request helper with a reasonable timeout and basic error handling/retries to avoid hanging or crashing on transient network issues.
  • In several places when extracting artwork (e.g., in search() for artists/albums/songs/playlists/videos), you access i[0]['artwork'] instead of i['artwork'], which is likely a typo and causes exceptions that are then swallowed—clean this up so artwork URLs are reliably parsed.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There are many `try/except: pass` blocks throughout the scraper; it would be more robust to catch specific exceptions (e.g., `KeyError`, `IndexError`, `JSONDecodeError`) and optionally log or default values so that real failures aren’t silently swallowed.
- HTTP requests currently don’t specify timeouts or handle network-level errors; consider adding a shared request helper with a reasonable timeout and basic error handling/retries to avoid hanging or crashing on transient network issues.
- In several places when extracting artwork (e.g., in `search()` for artists/albums/songs/playlists/videos), you access `i[0]['artwork']` instead of `i['artwork']`, which is likely a typo and causes exceptions that are then swallowed—clean this up so artwork URLs are reliably parsed.

## Individual Comments

### Comment 1
<location> `Apple-Music-Scraper/main.py:123` </location>
<code_context>
+        'playlists':[],
+        'videos':[]
+    }
+    link = "https://music.apple.com/us/search?term="+keyword
+    
+    headers = {
</code_context>

<issue_to_address>
**issue:** The search keyword should be URL-encoded before being concatenated into the query string.

Directly concatenating the raw keyword will break searches for values with spaces, &, +, or non-ASCII characters. Use proper URL encoding, e.g. `urllib.parse.quote_plus(keyword)` and `f"https://music.apple.com/us/search?term={quote_plus(keyword)}"`, so the search works for arbitrary input.
</issue_to_address>

### Comment 2
<location> `Apple-Music-Scraper/main.py:155-156` </location>
<code_context>
+            try:                
+                image_url = i['artwork']['dictionary']['url']
+                image_width = i['artwork']['dictionary']['width']
+                image_height = i[0]['artwork']['dictionary']['height']
+                artwork = get_cover(image_url, image_width, image_height)
+            except:
+                artwork = ""
</code_context>

<issue_to_address>
**issue (bug_risk):** Artwork height is indexed via `i[0]` instead of `i`, which is likely a bug and will raise at runtime.

Since `i` is the item dict from `for i in artists['items']:`, `i[0]` will fail (TypeError/KeyError) and be swallowed by the bare `except`, causing artwork to be dropped even when present. The same issue appears in the albums, songs, playlists, and videos sections. Accessing `i['artwork']['dictionary']['height']` (like width) avoids this failure and preserves artwork where available.
</issue_to_address>

### Comment 3
<location> `Apple-Music-Scraper/main.py:395-403` </location>
<code_context>
+    for i in sections:
+        if "album-detail" in i['id']:
+            album_detail_index = index
+        elif "track-list " in i['id']:
+            track_list_index = index
+        elif "video" in i['id']:
+            video_index = index
+        elif "more" in i['id']:
+            more_index = index
+        elif "you-might-also-like" in i['id']:
+            similar_index = index
+        elif "track-list-section" in i['id']:
+            track_list_section_index = index
+        index+=1
</code_context>

<issue_to_address>
**issue (bug_risk):** The `"track-list "` check includes a trailing space, which likely prevents matching the intended section.

Because of that trailing space, `"track-list " in i['id']` will likely never match, so `track_list_index` may never be set and the later `sections[track_list_index]` access will always fall into the `except` path. Consider matching `"track-list"` instead, and preferably use a stricter check like equality or `startswith` rather than a substring search to make this more robust.
</issue_to_address>

### Comment 4
<location> `Apple-Music-Scraper/main.py:147-156` </location>
<code_context>
+        elif "music_video" in i['id']:
+            videos = i
+
+    try:
+        artists_result = []
+        
+        for i in artists['items']:
+            artist = i['title']
+            try:                
+                image_url = i['artwork']['dictionary']['url']
+                image_width = i['artwork']['dictionary']['width']
+                image_height = i[0]['artwork']['dictionary']['height']
+                artwork = get_cover(image_url, image_width, image_height)
+            except:
+                artwork = ""
+
+            url = i['contentDescriptor']['url']
+            artists_result.append({'title':artist, 'url':url, 'image':artwork})
+        result['artists'] = artists_result
+        
+    except:
+        pass
+
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** The widespread use of bare `except:` blocks hides real errors and makes debugging difficult.

Several blocks here wrap large sections of logic in `try: ... except: pass`. This will hide real programming errors (e.g., `KeyError`, `TypeError`, `NameError`), not just missing optional fields, and can silently degrade the scraper’s output. Please catch only the specific exceptions you expect (e.g., `KeyError` / `IndexError` for missing fields), or refactor into smaller helpers with targeted error handling to keep failures visible while still allowing for genuinely optional data.

Suggested implementation:

```python
    # Build artists result list with targeted error handling so that missing
    # optional fields don't hide real programming errors.
    artists_result = []

    # Safely get the items list; if artists is None or not a dict, fall back to empty.
    try:
        artist_items = artists.get("items", []) if artists is not None else []
    except AttributeError:
        artist_items = []

    for item in artist_items:
        # Title is expected to be present; if not, skip this item rather than
        # hiding a KeyError in a broad try/except.
        try:
            artist_title = item["title"]
        except (KeyError, TypeError):
            continue

        # Artwork is optional; if any of the nested keys are missing or the
        # structure is unexpected, fall back to an empty artwork string.
        artwork = ""
        try:
            artwork_dict = item["artwork"]["dictionary"]
            image_url = artwork_dict["url"]
            image_width = artwork_dict["width"]
            image_height = artwork_dict["height"]
            artwork = get_cover(image_url, image_width, image_height)
        except (KeyError, TypeError):
            artwork = ""

        # URL is required for a valid artist entry; if missing, skip the item.
        try:
            url = item["contentDescriptor"]["url"]
        except (KeyError, TypeError):
            continue

        artists_result.append(
            {
                "title": artist_title,
                "url": url,
                "image": artwork,
            }
        )

    if artists_result:
        result["artists"] = artists_result

```

- If there are other similar blocks for albums, songs, playlists, or videos that use broad `try: ... except: pass`, they should be updated in the same style:
  - Remove outer broad `try/except` that swallows all exceptions.
  - Use small, focused `try/except` blocks around genuinely optional fields (e.g., nested artwork dictionaries) and catch only `KeyError`, `TypeError`, or `AttributeError` as appropriate.
  - For required fields (such as URLs or titles needed to build a valid result entry), either let the exception surface or `continue`/skip that specific item while still logging or handling it explicitly if the codebase has a logging convention.
</issue_to_address>

### Comment 5
<location> `Apple-Music-Scraper/utils.py:79-88` </location>
<code_context>
+
+    return f"https://music.apple.com/{country}/song/{title}/{song_id}"
+
+def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"):
+    """
+    Fetch all singles & EP URLs from an Apple Music artist page.
+
+    Parameters
+    ----------
+    url : str, optional
+        Base artist page URL. Defaults to the sample King Princess artist link.
+
+    Returns
+    -------
+    list[str]
+        A list of Apple Music URLs for all singles & EPs for the artist.
+
+    Notes
+    -----
+    - Apple Music loads singles under the `/see-all?section=singles` endpoint.
+    - This function retrieves the serialized server data, parses the `items` section,
+      and extracts the correct song/EP URLs.
+    - Used internally by `artist_scrape()`.
+    """
+    result = []
+    url = url+"/see-all?section=singles"
+    
+    headers = {
</code_context>

<issue_to_address>
**nitpick:** Simple string concatenation for `url` can produce malformed URLs if the base has a trailing slash.

If the caller passes an artist URL with a trailing slash (e.g. `.../1349968534/`), this becomes `.../1349968534//see-all?section=singles`. To avoid malformed URLs, either strip any trailing slash before appending the path segment or use `urllib.parse.urljoin`.
</issue_to_address>

### Comment 6
<location> `Apple-Music-Scraper/main.py:123` </location>
<code_context>
def search(keyword="sasha sloan"):
    """
    Search Apple Music for artists, songs, albums, playlists and videos.

    Parameters
    ----------
    keyword : str, optional
        Search query to send to Apple Music. Defaults to "sasha sloan".

    Returns
    -------
    dict
        Structured JSON-like dictionary containing search results:
        - artists
        - albums
        - songs
        - playlists
        - videos

    Notes
    -----
    Scrapes `serialized-server-data` to access Apple Music's internal search structure.
    """
    result = {
        'artists':[],
        'albums':[],
        'songs':[],
        'playlists':[],
        'videos':[]
    }
    link = "https://music.apple.com/us/search?term="+keyword

    headers = {
        "User-Agent": "Mozilla/5.0"
    }

    rspn = requests.get(link, headers=headers)
    soup = BeautifulSoup(rspn.text, "html.parser")
    items = soup.find('script', {'id': 'serialized-server-data'})
    our_json = json.loads(items.text)
    sections = our_json[0]['data']['sections']

    for i in sections:
        if "artist" in i['id']:
            artists = i
        elif "album" in i['id']:
            albums = i
        elif "song" in i['id']:
            songs = i
        elif "playlist" in i['id']:
            playlists = i
        elif "music_video" in i['id']:
            videos = i

    try:
        artists_result = []

        for i in artists['items']:
            artist = i['title']
            try:                
                image_url = i['artwork']['dictionary']['url']
                image_width = i['artwork']['dictionary']['width']
                image_height = i[0]['artwork']['dictionary']['height']
                artwork = get_cover(image_url, image_width, image_height)
            except:
                artwork = ""

            url = i['contentDescriptor']['url']
            artists_result.append({'title':artist, 'url':url, 'image':artwork})
        result['artists'] = artists_result

    except:
        pass


    try:
        albums_result = []

        for i in albums['items']:
            song = i['titleLinks'][0]['title']
            artist = i['subtitleLinks'][0]['title']
            try:
                image_url = i['artwork']['dictionary']['url']
                image_width = i['artwork']['dictionary']['width']
                image_height = i[0]['artwork']['dictionary']['height']
                artwork = get_cover(image_url, image_width, image_height)
            except:
                artwork = ""

            url = i['contentDescriptor']['url']
            albums_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
        result['albums'] = albums_result

    except:
        pass


    try:
        songs_result = []

        for i in songs['items']:
            song = i['title']
            artist = i['subtitleLinks'][0]['title']
            try:
                image_url = i['artwork']['dictionary']['url']
                image_width = i['artwork']['dictionary']['width']
                image_height = i[0]['artwork']['dictionary']['height']
                artwork = get_cover(image_url, image_width, image_height)
            except:
                artwork = ""

            url = i['contentDescriptor']['url']
            songs_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
        result['songs'] = songs_result
    except:
        pass



    try:
        playlists_result = []

        for i in playlists['items']:
            song = i['titleLinks'][0]['title']
            artist = i['subtitleLinks'][0]['title']
            try:
                image_url = i['artwork']['dictionary']['url']
                image_width = i['artwork']['dictionary']['width']
                image_height = i[0]['artwork']['dictionary']['height']
                artwork = get_cover(image_url, image_width, image_height)
            except:
                artwork = ""

            url = i['contentDescriptor']['url']
            playlists_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
        result['playlists'] = playlists_result
    except:
        pass


    try:
        videos_results = []

        for i in videos['items']:
            song = i['titleLinks'][0]['title']
            artist = i['subtitleLinks'][0]['title']
            try:
                image_url = i['artwork']['dictionary']['url']
                image_width = i['artwork']['dictionary']['width']
                image_height = i[0]['artwork']['dictionary']['height']
                artwork = get_cover(image_url, image_width, image_height)
            except:
                artwork = ""

            url = i['contentDescriptor']['url']
            videos_results.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
        result['videos'] = videos_results
    except:
        pass

    return result

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Extract duplicate code into function ([`extract-duplicate-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-duplicate-method/))
- Use `except Exception:` rather than bare `except:` [×6] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
</issue_to_address>

### Comment 7
<location> `Apple-Music-Scraper/main.py:256` </location>
<code_context>
def song_scrape(url="https://music.apple.com/us/song/california/1821538031"):
    """
    Scrape a single Apple Music song page and extract metadata.

    Parameters
    ----------
    url : str, optional
        URL of the Apple Music song. Defaults to sample link.

    Returns
    -------
    dict
        Dictionary containing:
        - title
        - image (full resolution)
        - kind (song type)
        - album info (title + URL)
        - artist info (title + URL)
        - preview-url
        - list of more songs

    Notes
    -----
    Uses the `schema:song` JSON-LD tag to extract preview URL.
    """
    result = {
        'title':'',
        'image':'',
        'kind':'',
        'album': {
            'title':'',
            'url':''
        },
        'artist': {
            'title':'',
            'url':''
        },
        'more':[],
        'preview-url':''
    }

    rspn = requests.get(url)
    soup = BeautifulSoup(rspn.text, "html.parser")
    items = soup.find('script', {'id': 'serialized-server-data'})
    our_json = json.loads(items.text)

    song_details = our_json[0]['data']['sections'][0] 

    result['title'] = song_details['items'][0]['title']

    image_url = song_details['items'][0]['artwork']['dictionary']['url']
    image_width = song_details['items'][0]['artwork']['dictionary']['width']
    image_height = song_details['items'][0]['artwork']['dictionary']['height']

    result['image'] = get_cover(image_url, image_width, image_height)

    result['kind'] = song_details['presentation']['kind']
    result['album']['title'] = song_details['items'][0]['album']
    result['album']['url'] = song_details['items'][0]['albumLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
    result['artist']['title'] = song_details['items'][0]['artists']
    result['artist']['url'] = song_details['items'][0]['artistLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']

    json_tag = soup.find("script", {"id": "schema:song", "type": "application/ld+json"})
    data = json.loads(json_tag.string)

    preview_url = data['audio']['audio']['contentUrl']
    result['preview-url'] = preview_url

    more_songs = our_json[0]['data']['sections'][-1]['items']

    more_songs_list = []

    for i in more_songs:
        more_songs_list.append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])

    result['more'] = more_songs_list

    return result

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Convert for loop into list comprehension ([`list-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/list-comprehension/))
- Move assignment closer to its usage within a block [×2] ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Merge dictionary assignment with declaration [×2] ([`merge-dict-assign`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/merge-dict-assign/))
</issue_to_address>

### Comment 8
<location> `Apple-Music-Scraper/main.py:334` </location>
<code_context>
def album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585"):
    """
    Scrape an Apple Music album page and extract metadata, songs, related albums, videos, etc.

    Parameters
    ----------
    url : str, optional
        URL of the Apple Music album. Defaults to example album.

    Returns
    -------
    dict
        Dictionary containing:
        - title
        - image
        - caption/description
        - artist info
        - song URLs
        - album info text
        - more songs (same artist)
        - similar (recommended) albums
        - videos related to the album

    Notes
    -----
    Extracts multiple sections such as:
    - album-detail
    - track-list
    - similar albums
    - more by artist
    - album videos
    """
    result = {
        'title':'',
        'image':'',
        'caption':'',
        'artist': {
            'title':'',
            'url':''
        },
        'songs':[],
        'info':'',
        'more':[],
        'similar':[],
        'videos':[]
    }

    headers = {
        "User-Agent": "Mozilla/5.0"
    }

    rspn = requests.get(url, headers=headers)
    soup = BeautifulSoup(rspn.text, "html.parser")
    items = soup.find('script', {'id': 'serialized-server-data'})
    our_json = json.loads(items.text)
    sections = our_json[0]['data']['sections']

    index=0
    for i in sections:
        if "album-detail" in i['id']:
            album_detail_index = index
        elif "track-list " in i['id']:
            track_list_index = index
        elif "video" in i['id']:
            video_index = index
        elif "more" in i['id']:
            more_index = index
        elif "you-might-also-like" in i['id']:
            similar_index = index
        elif "track-list-section" in i['id']:
            track_list_section_index = index
        index+=1

    try:
        result['title'] = sections[album_detail_index]['items'][0]['title']
    except:
        pass

    try:
        image_url = sections[album_detail_index]['items'][0]['artwork']['dictionary']['url']
        image_width = sections[album_detail_index]['items'][0]['artwork']['dictionary']['width']
        image_height = sections[album_detail_index]['items'][0]['artwork']['dictionary']['height']
        result['image'] = get_cover(image_url, image_width, image_height)
    except:
        pass

    try:
        result['caption'] = sections[album_detail_index]['items'][0]['modalPresentationDescriptor']['paragraphText']
    except:
        pass

    try:
        result['artist']['title'] = sections[album_detail_index]['items'][0]['subtitleLinks'][0]['title']
        result['artist']['url'] = sections[album_detail_index]['items'][0]['subtitleLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
    except:
        pass

    try:
        album_songs = sections[track_list_index]['items']
        for i in album_songs:
            result['songs'].append(convert_album_to_song_url(i['contentDescriptor']['url']))
    except:
        pass

    try:
        result['info'] = sections[track_list_section_index]['items'][0]['description']
        more_songs = sections[more_index]['items']
        for i in more_songs:
            result['more'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        similar_songs = sections[similar_index]['items']
        for i in similar_songs:
            result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        videos = sections[video_index]['items']
        for i in videos:
            result['videos'].append(i['contentDescriptor']['url'])
    except:
        pass

    return result

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Replace manual loop counter with call to enumerate ([`convert-to-enumerate`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/convert-to-enumerate/))
- Use `except Exception:` rather than bare `except:` [×8] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
- Low code quality found in album\_scrape - 21% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>








The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

### Comment 9
<location> `Apple-Music-Scraper/main.py:518` </location>
<code_context>
def video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026"):
    """
    Scrape Apple Music music-video page and extract metadata + video file URL.

    Parameters
    ----------
    url : str, optional
        URL of the Apple Music music-video. Defaults to example.

    Returns
    -------
    dict
        {
            title,
            image,
            artist: {title, url},
            video-url,
            more (same artist),
            similar (same genre)
        }

    Notes
    -----
    Uses JSON-LD block `schema:music-video` to extract the direct video content URL.
    """
    result = {
        'title': '',
        'image': '',
        'artist': {
            'title': '',
            'url': ''
        },
        'video-url': '',
        'more': [],
        'similar':[]
    }

    headers = {
        "User-Agent": "Mozilla/5.0"
    }

    res = requests.get(url, headers=headers)
    soup = BeautifulSoup(res.text, "html.parser")
    items = soup.find('script', {'id': 'serialized-server-data'})
    our_json = json.loads(items.text)

    sections = our_json[0]['data']['sections']

    for i in sections:
        if "music-video-header" in i['id']:
            music_video_header = i
        elif "more-by-artist" in i['id']:
            more = i
        elif "more-in-genre" in i['id']:
            similar = i

    try:
        result['title'] = music_video_header['items'][0]['title']
    except:
        pass

    try:
        image_url = music_video_header['items'][0]['artwork']['dictionary']['url']
        image_width = music_video_header['items'][0]['artwork']['dictionary']['width']
        image_height = music_video_header['items'][0]['artwork']['dictionary']['height']
        result['image'] = get_cover(image_url, image_width, image_height)
    except:
        pass

    try:
        result['artist']['title'] = music_video_header['items'][0]['subtitleLinks'][0]['title']
        result['artist']['url'] = music_video_header['items'][0]['subtitleLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
    except:
        pass

    try:
        json_tag = soup.find("script", {"id": "schema:music-video", "type": "application/ld+json"})
        data = json.loads(json_tag.string)
        result['video-url'] = data['video']['contentUrl']
    except:
        pass

    try:
        for i in more['items']:
            result['more'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        for i in similar['items']:
            result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    return result

</code_context>

<issue_to_address>
**issue (code-quality):** Use `except Exception:` rather than bare `except:` [×6] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
</issue_to_address>

### Comment 10
<location> `Apple-Music-Scraper/main.py:558` </location>
<code_context>
def artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534"):
    """
    Scrape an Apple Music artist page and extract all available metadata.

    Parameters
    ----------
    url : str, optional
        Apple Music artist page URL. Defaults to King Princess sample link.

    Returns
    -------
    dict
        Dictionary containing:
        - title
        - image
        - latest release URL
        - list of top songs
        - all albums
        - singles & EPs
        - playlists
        - videos
        - similar artists
        - appears on
        - more-to-see (videos)
        - more-to-hear (songs)
        - about text
        - extra info (bio subtitle)

    Notes
    -----
    This is the most complex scraper and extracts ~12 different sections 
    from the artist page.
    """
    result = {
        'title':'',
        'image':'',
        'latest':'',
        'top':[],
        'albums':[],
        'singles_and_EP':[],
        'playlists':[],
        'videos':[],
        'similar':[],
        'appears_on':[],
        'more_to_see':[],
        'more_to_hear':[],
        'about':'',
        'info':'',
    }

    headers = {
        "User-Agent": "Mozilla/5.0"
    }

    rspn = requests.get(url, headers=headers)
    soup = BeautifulSoup(rspn.text, "html.parser")
    items = soup.find('script', {'id': 'serialized-server-data'})
    our_json = json.loads(items.text)

    sections = our_json[0]['data']['sections']

    for i in sections:
        if "artist-detail-header-section" in i['id']:
            artist_detail = i
        elif "latest-release-and-top-songs" in i['id']:
            latest_and_top = i
        elif "full-albums" in i['id']:
            albums = i
        elif "playlists" in i['id']:
            playlists = i
        elif "music-videos" in i['id']:
            videos = i
        elif "singles" in i['id']:
            singles = i
        elif "appears-on" in i['id']:
            appears_on = i
        elif "more-to-see" in i['id']:
            more_to_see = i
        elif "more-to-hear" in i['id']:
            more_to_hear = i
        elif "artist-bio" in i['id']:
            bio = i
        elif "similar-artists" in i['id']:
            similar = i

    try:
        result['title'] = artist_detail['items'][0]['title']
    except:
        pass

    try:
        image_url = artist_detail['items'][0]['artwork']['dictionary']['url']
        image_width = artist_detail['items'][0]['artwork']['dictionary']['width']
        image_height = artist_detail['items'][0]['artwork']['dictionary']['height']
        result['image'] = get_cover(image_url, image_width, image_height)
    except:
        pass

    try:
        result['latest'] = latest_and_top['pinnedLeadingItem']['item']['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
    except:
        pass

    try:
        for i in latest_and_top['items']:
            result['top'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        for i in albums['items']:
            result['albums'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        result['singles_and_EP'] = get_all_singles(url)
    except:
        pass

    try:
        for i in playlists['items']:
            result['playlists'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        for i in videos['items']:
            result['videos'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        for i in similar['items']:
            result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        for i in appears_on['items']:
            result['appears_on'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        for i in more_to_see['items']:
            result['more_to_see'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        for i in more_to_hear['items']:
            result['more_to_hear'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
    except:
        pass

    try:
        result['about'] = bio['items'][0]['modalPresentationDescriptor']['paragraphText']
    except:
        pass

    try:
        result['info'] = bio['items'][0]['modalPresentationDescriptor']['headerSubtitle']
    except:
        pass

    return result

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use `except Exception:` rather than bare `except:` [×14] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
- Low code quality found in artist\_scrape - 10% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>













The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

### Comment 11
<location> `Apple-Music-Scraper/utils.py:80` </location>
<code_context>
def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"):
    """
    Fetch all singles & EP URLs from an Apple Music artist page.

    Parameters
    ----------
    url : str, optional
        Base artist page URL. Defaults to the sample King Princess artist link.

    Returns
    -------
    list[str]
        A list of Apple Music URLs for all singles & EPs for the artist.

    Notes
    -----
    - Apple Music loads singles under the `/see-all?section=singles` endpoint.
    - This function retrieves the serialized server data, parses the `items` section,
      and extracts the correct song/EP URLs.
    - Used internally by `artist_scrape()`.
    """
    result = []
    url = url+"/see-all?section=singles"

    headers = {
        "User-Agent": "Mozilla/5.0"
    }

    res = requests.get(url, headers=headers)
    soup = BeautifulSoup(res.text, "html.parser")
    items = soup.find('script', {'id': 'serialized-server-data'})
    our_json = json.loads(items.text)

    sections = our_json[0]['data']['sections'][0]['items']

    for i in sections:
        result.append((i['segue']['actionMetrics']['data'][0]['fields']['actionUrl']))

    return result
</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Move assignment closer to its usage within a block ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Convert for loop into list comprehension ([`list-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/list-comprehension/))
- Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

'playlists':[],
'videos':[]
}
link = "https://music.apple.com/us/search?term="+keyword
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The search keyword should be URL-encoded before being concatenated into the query string.

Directly concatenating the raw keyword will break searches for values with spaces, &, +, or non-ASCII characters. Use proper URL encoding, e.g. urllib.parse.quote_plus(keyword) and f"https://music.apple.com/us/search?term={quote_plus(keyword)}", so the search works for arbitrary input.

Comment on lines 395 to 403
elif "track-list " in i['id']:
track_list_index = index
elif "video" in i['id']:
video_index = index
elif "more" in i['id']:
more_index = index
elif "you-might-also-like" in i['id']:
similar_index = index
elif "track-list-section" in i['id']:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The "track-list " check includes a trailing space, which likely prevents matching the intended section.

Because of that trailing space, "track-list " in i['id'] will likely never match, so track_list_index may never be set and the later sections[track_list_index] access will always fall into the except path. Consider matching "track-list" instead, and preferably use a stricter check like equality or startswith rather than a substring search to make this more robust.

Comment on lines +79 to +88
def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"):
"""
Fetch all singles & EP URLs from an Apple Music artist page.

Parameters
----------
url : str, optional
Base artist page URL. Defaults to the sample King Princess artist link.

Returns
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Simple string concatenation for url can produce malformed URLs if the base has a trailing slash.

If the caller passes an artist URL with a trailing slash (e.g. .../1349968534/), this becomes .../1349968534//see-all?section=singles. To avoid malformed URLs, either strip any trailing slash before appending the path segment or use urllib.parse.urljoin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant