-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add Apple-Music-Scraper Python Script #3265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideAdds a new Apple Music web scraping module that parses Apple Music’s serialized-server-data and JSON-LD blocks to provide structured song, album, playlist, artist, video, room, and search results, supported by shared utilities for artwork URL formatting, URL conversion, and fetching singles/EPs, along with documentation and dependencies. Sequence diagram for the Apple Music search-to-latest-song workflowsequenceDiagram
actor "User" as User
participant "Client Script" as Client
participant "main.search()" as Search
participant "main.artist_scrape()" as ArtistScrape
participant "main.album_scrape()" as AlbumScrape
participant "utils.get_cover()" as GetCover
participant "utils.get_all_singles()" as GetAllSingles
participant "Apple Music Web" as AppleWeb
"User" ->> "Client Script": "Call search('night tapes')"
"Client Script" ->> "main.search()": "search(keyword)"
"main.search()" ->> "Apple Music Web": "GET https://music.apple.com/us/search?term=keyword"
"Apple Music Web" -->> "main.search()": "HTML with 'serialized-server-data' script"
"main.search()" ->> "main.search()": "Parse HTML with BeautifulSoup"
"main.search()" ->> "main.search()": "json.loads(serialized-server-data)"
"main.search()" ->> "utils.get_cover()": "Build artwork URL for each result"
"utils.get_cover()" -->> "main.search()": "Formatted artwork URL"
"main.search()" -->> "Client Script": "Structured search results dict"
"Client Script" ->> "main.artist_scrape()": "artist_scrape(artist_url)"
"main.artist_scrape()" ->> "Apple Music Web": "GET artist page HTML"
"Apple Music Web" -->> "main.artist_scrape()": "HTML with 'serialized-server-data'"
"main.artist_scrape()" ->> "main.artist_scrape()": "Parse and extract sections (detail, latest, top, etc.)"
"main.artist_scrape()" ->> "utils.get_cover()": "Build artist artwork URL"
"utils.get_cover()" -->> "main.artist_scrape()": "Formatted artwork URL"
"main.artist_scrape()" ->> "utils.get_all_singles()": "get_all_singles(artist_url)"
"utils.get_all_singles()" ->> "Apple Music Web": "GET artist/see-all?section=singles"
"Apple Music Web" -->> "utils.get_all_singles()": "HTML with singles section"
"utils.get_all_singles()" ->> "utils.get_all_singles()": "Parse serialized-server-data and items"
"utils.get_all_singles()" -->> "main.artist_scrape()": "List of singles and EP URLs"
"main.artist_scrape()" -->> "Client Script": "Artist metadata dict (including 'latest' URL)"
"Client Script" ->> "main.album_scrape()": "album_scrape(latest_song_album_url)"
"main.album_scrape()" ->> "Apple Music Web": "GET album page HTML"
"Apple Music Web" -->> "main.album_scrape()": "HTML with 'serialized-server-data'"
"main.album_scrape()" ->> "main.album_scrape()": "Parse sections (album-detail, track-list, etc.)"
"main.album_scrape()" ->> "utils.get_cover()": "Build album artwork URL"
"utils.get_cover()" -->> "main.album_scrape()": "Formatted artwork URL"
"main.album_scrape()" -->> "Client Script": "Album metadata dict (title, image, songs, more, similar)"
"Client Script" -->> "User": "Display latest song title and cover art"
Class diagram for the new Apple Music scraper and utilitiesclassDiagram
class MainScraper {
+room_scrape(link="https://music.apple.com/us/room/6748797380") list~str~
+playlist_scrape(link="https://music.apple.com/us/playlist/new-music-daily/pl.2b0e6e332fdf4b7a91164da3162127b5") list~str~
+search(keyword="sasha sloan") dict
+song_scrape(url="https://music.apple.com/us/song/california/1821538031") dict
+album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585") dict
+video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026") dict
+artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534") dict
}
class Utils {
+get_cover(url, width, height, format="jpg", crop_option="") str
+convert_album_to_song_url(album_url) str
+get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534") list~str~
}
MainScraper ..> Utils : "uses 'get_cover' for artwork URLs"
MainScraper ..> Utils : "uses 'convert_album_to_song_url' in room_scrape, playlist_scrape, album_scrape"
MainScraper ..> Utils : "uses 'get_all_singles' inside artist_scrape"
Flow diagram for generic Apple Music page scraping using serialized-server-dataflowchart TD
A["Start scraping function (song_scrape, album_scrape, video_scrape, artist_scrape, room_scrape, playlist_scrape, search"] --> B["Build target Apple Music URL (page-specific)"]
B["Build target Apple Music URL (page-specific)"] --> C["Set headers with 'User-Agent: Mozilla/5.0'"]
C["Set headers with 'User-Agent: Mozilla/5.0'"] --> D["requests.get(URL, headers=headers)"]
D["requests.get(URL, headers=headers)"] --> E["Parse HTML with BeautifulSoup"]
E["Parse HTML with BeautifulSoup"] --> F{"Find script tag with id 'serialized-server-data'?"}
F{"Find script tag with id 'serialized-server-data'?"} -->|"Yes"| G["Extract script text and load JSON via json.loads"]
F{"Find script tag with id 'serialized-server-data'?"} -->|"No"| Z["Return empty or partial result (error or structure change)"]
G["Extract script text and load JSON via json.loads"] --> H["Access our_json[0]['data']['sections']"]
H["Access our_json[0]['data']['sections']"] --> I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"}
I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"} --> J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"]
J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"] --> K{"Artwork present in item?"}
K{"Artwork present in item?"} -->|"Yes"| L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"]
K{"Artwork present in item?"} -->|"No"| M["Set artwork field to empty string"]
L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"] --> N["Attach formatted artwork URL to result object"]
M["Set artwork field to empty string"] --> N["Attach formatted artwork URL to result object"]
N["Attach formatted artwork URL to result object"] --> O{"Needs additional JSON-LD (preview or video URL)?"}
O{"Needs additional JSON-LD (preview or video URL)?"} -->|"Yes"| P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"]
O{"Needs additional JSON-LD (preview or video URL)?"} -->|"No"| R["Skip JSON-LD step"]
P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"] --> Q["Extract preview or video content URL and add to result"]
Q["Extract preview or video content URL and add to result"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
R["Skip JSON-LD step"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"] --> T["Return JSON-like Python structure to caller"]
Z["Return empty or partial result (error or structure change)"] --> T["Return JSON-like Python structure to caller"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- There are many
try/except: passblocks throughout the scraper; it would be more robust to catch specific exceptions (e.g.,KeyError,IndexError,JSONDecodeError) and optionally log or default values so that real failures aren’t silently swallowed. - HTTP requests currently don’t specify timeouts or handle network-level errors; consider adding a shared request helper with a reasonable timeout and basic error handling/retries to avoid hanging or crashing on transient network issues.
- In several places when extracting artwork (e.g., in
search()for artists/albums/songs/playlists/videos), you accessi[0]['artwork']instead ofi['artwork'], which is likely a typo and causes exceptions that are then swallowed—clean this up so artwork URLs are reliably parsed.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There are many `try/except: pass` blocks throughout the scraper; it would be more robust to catch specific exceptions (e.g., `KeyError`, `IndexError`, `JSONDecodeError`) and optionally log or default values so that real failures aren’t silently swallowed.
- HTTP requests currently don’t specify timeouts or handle network-level errors; consider adding a shared request helper with a reasonable timeout and basic error handling/retries to avoid hanging or crashing on transient network issues.
- In several places when extracting artwork (e.g., in `search()` for artists/albums/songs/playlists/videos), you access `i[0]['artwork']` instead of `i['artwork']`, which is likely a typo and causes exceptions that are then swallowed—clean this up so artwork URLs are reliably parsed.
## Individual Comments
### Comment 1
<location> `Apple-Music-Scraper/main.py:123` </location>
<code_context>
+ 'playlists':[],
+ 'videos':[]
+ }
+ link = "https://music.apple.com/us/search?term="+keyword
+
+ headers = {
</code_context>
<issue_to_address>
**issue:** The search keyword should be URL-encoded before being concatenated into the query string.
Directly concatenating the raw keyword will break searches for values with spaces, &, +, or non-ASCII characters. Use proper URL encoding, e.g. `urllib.parse.quote_plus(keyword)` and `f"https://music.apple.com/us/search?term={quote_plus(keyword)}"`, so the search works for arbitrary input.
</issue_to_address>
### Comment 2
<location> `Apple-Music-Scraper/main.py:155-156` </location>
<code_context>
+ try:
+ image_url = i['artwork']['dictionary']['url']
+ image_width = i['artwork']['dictionary']['width']
+ image_height = i[0]['artwork']['dictionary']['height']
+ artwork = get_cover(image_url, image_width, image_height)
+ except:
+ artwork = ""
</code_context>
<issue_to_address>
**issue (bug_risk):** Artwork height is indexed via `i[0]` instead of `i`, which is likely a bug and will raise at runtime.
Since `i` is the item dict from `for i in artists['items']:`, `i[0]` will fail (TypeError/KeyError) and be swallowed by the bare `except`, causing artwork to be dropped even when present. The same issue appears in the albums, songs, playlists, and videos sections. Accessing `i['artwork']['dictionary']['height']` (like width) avoids this failure and preserves artwork where available.
</issue_to_address>
### Comment 3
<location> `Apple-Music-Scraper/main.py:395-403` </location>
<code_context>
+ for i in sections:
+ if "album-detail" in i['id']:
+ album_detail_index = index
+ elif "track-list " in i['id']:
+ track_list_index = index
+ elif "video" in i['id']:
+ video_index = index
+ elif "more" in i['id']:
+ more_index = index
+ elif "you-might-also-like" in i['id']:
+ similar_index = index
+ elif "track-list-section" in i['id']:
+ track_list_section_index = index
+ index+=1
</code_context>
<issue_to_address>
**issue (bug_risk):** The `"track-list "` check includes a trailing space, which likely prevents matching the intended section.
Because of that trailing space, `"track-list " in i['id']` will likely never match, so `track_list_index` may never be set and the later `sections[track_list_index]` access will always fall into the `except` path. Consider matching `"track-list"` instead, and preferably use a stricter check like equality or `startswith` rather than a substring search to make this more robust.
</issue_to_address>
### Comment 4
<location> `Apple-Music-Scraper/main.py:147-156` </location>
<code_context>
+ elif "music_video" in i['id']:
+ videos = i
+
+ try:
+ artists_result = []
+
+ for i in artists['items']:
+ artist = i['title']
+ try:
+ image_url = i['artwork']['dictionary']['url']
+ image_width = i['artwork']['dictionary']['width']
+ image_height = i[0]['artwork']['dictionary']['height']
+ artwork = get_cover(image_url, image_width, image_height)
+ except:
+ artwork = ""
+
+ url = i['contentDescriptor']['url']
+ artists_result.append({'title':artist, 'url':url, 'image':artwork})
+ result['artists'] = artists_result
+
+ except:
+ pass
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The widespread use of bare `except:` blocks hides real errors and makes debugging difficult.
Several blocks here wrap large sections of logic in `try: ... except: pass`. This will hide real programming errors (e.g., `KeyError`, `TypeError`, `NameError`), not just missing optional fields, and can silently degrade the scraper’s output. Please catch only the specific exceptions you expect (e.g., `KeyError` / `IndexError` for missing fields), or refactor into smaller helpers with targeted error handling to keep failures visible while still allowing for genuinely optional data.
Suggested implementation:
```python
# Build artists result list with targeted error handling so that missing
# optional fields don't hide real programming errors.
artists_result = []
# Safely get the items list; if artists is None or not a dict, fall back to empty.
try:
artist_items = artists.get("items", []) if artists is not None else []
except AttributeError:
artist_items = []
for item in artist_items:
# Title is expected to be present; if not, skip this item rather than
# hiding a KeyError in a broad try/except.
try:
artist_title = item["title"]
except (KeyError, TypeError):
continue
# Artwork is optional; if any of the nested keys are missing or the
# structure is unexpected, fall back to an empty artwork string.
artwork = ""
try:
artwork_dict = item["artwork"]["dictionary"]
image_url = artwork_dict["url"]
image_width = artwork_dict["width"]
image_height = artwork_dict["height"]
artwork = get_cover(image_url, image_width, image_height)
except (KeyError, TypeError):
artwork = ""
# URL is required for a valid artist entry; if missing, skip the item.
try:
url = item["contentDescriptor"]["url"]
except (KeyError, TypeError):
continue
artists_result.append(
{
"title": artist_title,
"url": url,
"image": artwork,
}
)
if artists_result:
result["artists"] = artists_result
```
- If there are other similar blocks for albums, songs, playlists, or videos that use broad `try: ... except: pass`, they should be updated in the same style:
- Remove outer broad `try/except` that swallows all exceptions.
- Use small, focused `try/except` blocks around genuinely optional fields (e.g., nested artwork dictionaries) and catch only `KeyError`, `TypeError`, or `AttributeError` as appropriate.
- For required fields (such as URLs or titles needed to build a valid result entry), either let the exception surface or `continue`/skip that specific item while still logging or handling it explicitly if the codebase has a logging convention.
</issue_to_address>
### Comment 5
<location> `Apple-Music-Scraper/utils.py:79-88` </location>
<code_context>
+
+ return f"https://music.apple.com/{country}/song/{title}/{song_id}"
+
+def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"):
+ """
+ Fetch all singles & EP URLs from an Apple Music artist page.
+
+ Parameters
+ ----------
+ url : str, optional
+ Base artist page URL. Defaults to the sample King Princess artist link.
+
+ Returns
+ -------
+ list[str]
+ A list of Apple Music URLs for all singles & EPs for the artist.
+
+ Notes
+ -----
+ - Apple Music loads singles under the `/see-all?section=singles` endpoint.
+ - This function retrieves the serialized server data, parses the `items` section,
+ and extracts the correct song/EP URLs.
+ - Used internally by `artist_scrape()`.
+ """
+ result = []
+ url = url+"/see-all?section=singles"
+
+ headers = {
</code_context>
<issue_to_address>
**nitpick:** Simple string concatenation for `url` can produce malformed URLs if the base has a trailing slash.
If the caller passes an artist URL with a trailing slash (e.g. `.../1349968534/`), this becomes `.../1349968534//see-all?section=singles`. To avoid malformed URLs, either strip any trailing slash before appending the path segment or use `urllib.parse.urljoin`.
</issue_to_address>
### Comment 6
<location> `Apple-Music-Scraper/main.py:123` </location>
<code_context>
def search(keyword="sasha sloan"):
"""
Search Apple Music for artists, songs, albums, playlists and videos.
Parameters
----------
keyword : str, optional
Search query to send to Apple Music. Defaults to "sasha sloan".
Returns
-------
dict
Structured JSON-like dictionary containing search results:
- artists
- albums
- songs
- playlists
- videos
Notes
-----
Scrapes `serialized-server-data` to access Apple Music's internal search structure.
"""
result = {
'artists':[],
'albums':[],
'songs':[],
'playlists':[],
'videos':[]
}
link = "https://music.apple.com/us/search?term="+keyword
headers = {
"User-Agent": "Mozilla/5.0"
}
rspn = requests.get(link, headers=headers)
soup = BeautifulSoup(rspn.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections']
for i in sections:
if "artist" in i['id']:
artists = i
elif "album" in i['id']:
albums = i
elif "song" in i['id']:
songs = i
elif "playlist" in i['id']:
playlists = i
elif "music_video" in i['id']:
videos = i
try:
artists_result = []
for i in artists['items']:
artist = i['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
artists_result.append({'title':artist, 'url':url, 'image':artwork})
result['artists'] = artists_result
except:
pass
try:
albums_result = []
for i in albums['items']:
song = i['titleLinks'][0]['title']
artist = i['subtitleLinks'][0]['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
albums_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
result['albums'] = albums_result
except:
pass
try:
songs_result = []
for i in songs['items']:
song = i['title']
artist = i['subtitleLinks'][0]['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
songs_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
result['songs'] = songs_result
except:
pass
try:
playlists_result = []
for i in playlists['items']:
song = i['titleLinks'][0]['title']
artist = i['subtitleLinks'][0]['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
playlists_result.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
result['playlists'] = playlists_result
except:
pass
try:
videos_results = []
for i in videos['items']:
song = i['titleLinks'][0]['title']
artist = i['subtitleLinks'][0]['title']
try:
image_url = i['artwork']['dictionary']['url']
image_width = i['artwork']['dictionary']['width']
image_height = i[0]['artwork']['dictionary']['height']
artwork = get_cover(image_url, image_width, image_height)
except:
artwork = ""
url = i['contentDescriptor']['url']
videos_results.append({'title':song, 'artist':artist, 'url':url, 'image':artwork})
result['videos'] = videos_results
except:
pass
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Extract duplicate code into function ([`extract-duplicate-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-duplicate-method/))
- Use `except Exception:` rather than bare `except:` [×6] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
</issue_to_address>
### Comment 7
<location> `Apple-Music-Scraper/main.py:256` </location>
<code_context>
def song_scrape(url="https://music.apple.com/us/song/california/1821538031"):
"""
Scrape a single Apple Music song page and extract metadata.
Parameters
----------
url : str, optional
URL of the Apple Music song. Defaults to sample link.
Returns
-------
dict
Dictionary containing:
- title
- image (full resolution)
- kind (song type)
- album info (title + URL)
- artist info (title + URL)
- preview-url
- list of more songs
Notes
-----
Uses the `schema:song` JSON-LD tag to extract preview URL.
"""
result = {
'title':'',
'image':'',
'kind':'',
'album': {
'title':'',
'url':''
},
'artist': {
'title':'',
'url':''
},
'more':[],
'preview-url':''
}
rspn = requests.get(url)
soup = BeautifulSoup(rspn.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
song_details = our_json[0]['data']['sections'][0]
result['title'] = song_details['items'][0]['title']
image_url = song_details['items'][0]['artwork']['dictionary']['url']
image_width = song_details['items'][0]['artwork']['dictionary']['width']
image_height = song_details['items'][0]['artwork']['dictionary']['height']
result['image'] = get_cover(image_url, image_width, image_height)
result['kind'] = song_details['presentation']['kind']
result['album']['title'] = song_details['items'][0]['album']
result['album']['url'] = song_details['items'][0]['albumLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
result['artist']['title'] = song_details['items'][0]['artists']
result['artist']['url'] = song_details['items'][0]['artistLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
json_tag = soup.find("script", {"id": "schema:song", "type": "application/ld+json"})
data = json.loads(json_tag.string)
preview_url = data['audio']['audio']['contentUrl']
result['preview-url'] = preview_url
more_songs = our_json[0]['data']['sections'][-1]['items']
more_songs_list = []
for i in more_songs:
more_songs_list.append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
result['more'] = more_songs_list
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Convert for loop into list comprehension ([`list-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/list-comprehension/))
- Move assignment closer to its usage within a block [×2] ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Merge dictionary assignment with declaration [×2] ([`merge-dict-assign`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/merge-dict-assign/))
</issue_to_address>
### Comment 8
<location> `Apple-Music-Scraper/main.py:334` </location>
<code_context>
def album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585"):
"""
Scrape an Apple Music album page and extract metadata, songs, related albums, videos, etc.
Parameters
----------
url : str, optional
URL of the Apple Music album. Defaults to example album.
Returns
-------
dict
Dictionary containing:
- title
- image
- caption/description
- artist info
- song URLs
- album info text
- more songs (same artist)
- similar (recommended) albums
- videos related to the album
Notes
-----
Extracts multiple sections such as:
- album-detail
- track-list
- similar albums
- more by artist
- album videos
"""
result = {
'title':'',
'image':'',
'caption':'',
'artist': {
'title':'',
'url':''
},
'songs':[],
'info':'',
'more':[],
'similar':[],
'videos':[]
}
headers = {
"User-Agent": "Mozilla/5.0"
}
rspn = requests.get(url, headers=headers)
soup = BeautifulSoup(rspn.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections']
index=0
for i in sections:
if "album-detail" in i['id']:
album_detail_index = index
elif "track-list " in i['id']:
track_list_index = index
elif "video" in i['id']:
video_index = index
elif "more" in i['id']:
more_index = index
elif "you-might-also-like" in i['id']:
similar_index = index
elif "track-list-section" in i['id']:
track_list_section_index = index
index+=1
try:
result['title'] = sections[album_detail_index]['items'][0]['title']
except:
pass
try:
image_url = sections[album_detail_index]['items'][0]['artwork']['dictionary']['url']
image_width = sections[album_detail_index]['items'][0]['artwork']['dictionary']['width']
image_height = sections[album_detail_index]['items'][0]['artwork']['dictionary']['height']
result['image'] = get_cover(image_url, image_width, image_height)
except:
pass
try:
result['caption'] = sections[album_detail_index]['items'][0]['modalPresentationDescriptor']['paragraphText']
except:
pass
try:
result['artist']['title'] = sections[album_detail_index]['items'][0]['subtitleLinks'][0]['title']
result['artist']['url'] = sections[album_detail_index]['items'][0]['subtitleLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
except:
pass
try:
album_songs = sections[track_list_index]['items']
for i in album_songs:
result['songs'].append(convert_album_to_song_url(i['contentDescriptor']['url']))
except:
pass
try:
result['info'] = sections[track_list_section_index]['items'][0]['description']
more_songs = sections[more_index]['items']
for i in more_songs:
result['more'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
similar_songs = sections[similar_index]['items']
for i in similar_songs:
result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
videos = sections[video_index]['items']
for i in videos:
result['videos'].append(i['contentDescriptor']['url'])
except:
pass
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Replace manual loop counter with call to enumerate ([`convert-to-enumerate`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/convert-to-enumerate/))
- Use `except Exception:` rather than bare `except:` [×8] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
- Low code quality found in album\_scrape - 21% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 9
<location> `Apple-Music-Scraper/main.py:518` </location>
<code_context>
def video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026"):
"""
Scrape Apple Music music-video page and extract metadata + video file URL.
Parameters
----------
url : str, optional
URL of the Apple Music music-video. Defaults to example.
Returns
-------
dict
{
title,
image,
artist: {title, url},
video-url,
more (same artist),
similar (same genre)
}
Notes
-----
Uses JSON-LD block `schema:music-video` to extract the direct video content URL.
"""
result = {
'title': '',
'image': '',
'artist': {
'title': '',
'url': ''
},
'video-url': '',
'more': [],
'similar':[]
}
headers = {
"User-Agent": "Mozilla/5.0"
}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections']
for i in sections:
if "music-video-header" in i['id']:
music_video_header = i
elif "more-by-artist" in i['id']:
more = i
elif "more-in-genre" in i['id']:
similar = i
try:
result['title'] = music_video_header['items'][0]['title']
except:
pass
try:
image_url = music_video_header['items'][0]['artwork']['dictionary']['url']
image_width = music_video_header['items'][0]['artwork']['dictionary']['width']
image_height = music_video_header['items'][0]['artwork']['dictionary']['height']
result['image'] = get_cover(image_url, image_width, image_height)
except:
pass
try:
result['artist']['title'] = music_video_header['items'][0]['subtitleLinks'][0]['title']
result['artist']['url'] = music_video_header['items'][0]['subtitleLinks'][0]['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
except:
pass
try:
json_tag = soup.find("script", {"id": "schema:music-video", "type": "application/ld+json"})
data = json.loads(json_tag.string)
result['video-url'] = data['video']['contentUrl']
except:
pass
try:
for i in more['items']:
result['more'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in similar['items']:
result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
return result
</code_context>
<issue_to_address>
**issue (code-quality):** Use `except Exception:` rather than bare `except:` [×6] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
</issue_to_address>
### Comment 10
<location> `Apple-Music-Scraper/main.py:558` </location>
<code_context>
def artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534"):
"""
Scrape an Apple Music artist page and extract all available metadata.
Parameters
----------
url : str, optional
Apple Music artist page URL. Defaults to King Princess sample link.
Returns
-------
dict
Dictionary containing:
- title
- image
- latest release URL
- list of top songs
- all albums
- singles & EPs
- playlists
- videos
- similar artists
- appears on
- more-to-see (videos)
- more-to-hear (songs)
- about text
- extra info (bio subtitle)
Notes
-----
This is the most complex scraper and extracts ~12 different sections
from the artist page.
"""
result = {
'title':'',
'image':'',
'latest':'',
'top':[],
'albums':[],
'singles_and_EP':[],
'playlists':[],
'videos':[],
'similar':[],
'appears_on':[],
'more_to_see':[],
'more_to_hear':[],
'about':'',
'info':'',
}
headers = {
"User-Agent": "Mozilla/5.0"
}
rspn = requests.get(url, headers=headers)
soup = BeautifulSoup(rspn.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections']
for i in sections:
if "artist-detail-header-section" in i['id']:
artist_detail = i
elif "latest-release-and-top-songs" in i['id']:
latest_and_top = i
elif "full-albums" in i['id']:
albums = i
elif "playlists" in i['id']:
playlists = i
elif "music-videos" in i['id']:
videos = i
elif "singles" in i['id']:
singles = i
elif "appears-on" in i['id']:
appears_on = i
elif "more-to-see" in i['id']:
more_to_see = i
elif "more-to-hear" in i['id']:
more_to_hear = i
elif "artist-bio" in i['id']:
bio = i
elif "similar-artists" in i['id']:
similar = i
try:
result['title'] = artist_detail['items'][0]['title']
except:
pass
try:
image_url = artist_detail['items'][0]['artwork']['dictionary']['url']
image_width = artist_detail['items'][0]['artwork']['dictionary']['width']
image_height = artist_detail['items'][0]['artwork']['dictionary']['height']
result['image'] = get_cover(image_url, image_width, image_height)
except:
pass
try:
result['latest'] = latest_and_top['pinnedLeadingItem']['item']['segue']['actionMetrics']['data'][0]['fields']['actionUrl']
except:
pass
try:
for i in latest_and_top['items']:
result['top'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in albums['items']:
result['albums'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
result['singles_and_EP'] = get_all_singles(url)
except:
pass
try:
for i in playlists['items']:
result['playlists'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in videos['items']:
result['videos'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in similar['items']:
result['similar'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in appears_on['items']:
result['appears_on'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in more_to_see['items']:
result['more_to_see'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
for i in more_to_hear['items']:
result['more_to_hear'].append(i['segue']['actionMetrics']['data'][0]['fields']['actionUrl'])
except:
pass
try:
result['about'] = bio['items'][0]['modalPresentationDescriptor']['paragraphText']
except:
pass
try:
result['info'] = bio['items'][0]['modalPresentationDescriptor']['headerSubtitle']
except:
pass
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use `except Exception:` rather than bare `except:` [×14] ([`do-not-use-bare-except`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/do-not-use-bare-except/))
- Low code quality found in artist\_scrape - 10% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 11
<location> `Apple-Music-Scraper/utils.py:80` </location>
<code_context>
def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"):
"""
Fetch all singles & EP URLs from an Apple Music artist page.
Parameters
----------
url : str, optional
Base artist page URL. Defaults to the sample King Princess artist link.
Returns
-------
list[str]
A list of Apple Music URLs for all singles & EPs for the artist.
Notes
-----
- Apple Music loads singles under the `/see-all?section=singles` endpoint.
- This function retrieves the serialized server data, parses the `items` section,
and extracts the correct song/EP URLs.
- Used internally by `artist_scrape()`.
"""
result = []
url = url+"/see-all?section=singles"
headers = {
"User-Agent": "Mozilla/5.0"
}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
items = soup.find('script', {'id': 'serialized-server-data'})
our_json = json.loads(items.text)
sections = our_json[0]['data']['sections'][0]['items']
for i in sections:
result.append((i['segue']['actionMetrics']['data'][0]['fields']['actionUrl']))
return result
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Move assignment closer to its usage within a block ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
- Convert for loop into list comprehension ([`list-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/list-comprehension/))
- Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Apple-Music-Scraper/main.py
Outdated
| 'playlists':[], | ||
| 'videos':[] | ||
| } | ||
| link = "https://music.apple.com/us/search?term="+keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: The search keyword should be URL-encoded before being concatenated into the query string.
Directly concatenating the raw keyword will break searches for values with spaces, &, +, or non-ASCII characters. Use proper URL encoding, e.g. urllib.parse.quote_plus(keyword) and f"https://music.apple.com/us/search?term={quote_plus(keyword)}", so the search works for arbitrary input.
Apple-Music-Scraper/main.py
Outdated
| elif "track-list " in i['id']: | ||
| track_list_index = index | ||
| elif "video" in i['id']: | ||
| video_index = index | ||
| elif "more" in i['id']: | ||
| more_index = index | ||
| elif "you-might-also-like" in i['id']: | ||
| similar_index = index | ||
| elif "track-list-section" in i['id']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): The "track-list " check includes a trailing space, which likely prevents matching the intended section.
Because of that trailing space, "track-list " in i['id'] will likely never match, so track_list_index may never be set and the later sections[track_list_index] access will always fall into the except path. Consider matching "track-list" instead, and preferably use a stricter check like equality or startswith rather than a substring search to make this more robust.
| def get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534"): | ||
| """ | ||
| Fetch all singles & EP URLs from an Apple Music artist page. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| url : str, optional | ||
| Base artist page URL. Defaults to the sample King Princess artist link. | ||
|
|
||
| Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Simple string concatenation for url can produce malformed URLs if the base has a trailing slash.
If the caller passes an artist URL with a trailing slash (e.g. .../1349968534/), this becomes .../1349968534//see-all?section=singles. To avoid malformed URLs, either strip any trailing slash before appending the path segment or use urllib.parse.urljoin.
Description
This PR adds a brand-new Apple Music Web Scraper capable of scraping:
It parses Apple Music’s internal
serialized-server-dataJSON structure and converts it into a clean Python output.This feature did NOT exist in the repository before and expands the Scrapping/Social Media category significantly.
What’s Included:
apple_music_scraper.py– Main scraper logicutils.py– Helper methods (cover resolver, URL converter, etc.)README.md– Full documentation + examplesrequirements.txt– clean dependency list (requests,beautifulsoup4)Fixes #none
No existing issue was referenced; this is a brand-new standalone feature.
Type of change
Checklist:
README.mdTemplate for README.mdrequirements.txtfile if needed.Project Metadata
Category:
Title: Apple Music Web Scraper
Folder:
Apple-Music-ScraperRequirements:
requirements.txtScript:
apple_music_scraper.pyArguments: none
Contributor:
abssdghiDescription:
A powerful and fully-featured Apple Music scraper that extracts songs, albums, playlists, videos, artist pages, and full search results using Apple Music’s internal structured JSON data.
Summary by Sourcery
Add a new Apple Music web scraper module that extracts structured metadata from various Apple Music web pages using their embedded serialized JSON data.
New Features: