## Following code iterates through the inventory of seller and matches it in a more enhanced way:
1. fetching the master IDs from your wantlist via API
2. fetching data from seller's inventory (seller to be specified)
3. fetching all the releases(versions) of all master releases of the matched artists
4. matching wantlist & seller by checking every single release version

## How to execute the code :
1. Clone this repository
2. Log in to the discogs.com and get the personal API Token
3. Add the API Token & Seller's username to section BELOW (name of a shop you want to fetch data from) to the related variables in Jupiter notebook
4. Execute the notebook and wait for your results!

### Limitations: Discogs does not allow >100 api calls per second. Big sellers might take a few minutes to load

## YOUR INPUT

In [8]:
# Replace with your API Disocgs Token 
my_user_token = 'YOUR_API_TOKEN'

# Replace with the actual seller's username
seller_username = "YOUR_SELLER_USERNAME"

## Step 0: User Initiation + Seller Username

In [9]:
!pip install python3-discogs-client
import discogs_client
from concurrent.futures import ThreadPoolExecutor, as_completed
from collections import defaultdict
import requests
import time
import concurrent.futures


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [10]:
d = discogs_client.Client('discogs application project', user_token=my_user_token)

me = d.identity()
wantlist = me.wantlist

# Set headers with authentication
headers = {
    'Authorization': f'Discogs token={my_user_token}',
    'User-Agent': 'Discogs-Inventory-App'
}


## Step 1: Fetching MasterID+Artist from Wantlist

In [11]:
# Function to process a release from the wantlist
def process_release(item):
    release = item.release  # Get the release object
    master_id = None
    artists = set()
    
    # Check and collect master ID if it exists
    if release.master:
        master_id = release.master.id
    
    # Collect artist names
    artist_names = []
    for artist in release.artists:
        # Access the name and join fields directly
        if hasattr(artist, 'join') and artist.join:
            artist_names.append(f"{artist.name} {artist.join}")
        else:
            artist_names.append(artist.name)
    
    # Combine artist names into a single string and add to the set
    artists.add(" ".join(artist_names))
    
    return master_id, artists

# Collecting unique master IDs and artists concurrently
def fetch_wantlist_data_concurrently(wantlist):
    wantlist_master_ids = set()
    wantlist_artists = set()

    # Use ThreadPoolExecutor to process releases concurrently
    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
        # Submit all releases to the thread pool
        future_to_release = {executor.submit(process_release, item): item for item in wantlist}
        
        # Process the results as they complete
        for future in concurrent.futures.as_completed(future_to_release):
            try:
                master_id, artists = future.result()

                # Add master ID to the set if it exists
                if master_id:
                    wantlist_master_ids.add(master_id)

                # Add all artist names to the set
                wantlist_artists.update(artists)

            except Exception as exc:
                print(f"Error processing release: {exc}")

    return wantlist_master_ids, wantlist_artists

wantlist_master_ids, wantlist_artists = fetch_wantlist_data_concurrently(me.wantlist)

print("# of unique Master IDs in your wantlist:",len(wantlist_master_ids))
print("# of unique Artists in your wantlist:",len(wantlist_artists))

# of unique Master IDs in your wantlist: 159
# of unique Artists in your wantlist: 140


## Step 2: Fetching Release+Artist from Seller

In [12]:
time.sleep(3)

items_per_page = 100  # Max number of items per page to be fetched

def fetch_inventory_page(page, seller_username, items_per_page, headers):
    """Fetch a single page of inventory from the Discogs API."""
    url = f"https://api.discogs.com/users/{seller_username}/inventory?per_page={items_per_page}&page={page}&sort=artist&sort_order=asc"
    response = requests.get(url, headers=headers)
    
    if response.status_code == 200:
        remaining_requests = int(response.headers.get('X-Discogs-Ratelimit-Remaining', 60))
        reset_time = int(response.headers.get('X-Discogs-Ratelimit-Reset', 60))
        return response.json(), remaining_requests, reset_time
    else:
        print(f"Error fetching page {page}: {response.status_code}, {response.text}")
        return None, 0, 60  # Default values in case of error

def adaptive_sleep(remaining_requests, reset_time):
    """Calculate adaptive sleep time based on remaining requests and reset time."""
    if remaining_requests > 0:
        return reset_time / remaining_requests
    else:
        return reset_time

# Set to store unique artist names and release IDs
unique_artists = set()
unique_releases = set()
artists_and_releases_and_items = set()

# Progress tracking
total_checked_items = 0  # To keep track of how many items we've processed


# First, get the first page to find the total number of items
inventory_data, remaining_requests, reset_time = fetch_inventory_page(1, seller_username, items_per_page, headers)
if inventory_data:
    total_items = inventory_data['pagination']['items']
    total_pages = inventory_data['pagination']['pages']
    print(f"Total items: {total_items}, Total pages: {total_pages}")
    
    # Add artists and releases from the first page
    for item in inventory_data['listings']:
        artist_name = item["release"].get("artist")
        release_id = item["release"].get("id")
        item_id = item["id"]
        if artist_name and release_id:
            unique_artists.add(artist_name)
            unique_releases.add(release_id)
            artists_and_releases_and_items.add((artist_name, release_id, item_id))
        
        total_checked_items += 1  # Update total checked items after processing

    # Print the progress
    print(f"Checked {total_checked_items} items out of {total_items}")

    # Fetch the rest of the pages with adaptive sleep and limiting concurrency
    pages_to_process = total_pages - 1  # Already fetched page 1
    current_page = 2

    with ThreadPoolExecutor(max_workers=3) as executor:
        while pages_to_process > 0:
            # Determine the number of pages to fetch in this batch
            pages_in_batch = min(remaining_requests, pages_to_process)  # Use remaining requests
            
            futures = [executor.submit(fetch_inventory_page, page, seller_username, items_per_page, headers)
                       for page in range(current_page, current_page + pages_in_batch)]
            
            for future in as_completed(futures):
                inventory_data, remaining_requests, reset_time = future.result()
                if inventory_data and 'listings' in inventory_data:
                    for item in inventory_data['listings']:
                        artist_name = item["release"].get("artist")
                        release_id = item["release"].get("id")
                        item_id = item["id"]
                        if artist_name and release_id:
                            unique_artists.add(artist_name)
                            unique_releases.add(release_id)
                            artists_and_releases_and_items.add((artist_name, release_id, item_id))
                        
                        total_checked_items += 1  # Update total checked items after processing
                        
                    # Print the progress after processing each page
                    print(f"Checked {total_checked_items} items out of {total_items}")

            # Move to the next set of pages
            current_page += pages_in_batch
            pages_to_process -= pages_in_batch

            # Apply adaptive sleep or wait for rate limit reset
            if remaining_requests == 0:
                print(f"Rate limit reached, waiting for {reset_time} seconds...")
                time.sleep(reset_time)
            else:
                sleep_time = adaptive_sleep(remaining_requests, reset_time)
                print(f"Sleeping for {sleep_time:.2f} seconds to avoid hitting rate limit...")
                time.sleep(sleep_time)

#print(f"\nUnique artists found in seller's inventory: {sorted(unique_artists)}")
print(f"Total unique artists: {len(unique_artists)}")
print(f"Total unique releases: {len(unique_releases)}")

Total items: 373, Total pages: 4
Checked 100 items out of 373
Checked 200 items out of 373
Checked 273 items out of 373
Checked 373 items out of 373
Sleeping for 1.28 seconds to avoid hitting rate limit...
Total unique artists: 324
Total unique releases: 366


## Step 3: Fetching MasterID of ReleaseIDs of each matched Artist

In [13]:
time.sleep(3)
# Find matched artists between wantlist and artists_and_releases
matched_artists = {artist for artist, _, _ in artists_and_releases_and_items if artist in wantlist_artists}

# Initialize a dictionary to store each matched artist's release IDs and master IDs
artist_to_release_master_ids = defaultdict(lambda: {"release_ids": set(), "master_ids": set()})

# Step 3: Loop through the wantlist and check for matching artists
for item in wantlist:
    release = item.release  # Get the release object
    
    # Check if any artist (with union logic) in the release matches the matched artists
    artist_names = []
    for artist in release.artists:
        # Build the full artist name with the "join" field
        if hasattr(artist, 'join') and artist.join:
            artist_names.append(f"{artist.name} {artist.join}")
        else:
            artist_names.append(artist.name)

    # Create a single string of all artist names in the release
    full_artist_name = " ".join(artist_names)

    # Check if the full artist name matches any in matched_artists
    if full_artist_name in matched_artists:
        # Add the release ID
        artist_to_release_master_ids[full_artist_name]["release_ids"].add(release.id)

        # Add the master ID if it exists
        if release.master:
            artist_to_release_master_ids[full_artist_name]["master_ids"].add(release.master.id)

# Print the results
print("Matched artists with release and master IDs:")
for artist, ids in artist_to_release_master_ids.items():
    release_ids_list = ", ".join(map(str, ids["release_ids"]))
    master_ids_list = ", ".join(map(str, ids["master_ids"]))
    print(f"Artist: {artist}, Release IDs: {release_ids_list}, Master IDs: {master_ids_list}")

# Store distinct master IDs across all matched artists
distinct_master_ids = set()
for ids in artist_to_release_master_ids.values():
    distinct_master_ids.update(ids["master_ids"])

# Output distinct master IDs
print("\nDistinct Master IDs:", sorted(distinct_master_ids))
print("Total distinct Master IDs:", len(distinct_master_ids))

Matched artists with release and master IDs:
Artist: Steely Dan, Release IDs: 9405826, 11462532, 2152836, 24158342, 1804939, 5805967, 8144785, 5893905, 14622867, 15422995, 16251417, 9670809, 4982428, 1710246, 13905960, 5657387, 6281645, 12955952, 15764402, 10969782, 12184375, 6956855, 16785210, 27481410, 9520450, 14482375, 1727687, 17719243, 8813003, 3228622, 4969680, 6538322, 14140630, 8887259, 2469981, 4319198, 21255265, 23386859, 3237233, 26262644, 5470198, 5098615, 1327484, 2470015, Master IDs: 17100
Artist: Various, Release IDs: 26877272, 23640356, 7524613, 6213832, 9062153, 11000744, 478443, 13845486, 12238166, 14229974, 9409848, 3286399, Master IDs: 3188130, 1279206, 1688587, 206358, 1883259, 1736701
Artist: Rank 1, Release IDs: 35648, 2824125, Master IDs: 73289
Artist: Unknown Artist, Release IDs: 15837425, 15815070, 11306517, 13751577, Master IDs: 
Artist: Sade, Release IDs: 1548070, Master IDs: 43936
Artist: Oxia, Release IDs: 6732, Master IDs: 1008401
Artist: COEO, Release I

## Step 4: Fetch ReleaseID for each matched MasterID + FINAL MATCH

In [14]:
time.sleep(5)
def fetch_master_versions(master_id, headers, per_page=100):
    """Fetch all release versions for a given master_id using pagination."""
    page = 1
    more_pages = True
    master_release_ids = set()  # Local set to collect each master’s release IDs

    while more_pages:
        # Construct the URL with pagination
        url = f"https://api.discogs.com/masters/{master_id}/versions?page={page}&per_page={per_page}&format=Vinyl"
        
        # Make the API request
        response = requests.get(url, headers=headers)
        
        if response.status_code == 200:
            versions_data = response.json()
            
            # Check if there are no more items (empty page)
            if not versions_data['versions']:
                more_pages = False
            else:
                # Iterate through the versions in the response
                for version in versions_data['versions']:
                    release_id = version.get('id')
                    if release_id:
                        master_release_ids.add(release_id)
                
                # Move to the next page
                page += 1

            # Implement adaptive sleep based on rate limit headers
            remaining_requests = int(response.headers.get('X-Discogs-Ratelimit-Remaining', 60))
            reset_time = int(response.headers.get('X-Discogs-Ratelimit-Reset', 60))  # Default reset time to 60 seconds
            if remaining_requests == 0:
                #print(f"Rate limit reached for master {master_id}. Waiting for {reset_time} seconds.")
                time.sleep(reset_time)
            else:
                sleep_time = max(1, reset_time / remaining_requests)  # At least 1-second sleep
                #print(f"Sleeping for {sleep_time:.2f} seconds between requests.")
                time.sleep(sleep_time)

        else:
            print(f"Error fetching page {page} for master {master_id}: {response.status_code}, {response.text}")
            more_pages = False  # Stop the loop if there's an error

    return master_release_ids

def fetch_all_master_versions_concurrently(master_ids, headers):
    """Fetch release IDs for all master IDs concurrently with rate limit control."""
    all_masters_release_ids = set()
    
    # Use ThreadPoolExecutor to fetch versions for each master concurrently
    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
        future_to_master_id = {executor.submit(fetch_master_versions, master_id, headers): master_id for master_id in master_ids}
        
        # Process the results as they complete
        for future in concurrent.futures.as_completed(future_to_master_id):
            master_id = future_to_master_id[future]
            try:
                master_release_ids = future.result()
                all_masters_release_ids.update(master_release_ids)
                print(f"Fetched {len(master_release_ids)} release IDs for Master ID {master_id}")
            except Exception as exc:
                print(f"Error fetching releases for Master ID {master_id}: {exc}")

    return all_masters_release_ids

# Fetch all master release IDs concurrently with rate limit control
all_masters_release_ids = fetch_all_master_versions_concurrently(distinct_master_ids, headers)


matching_release_data = []  # To store tuples of (release_id, seller_id)

for artist, release_id, seller_id in artists_and_releases_and_items:
    if release_id in all_masters_release_ids:
        matching_release_data.append((release_id, seller_id))

# Print matching release IDs and seller IDs
print("Matching Releases with Seller IDs:")
for release_id, seller_id in matching_release_data:
    print(f"Release ID: {release_id}, Seller ID: {seller_id}")
    print(f"https://www.discogs.com/sell/item/{seller_id}")

# Output the count of matches
print(f"Total matching releases: {len(matching_release_data)}")

Fetched 2 release IDs for Master ID 3188130
Fetched 1 release IDs for Master ID 1279206
Fetched 83 release IDs for Master ID 43936
Fetched 1 release IDs for Master ID 3104646
Fetched 2 release IDs for Master ID 956552
Fetched 15 release IDs for Master ID 73289
Fetched 1 release IDs for Master ID 1688587
Fetched 46 release IDs for Master ID 17100
Fetched 18 release IDs for Master ID 72875
Fetched 1 release IDs for Master ID 1899759
Fetched 2 release IDs for Master ID 1008401
Fetched 2 release IDs for Master ID 1242491
Fetched 1 release IDs for Master ID 206358
Fetched 30 release IDs for Master ID 84982
Fetched 1 release IDs for Master ID 1534205
Fetched 0 release IDs for Master ID 1736701
Fetched 1 release IDs for Master ID 1883259
Matching Releases with Seller IDs:
Release ID: 14538, Seller ID: 3333581403
https://www.discogs.com/sell/item/3333581403
Release ID: 2222162, Seller ID: 3333769800
https://www.discogs.com/sell/item/3333769800
Total matching releases: 2
