Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_track_audio_features() causes an HTTP error using a for-loop ( >100 tracks) | Interval loop causes an error, too #130

Open
KewKalustian opened this issue May 16, 2021 · 3 comments
Labels

Comments

@KewKalustian
Copy link

KewKalustian commented May 16, 2021

Hi,

When I am trying to load audio features for >100 tracks (i.e., 3115) using for-loops, I am getting the following errors:

For-loop 1)

# Extracting Spotify IDs from a given "data.frame" to retrieve the audio features.
ID <- unique(df$track_id)

#  Sequence according to the length of the ID vector
temp <- seq(length(ID))

# Interval of 100 IDs
chunk <- 100

# Overcoming the obstacle that the "get_track_audio_features" function can
# only retrieve audio features for 100 tracks at once.

for(i in ceiling(temp/chunk)){


Feats <- get_track_audio_features(ID[((i-1)*(chunk+1)) : min(temp,(i*chunk))], token) 

}

As reference: https://stackoverflow.com/questions/36104466/looping-through-a-list-in-chunks

Yields:

Error in get_track_audio_features(ID[((i - 1) * (chunk + 1)):min(temp, : length(ids) <= 100 is not TRUE

Alternatively, I also tried for(i in seq(1,length(temp),chunk)) instead of for(i in ceiling(temp/chunk)) and also min(length(temp),(i*chunk)).

The most intuitive loop causes an HTTP error:

For-loop 2)

for(i in temp){

Feats <- get_track_audio_features(ID[i], token)

}

Yields:

Request failed [429]. Retrying in 4 seconds...

So my question is this: What am I doing wrong and how can I fix this issue?

Thanks for any helpful advice!

@KewKalustian
Copy link
Author

Ok, just for the record: I could solve those issues with a well-performing function that takes a sample of 100 IDs (without replacement) from an ID vector within a while-loop and adds them to a tibble. Then, the sampled IDs are dropped from the vector from which those 100 IDs were taken so that the not yet sampled IDs can be sampled from the vector. If less than 100 IDs are remaining, those remaining IDs will be taken. In the end, the function returns a tibble with all audio features and track IDs.
      It took only a few seconds to retrieve audio features of over 3000 tracks without any HTTP errors and Sys.sleep() commands. If you are interested in using that function, please feel free to drop me a message. I am happy to share it.

@antaldaniel
Copy link
Collaborator

Can you please replicate what exactly causes the error? I.e. place a code here that runs into the actual HTTP error?

@KewKalustian
Copy link
Author

KewKalustian commented Jul 22, 2021

First of all, sorry for the delayed answer.

Here I am attaching my solution and the not working attempts with actual IDs from a current study.

Once the respective Client ID and the Client Secret are entered, the following script should run on its own and should yield the errors in question. With this, I hope to comply with your request.

remove(list = ls(all = T)); gc(T,T,T)

################
### Packages ###
################
if (!require(pacman))
  install.packages("pacman", repo = "http://cran.us.r-project.org")

pacman::p_load("tidyverse", "magrittr", "spotifyr")


Charts_w_IDs <- read_csv("https://raw.githubusercontent.com/KewKalustian/Spotify_COVID-19_DACH/main/Datasets/Overall/DACH_tracks.csv")


# ############### #
# Works perfectly #
# ############### #

# Extracting the Spotify IDs from a given data.frame (e.g., "Charts_w_IDs")
# to retrieve the audio features.

id <- unique(Charts_w_IDs$track_id)

set.seed(1, sample.kind = "Rounding")

id_s <- sample(id, replace = F)

# Developer ID

Sys.setenv(SPOTIFY_CLIENT_ID = "PLEASE ENTER HERE THE CLIENT ID")

# Developer secret

Sys.setenv(SPOTIFY_CLIENT_SECRET = "PLEASE ENTER HERE THE CLIENT SECRET")

# Generating an access token to use Spotify’s API

token <- get_spotify_access_token(Sys.getenv("SPOTIFY_CLIENT_ID"), 
                                  Sys.getenv("SPOTIFY_CLIENT_SECRET"))

Feat_scraper <- function(x) {
  # omitting progress info
  base::options(warn =-1) 
  # assigning length of an ID vector to a proxy object   
  entire <- length(x)
  # setting seed for repo purposes
  set.seed(1, sample.kind = "Rounding")
  # assigning 100 sampled IDs to a vector to account for Spotify's limit
  v1a <- as.character(sample(x, 100, replace = F))
  # assigning a tibble with features of those 100 IDs. This tibble will be 
  # extended below.
  tib <- spotifyr::get_track_audio_features(v1a, token)
  # replacing any IDs with new ones if those IDs are already in the tibble
  if (any(x %in% tib$id) == T) {x = x[which(!x %in% tib$id)]}
  # creating a while loop on the condition that the rows of the tibble are
  # less and/or equal to the length of the entire object
  while (nrow(tib) <= entire) {
    # Setting seed for repo purposes
    set.seed(42, sample.kind = "Rounding")
    # assigning 100 sampled IDs from the new IDs from above to a base vector 
    # according to Spotify's limit as long as the object IDs are greater
    # than 100. If the remaining IDs are less than 100, these remaining IDs 
    # will be sampled.
    v1b <- as.character(sample(x, ifelse(length(x) > 100, 100, length(x)),
                               replace = F)) 
    # extending the tibble from above to create a complete tibble with all 
    # retrieved audio features of all track IDs of the object in question
    tib %<>% full_join(spotifyr::get_track_audio_features(v1b,token),
                       by = c("danceability", "energy", "key", "loudness", 
                              "mode", "speechiness", "acousticness", 
                              "instrumentalness", "liveness", 
                              "valence", "tempo", "type", "id", "uri",
                              "track_href", "analysis_url", "duration_ms",
                              "time_signature"))
    # replacing any IDs with new ones if those IDs are already in the tibble
    if (any(x %in% tib$id) == T) {x = x[which(!x %in% tib$id)]}
    # If the rows of the tibble are equal to the length of the entire object
    # in question…,
    if (nrow(tib) == entire) 
      #…break the loop.
      break
  }
  # outputting the entire tibble
  return(tib)
}

start <- Sys.time()
Feats <- Feat_scraper(id_s)
end <- Sys.time()

process <- end-start
print(process)


# #################### #
# Does not work at all #
# #################### #

temp <- seq_along(id_s)

# Overcoming the obstacle that the "get_track_audio_features"-function can
# only retrieve audio features for 100 tracks at once.

Feats_vers1 <-  function(i){get_track_audio_features(id_s[i], token) }

init_time <- Sys.time()

Feats_ERROR <- map_df(temp, Feats_vers1)

end_time <- Sys.time()

# Getting error: Request failed [429]. Retrying in 2 seconds...

# ################################################ #
# A for-loop does not work either; same HTTP error #
# ################################################ #


for (i in temp) {
  
Feats_ERROR2 <- get_track_audio_features(id_s[i], token)
  
}

# Getting error: Request failed [429]. Retrying in 5 seconds...

All in all, I could work around this issue quite well with this employed Feat_scraper function. However, I am wondering whether it's intended that the get_track_audio_features function should be used in this way when features for more than 100 IDs are to be retrieved. Accordingly, I am now curious to know what you think of my solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants