Skip to content

Commit

Permalink
R and python files
Browse files Browse the repository at this point in the history
  • Loading branch information
paulbradshaw committed Jun 22, 2017
1 parent 787a5a0 commit 747ef85
Show file tree
Hide file tree
Showing 5 changed files with 544 additions and 0 deletions.
135 changes: 135 additions & 0 deletions SpotifyNetworkAnalysis.Rmd
@@ -0,0 +1,135 @@
---
title: "analysing spotify data"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Grabbing the data

The data has already been scraped from the API using a Python script in Quickcode.io.The data can be queried using SQL with results provided in JSON. The URL for the result of `select * from swdata` (select all data) is [https://premium.scraperwiki.com/1c6u0ci/n4n1dvnblw9ieyh/sql/?q=select%20*%0Afrom%20swdata%0A](https://premium.scraperwiki.com/1c6u0ci/n4n1dvnblw9ieyh/sql/?q=select%20*%0Afrom%20swdata%0A).

To grab that we first need the `jsonlite` package. Then we can use the `fromJSON` command to grab it and put it in an object

```{r}
library("jsonlite")
spotifydata <- fromJSON("https://premium.scraperwiki.com/1c6u0ci/n4n1dvnblw9ieyh/sql/?q=select%20*%0Afrom%20swdata%0A")
```

## Turn artist-as-row into relationship-as-row

To create a network analysis you need a row for every *relationship*. Because each artist has around 10 related artists, that means 10 relationships (rows) for each artist, not just one.

So we need a new data frame. You create a data frame in R by combining vectors of the same length. So we need one vector for the festival artist, and one vector for each of the related artists that correspond to it.

Let's grab those related artists, then, and put them in a new item:

```{r}
relateds <- strsplit(spotifydata$relatedartists, ',')
```

This will separate each cell by column (so each related artist is stored as a separate item), and then store each collection of artists (a vector) in a list.

```{r}
#Grab the first item in the list, which is a vector
relateds[[1]]
```

You can dig deeper into a list by adding another index:

```{r}
#Grab the four item in the first vector in the list
relateds[[1]][4]
```

We also need our artists in a vector too:

```{r}
artists <- spotifydata$name
```


## Looping through the list

Now we need to loop through all those vectors and turn them into a single vector which *mirrors* another vector of the artists they are connected to.

Here's a basic loop:

```{r}
for (i in 1:20){
print(i)
}
```

The loop we're going to write looks like this:

```{r}
#create an empty vector to store the artists
col1 <- c()
#create another to store the relateds
col2 <- c()
#run the loop 305 times
for (artist in 1:305){
#within each of those 305 times, run another loop 20 times
for (related in 01:20){
#if you want to see what's happening, uncomment this:
#print(paste(artists[artist],":", relateds[[artist]][related]))
#add the artist to the col1 vector. The same artist will be added 20 times for each time a related artist is grabbed
col1 <- c(col1,artists[artist])
#add the related to the col2 vector
col2 <- c(col2,relateds[[artist]][related])
}
}
```

It turns out there are some examples of artists being referred to in different ways in different places (e.g. Klaxons vs The Klaxons), so it's better to use the artist IDs instead.

This code then is the same as above, but uses id codes instead of names. It also doesn't bother with creating a vector or list to store those names - instead strings are split within the code. As a result this takes longer to run, and so it is more efficient to create them separately as before:

```{r}
col1b <- c()
col2b <- c()
for (artist in 1:305){
for (related in 01:20){
#print(paste(artists[artist],":", relateds[[artist]][related]))
col1b <- c(col1b,spotifydata$artistid[artist])
col2b <- c(col2b,strsplit(spotifydata$relatedartistsid,",")[[artist]][related])
}
}
```

Then to save:

```{r}
relationshipids <- data.frame(col1b,col2b)
write.csv(relationshipids,"relationshipids.csv")
```

## Creating a lookup table

We can adapt the process above to create a lookup table of all the ID codes too:

```{r}
relateds <- strsplit(spotifydata$relatedartists,",")
relatedids <- strsplit(spotifydata$relatedartistsid,",")
relatedpops <- strsplit(spotifydata$relatedartistspop,",")
colid <- c()
colname <- c()
colpop <- c()
for (artist in 1:305){
for (related in 01:20){
#print(paste(artists[artist],":", relateds[[artist]][related]))
colid <- c(colid,relatedids[[artist]][related])
colname <- c(colname,relateds[[artist]][related])
colpop <- c(colpop,relatedpops[[artist]][related])
}
}
# Now to save as a data fram
artistidlookup <- data.frame(colid,colname,colpop)
write.csv(artistidlookup,"artistidlookup.csv")
```



83 changes: 83 additions & 0 deletions analysingSpotifyGenre.Rmd
@@ -0,0 +1,83 @@
---
title: "analysing spotify data"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Grabbing the data

The data has already been scraped from the API using a Python script in Quickcode.io.The data can be queried using SQL with results provided in JSON. The URL for the result of `select * from swdata` (select all data) is [https://premium.scraperwiki.com/1c6u0ci/n4n1dvnblw9ieyh/sql/?q=select%20*%0Afrom%20swdata%0A](https://premium.scraperwiki.com/1c6u0ci/n4n1dvnblw9ieyh/sql/?q=select%20*%0Afrom%20swdata%0A).

To grab that we first need the `jsonlite` package. Then we can use the `fromJSON` command to grab it and put it in an object

```{r}
library("jsonlite")
spotifydata <- fromJSON("https://premium.scraperwiki.com/1c6u0ci/n4n1dvnblw9ieyh/sql/?q=select%20*%0Afrom%20swdata%0A")
```

## Exporting genres as CSV

To analyse the genres, I'm following the steps outlined in a [blog post by John Victor Anderson](http://johnvictoranderson.org/?p=115).

First, we need to export the column of keywords:

```{r}
write(spotifydata$genres, 'genresastext.txt')
```

Now we re-import that data as a character object using `scan`:

```{r}
genres <- scan('genresastext.txt', what="char", sep=",")
# We convert all text to lower case to prevent any case sensitive issues with counting
genres <- tolower(genres)
```

We now need to put this through a series of conversions before we can generate a table:

```{r}
#create a list object by splitting 'genres' on comma
genres.split <- strsplit(genres, ",")
#create a vector object from that
genresvec <- unlist(genres.split)
#create a table from the vector
genrestable <- table(genresvec)
```

That table is enough to create a CSV from:

```{r}
write.csv(genrestable, 'genrecount.csv')
```

## Repeating but with appearances factored in

The above process gives us the most popular genre by artists. But if we want to reflect the popularity of genres at festivals, we need to factor in the fact that some artists have appeared more than once (are more popular).

To do this we've taken the list of appearances in Excel and used VLOOKUP to match those with genres, correcting for misspellings and other inconsistencies along the way. The resulting CSV file is then re-imported for our analysis:

```{r}
spotifydata2 <- read.csv("spotifygenres_x_appearances.csv")
```

Then we follow the same steps as above, but with that as the basis:

```{r}
#For some reason this time the write command generates a list of numbers, whereas write.csv doesn't, so we'll use that and ignore the indexes that write.csv creates
write.csv(spotifydata2$genres.on.Spotify, 'genresastext2.txt')
genres2 <- scan('genresastext2.txt', what="char", sep=",")
# We convert all text to lower case to prevent any case sensitive issues with counting
genres2 <- tolower(genres2)
#create a list object by splitting 'genres' on comma
genres.split2 <- strsplit(genres2, ",")
#create a vector object from that
genresvec2 <- unlist(genres.split2)
#create a table from the vector
genrestable2 <- table(genresvec2)
write.csv(genrestable2, 'genrecount_x_appearances.csv')
```


13 changes: 13 additions & 0 deletions spotify.Rproj
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX
69 changes: 69 additions & 0 deletions spotifyscraper.py
@@ -0,0 +1,69 @@
#!/usr/bin/env python
import urllib, json
import scraperwiki

#Set up the token - this only lasts for an hour after which you will need to generate a new one
token = 'BQA5m6oNalpO2oVIZ1ogAklMWzj_Kkk3oJBCn8fasnADDacSi7JmME830Cx3NCl5I7lSGClpBSUwkG3NblvPDw'

baseurl = 'https://api.spotify.com/v1/artists/'
#These artist IDs have been compiled through a combination of scraping and manual search
artists = ['711MCceyCBcFnzjGY4Q7Un','3QP0XPDwbvGivqDAaJ5f5G','4dpARuHxo51G3z768sgnrY','7Ey4PD4MYsKc5I2dolUwbH','3XHO7cRUPCLOr6jwp8vsx5','3FTxQTEzrX6tcJYSlsdUle','4kwxTgCKMipBKhSnEstNKj','4fxp616ALtFWnXfwxnjLzW','3kjuyTCjPG1WMFCiyc5IuB','7Ln80lUS6He07XvHI8qqHH','2evydP72Z45DouM4uMGsIE','2ziB7fzrXBoh1HUPS6sVFn','0nmQIMXWTXfhgOBdNzhGOs','1vCWHaC5f2uS3yhpwWbIA6','7gRhy3MIPHQo5CXYfWaw9I','244fcyNSuyhbRlMGfMbYrO','4YrKBkKSVeqDamzBPWVnSJ','7EQ0qTo7fWT7DPxmxtSYEc','6l77PmL5iuEEcYjGl8K6s7','56ZTgzPBDge0OvCGgMO3OY','03r4iKL2g2442PT9n2UKsx','6pmxr66tMAePxzOLfjGNcX','4I2BJf80C0skQpp1sQmA0h','79fyBJJSUvWw4263rXYDM0','5schNIzWdI9gJ1QRK8SBnc','6vWDO969PvNqNYHIOW5v0m','1km0R7wy712AzLkA1WjKET','1h8YIw9HLr6E8gdXVDRbVJ','7lzordPuZEXxwt9aoVZYmG','5IDs1CK15HegSAhGEbSYXo','5M52tdBnJaKSvOpJGz8mfZ','5keeQyPKYRxUCKDMECTXG3','6FBDaR13swtiWwGhX1WQsP','3MM8mtgFzaEJsqbjZBSsHJ','4tpUmLEVLCGFr93o8hFFIB','7MhMgCo0Bl0Kukl93PZbYS','3pTE9iaJTkWns3mxpNQlJV','4LEiUm1SRbFMgfqnQTwUbQ','58lV9VcRSjABbAbfWS6skp','2BWfZGPtsjRlRp7JTDqI45','3eqjTLE0HfPfh78zjh6TqT','0du5cEVh5yTK9QJze8zA0C','3Z02hBLubJxuFJfhacLSDc','5RNFFojXkPRmlJZIwXeKQC','1OmdWpAh1pucAuZPzJaxIJ','7CajNmpbOovFoOoasH2HaY','7JKYaxqsOWhaHjsG9AVJ7y','1anyVhU62p31KFi8MEzkbf','5fScAXreYFnuqwOgBsJgSd','3jNkaOXasoc7RsxdchvEVq','1GhPHrq36VKCY3ucVaZCfo','3Ebn7mKYzD0L3DaUB1gNJZ','4gzpq5DPGxSnKTe4SA8HAU','2Z7gV3uEh1ckIaBzTUCE6R','0vEsuISMWAKNctLlUAhSZC','7ohlPA8dRBtCf92zaZCaaB','2AV6XDIs32ofIJhkkDevjm','4P0dddbxPil35MNN9G2MEX','4tZwfgrHOc3mvqYlEYSvVi','14r9dR01KeBLFfylVSKCZQ','0O98jlCaPzvsoei6U5jfEL','77tT1kLj6mCWtFNqiOmP9H','0oSGxfWSnnOXhD2fKuz2Gy','20vuBdFblWUo2FCOvUzusB','7J2lZBANizgPNfUzux31PV','1Cs0zKBU1kc0i8ypK3B9ai','2CIMQHirSU0MQqyYHq0eOx','6H1RjVyNruCmrBEWRbD0VZ','762310PdDnwsDxAQxzQkfX','0gusqTJKxtU1UTmNRMHZcv','3TVXtAsR1Inumwj472S9r4','0lZoBs4Pzo7R89JM9lxwoT','0fgYKF9Avljex0L9Wt5b8Z','6eUKZXaKkcviH0Ku9w2n3V','0TJB3EE2efClsYIDQ8V2Jk','2BGRfQgtzikz1pzAD0kaEn','7dGJo4pcD2V6oG8kP0tJRR','1uQWmt1OhuHGRKmZ2ZcL6p','6GbCJZrI318Ybm8mY36Of5','5T4UKHhr4HGIC0VzdZQtAE','4Y7tXHSEejGu1vQ9bwDdXW','2kGBy2WHvF0VdZyqiVCkDT','0ZZr6Y49NZWRJc0uCwqpMR','4EVpmkEwrLYEg6jIsiPMIb','08GQAI4eElDnROBrJRGE0X','1moxjboGR7GNWYIMWsRjgG','6FQqZYVfTNQ1pCqfkwVFEa','7jy3rLJdDQY21OgRLCZ9sD','0XNa1vTidXlvJ2gHSsRi4A','3mZqziCJj4pq3P2VBpmK6p','4AbDWrmJPSOeIbT2Ou60ik','6S0GHTqz5sxK5f9HtLXn9q','6V6WCgi7waF55bJmylC4H5','5BKsn7SCN2XmbF7apdCpRS','3AA28KZvwAUcZuOKwyblJQ','2f9ZiYA2ic1r1voObUimdd','3W4xM5XYtUp4ifYYPVKVdk','7oPftvlwr6VrsViSDV7fJY','2Jv5eshHtLycR6R8KQCdc4','67tgMwUfnmqzYsNAtnP6YJ','3qm84nBOXUEQ2vnTfUTTFC','339DNkQkuhHKEcHw6oK8f0','2jK54ZlZhTF1TxygsVeR05','6IDifUtaIPK4yuAiq5W2iG','37uLId6Z5ZXCx19vuruvv5','3WaJSfKnzc65VDgmj2zU8B','6mdiAmATAx73kdxrNrnlao','5lkiCO9UQ8B23dZ1o0UV4m','4EzkuveR9pLvDVFNx6foYD','7KMqksf0UMdyA0UCf4R3ux','3LpLGlgRS1IKPPwElnpW35','3XxxEq6BREC57nCWXbQZ7o','6J7biCazzYhU3gM9j1wfid','3nFkdlSjzX9mRTtwJOzDYB','1OwarW4LEHnoep20ixRA0y','4gn6f5jaOO75s0oF7ozqGG','3pFCERyEiP5xeN2EsPXhjI','6eLbRJP12OhyvUv4ntto4e','6igfLpd8s6DBBAuwebRUuo','1uNFoZAHBGtllmzznpCI3s','31TPClRtHm23RisEBtV3X7','0LbLWjaweRbO4FDKYlbfNt','5K4W6rqBFWDnAN6FQUkS6x','11wRdbnoYqRddKBrpHt4Ue','7K4k5g1ie2qHIH42UMNO7n','53A0W3U0s8diEn9RhXQhVz','73a6pNH4YtLNgDbPQwXveo','5dKj3B0vI9B2sYxJx9Yfvz','2qk9voo8llSGYcZ6xrBzKx','2h93pZq0e7k5yf4dywlkpM','07XSN3sPlIlB2L2XNcTwJw','1ajKVVeguChWXvyDr7L8rv','3VNITwohbvU5Wuy5PC6dsI','0dmPX6ovclgOy8WWJaFEUU','23fqKkggKUBHNkbKtXEls4','2Lhs0asnFQiLuntn3s8p78','066X20Nz7iquqkkCW6Jxy6','72hqBMsw7x5jnfxxwkii8L','5gznATMVO85ZcLTkE9ULU7','0L9xkvBPcEp1nrhDrodxc5','13saZpZnCDWOI9D4IJhp1f','6XyY86QOPPrYVGvF9ch6wz','7KnaZr690xW0sCihF9Z8oP','60ht0hWRy1yjUDfNsLuHuP','3lcbKPLl0ci2mKRdcP5Etf','0QJIPDAEDILuo8AIq3pMuU','3Sz7ZnJQBIHsXLUSo0OQtM','738wLrAtLtCtFOLvQBXOXp','2uH0RyPcX7fnCcT90HFDQX','6wH6iStAh4KIaWfuhf0NYM','6FXMGgJwohJLUSr5nVlf9X','77oD8X9qLXZhpbCjv53l5n','2ye2Wgw4gimLv2eAKyk1NB','2N4isf5pypyuDVpBofqEN8','3OsRAKCvk37zwYcnzRf5XF','1yAwtBaoHLEDWAnWR87hBT','34UhPkLbtFKRq3nmfFgejG','3iTsJGG39nMg9YiolUgLMQ','3gd8FJtBJtkRxdfbTu19U2','12Chz98pHFMPJEknJQMWvI','7FBcuc1gsnv6Y1nwFtNRCb','5YjEVrNMrIRw2xGbjTN6Ti','20qISvAhX20dpIbOOzGK3q','6v8FB84lnmJs434UJf2Mrm','0yNLKJebCb8Aueb54LYya3','0pf1lcBxh6HiiHQAIzhTI5','4UXJsSlnKd7ltsrHebV79Q','0hCNtLu0JehylgoiP8L4Gh','7sjttK1WcZeyLPn3IsQ62L','0UTbeMH3r0QJB50wYA4VTE','2DaxqgrOhkeH0fpeiQq2f4','7wJ9NwdRWtN92NunmXuwBk','7x5rK9BClDQ8wmCkYAGsQp','4STHEaNw4mPZ2tzheohgXB','2CvCyf1gEVhI0mX6aFXmVI','7Lf3LOZp3U3u2f6cWMd3AH','1w5Kfo2jwwIPruYS2UWh56','2ycnb8Er79LoH2AsR5ldjh','7C4sUpWGlTy7IANjruj02I','5psTfOBPmc8UquYcnS22KE','6zvul52xwTWzilBZl6BUbT','7qlh1IM1XMeQXA9ukp59au','6liAMWkVf5LH7YR9yfFy1Y','3wury2nd8idV4GecUg5xze','0O0lrN34wrcuBenkqlEDZe','36E7oYfz3LLRto6l2WmDcD','1dfeR4HaWDbWqFHLkxsg1d','4pejUc4iciQfgdX6OKulQn','4Z8W4fKeB5YxbusRsdQVPb','2d0hyoQ5ynDBnkvAbJKORj','6wWVKhxIU2cEi0K81v7HvP','450iujbtN6XgiA9pv6fVZz','0L8ExT028jH3ddEcZwqJJ5','4KWTAlx2RvbpseOGMEmROg','1HGTHrRQkw0BtevSo1jucU','5pKCCKE2ajJHZ9KAiaK11H','1OwarW4LEHnoep20ixRA0y','2y8Jo9CKhJvtfeKOsYzRdT','3lHXm91pKLq9Sxi6CoRKWu','3fhOTtm0LBJ3Ojn4hIljLo','4WN5naL3ofxrVBgFpguzKo','3CQIn7N5CuRDP8wEI7FiDA','2qc41rNTtdLK0tV3mJn2Pm','3Y10boYzeuFCJ4Qgp53w6o','2wpJOPmf1TIOzrB9mzHifd','5GtMEZEeFFsuHY8ad4kOxv','6OVkHZQP8QoBYqr1ejCGDv','7ooOn6bokl4mGV4CEaUz6A','6UUrUCIZtQeOf8tC0WuzRy','6hN9F0iuULZYWXppob22Aj','4sD9znwiVFx9cgRPZ42aQ1','2p1fiYHYiXz9qi0JJyxBzN','5HlXA01kcjssYDT7EoqUJF','05fG473iIaoy82BF1aGhL8','5m8H6zSadhu1j9Yi04VLqD','7hJcb9fa4alzcOq3EaNPoG','3rIZMv9rysU7JkLzEaC5Jp','2sIx6SmAMw9IBySG3Uj0jf','6Jrj26oAY96EEC2lqC6fua','7bcbShaqKdcyjnmv4Ix8j6','4gIdjgLlvgEOz7MexDZzpM','21UJ7PRWb3Etgsu99f8yo8','2UBTfUoLI07iRqGeUrwhZh','7guDJrEfX3qb6FEbdPA5qi','6PHIK3kjWggLtVygsOtpqS','4MXUO7sVCaFgFjoTI5ox5c','7rZNSLWMjTbwdLNskFbzFf','0FOcXqJgJ1oq9XfzYTDZmZ','3X0tJzVYoWlfjLYI0Ridsw','5eAWCfyUhZtHHtBdNk56l1','3dBVyJ7JuOMt4GE9607Qin','5INjqkS1o8h1imAzPqGZBb','5JsdVATHNPE0XdMFMRoSuf','3mIj9lX2MWuHmhNCA7LSCW','3gdbcIdNypBsYNu3iiCjtN','5krkohEVJYw0qoB5VWwxaC','1yxSLGMDHlW21z4YXirZDS','7mnBLXK823vNxN3UWB7Gfz','40oYPr305MsT2lsiXr9fX9','17U2ImH5IyYMvjkCfPhMHT','7bu3H8JO7d0UbMoVzbo70s','5r1bdqzhgRoHC3YcCV6N5a','27pyLBNdWVJBkbykvbf3TW','16eRpMNXSQ15wuJoeqguaB','6iy8nrBbtL57i4eUttHTww','3jc496ljiyrS3ECrD7QiqL','1aX2dmV8XoHYCOQRxjPESG','4PsjwNjuhm45waas7wa1xJ','0C0XlULifJtAgn6ZNCW2eu','1TrwMxRrrlk0hZxJkw4jUF','4fSPtBgFPZzygkY6MehwQ7','0vW8z9pZMGCcRtGPGtyqiB','2cCUtGK9sDU2EoElnk0GNB','5LfGQac0EIXyAN8aUwmNAQ','7FPkZue0zzjHaOPJb4WCw3','2kreKea2n96dXjcyAU9j5N','5NGO30tJxFlKixkPSgXcFE','0GByy3DcfbQwDvXGCWmzv9','4k1ELeJKT1ISyDv8JivPpB','22bE4uQ6baNwSHPVcDxLCe','1u7kkVrr14iBvrpYnZILJR','40Yq4vzPs9VNUrIBG5Jr2i','3yY2gUcIsjMr8hjo51PoJ8','1lYT0A0LV5DUfxr6doRP3d','4GvOygVQquMaPm8oAc0vXi','0epOFNiUfyON9EYx7Tpr6V','4BntNFyiN3VGG4hhRRZt9d','0Ak6DLKHtpR6TEEnmcorKA','2cGwlqi3k18jFpUyTrsR84','6g0mn3tzAds6aVeUYRsryU','1Xyo4u8uXC1ZmMpatF05PJ','4F84IBURUo98rz4r61KF70','67ea9eGLXYMsO2eYQRui3w','0Ya43ZKWHTKkAbkoJJkwIB','2qV7axHq9Jk7QqFcB3f05A','5jVeqi3PNaTOajfvBa4uFn','2ILGhwWQ0X2HH7iJyH0LWW','4tX2TplrkIP4v05BNC903e','0wHrpPuQ3Qea6UXAc06ocM','2yEwvVSSSUkcLeSTNyHKh8','3bUwxJgNakzYKkqAVgZLlh','536BYVgOnRky0xjsPT96zl','51Blml2LZPmy7TTiAg47vQ','69MEO1AADKg1IZrq2XLzo5','5BvJzeQpmsdsFp4HGUYUEx','44NX2ffIYHr6D4n7RaZF7A','162DCkd8aDKwvjBb74Gu8b','2QoU3awHVdcHS8LrZEKvSM','4zrFO6P7G6EZry0pfxMfKT','2U6gqwyl9F33YxawnFrZG7','5hAhrnb0Ch4ODwWu4tsbpi','77zlytAFjPFjUKda8TNIDY','09hVIj6vWgoCDtT03h8ZCa','0Xf8oDAJYd2D0k3NLI19OV','1G9G7WwrXka3Z1r7aIDjI7','1GhPHrq36VKCY3ucVaZCfo','1PXHzxRDiLnjqNrRn2Xbsa','2wIVse2owClT7go1WT98tk','3G3Gdm0ZRAOxLrbyjfhii5','3iOvXCl6edW5Um0fXEBRXy','3PhoLpVuITZKcymswpck5b','3Rsr4Z96O6U3lToOiV3zBh','3vbKDsSS70ZX9D2OcvbZmS','6Q192DXotxtaysaqNPy5yR','7MqnCTCAX6SsIYYdJCQj9B','7w29UYBi0qsHi5RTcv3lmA']
print 'there are ', len(artists), ' artists'
record = {}

for artist in artists:
print artist
#Form a URL by combining different parts
fullurl = baseurl+str(artist)+'?access_token='+token
print 'scraping', fullurl
response = urllib.urlopen(fullurl)
data = json.loads(response.read())
print data['name']
print data['genres']
#create an empty string - we'll then add each genre to turn a list into a single comma separated string
genres = ''
for genre in data['genres']:
genres = genres+','+genre
record['genres'] = genres[1:]
record['name'] = data['name']
record['popularity'] = data['popularity']
record['type'] = data['type']
record['followers'] = data['followers']['total']
record['fullurl'] = fullurl
record['artistid'] = artist
#Now grab the top tracks
tracksurl = baseurl+artist+'/top-tracks?country=GB&access_token='+token
print 'scraping tracks: ', tracksurl
response = urllib.urlopen(tracksurl)
data = json.loads(response.read())
#for tracks we have the same problem as genres above, so we create 3 empty strings to store the name, id, and popularity
tracks = ""
trackpops = ''
trackids = ''
for track in data['tracks']:
print track['name']
tracks = tracks+','+track['name']
trackpops = trackpops+','+str(track['popularity'])
trackids = trackids+','+track['id']
record['tracks'] = tracks[1:]
record['trackpops'] = trackpops[1:]
record['trackids'] = trackids[1:]
#Now onto related artists
relatedurl = baseurl+artist+'/related-artists?access_token='+token
print 'scraping related artists: ', relatedurl
response = urllib.urlopen(relatedurl)
data = json.loads(response.read())
relatedartists = ''
relatedartistspop = ''
relatedartistsid = ''
print data['artists']
for artist in data['artists']:
relatedartists = relatedartists+','+artist['name']
relatedartistspop = relatedartistspop+','+str(artist['popularity'])
relatedartistsid = relatedartistsid+','+artist['id']
record['relatedartists'] = relatedartists[1:]
record['relatedartistspop'] = relatedartistspop[1:]
record['relatedartistsid'] = relatedartistsid[1:]
scraperwiki.sqlite.save(['artistid'], record)


0 comments on commit 747ef85

Please sign in to comment.