Skip to content

Bandcamp sraper: extracd data from bandcamp website

License

Notifications You must be signed in to change notification settings

borbiuk/band-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


band-play logo

Band Spider

band-spider is a web scraper for bandcamp. It stores information about albums, tracks, their tags, labels, and accounts that purchased them.

🛠️ Installation

Clone the repository and install dependencies:

  git clone https://github.com/borbiuk/band-spider.git \
  && cd band-spider \
  && npm i

🚀 Run

📁 Run the app using the file as the initial data source

NOTE: DB will be created automatically

Fill the file accounts.txt with links to accounts of bandcamp and run:

node run file:accounts

Or fill the file items.txt with links to albums/tracks of bandcamp and run:

node run file:items

💾 Run the app using the DB as the data source

To start process saved accounts:

node run db:accounts

To start process saved albums/tracks:

node run db:items

🕵️ Debug Mode

By default browser will be running in headless mode, to run a browser with GUI use the debug: prefix in the command:

  • node run debug:file:accounts
  • node run debug:file:items
  • node run debug:db:accounts
  • node run debug:db:items

📀 How&What data we store

database schema

🐤 What data should be added to parsing

  1. Item image (The new image column in items table)
  2. Wishlist relation (The new isWishlist column in item-to-tag table)
  3. Account item index (The new index column in item-to-account table)
  4. Labels (New labels and item-to-label tables)