Skip to content
This repository has been archived by the owner on Sep 7, 2022. It is now read-only.

notpeter/crunchbase-data

Repository files navigation

Crunchbase Data As CSV

This data was extracted from the December 4, 2015 Crunchbase Data Export.

This repository includes unofficial CSV exports derived from the individual worksheets from crunchbase_export.xlsx. I previously munged the data by hand with Excel, but have since moved the dirty work to python. Reading the XLSX file is handled with openpyxl while unicodecsv creates the CSVs.

The Excel workbook is transformed as follows:

  • One CSV file per worksheet
  • Skip the analysis page and empty columns
  • Remove redundant reduced precision date columns (month, quarter, year)
  • Remove dates missing a year (year 1000 is just wrong)
  • Remove trailing blank rows

Usage

virtualenv .venv
source .venv/bin/activate
pip install -r requirements.txt
python crunchbase-csv.py crunchbase_export.xlsx

License

Use of this data is governed by the CrunchBase Terms of Service and Licensing Policy.

This data dump for non-commercial use is provided under Creative Commons Attribution-NonCommercial (CC-BY-NC) license. Any commercial use requires a seperate license from CrunchBase.

crunchbase-csv.py is Copyright (c) Peter Tripp and made available under terms of the MIT License