Conda Package Download Data
This repository describes the conda package download data provided by Anaconda, Inc. It includes package download counts starting from Jan. 2017 for the following download sources:
- Anaconda Distribution: The default channels hosted on
repo.anaconda.com(and historically on
- Select Anaconda.org channels: Currently this includes
The download data is provided as record for every unique combination of:
anacondafor Anaconda distribution,
conda-forgefor the conda-forge channel on Anaconda.org, and
biocondafor the bioconda channel on Anaconda.org.
time: UTC time, binned by hour
pkg_name: Package name (Ex:
pkg_version: Package version (Ex:
pkg_platform: One of
pkg_python: Python version required by the package, if any (Ex:
counts: Number of downloads for this combination of attributs
The storage format is Parquet, one file per day, with SNAPPY compression. Files are hosted on S3, with the naming convention:
To simplify using the dataset, we have also created an Intake catalog file, which you can load either directly from the repository if you have the
python-snappy packages installed:
import intake cat = intake.Catalog('https://raw.githubusercontent.com/ContinuumIO/anaconda-package-data/master/catalog/anaconda_package_data.yaml') monthly = cat.anaconda_package_data_by_month(year=2019, month=12).to_dask()
Or you can install the data package directly with conda, which will also fetch the required dependencies:
conda install -c intake anaconda-package-data
And then the data source will appear in the global catalog of your conda environment:
import intake monthly = intake.cat.anaconda_package_data_by_month(year=2019, month=12).to_dask()
To minimize bandwidth usage, these catalogs are configured so that Intake will cache data locally to your system on first use.
There are some known gaps in the dataset, and Anaconda.org data doesn't appear in the data set until April 2017. See KNOWN_ISSUES.md for more details.
This data will be updated approximately monthly. Note that we may revise historical data if processing issues are discovered, or to add additional data (like new Anaconda.org channels). We will update the change log when new or revised data is posted.
This dataset is licensed under a Creative Commons Attribution 4.0 International License. We are offering this data to help the community understand the usage of conda packages, but with no warranty. If you use this data, please acknowledge Anaconda as the source and link back to this Github repository.
If you have questions or find problems in the data, please open an issue on this repository. Thanks!