# Anaconda Package Data Quickstart

This notebook gives you easy access to package download stats. See the [README](https://github.com/ContinuumIO/anaconda-package-data/blob/master/README.md#quickstart) for setup instructions.

## Settings

First set the parameters for the data you want. Below are the parameters and allowed values. All except year and month must be formatted as strings. Year and month must be integers. If you get zero total downloads, you could be using non-existent data values for your settings.

#### Required

*year*, *month* - the year (e.g., 2024) and month (e.g., 4, for April) of interest. The data will be given for that month, with a graph showing daily downloads within that month. Single-digit months must not be preceded with a zero.

#### Optional

If any of the the settings below are not given, then the data for all options will be included.

- *data_source* - the channel for which you want download stats, as a string. Must be one of: "conda-forge", "anaconda", "bioconda", "nvidia", "rapidsai-nightly", "rapidsai", "pyviz", "rdkit", "plotly", "pytorch".

- *pkg_name* - the name of the conda package for which you want download statistics. Examples: "pandas", "numpy", "scikit-learn".

- *pkg_version* - the version of interest. Must be the _full_ version as listed via `conda search`.

- *pkg_platform* - the target platform of interest. Examples: "linux-64", "linux-aarch-64", "win-64".

- *pkg_python* - the python version of interest. Major and minor versions only are handled.

In [1]:
year=2024
month=5
data_source="anaconda"
pkg_name="numpy"
#pkg_version="2.2.2"
#pkg_platform="linux-64"
#pkg_python="3.11"

## Setup

In [7]:
import hvplot.pandas
import intake
# import numpy as np
import pandas as pd

# Get the data
# cat = intake.open_catalog('https://raw.githubusercontent.com/ContinuumIO/anaconda-package-data/master/catalog/anaconda_package_data.yaml')
cat = intake.open_catalog('catalog/anaconda_package_data.yaml')


## Download and process monthly data

In [3]:
df = cat.anaconda_package_data_by_month(year=year, month=month).read()

# Subset for data source and package
if not len(df):
    print("Data not available for given month. Note that data is only updated at the end of each month.")
else:
    numpy_df = df[(df["pkg_name"] == pkg_name) & (df["data_source"] == data_source)].reset_index(drop = True)
    numpy_df['time'] = pd.to_datetime(numpy_df['time']).dt.strftime('%Y-%m')

### Total downloads

In [4]:
numpy_df['counts'].sum()

1717601

### Plot downloads by month

In [5]:
# numpy_df.hvplot('pkg_version','counts')
numpy_df.hvplot.bar(
    x = 'pkg_platform', 
    y = 'counts', 
    by='pkg_python', 
    stacked = True, 
    height=600, 
    line_color = "grey", 
    line_width = 0.1, 
    rot = 90
)

## Download and process yearly data

In [8]:
df_raw = cat.anaconda_package_data_by_year(year=year).read()

FileNotFoundError: anaconda-package-data/conda/hourly/2024/*/2024-*-*.parquet