# Package usage

This notebook aims to collect the number of dependent packages (within npm) to mainlines and variants, and also the number of dependent projects (from GitHub repositories) for them. As such, it requires `variants.csv.gz`, `dependencies.csv.gz` and `repo_deps.csv.gz`.

In [1]:
import pandas
import matplotlib

%matplotlib inline

In [2]:
df_variants = pandas.read_csv('../data/variants.csv.gz')
df_variants

Unnamed: 0,mainline,mainline_repo,mainline_repoid,variant,variant_repo,variant_repoid
0,wheat,creationix/wheat,162291,11zwheat,sun11/wheat,49882
1,wheat,creationix/wheat,162291,barley,frodare/barley,124697
2,keypair,juliangruber/keypair,110982,akeypair,quartzjer/akeypair,86500
3,keypair,juliangruber/keypair,110982,jh-keypair,johnhaley81/keypair,805497
4,sasl-digest-md5,jaredhanson/js-sasl-digest-md5,149511,alt-sasl-digest-md5,legastero/js-sasl-digest-md5,86665
...,...,...,...,...,...,...
12808,dot-values,bajankristof/dot-values,34049409,dot-values2,bluelovers/dot-values,41256794
12809,kompression,tuananh/kompression,30312975,@nivinjoseph/kompression,nivinjoseph/kompression,41256967
12810,contentful-typescript-codegen,intercom/contentful-typescript-codegen,39168489,@zeusdeux/contentful-typescript-codegen,zeusdeux/contentful-typescript-codegen,41257476
12811,prometheus-gc-stats,SimenB/node-prometheus-gc-stats,13589391,prometheus-gc-stats2,acifani/node-prometheus-gc-stats,41257504


In [3]:
df_dependencies = pandas.read_csv('../data/dependencies.csv.gz')
df_dependencies

Unnamed: 0,source,version,kind,target,constraint
0,0815,0.1.0,runtime,cli-color,>= 0.2.1
1,0815,0.1.0,runtime,mu2,>= 0.5.17
2,0815,0.1.1,runtime,cli-color,>= 0.2.1
3,0815,0.1.1,runtime,mu2,>= 0.5.17
4,0815,0.1.2,runtime,cli-color,>= 0.2.1
...,...,...,...,...,...
9785760,webpack-bundle-size-limit-plugin,0.0.6,Development,eslint-config-google,^0.14.0
9785761,ynk,0.0.1,Development,benchmark,^2.1.4
9785762,ss_react_ts_ui,1.0.2,Development,source-map-loader,^0.2.4
9785763,ss_react_ts_ui,1.0.3,Development,source-map-loader,^0.2.4


In [4]:
df_repodeps = pandas.read_csv('../data/repo_deps.csv.gz')
df_repodeps

Unnamed: 0,host,repository,repoid,kind,target,constraint
0,GitHub,brianmhunt/knockout-modal,1,development,gulp,^3.8.8
1,GitHub,brianmhunt/knockout-modal,1,development,gulp-autoprefixer,^1.0.0
2,GitHub,brianmhunt/knockout-modal,1,development,gulp-bump,^0.1.11
3,GitHub,brianmhunt/knockout-modal,1,development,gulp-connect,^2.0.6
4,GitHub,brianmhunt/knockout-modal,1,development,gulp-filter,^1.0.2
...,...,...,...,...,...,...
10395553,GitHub,tOke3i/alumniSPBU,20842227,development,gulp-cssnano,^2.1.2
10395554,GitHub,tOke3i/alumniSPBU,20842227,development,gulp-image,2.7.2
10395555,GitHub,tOke3i/alumniSPBU,20842227,development,gulp-minify,0.0.14
10395556,GitHub,tOke3i/alumniSPBU,20842227,development,gulp-plumber,1.1.0


### Counting the dependents

In [20]:
dependents = pandas.merge(
    left=(
        df_dependencies
        [['source', 'target']]
        .drop_duplicates()
        .groupby('target', sort=False)
        .count()
    ),
    right=(
        df_repodeps
        [['repoid', 'target']]
        .drop_duplicates()
        .groupby('target', sort=False)
        .count()
    ),
    how='outer',
    on='target',
).rename(columns={
    'source': 'packages', 'repoid': 'projects',
})

In [22]:
dependents

Unnamed: 0_level_0,packages,projects
target,Unnamed: 1_level_1,Unnamed: 2_level_1
cli-color,1998,2228
mu2,118,159
openid,44,173
deep-eql,283,855
mongoskin,178,1044
...,...,...
react-native-localsearch,0,1
@angular-redux/router,0,1
@angular-redux/core,0,1
rx-http,0,1


We restrict these data to the variants under consideration. 

In [24]:
variants = df_variants['mainline'].append(df_variants['variant']).drop_duplicates()

In [40]:
df_dependents = (
    dependents
    .reindex(variants)
    .fillna(0)
    .astype(int)
).rename_axis(index='variant')

In [41]:
df_dependents.to_csv('../data/dependents.csv.gz', compression='gzip', index=True)