Skip to content
Statistics to better understand how python is used and written
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
inspect_api
scripts
tests
.envrc
.gitignore
README.md
shell.nix

README.md

Motivation

This is a package with a goal to provide statistics to better understand how python is used and written.

A package maintainer might ask:

  • Can certain functions be depreciated?
  • How are my users using my package in tests vs. source vs. notebooks?
  • What should I include in tutorials?
  • Are new features being adopted?

Python Core Maintainers might ask:

  • What are the most and least used stdlib modules?
  • Is the community moving away from one module?
  • Lets educate PEPs with actual statistics!

This work exposes a sqlite queryable web api via datasette.

NOTE: this dataset is currently extremely biased as we are parsing the top 4,000 repositories for few scientific libraries in data/whitelist. This is not a representative sample of the python ecosystem nor the entire scientific python ecosystem. Further work is needed to make this dataset less biased.

Interesting Questions

As with any project that provides large datasets interpretation is even more important than the data itself. Here we provide some guiding questions.

Workflow

This is a package with components that expose a sqlite database via datasette. Originally this package provided csv files with api usage statistics for packages. The problem is that this cannot anticipate all the questions that users may have. Thus we have a sql interface to ask custom questions on the (currently) 6 GB database.

The scripts involved in this work.

  1. Assemble list of important repositories/projects that depend on libraries such as numpy, scipy, requests, tensorflow, etc. This work would not be possible without libraries.io scripts/librariesio.sh
  2. Construct database by inspecting source code and ast of every python file and notebook in repositories. scripts/inspect.sh
  3. Expose sqlite database via datasette scripts/serve.sh

Tests

The tests depend on pytest. The tests are a great demostration of what python-api-inspect can capture.

pytest
You can’t perform that action at this time.