A Python toolkit for loading, analyzing, and extracting insights from Stack
Overflow's annual developer survey datasets. This package provides both a
command-line interface and a programmatic API for working with survey data.
This package requires Python 3.7 or higher.
pip install so_surveyThe so_survey package provides a command-line interface for common operations:
so_survey catalogso_survey load survey_2022 --head 10so_survey subset survey_2022 --column Language --values "Python" \
--output filtered_data.csvso_survey stats survey_2022 --columns YearsCodePro SalaryThe package is organized into several modules:
- loader.py: Handles loading survey data from CSV files into pandas
DataFrames with appropriate data type inference. - catalog.py: Manages dataset discovery and provides utilities for listing
and accessing available datasets. - subset.py: Offers functionality for filtering and creating subsets of
survey data based on column values or ranges. - stats.py: Implements statistical functions for analyzing survey data
including descriptive statistics. - cli.py: Provides a command-line interface with commands for interacting
with the survey data.
Contributions to the Stack Overflow Survey Analysis Toolkit are welcome!
- Clone the repository:
git clone https://github.com/so-survey/so-survey.git
cd so-survey- Install development dependencies:
pip install -e ".[dev]"-
Run tests:
pytest
-
Lint code:
flake8 . -
Type checking:
mypy .
Before submitting a pull request, please ensure that:
- Your code passes all tests
- Your code passes flake8 linting
- Your code passes mypy type checking
- You've added tests for any new functionality
- You've updated documentation as needed
This project is licensed under the MIT License - see the LICENSE file for details.