datopy

datopy (da-toh-pie) is a Python library for people who work with unstructured data, providing a simple workflow for building data models and ETL (extract, transform, load) pipelines.

This package also includes utilities for:

Data retrieval (web scraping and API-based data retrieval)
Input/output processes (loading and inspecting data)
Jupyter Notebook workflows

Note

This project is under active development.

Getting Started

Installation

To use datopy, first install it using pip:

$ pip install "git+https://github.com/bainmatt/datopy.git#egg=datopy"

Cloning

Step 1. Clone the repo:

$ git clone https://github.com/bainmatt/datopy.git
$ cd datopy

Step 2. Install dependencies:

$ conda env create -f environment.yml
$ conda activate datopy

Development

WIP.

Instructions on typing/testing/documentation, CI/CD, and conventions for developers.

Usage

Dataset inspection

API reference: :mod:`datopy.inspection`

Produce multiple parallel, informative displays of Pandas data frames and NumPy arrays for data exploration and inspection.

>>> import numpy as np
>>> import pandas as pd
>>> from datopy.inspection import display, make_df

>>> df1 = make_df('AB', [1, 2]); df2 = make_df('AB', [3, 4])
>>> display('df1', 'df2', 'pd.concat([df1, df2])', globs=globals(), bold=False)

df1
--- (2, 2) ---
   A   B
1  A1  B1
2  A2  B2


df2
--- (2, 2) ---
   A   B
3  A3  B3
4  A4  B4


pd.concat([df1, df2])
--- (4, 2) ---
   A   B
1  A1  B1
2  A2  B2
3  A3  B3
4  A4  B4

Metadata scraping

API reference: :mod:`datopy._media_scrape`

WIP.

More usage examples to come.

Retrieve media-related data from Spotify, IMDb, and Wikipedia.

Acknowledgements

This package is powered by:

mypy type checking

pytest unit testing

Flake8 linting

Sphinx documentation

numpydoc docstrings

PyData theming

Read the Docs hosting

GitHub Actions continuous integration

PyPI packaging

Pydantic data validation

License

This project is licensed under the MIT License.

Contact

Project Link: https://github.com/bainmatt/datopy

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github		.github
docs		docs
src/datopy		src/datopy
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CHANGELOG.rst		CHANGELOG.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
requirements_docs.txt		requirements_docs.txt
requirements_optional.txt		requirements_optional.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datopy

Getting Started

Installation

Cloning

Development

Usage

Dataset inspection

Metadata scraping

Acknowledgements

License

Contact

About

Releases

Packages

Languages

License

bainmatt/datopy

Folders and files

Latest commit

History

Repository files navigation

datopy

Getting Started

Installation

Cloning

Development

Usage

Dataset inspection

Metadata scraping

Acknowledgements

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages