Command Line tool for scraping data from CTSDE's EdSight database
Python Makefile
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.github
ctdata_edsight_scraping_tool
docs
tests
.editorconfig
.gitignore
.travis.yml
AUTHORS.rst
CONTRIBUTING.rst
HISTORY.rst
LICENSE
MANIFEST.in
Makefile
README.rst
requirements_dev.txt
setup.cfg
setup.py
tox.ini

README.rst

CTData EdSight Scraping Tool

Documentation Status Updates

Click based CLI for scraping CTSDE EdSight

  • Free software: GPL v3.0

Features

  • Command line tool for exploring CT SDE's EdSight database
  • Option to download all datasets as a batch operation

Installation

To install, use pip as follows:

pip install git+ssh://git@github.com/CT-Data-Collaborative/ctdata-edsight-cli.git#edsight

or

pip install git+ssh://git@github.com/CT-Data-Collaborative/ctdata-edsight-cli.git#edsight

How to use

There are a few utility commands to help identify which datasets you might want and what variables they make available.

To see a list of datasets, issue the following command: edsight datasets.

To see variables associated with a dataset, issue the following command: edsight info -d [DATASET].

There are a few assumptions made regarding downloading.

  1. You're using this because you want to download a complete dataset.
  2. You might want the data at either the district or school level.

Consequentially, the download commands are fairly minimal in terms of options. All variable combinations for a given geography will be downloaded and file names will reflect the variables contained within. Also, state values are downloaded and included when fetching either the district or school files. This results in some duplication if you want both, but we felt it was more appropriate to always include the state data.

If you want just one dataset, use: edsight fetch.

You'll need to provide the dataset and a target directory for where the data should be saved.

edsight fetch -d 'Chronic Absenteeism' -g District -o ./tmp

will download District-level Chronic Absenteeism and save the files in the /tmp directory. The default is to fetch the district data, so you actually can get away with just

edsight fetch -d 'Chronic Absenteeism' -o ./tmp

and only use the -g flag when you want school data. NOTE: The -g/--geography flag will be deprecated in an upcoming release and will be replaced with a -s/--school flag to simply this specification.

If you want the whole EdSight catalog, use

edsight fetch_catalog -o TARGET_DIR

This will trigger a lengthy download process, so make sure this is what you want to do. Subdirectories will automatically be created for each dataset geography.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.