Boulder Valley School District Mentoring Project
This is a reverse chronological TODO list. Please feel free to add notes/comments.
Suchit! Here are two tasks for the next week or so.
1. Test the metadata ingest and note any errors.
Most importantly, I've released a prototype of the metadata ingest system. Can you break it? That is, try cloning the repository
and running through the notebook
on your system, then keeping note of what cells throw error messages. Our next check-in I will ask for a brief summary of the errors you ran into, and how you were able to circumvent them. (For "culture", you might read How to Report Bugs Effectively, Simon Tatham. His standards are much higher than mine.)
Right away, I should note you'll need to install, either with
pip, a few dependencies, including
python-magic, and maybe others. (I'm quite ignorant when it comes to python environments and dependency management: Asking you to install packages "by hand" is not the right way to do things.)
The goals for this task are
- to draft a simple bug report,
- to download a sample set of images,
- to create a
jsonroster of the images in the
- to rename the images according to their
2. Do a quick refresher on how to munge data with pandas.
I've reorganized the notebooks in this repository according to my favorite stable naming convention for literature, which is
because such files can be stored in the same directory without too much namespace conflict, and it makes citing one's work later on a bit easier.
Here are the notebooks we've worked through so far:
2015-dlab-introduction-workshop.ipynb(this version is my working copy)
Can you do a brief review of the operations in
pandas that are displayed in both of these notebooks?
Concurrently, let's take a look at a similar, more recent, tutorial from Chris Fonnesbeck (referenced in the DLab tutorial):
The Lunacek tutorials (e.g., those files
2019-lunacek*.ipynb) have powergrid data from NREL, which might be of interest to you if you are comfortable with the three notebooks above.
The goals for this task are to be proficient at indexing, slicing, reshaping, copying, and otherwise mutating
- numpy arrays and
- pandas DataFrames.
I have a supervision meeting on Thursday 2020-01-09. After it, I will decide whether our next step is to work towards a MySQL injection of image metadata with pandas or towards image file manipulation and classification with numpy/scipy.
The first goal should be to have enough proficiency with pandas to be able to load DataFrames from .json files that are
- lists of dictionaries
where the keys to the dictionaries are unique field names, such as, "document.id_within_archive" and "archive.host_country".
The second goal should be to have enough proficiency with
exiftool to be able to distinguish between
- two non-identical images.
A third goal is to be able to recognize
- two identical images with the same "ImageUniqueID"
where "ImageUniqueID" can be read by
exiftool a la
exiftool -ImageUniqueID image.jpg and is given by 32 character hexadecimal string.
Hi Suchit! Please excuse me for taking a few weeks to follow up since our last video chat. Here is a short list of technical tasks (leading up to an introduction to Numpy in Python) for you to complete in, say, the next 14 days.
setup version control
- Install git for your operating system.
- Apply for a github student developer account (this takes only a day or two).
- Fork this repository to your own github account, then clone your fork of this repository onto your operating system.
setup a scientific computing environment
- Install JupyterLab for your operating system.
- Open the
kuleshovtutorial by running
jupyter-notebookin a terminal in the directory that is your fork of this git repository on your operating system. There's a Graphic User Interface for the Jupyter Notebook in your browser from which you can start the tutorial.
- Work through a few sections of that tutorial (at least until you get to the definition of arrays).
- Please come up with a list of 5 to 10 questions about numpy and python for our next videochat.
ask for help by opening git issues
Thanks! Please let me know if you have any questions about these instructions.
- You can either email me or, preferably,
- open an issue in this git repo.
- familiarize yourself with NCAR https://en.wikipedia.org/wiki/National_Center_for_Atmospheric_Research
- familiarize yourself with CISL https://www2.cisl.ucar.edu/org/about
- watch Philip Brohan's CISL seminar talk https://www.youtube.com/watch?v=O98ha2c4vGs
- watch my SIParCS seminar talk https://www.youtube.com/watch?v=wURNDpfBS4Q