Overview and Software Setup
This repository contains teaching resources that we will use over the fellowship. It is supplementary to the DSSG Hitchiker's Guide and heavily sourced from it, which is an invaluable resource for doing any DSSG project. This repository is tailored specifically to the tutorials/classes we will be giving over the 2018 summer fellowship in Lisbon.
- Nuno Brás - lead technical mentor
- Qiwei Han - technical mentor
- William Grimes - junior technical mentor
- Iñigo Martínez de Rituerto de Troya - infrastructure and technical support
- João Fonseca - infrastructure support
Technical mentor’s role:
- Project mentor/consultant on technical side
- Core infrastructure maintenance (data, computing resources)
- Technical training/support
Ask us anything about technical stuff. We will try our best to help you address the difficulties or direct you to the right person whenever necessary.
Local software setup for tutorials and projects
- SSH (PuTTY for Windows)
- Git (for version control)
- pSQL (PostgreSQL command line interface)
- IDE / text editor (Atom, Sublime, Vim, VS studio, PyCharm, Spyder, ...)
- miniconda or pip/virtualenv with Python 3.6
- Python Packages
For more detail on the software setup have a look here.
Try it out!
You should give all installed software a quick spin to check that it did install. For your Python packages, try to import them. Type
Python in your shell, and then once you are in your Python session, try for example
import matplotlib, and so on. (You can quit with exit().) Also try iPython and jupyter notebook in your terminal, and see if you get any errors.
Working in the cloud
Project work over the summer will be done in a cloud computing environment, where each project will have a seperate server (AWS EC2 instance) as their main server for large-scale data processing tasks, and a database to securely store the data. This is advantageous since data is maintained in one place, teams can collaborate easily, and you have access to scalable computing resources.
Good news: DSSG is supported by Amazon Web Service Cloud Credits for Research program and Microsoft Azure for Research awards! Amazon Web Service (AWS) https://aws.amazon.com/
Each fellow will be assigned a user account that allows you to make use of AWS service
- Have all software installed, running, and tested locally
- Have a Github account created
- Join the two DSSG github organisations:
- Try to SSH into the training instance using your saved private key
ssh -i ~/path/to/pemfile.pem email@example.com
This summer we go through a set of modules that will help you starting and/or growing as DataScientists for Social Good. Each session is briefly identified in square brackets in the calendar, like
[All] [TERMINAL] 1 command line basics
All to attend, Module TERMINAL, First lesson, about command line basics
1. Terminal Module
A set of lessons that introduce you and helps you to be productive while working in the terminal. This is specially important when working in virtual machines. The sessions are:
|Command line basics||w1|
|Software versioning with git||w1|
|SSH and the cloud||w1|
2. SQL Module
A set of lessons to make you dominate simple and advanced SQL (PostGres) and also some relevant database concepts.
3. Python Module
A module to make all on the same pace. We work with things like
- Dictionaries and other structures
- Functions, Classes and Objects, numpy, matplotlib
- Python Code best practices
|Python beyond scripting||w2|
4. Data Module
Handling of data using Pandas; feature extraction, transformation, selection;
5. Machine Learning Module
From general introduction to machine learning concepts up to a set of algorithms adapted to your problems.
|Quantitative Social Science||w4|
|ML models 1||w4|
|ML models 2||w6|
6. ETL Module
How to bring data science solutions to production architectures. Workflows and data streamings; Data Warehouses and Data Lakes.
|csvtodb and other simple data handling||w2|
|DAGs and other workflow systems||w5|
|TERMINAL||Command line basics||w1|
|TERMINAL||Software versioning with git||w1|
|TERMINAL||SSH and the cloud||w1|
|PYTHON||Python beyond scripting||w2|
|ETL||csvtodb and other simple data handling||w2|
|ML||Quantitative social science||w4|
|ML||ML models 1||w4|
|ETL||DAGs and other workflow systems||w5|
|ML||ML models 2||w6|
Extra: Special Sessions
- This sessions should be given in order to fulfill lack of knowledge in specific areas that could make a huge difference to some groups;
- They are not compulsory.
- They actually can be given by fellows!
here are some examples: