## Welcome!

Welcome to the 2nd winter [Collaboratory](http://collaboratory.columbia.edu/) **Data Science Bootcamp** for Columbia University PhD students and postdoctoral scholars. 

In this course, we will cover _selected_ basic and advanced data science and machine learning techniques, with a focus on `Python` skills. It includes online learning material, introductory lectures, hands on laboratory experiences, and guided capstone projects.

The Data Science Bootcamp is jointly founded by Columbia University's [Data Science Institute](http://collaboratory.columbia.edu/) and [Columbia Entrepreneurship](https://entrepreneurship.columbia.edu/). The [Collaboratory@Columbia](http://collaboratory.columbia.edu/) is a university-wide program dedicated to supporting collaborative curricula innovations designed to ensure that all Columbia University students receive the education and training that they need to succeed in today’s data rich world.

All of the materials developed in this bootcamp are posted on [GitHub](https://github.com/DS-BootCamp-Collaboratory-Columbia/AY2017-2018-Winter).

## IPython notebooks

In the course, we will work with [IPython](http://ipython.org/) notebooks. This is the first notebook you should run, which checks the computing environment (your Python installations and the versions of your Python pacakges) of your computer.

For additional help please see us during the breaks at the bootcamp so that we can offer support to participants that have issues with setting up their computers. 

If your computer has unresolved issues during lab sessions, please do *NOT* try to resolve it during the lab session. Instead, join a peer first and try to participate in the lab activity together. We can help you with your computer later. 

From the above menu, select `kernel/restart & run all`. 

In [1]:
# Loading required modules
from __future__ import print_function
from distutils.version import LooseVersion as Version
import sys

# Formatting output tags
OK = '\x1b[42m[ OK ]\x1b[0m'
FAIL = "\x1b[41m[FAIL]\x1b[0m"

try:
    import importlib
except ImportError:
    print(FAIL, "Python version 3.4 (or 2.7) is required,"
                " but %s is installed." % sys.version)

In [2]:
# first check the python version
print('Using python in', sys.prefix)
print(sys.version)
pyversion = Version(sys.version)
if pyversion >= "3":
    if pyversion < "3.4":
        print(FAIL, "Python version 3.4 (or 2.7) is required,"
                    " but %s is installed." % sys.version)
elif pyversion >= "2":
    if pyversion < "2.7":
        print(FAIL, "Python version 2.7 is required,"
                    " but %s is installed." % sys.version)
else:
    print(FAIL, "Unknown Python version: %s" % sys.version)

print()

Using python in /Users/tz33/anaconda
2.7.13 |Anaconda custom (x86_64)| (default, Dec 20 2016, 23:05:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]



In [3]:
# 
def import_version(pkg, min_ver, fail_msg=""):
    mod = None
    try:
        mod = importlib.import_module(pkg)
        if pkg in {'PIL'}:
            ver = mod.VERSION
        else:
            ver = mod.__version__
        if Version(ver) < min_ver:
            print(FAIL, "%s version %s or higher required, but %s installed."
                  % (lib, min_ver, ver))
        else:
            print(OK, '%s version %s' % (pkg, ver))
    except ImportError:
        print(FAIL, '%s not installed. %s' % (pkg, fail_msg))
    return mod

In [4]:
requirements = {'numpy': "1.6.1", 'scipy': "0.9", 'matplotlib': "1.0",
                'IPython': "3.0", 'sklearn': "0.18", 'pandas': "0.18",
                'PIL': "1.1.7", 'nltk': "3.1", 'seaborn': "0.8.1", 
                'ipywidgets': "4.1.1"}

# now the dependencies
for lib, required_version in list(requirements.items()):
    import_version(lib, required_version)

[42m[ OK ][0m ipywidgets version 7.0.5
[42m[ OK ][0m scipy version 1.0.0
[42m[ OK ][0m PIL version 1.1.7
[42m[ OK ][0m seaborn version 0.8.1
[42m[ OK ][0m IPython version 4.1.2
[42m[ OK ][0m nltk version 3.1
[42m[ OK ][0m numpy version 1.9.3
[42m[ OK ][0m pandas version 0.22.0
[42m[ OK ][0m matplotlib version 2.1.1
[42m[ OK ][0m sklearn version 0.19.1


If no FAIL is returned, congratulations, you are ready to go. 

If you have some FAILs, do the following according to the items that you received FAIL. 

+ If your python installation returns FAIL, you should install the [Anaconda Python distribution](https://www.anaconda.com/download/?lang=en-us#linuxQ). Download and install the distribution from the link provided. 
+ If some of your Python packages received FAIL, you may need to install and update your Python packages. Follow the [instruction on managing your Python packages](https://conda.io/docs/user-guide/tasks/manage-pkgs.html#installing-packages). 
+ After you are done, from the menu, select `Kernel>Restart & Run all` to rerun this file and make sure no errors are raised.

Please do not forget to prepare before each session by going over the assigned videos and reading material. You should watch all the videos corresponding to each day _before_ attending the session on that day.