# Let's Get Started

**Objectives**: Today we are going to get our development enviroments and access environments set up and introduce you to the various tools and resources we will be using in class. This will include:
  
* Setting up your environment with Anacoda which is the free Python distribution for data science we will be using
* Reviewing how to install these Jupyter notebooks on the PC your team will be using to complete your assignments using GitHub,
* Setting up any accounts you will need to access cloud based analytic platforms or data sources like Twitter, Amazon AWS, or Google BigQuery
* Starting with our first data analysis of the classic "iris dataset"

When you first start the class you will most likely be viewing a static version of this page on GitHub. Once you follow the directions below, you will have Python and our required libraries running on your computer.  You can download the "live" versions of this page which will allow you to run code and complete your assignment for the week which we detail below.
      

## Setting Up Your Environment

To complete the exercises we have planned, you need to install the programming language Python and a variety of scientific [packages](http://docs.continuum.io/anaconda/pkg-docs "packages") and analytics realted packages. This section will walk you through the install.

1. Download and install Anaconda for your operating system: https://www.continuum.io/downloads

2. During install, make sure to select the "Add Anaconda to my PATH environment variable" option

3. Open a command prompt or terminal (Google how to do so if you are unsure)

4. Run the command conda --version in the command prompt or terminal and ensure you can see a version. If not, something went wrong during install and you will need to repeat step 1.

5. Run the following command based on your Operating System:
  1. Windows: conda install --channel https://conda.anaconda.org/ProfessorEaston tweepy
  2. Mac: conda install --channel https://conda.anaconda.org/zed tweepy
  3. Linux: conda install --channel https://conda.anaconda.org/ActivisionGameScience tweepy
  
6. Check that everything Python installed correctly by entering "ipython qtconsole" at the command line.

If you installed Python correctly, you should see an image like this:

<img src="https://raw.githubusercontent.com/azbones/big_data/master/images/week1-qtconsole.png">

Check that everything else installed correctly by cutting and pasting the code from the code block below into your IPython console and hitting return.

If all your packages were installed correctly, you should see **"Good to go on (your computer's name)!"**

In [None]:
import pkg_resources
from distutils.version import StrictVersion
import socket

def check_version_number(module, minimum):
    try:
        module_version = pkg_resources.get_distribution(module).version
    except Exception:
        print "You are missing the {} module.".format(module)
        return False

    if StrictVersion(module_version) < StrictVersion(minimum):
        print "Your version of {0} is too old at {1}! Need at least version {2}...".format(module, module_version, minimum)
        return False

    return True

success = check_version_number('pandas', '0.14.0')
success = check_version_number('boto', '2.29.1') and success
success = check_version_number('tweepy', '3.4.0') and success

if success:
    print 'Good to go on {}!'.format(socket.gethostname())
else:
    print 'Validation failed. You have missing or outdated modules. Please go through the install procedures again.'

## Jupyter Notebooks

**Getting the Notebooks**

We are using Jupyter notebooks (http://jupyter.org/) to facilitate the technical portion of our class. Jupyter notebooks also us to have an web-based, interactive Python environment were we can run code, present visualizations, and provide explantory text. We developed these notebooks using the version control system Git (https://git-scm.com/) and store them centrally at GitHub (http://github.com). The notebooks are publicly **viewable** in our repository at https://github.com/azbones/big_data.  In order to run our sample code, build your own code for the assignments, answer questions in the notebook, and ultimately turn in your assignment, you will need to download the notebooks to your computer and install them. There are two ways to do this:

* **Easy Way**: click on the "download zip" button, download the zip file, and then unzip in the directory where you will be doing your work.

* **More Elite Way**: install Git on your computer and clone the GitHub directory to the directory where you will be doing your work. A review of Git is beyond the scope of this class, but you can learn more about it and GitHub here- https://help.github.com/categories/bootcamp/ By the way, as we consider out notebooks open source, you can actually contribute to making them better by starting a pull request which we will evaluate and perhaps include in the source code going forward. Learn about pull requests here- https://help.github.com/articles/creating-a-pull-request/,

**Running Python in the Notebooks**

Once you have the repository from GitHub on your local computer, open your console and navigate to that directory which should be named "big_data". Next, enter the following command:

````
ipython notebook
````

If everything works as intended, this command should start a local webserver and then direct your default web browser to the root Jupyter directory. Now, you should be able to navigate to "week_1" and launch the Jupyter notebook called "Week 1- Getting Ready- Environments, Python, Jupyter, pandas, and Credentials". 

While the web page you launch from your local machine will look identical to the one in GitHub, it is fundementally different. Now instead of a static web page, you will have an interactive page with all the power of Python!

To test your notebook, select the code block above so that there is a back outline around it and then select "Cell" and "Run" from the Jupyter menu.  You should see the output of this code right below the code block.

Now try the same with the code block below:
      

In [None]:
import this

In the last code block, you imported a library which then printed some style hints for programming in Python. In the next code block, replace "string" with "Hello World!" and run that cell. You can also run code by using:

* <code>Alt-Enter</code> runs the current cell and inserts a new one below.
* <code>Ctrl-Enter</code> run the current cell and enters command mode.

In [None]:
print "string"

As you can see, any code block is a live, interactive version of Python. More specifically, each notebook page is an instance of Python. So, if you save a variable in one code block, it is available in any other code block on the page. Try the following code to see this in action.

In [6]:
# This is an example of assigning a variable in one code block
# Note that running this code will not produce any output
my_var = 3.141592654

In [7]:
# Here in a different code block, if the variable was assigned, 
# it is available in the python "namespace" in this notebook.
print my_var

3.141592654


Wondering what a "namespace" is? A simple description of namespace, is the system that Python uses to keep track of the different objects you build up in your program. Objects can include variables like <code>my_var</code> or the various modules (librairies of code you import to use in your program). IPython provides a nice function for seeing what is in the namespace of your notebook or console (if you are using Qtconsole). Run the IPython-specific command <code>whos</code> in the code block below.  

In [13]:
# whos magic function
whos

Variable               Type        Data/Info
--------------------------------------------
StrictVersion          classobj    distutils.version.StrictVersion
check_version_number   function    <function check_version_n<...>er at 0x0000000003D5C0B8>
my_var                 float       3.141592654
pkg_resources          module      <module 'pkg_resources' f<...>g_resources\__init__.py'>
socket                 module      <module 'socket' from 'C:<...>Anaconda\lib\socket.pyc'>
success                bool        True
this                   module      <module 'this' from 'C:\U<...>n\Anaconda\lib\this.pyc'>


<code>whos</code> is technically a "magic" function that makes a command line function available in an interactive Python session. If we were using the base Python language and ran a script with <code>whos</code> as a single line of code, the script would fail as it would think it was an unassigned variable. There are a variety of other useful magics in IPython including all the of standard linux-like file and directory operations including:

* <code>ls</code> -list information about the files in the current directory
* <code>pwd</code> -print the working directory
* <code>cd</code> -change directory

Try running some of these in the code block below. 

In [19]:
# Enter and run some file and directory commands here.
# You can always change and rerun code in any block.
ls

There are many resources on the Internet that will provide you with more information about Jupyter notebooks and how to use them. The official docs are excellent and a great place to start at  http://ipython.readthedocs.org/en/stable/interactive/tutorial.html