# WORKSHOP 1

> Start by doing what's necessary; then do what's possible; and suddenly you are doing the impossible.  ~ Francis of Assisi (1182 - 1226)

## Tools
In this workshop we're going to learn a little about the tools and language we'll repeatedly access through the summer.  First on to the tool chain we're going to work with :

| Tool | Purpose |
|------|---------|
|Python|This is the language we'll be using this summer.  Let's make sure it is installed and ready to use.  If you went through the [installation](../README.md) tutorial, you should be good to go. |
|PyCharm|This is our editor (one of many you could use).  It is full functioning and pretty friendly once you get the hang of its functions.  PyCharm is excellent for writing longer, more complex code in a project-like environment.  Let's see that everything is OK with the [installation of PyCharm](../README.md). |
|Github|[Github](https://github.com) is an excellent place to share, find and contribute to open source projects of all kinds.  There are millions of repositories representing everything from scientific projects to programming langages, web servers and websites.  Watch out ... you can get lost on there! |
|Jupyter Notebooks|We'll affectionately use NB as a shorthand for Jupyter Notebooks, but this tool is the backbone of data exploration in Python these days.  With support for so many interesting things, you'll soon find out that you can do a lot of great things and share them with others when you're done. Support for NB is OK in PyCharm, but we'll play mostly with it directly in the browser on the server that we'll invoke from within PyCharm.|



## The Python Language
Python is the language of choice these days for many of the data science and analytics tasks we're faced with on a day-to-day basis.  There is a basic assumption that you have had _some_ exposure to at least _one_ programming language, but if you need to brush up on some basic concepts, check out the [resources page](../resources.md) for more places to learn about Python.

To get started, let's review the [Python Primer](primer_python.md) that will go over the **very basics** of the language and get your feet we in what it is all about and what it can do.



## What Python is about ...
First and foremost, Python is about readability.  You will soon find out that what makes Python interesting is _not_ its syntax.  Python is about expressing your ideas in the most direct way you can, and this is often counter to languages you may have worked with before.  If it looks that simple ... in Python, you must accept that truth and move on.  You will soon accept many truths of simplicity that you have been ignoring ... and you just might fall for Python as many, many other have.

## Basic Data Types

### Integers and Floats
As with almost all languages, support for numbers (integers, floats and others) is basic to the operation of the language.  In Python, you'll need to know about **integers**,


In [6]:
# this is an integer
silly_integer = 1

another_silly_integer = 129834

When you need a little more precision, you will want to work with **floating point** numbers.  So if you need a decimal in your number, try


In [11]:
# this is a float
silly_float = 1.01

# if you need a float
another_silly_float = 1.

# and another ...
more_interesting_float = 78.23479478099

print silly_float
print silly_integer
print another_silly_float

1.01
1
1.0


Intuitively, adding, subtracting and doing math operations work as expected :

In [13]:
a = 1
b = 1.25
c = a + b
print c # ==> 2.25 as expected

c = a / b
print c # ==> 0.8 as expected

2.25
0.8


### Strings
Strings are again, another crucial type built in to almost all languages.  In Python, they work as expected:

In [15]:
first_name = "Bob"
last_name = "Jones"

print first_name
print last_name

Bob
Jones


To take two strings and concatenate them (merge 2 or more strings together end-to-end):

In [16]:
fullname = first_name + last_name
print fullname # ==> "BobJones"

fullname = first_name + " " + last_name
print fullname # ==> "Bob Jones"

BobJones
Bob Jones


Later we will learn about some other useful operations on strings.


## Data Structures
As a Pythonista padwan there are three data structures you will need to know well:

* tuples,
* lists, and
* dictionaries.

### Tuples
A tuple is a ordered "list" of items, but unlike a list they can't be changed (also referred to as "immutable").  So it's not "really" a list, but "list" gives you a point of reference.

In [20]:
# this is an empty tuple
tpl = ()

# you can add anything to your tuple; make a new one with what you want
tpl1 = (1, 2, 1)

# you can also make a new tuple from existing tuples
tpl2 = (4, 5)
tpl3 = tpl1 + tpl2 # ==> (1, 2, 1, 4, 5)

print tpl3

(1, 2, 1, 4, 5)


Tuples can contain whatever you'd like: numbers, strings, other objects (lists, dicts, other tuples), etc.  And you can mix and match.


In [25]:
# a tuple of strings
tpl1 = ("you", "are", "here")

tpl2 = (1, "you", (0, 0, 0))

tpl3 = (1, ["fred", "bob"], (3, 2))

### Lists
Lists are fun, they are sort of like tuples, except you can do a lot more with them and you can change them (they are referred to as "mutable").

In [28]:
# this is an empty list
lst = []

# this is a list with a single item
lst = [1]

# this is a list with several items
lst = [1, 2, 3, 4]

# but a list can contain just about anything ... including other lists
lst = [[1, 2, 3],[1],[4, [5, 6], 7]]

### Dictionaries

Dictionaries are even more fun - they contain a _key_ and a _value_ ... and are not unlike a _real_ dictionary except in a Python dictionary you can only have a single unique key, thus _unlike_ a real dictionary, the word "book" will only have a single entry.  Furthermore, the dictionary value can be nearly anything - another dictionary, list, etc.


#### Initialization
There are a variety of ways to make dictionaries:


In [32]:
# this is an empty dictionary
dct = {}

# this is a dictionary with a single key as a string and an integer value
dct = {'book': 0}

# here is another way to make a dictionary with a list of tuples
dct = dict([('book', 0)])

# and a really nice way to make a dictionary
dct = dict(a=1, b=[1, 2], c={'book': 0}, d={'cat': 'dog'})

print dct

{'a': 1, 'c': {'book': 0}, 'b': [1, 2], 'd': {'cat': 'dog'}}


#### Data access
Accessing a dictionary is easy with just referencing the key whose value you'd like to access:


In [34]:
dct = dict(a=1, b=[1, 2], c={'book': 0}, d={'cat': 'dog'})

print dct['d'] # ==> {'cat': 'dog'}
print dct['a'] # ==> 1

# there is also the get() method
print dct.get('a') # ==> 1
print dct.get('p') # ==> None

{'cat': 'dog'}
1
1
None


**A note about get() method:**  One thing I wish I had used more often in the beginning was `dict.get()`.  When accessing a value by it's key, if you try to access a key that does not exist, you will get a `KeyError` ... this is actually a good thing in some contexts, but in others, you may just want to know that the value exists without catching and handling the exception.  `get()` does this beautifully and additionally, if you want a value to be returned other than the default of `None`, then you can set the second parameter to just that value:

In [36]:
dct['p'] # ==> KeyError: 'p'

KeyError: 'p'

In [37]:
dct.get('p', -1) # ==> -1 instead of the default None

-1

## Operating over data structures
I am assuming here (for now), the you have a basic understanding looping over things (at least abstractly).  Let's say, for example, you are interested in adding the numbers in a list of numbers.

Let's try:

In [38]:
# starting with a tuple of numbers
ton = (1, 2, 3, 4, 5)

What you want to do now is just print the numbers in that list:

In [40]:
for number in ton:
    print number

1
2
3
4
5


How about a list:

In [41]:
lon = [1, 2, 3, 4, 5]
for number in lon:
    print number

1
2
3
4
5


dictionaries ... anyone?

In [50]:
dct = {'a': 1, 'b': 2, 'c': 3}
for d in dct:
    print d

a
c
b


These are the **keys**, but what about the **values**?

When using a simple loop over the whole dictionary, you can use the dictionary method `iteritems()` and build the loop like this instead:

In [54]:
for d, val in dct.iteritems():
    print d, val
    
print 
print "Printing the square of the value in the dictionary key:"
for d, val in dct.iteritems():
    print d, val**2

a 1
c 3
b 2

Printing the square of the value in the dictionary key:
a 1
c 9
b 4


### Python String Basics

A _string_ in Python is defined much like that of most other languages.

In [57]:
s = "This is a string"

another_string = "And this is another string"

### Concatening strings

Concatening strings is very easy - just use `+` between the two (or more) strings you want to concatenate.  The result is something like this:

In [58]:
print s + '!'
print s + s
print s + another_string
print s + '! ' + another_string + "."

This is a string!
This is a stringThis is a string
This is a stringAnd this is another string
This is a string! And this is another string.


### Accessing string characters

Strings can be accessed much like lists in the sense that you can obtain a single character by accessing the zero-based index of that character:

In [59]:
print s[0]
print s[1]
print s[2]

T
h
i


### Useful string operations
Sometimes you might want to get the **length of a string**.  Simply use `len` to do that.

Other times you might want to get a subset of the string (substring), say the first few characters.  Try using the `[start_index:end_index]` syntax to do that.  A nice feature of Python is you can use negative indices to work from the _end_ of the string.  Also if you leave the `start_index` empty, it is assumed to be the first character (0th index) and similarly, if you leave the `end_index` empty, it is assumed to be the index of the last character (`len(s) - 1`).

Let's see some examples:

In [60]:
# length
print len(s)
print len(another_string)
print len(s + another_string)

# substring selection
print s[:4]
print s[-4:]

print s[0:10]
print s[-8:-1]

16
26
42
This
ring
This is a 
a strin


### Iterating over a string
Iterating a string operates much like that of a list: `for` is your tool.

In [61]:
for c in s:
    print c

T
h
i
s
 
i
s
 
a
 
s
t
r
i
n
g


Let's just print the `s` characters.

In [62]:
for c in s:
    if c == 's':
        print c

s
s
s


### Finding substrings: The easy way

You won't have to juggle writing your own substring searching functions in Python if you use this one last time saver is for finding a substring can be performed with `in`:

In [63]:
print 'c' in s
print 's' in s
print 'is' in s
print 'is ' in s
print 'ring' in s
print 's is a' in s

False
True
True
True
True
True


You can continue to play ... and learn more about strings, but this should get you started.

## Github in 10 minutes or less
[Github](https://www.github.com) is a system / web application / platform that provides a beautiful way to store, view, manage, share and revise code.  While the primary content on Github is _running_ software, whether that be in C, Java, HTML, Python or the many hundreds of other languages in popular use today, it can also be a place for traditional text files (data, papers) and other binary data (Word/Excel documents, images, etc.)  Github's strength is in the ability for it to facilitate software collaboration, and today it is the premier place on the web for large scale open-source, collaborative software projects.  Though open and public projects are Github's forte, it provides the ability for you to host private projects and allows you to make the decision later whether it is appropriate to make such projects public or not.

### Revision Control in 60 seconds
Github is built on top of what is called **git**, which is a __revision control system__.  The core concept behind such systems is that they manage your text-based code files and allow you to keep track of the changes (revisions) to those files.  In a system like **git** you can invite others to work on your code as well, which can allow more than one person to make contributions to the code.  What is even more useful is that such changes can be tracked across a project so that those contributions and changes can be seen, reviewed, modified and otherwise managed.  Thus, when someone (even you) makes a change to a file (or files) and makes those changes known to the revision control system, others with access to those files can not only see those changes, but integrate them into their own.

Git provides a great deal of functionality which might appear to be complex and overwhelming.  Most of the functionality you will need in the basic daily use case is very narrow, so don't be discouraged by the depth and complexity of git's documentation.

### What's Git all about?
In revision control systems you code is typically stored in a **repository** and that repository most often lives and is managed on a remote server.  This is done for a variety of reasons, one if which is to provide others access to your code.  Git has the advantage of such code and its revisions being controlled locally and only when you're ready can those changes be put onto the remote server.  The relationship between Git and Github is that Github provides a nice visual shell over Git and also serves as the remote git server, so you don't have to think about it when you want to share your code with others (or have a remote copy of it).

You will always need to install git on your local system, and don't ever _have to have_ a remote server, but if that is the case, the code in such a local project will never be seen by, or syncronized with (backed up on) a remote server, thus making it difficult to (though not technically impossible) share with others in the general case.

### Projects <=> Repositorities
In git your code lives in a repository.  This is simply the location / directory for your files and it lives (initially) local to your file system.  You can create a repository anywhere on your file system, but it is often a good practice to keep git repositories in a common location when it makes sense to do so.  Furthermore  there is no limit to the number of repositories or the number of files under the control of a repository.

You can consider a repository as a project containing a single focus of interest.  For example, you might think if your repository as containing a single complete software application you're building.  Similarly, you might think of it as a place for all the files of a single course over a semester, or a single topic of interest.  Your concept of "project" isn't really restricted, though there are some best practices.

In Github, as in git, the anchor of a project is the repository.

## A Primer on Text-based Data Files

For starters, many of the files you'll be working with through the summer (and beyond) are in some text format or another.  The most popular (generally) being CSV or TSV (comma separated, tab separated, respectively).  Netcdf is very common binary format used in the Geosciences, and we'll touch in on them if we have time.  Python makes doing basic operations on text files very straightforward, hence its use as a language of choice text processing language.

We'll get started with a review and shallow dive [into processing text files in Python](primer_files.md).