# Overview

This lesson introduces Python as an environment for data analysis and visualization. The materials are based on the Data Carpentry [Python for Ecologists lesson](http://www.datacarpentry.org/python-ecology-lesson/). However, the lesson focuses on general analysis and visualization of tabular data and is not specific to ecologists or ecological data. As Data Carpentry explains:

> Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain.

**At the end of this lesson, you will be able to:**

- Read and write basic Python code;
- Import and export tabular data with Python;
- Subset and filter tabular data;
- Understand different data types and data formats;
- Understand pandas Data Frames and how they help organize tabular data;
- Devise and intepret data processing workflows;
- Automate your data cleaning and analysis with Python;
- Visualize your data using matplotlib and pandas;
- Connect to a SQLite database using Python.

This lesson will not prepare you to use Python as a general purpose programming language; there are some parts of the language we won't have time to cover. However, at the end of this lesson, you will have a good grasp of Python syntax and be well-prepared to learn the rest of the language, if you desire to do so. **Even without seeing all of the Python programming language, you will be prepared to analyze and visualize data in Python using pandas and matplotlib.**

## License

As with [the Data Carpentry ecology lesson](http://www.datacarpentry.org/python-ecology-lesson/license/), this lesson is licensed for open use under the [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/).

# Introduction to the Python Programming Language

Python is a general purpose programming language that allows for the rapid development of scientific workflows. Python's main advantages are:

- It is open-source software, supported by the [Python Software Foundation](https://www.python.org/psf/);
- It is available on all platforms, including Windows, Mac OS X, and GNU/Linux;
- It can be used to program any kind of task (it is a *general purpose* language);
- It supports multiple *programming paradigms* (a fancy term computer scientists use to describe the different ways people like to design software);
- **Most importantly, it has a large and diverse community of users who share Python code they've already written to do a wide variety of things.**

## The Python Interpreter

The only language that computers really understand is machine language, or binary: ones and zeros. Anything we tell computers to do has to be translated to binary for computers to execute.

Python is what we call an *interpreted language.* This means that computers can translate Python to machine code as they are reading it. This distinguishes Python from languages like C, C++, or Java, which have to be *compiled* to machine code *before* they are run. The details aren't important to us; **what is important is that we can use Python in two ways:**

- We can use the Python interpreter in **interactive mode;**
- Or, we can use execute Python code that is stored in a text file, called a script.

Let's try out the Python interpreter.

In [1]:
print('Hello, world!')

Hello, world!


Alternatively, we could save that one line of Python code to a text file with a `*.py` file extension and then execute that file.

## Python Data Types

### Strings, Integers, and Floats

In [4]:
text = 'Data Carpentry' # A character string
number = 42 # An integer number
pi = 3.14159265 # A floating-point number or "float"

Here, we've assigned data to **variables** using the **assignment operator** or equal sign. **The process of assignment takes a value and stores it under a name that we make up.** This way, we can use that stored value again by calling its name.

In [5]:
number

42

Note that to recover a variable's stored value, we simply type the name of the variable and hit `Enter`. (This only works in interactive mode; if we wrote a script and want it to print out a value, we have to use the `print()` function.)

Variable names can only include letters, the underscore, and numbers. However, variable names cannot start with numbers.

In [7]:
my_variable = 'some text'

### Operators

We can perform mathematical calculations in Python using the basic operators `+`, `-`, `/`, `*`, and `%`.

In [8]:
2 + 2

4

In [9]:
6 * 7

42

In [12]:
5 ** 2

25

In [22]:
13 % 5 # "13 modulo 5" -- The result is the remainder, 3

3

We can also use **comparison** and **logical operators.** These operators return Boolean values; that is, they determine or describe whether something is `True` or `False`.

In [23]:
3 > 4

False

In [26]:
5 == 5

True

In [24]:
True and True # Is it True *and* True?

True

In [25]:
True or False # Chooses the "truth-y" value between the two

True

`True` and `False`, with the first letter capitalized, are special values in Python that mean just what they say.

## Sequences

Much of Python's expressive power and flexibility comes from the way it handles **sequences.** A sequence could be a sequence of characters in a text string or a sequence of numbers.

A **list** is Python's built-in data structure for handling general, ordered sequences. Each element can be accessed by its index. **Note that, in Python, we start counting from zero, not from one.**

In [27]:
numbers = [1, 2, 3]
numbers[0]

1

The square brackets are used to **slice** a sequence by one or more indices. Above, we have asked for the first (the zeroth) element of the `numbers` sequence.

A `for` loop is a useful way of accessing the elements of a sequence one at a time:

In [28]:
for number in numbers:
    print(number)

1
2
3


**Indentation is very important in Python.** Note that the second line in the above example is indented. This is Python's way of marking a block of code. It's standard to indent by 4 spaces.

To add elements to the end of a list, we can use the `append()` method:`

In [29]:
numbers.append(4)
print(numbers)

[1, 2, 3, 4]


Note that there is no output associated with the `append()` method; the `numbers` sequence is modified in place so we don't need to assign the result to a variable.

**Methods are a way to interact with an object in Python.** We can invoke a method using the dot, followed by the method name and a list of arguments in parentheses. To find out what methods are available for an object, we can use the built-in `help()` function.

In [30]:
help(numbers)

Help on list object:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /

We can also access a list of methods using `dir`.

In [31]:
dir(numbers)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

### Tuples

A tuple is similar to a list in that it's an ordered sequence of elements. However, tuples can not be changed once created; they are "immutable." Tuples are created by placing comma-separated values inside parentheses.

In [33]:
a_tuple = (1,2,3)
another_tuple = ('rabbit', 'mongoose', 'platypus')
still_a_tuple = (1,)

# Note that lists use square brackets
a_list = [1,2,3]

### Challenge: Tuples and Lists

1. What happens when you type `a_tuple[2] = 5` versus `a_list[1] = 5`? And why?
2. Type `type(a_tuple)` into Python; what is the object's type?