# Overview

This lesson introduces Python as an environment for reproducible scientific data analysis and programming. The materials are based on the Software Carpentry [Programming with Python lesson](http://swcarpentry.github.io/python-novice-inflammation/).

**At the end of this lesson, you will be able to:**

- Read and write basic Python code;
- Import and export tabular data with Python;
- Subset and filter tabular data;
- Understand different data types and data formats;
- Understand pandas Data Frames and how they help organize tabular data;
- Devise and intepret data processing workflows;
- Automate your data cleaning and analysis with Python;
- Visualize your data using matplotlib and pandas;
- Connect to a SQLite database using Python.

This lesson will introduce Python as a *general purpose programming language.* Python is a great programming language to use for a wide variety of applications, including:

- Natural language processing or text analysis;
- Web development and web publishing;
- Web scraping or other unstructured data mining;
- Image processing;
- Spatial data analysis;
- (And many others.)

## License

As with [the Software Carpentry lesson](http://swcarpentry.github.io/python-novice-inflammation/license/), this lesson is licensed for open use under the [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/).

# Introduction to Python

Python is a general purpose programming language that allows for the rapid development of scientific workflows. Python's main advantages are:

- It is open-source software, supported by the [Python Software Foundation](https://www.python.org/psf/);
- It is available on all platforms, including Windows, Mac OS X, and GNU/Linux;
- It can be used to program any kind of task (it is a *general purpose* language);
- It supports multiple *programming paradigms* (a fancy term computer scientists use to describe the different ways people like to design software);
- **Most importantly, it has a large and diverse community of users who share Python code they've already written to do a wide variety of things.**

## The Python Interpreter

The only language that computers really understand is machine language, or binary: ones and zeros. Anything we tell computers to do has to be translated to binary for computers to execute.

Python is what we call an *interpreted language.* This means that computers can translate Python to machine code as they are reading it. This distinguishes Python from languages like C, C++, or Java, which have to be *compiled* to machine code *before* they are run. The details aren't important to us; **what is important is that we can use Python in two ways:**

- We can use the Python interpreter in **interactive mode;**
- Or, we can use execute Python code that is stored in a text file, called a script.

### Jupyter Notebook

For this lesson, we'll be using the Python interpreter that is embedded in Jupyter Notebook. Jupyter Notebook is a fancy, browser-based environment for **literate programming,** the combination of Python scripts with rich text for telling a story about the task you set out to do with Python. This is a powerful way for collecting the code, the analysis, the context, and the results in a single place.

The Python interpreter we'll interact with in Jupyter Notebook is the same interpreter we could use from the command line. To launch Jupyter Notebook:

- In GNU/Linux or Mac OS X, launch the Terminal and type: `jupyter notebook`; then press ENTER.
- In Windows, launch the Command Prompt and type `jupyter notebook`; then press ENTER.

Let's try out the Python interpreter.

In [1]:
print('Hello, world!')

Hello, world!


Alternatively, we could save that one line of Python code to a text file with a `*.py` extension and then execute that file. We'll see that towards the end of this lesson.

## First Steps with Python

In *interactive mode,* the Python interpreter does three things for us, in order:

1. Reads our input;
2. Evaluates or executes the input command, if it can;
3. Prints the output for us to see, then waits for the next input.

This is called a **read, evaluate, print loop (REPL).** Let's try it out.

In [2]:
2 * 5

10

We can use Python as a fancy calculator, like any programming language.

When we perform calculations with Python, or run any Python statement that produces output, if we don't explicitly save that output somewhere, then we can't access it again. Python prints the output to the screen, but it doesn't keep a record of the output.

**In order to save the output of an arbitrary Python statement, we have to assign that output to a variable. We do this using the equal sign operator:**

In [3]:
number = 2 * 5

Notice there is no output associated with running this command. That's because the output we saw earlier has instead been saved to the *variable* named `number`.

If we want to retrieve this output, we can ask Python for the value associated with the variable named `number`.

In [4]:
number

10

As we saw earlier, we can also use the **function** `print()` to explicitly print the value to the screen.

In [5]:
print(number)

10


## Importing Libraries

What are some tasks you're hoping to complete with Python? Alternatively, what kinds of things have you done in other programming languages?

**When you're thinking of starting a new computer-aided analysis or building a new software tool, there's always the possibility that someone else has created just the piece of software you need to get your job done faster.** Because Python is a popular, general-purpose, and open-source programming language with a long history, there's a wealth of completed software tools out there written in the Python for you to use. Each of these software *libraries* extends the basic functionality of Python to let you do new and better things.

**The Python Package Index (PyPI),** is the place to start when you're looking for a piece of Python software to use. We'll talk about that later.

For now, we'll...