## Why SciPy?

The NumPy and SciPy libraries make up the core of the Scientific Python ecosystem.
The SciPy software library implements a set of functions for processing scientific data, such as statistics, signal processing, image processing, and function optimization.
SciPy is built on top of NumPy, the Python numerical array computation library.
Building on NumPy and SciPy, an entire ecosystem of apps and libraries has grown dramatically over the past few years, spanning a broad spectrum of disciplines that includes astronomy, biology, meteorology and climate science, and materials science, among others.

This growth shows no sign of abating.
In 2014, Thomas Robitaille and Chris Beaumont
[documented](https://nbviewer.jupyter.org/github/ChrisBeaumont/adass_proceedings/blob/master/Mining%20acknowledgments%20in%20ADS.ipynb)
Python's growing use in astronomy. Here's what we found when we [updated](https://gist.github.com/jni/3339985a016572f178d3c2f18e27ec0d) their
plot in the second half of 2016:

![Python in astronomy](../figures/python-in-astronomy.png)

It is clear that SciPy and related libraries will be driving much of scientific data analysis for years to come.

As another example, the
[Software Carpentry organization](http://software-carpentry.org/), which
teaches computational skills to scientists, most often using Python, currently
cannot keep up with demand.

### What is the SciPy Ecosystem?

> "SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering."
>
> — http://www.scipy.org/

The SciPy ecosystem is a loosely defined collection of Python packages.
In Elegant SciPy we will meet many of its main players:

* **NumPy** is the foundation of scientific computing in Python. It
provides efficient numeric arrays and  wide support for numerical computation, including linear algebra, random numbers, and Fourier transforms.
NumPy's killer feature is its "N-dimensional arrays", or `ndarray`.
These data structures store numeric values efficiently and define a grid in any number of dimensions (more about this later).
http://www.numpy.org/
* **SciPy**, the library,
is a collection of efficient numerical algorithms for domains such as signal processing, integration, optimization, and statistics.
These are wrapped in user-friendly interfaces.
http://www.scipy.org/scipylib/index.html
* **Matplotlib**
is a powerful package for plotting in two dimensions (and basic 3D). It draws its name from its Matlab-inspired syntax.
http://matplotlib.org/
* **IPython**
is an interactive interface for Python, which allows you to quickly interact with your data and test ideas.
https://ipython.org/
* The **Jupyter**
notebook runs in your browser and allows the construction of rich documents that combine code, text, mathematical expressions, and interactive widgets[^literate_computing].
In fact, to produce this book, the text is converted to Jupyter notebooks and executed (that way, we know that all the examples execute correctly).
Jupyter started out as an IPython extension, but now supports multiple languages, including Cython, Julia, R, Octave, Bash, Perl and Ruby.
http://jupyter.org
* **pandas**
provides fast, columnar data structures in an easy-to-use package.
It is particularly suited to working with labelled data sets such as tables or relational databases, and for managing time series data and sliding windows.
Pandas also has some handy data analysis tools for data parsing and cleaning, aggregation, and plotting.
http://pandas.pydata.org/
* **scikit-learn**
provides a unified interface to machine learning algorithms.
http://scikit-learn.org/
* **scikit-image**
provides image analysis tools that integrate cleanly with the rest of the SciPy ecosystem.
http://scikit-image.org/

**literate_computing**: "Literate computing" and computational reproducibility: IPython in the age of data-driven journalism, http://blog.fperez.org/2013/04/literate-computing-and-computational.html

There are many other Python packages that form part of the SciPy ecosystem, and we will see some of them too.
Although this book will focus on NumPy and SciPy,
it is the many surrounding packages that make Python a powerhouse for
scientific computing.

## SciPy Ecosystem and Community

SciPy is a major library with a lot of functionality.
Together with NumPy, it is one of Python's killer apps.
It has launched a vast number of related libraries that build on this functionality, many of which you'll encounter throughout this book.

The creators of these libraries, and many of their users, gather at many events and conferences around the world.
These include the yearly SciPy conference in Austin (USA), EuroSciPy, SciPy India, PyData and others.
We highly recommend attending one of these, and meeting the authors of the best scientific software in the Python world.
If you can't get there, or simply want a taste of these conferences,
many publish their talks online (see, e.g., [https://www.youtube.com/user/EnthoughtMedia/playlists](https://www.youtube.com/user/EnthoughtMedia/playlists)).

### Free and open-source software (FOSS)

The SciPy community embraces open source software development.
The source code for nearly all SciPy libraries is freely available to read, edit and reuse by anyone.

If you want others to use your code, one of the best ways to achieve this is to make it free and open.
If you use closed source software, but it doesn't do exactly what you want to achieve, you're out of luck.
You can email the developer and ask them to add a new feature (this often doesn't work!), or write new software yourself.
If the code is open source, you can easily add or modify its functionality using the skills you learn from this book.

Similarly, if you find a bug in a piece of software, having access to the source code can make things a lot easier for both the user and the developer.
Even if you don't quite understand the code, you can usually get a lot further along in diagnosing the problem, and help the developer with fixing it.
It is usually a learning experience for everyone!

#### Open Source, Open Science

In scientific programming, all of the above scenarios are extremely common and important: scientific software often builds on previous work, or modifies it in interesting ways.
And, because of the pace of scientific publishing and progress, much code is not thoroughly tested before release, resulting in minor or major bugs.

Another great reason for making code open source is to promote reproducible research.
Many of us have had the experience of reading a really cool paper, and then downloading the code to try it out on our own data.
Only we find that the executable isn't compiled for our system. Or we can't work out how to run it.
Or it has bugs, missing features, or produces unexpected results.
By making scientific software open source, we not only increase the quality of that software, but we make it possible to see exactly how the science was done.
What assumptions were made, and even hard-coded?
Open source helps to solve many of these issues.
It also enables other scientists to build on the code of their peers, fostering new collaborations and speeding up scientific progress.

#### Open Source Licenses

If you want others to use your code, then you *must* license it.
If you don't license your code, it is closed by default.
Even if you publish your code (for example by placing it in a public GitHub repository), without a software license, no one is allowed to use, edit, or redistribute your code.

When choosing among the many license options, you must first decide what you want to allow people to do with your code.
Do you want people to be able to sell your code for profit?
Or sell software that uses your code?
Or do you want to restrict your code to be used only in free software?

There are two broad categories of FOSS license:

* Permissive
* Copy-left

A permissive license means that you are giving anyone the write to use, edit, and redistribute your code in any way that they like.
This includes using your code as part of commercial software.
Some popular choices in this category include the MIT and BSD licenses.
The SciPy community has adopted the New BSD License (also called "Modified BSD" or "3-clause BSD").
Using such a license means receiving many code contributions from a wide array of people, including many in industry and start-ups.

Copy-left licenses also allow others use, edit, and redistribute your code.
These licenses, however, also prescribe that derived code must be distributed under a copy-left license.
In this way, copy-left licenses restrict what users can do with the code.

The most popular copy-left license is the GNU Public License, or GPL.
The main disadvantage to using a copy-left license is that you are often putting your code off-limits to any potential users or contributors from the private sector.
And this could include your future self!
This can substantially reduce your user base and thus the success of your software.
In science, this could mean fewer citations.

For more help choosing a license, see the [Choose a License website](http://choosealicense.com/).
For licensing in a scientific context, we recommend this blog post by Jake VanderPlas, Director of Research in the Physical Sciences at the University of Washington, and all around SciPy superstar:
http://www.astrobetter.com/the-whys-and-hows-of-licensing-scientific-code/.
In fact we quote Jake here, to drive home the key points of software licensing:

> ...if you only take three pieces of information away from the article, let
> them be these:
>
> 1. Always license your code.  Unlicensed code is closed code, so any open license is better than none (but see #2).
> 2. Always use a GPL-compatible license. GPL-compatible licenses ensure broad compatibility for your code, and include GPL, new BSD, MIT, and others (but see #3).
> 3. Always use a permissive, BSD-style license. A permissive license such as new BSD or MIT is preferable to a copyleft license such as GPL or LGPL.
>
> — Jake VanderPlas, *[The Whys and Hows of Licensing Scientific
> Code](http://www.astrobetter.com/the-whys-and-hows-of-licensing-scientific-code/)*

All the code in this book is available under the 3-clause BSD license.
Where we have sourced code snippets from other people, the code was generally under a permissive open license of some variety (although not necessarily BSD).

For your own code, we recommend that you follow the practices of your
community. In Scientific Python, this means 3-clause BSD, while the R language
community, for example, has adopted the GPL license.

### GitHub: Taking Coding Social

We've talked a little about releasing your source code under an open source license.
This will hopefully result in huge numbers of people downloading your code, using it, fixing bugs and adding new features.
Where will you put your code so people can find it?
How will those bug fixes and features get back into your code?
How will you keep track of all the issues and changes?
You can imagine how this could get out of control quite quickly.

Enter GitHub.

GitHub (https://github.com/) is a website for hosting, sharing and developing code.
It is based on the Git version control software (http://git-scm.com/).
There are some great resources to learn to use GitHub, such as [Introducing GitHub](http://shop.oreilly.com/product/0636920033059.do) by Peter Bell and Brent Beer.
The vast majority of projects in the SciPy ecosystem are hosted on GitHub, so it is certainly worth learning to use it!

GitHub has had a massive effect on open source contributions.
It did this by allowing users to publish code and collaborate for free.
Anyone can come along and create a copy (called a *fork*) of the code and edit it to their heart's content.
They can eventually contribute those changes back into the original by creating a *pull request*.
There are some nice features like managing issues and change requests, as well as who can directly edit your code.
You can even keep track of edits, contributors and other fun stats.
There are a whole bunch of other great GitHub features, but we will leave many of them for you to discover and some for you to read in later chapters.
In essence, GitHub has democratized software development. It has substantially reduced the barrier to entry.

![The impact of GitHub (Used with permission of the author, Jake VanderPlas)](https://jakevdp.github.io/figures/author_count.png)

### Make your Mark on the SciPy Ecosystem

As you gain more experience with SciPy and start using it for your research, you may find that a particular package is lacking a feature you need, or you think that you can do something more efficiently, or perhaps find a bug.
When you reach this point, it's time to start contributing to the SciPy Ecosystem.

We strongly encourage you to try doing this.
The community lives because people are willing to share their code and improve existing code.
And, if we each contribute a little bit, together we build a lot.
But, beyond any altruistic reasons for contributing, there are some very practical personal benefits.
By engaging with the community you will become a better coder.
Any code you contribute will be reviewed by others and you will receive feedback.
As a side effect, you will learn how to use Git and GitHub, which are very useful tools for maintaining and sharing your own code.
You may even find that interacting with the SciPy community provides you with a broader scientific network, and surprising career opportunities.

We want you to think about being more than just a SciPy user.
You are joining a community, and your work will make it a better place for all scientific coders.

### A Touch of Whimsy with your Py

In case you were worried that the SciPy community might be an imposing place for the newcomer, remember that it is made of people like you, scientists, usually with a great sense of humor.

In the land of Python, it is inevitable that you find some Monty Python references.
The package Airspeed Velocity (http://spacetelescope.github.io/asv/using.html) measures your software's speed (more on this later), and references the line, "what is the airspeed velocity of an unladen swallow?" from *Monty Python and the Holy Grail*.

Another amusingly titled package is "Sux", which allows you to use Python 2 packages from Python 3.
This is a play on "six", which lets you use Python 3 syntax in Python 2, with a New Zealand accent.
Sux syntax makes it less frustrating to use Python 2-only packages after you've moved to Python 3:

```
import sux
p = sux.to_use('my_py2_package')
```


In general, Python library names can be a riot, and we hope you'll enjoy your time coming up with some!

## Getting Help

Our first step when we get stuck is to Google the task that we are trying to achieve,
or the error message that we got.
This generally leads us to [Stack Overflow](http://stackoverflow.com/),
an excellent question and answer site for programming.
If you don't find what you're looking for immediately, try generalizing your search terms to find someone who is having a similar problem.

Sometimes, you might actually be the first person to have this specific question (this is particularly likely when you are using a brand new package), but not all is lost!
As mentioned above, the SciPy community is a friendly bunch, and can be found scattered around various parts of the interwebs.
Your next point of call is to Google "`<library name> mailing list`", and find
an email list to ask for help.
Library authors and power users read these regularly, and are very welcoming to newcomers.
Note that it is common etiquette to *subscribe* to the list before posting.
If you don't, it usually means someone will have to manually check that your email isn't spam before allowing it to be posted to the list.
It may seem annoying to join yet another mailing list, but we highly recommend it: it is a great place to learn!

## Installing Python

Throughout this book we’re going to assume that you have Python 3.6 (or later) and have all the required SciPy packages installed.
We list all of the requirements and the versions we used in the **environment.yml** file packaged with the data for this book.
The easiest way to get all of these components is to install conda, a tool for managing python environments (http://conda.pydata.org/miniconda.html).
You can then pass that environment.yml to conda to install the right versions of everything in one go.

```
conda env create --name elegant-scipy -f path/to/environment.yml
source activate elegant-scipy
```


See the the book [GitHub repository](https://github.com/elegant-scipy/elegant-scipy) for more details.

### Accessing the book materials

All of the code and data shown in this book are available on our GitHub repository: [https://github.com/elegant-scipy/elegant-scipy](https://github.com/elegant-scipy/elegant-scipy).
In the README file in that repository, you will find instructions to build Jupyter notebooks from the markdown source files, which you can then run interactively, using the data included in the repo.

## Diving in

We've brought together some of the most elegant code offered up by the SciPy community.
Along the way we are going to explore some real-world scientific problems that SciPy solves.
This book is also a glimpse into a welcoming, collaborative scientific coding community that wants you to join in.

Welcome to Elegant SciPy.