### Why Python?

The programming language used in this course is Python. Since Python is not yet widely used by many economists, statisticians, and social scientists, this deserves some explanation. The short answer is that Python is now clearly the tool of choice, both in research and production, for machine learning and natural language processing, which forms the core of modern data science and the focus of this handbook. You can see evidence of this in the papers from the top machine learning and language processing conferences like ICLR, NIPS, [ACL](http://aclweb.org/anthology/>), where tools are mentioned the overwhelming choice is Python. You can see this too in the tools being taught at the top universities, from Stanford and MIT to Carnegie Melon and Harvard. Python is not merely a research tool. If you look at the tools released by the top companies in data analysis, including Google, Facebook, Amazon, and Microsoft, all of these are built first and foremost for Python (see [Tensorflow](https://www.tensorflow.org/), [PyTorch](http://pytorch.org/), [MxNET](https://mxnet.apache.org/), [CNTK](https://github.com/Microsoft/CNTK), and many others). 

If you need easy access to advanced statistical tests and easy visualizations, there are still plenty of great reasons to use R, but for machine learning and natural language processing Python is much better, and it's a much better designed programming language too. I think once you take the time to learn it, you will come to the same conclusion that Google, Stanford, MIT, and many others have, for machine learning, natural language processing, and many other techniques of modern data science, Python is the best language currently available. 

For further reading see:
* [Python now the most popular introductory language at top universities](https://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-u-s-universities/fulltext)
* [The Most Popular Language for Machine Learning and Data Science](https://www.kdnuggets.com/2017/01/most-popular-language-machine-learning-data-science.html)
* [The Incredible Growth of Python](https://stackoverflow.blog/2017/09/06/incredible-growth-python/)



### Installing Python

The easiest way to get started with Python is to install the Anaconda distribution from https://www.anaconda.com/download/. I recommend the most recent version of Python (3.6 or higher). Unless  your computer is ancient you will want the 64-bit installer. All of the software is free.

The advantage of installing the Python Anaconda distribution is that, in addition to the Python language interpreter itself, you also get many of the libraries we will use extensively, including pandas, numpy, and scikit-learn. You can install these libraries separately if you prefer, but it is much easier to get everything bundled together.

Once you have downloaded and installed Anaconda, it's time to start your first program. I will describe the process using the Spyder IDE, which is included Anaconda, but more advanced users should feel free to use whatever development environment you prefer. A Python program is ultimately just a text file so any text editor will do.

### Introduction to Spyder

If you're on Windows and you have already installed Anaconda, you should be able to find a shortcut to Spyder under `All Programs >> Anaconda >> Spyder`. When you first launch it, you might get several prompts. You can just click "Ok" and move on. After the prompts, it should look something like this: 

<img src="..\images\spyder_default.png">

On the left we have the **Editor**, which is just a smart text editor where we can create and edit Python programs. To run the Python program defined in the text editor we just click the green triangle at the top.

In the bottom right we have the **Console**. This is where all Python code is executed. You can also write Python code interactively in the console. It is often useful to try out new ideas in the console before putting them in your Python program. 

In the top right we have a sort of **Helper** box. Perhaps it's most useful feature is the Variable explorer, which shows you the variables currently defined in the console.

### First program ###
Let's write our first very simple program. All it does is define two variables, x and y, prints out "Hey, I can add!", and then prints out x + y. After running the program by clicking the green triangle at the top we should see the following:

<img src="..\images\first_spyder_program.png">


All the variables defined by our program now exist in the console, and are also shown in the Variable explorer. We can now examine and interact with these variables in the console. For example, by creating and printing a new variable, z: 

<img src="..\images\spyder_console.png">

Finally, of course, we can save our Python program for another day by clicking `File >> Save as`. You should make a habit of saving all your programming in a file somewhere. This provides documentation of what you have done and often serves as a valuable resource in the future when you find you need to do something similar again.

One last point. Although Spyder comes with built in help tools, often the best resources are online. The standard programming workflow is to have Spyder (or another editor) open in one window and a web browser open to online documentation in one or more additional windows. 

You will find that virtually every Python or Python-library specific question you encounter has already been answered somewhere on the internet. This should be the first place you look whenever you need Python assistance.