# Session 1: Getting Started


# Programming and Geography 


## The Rise (Again) of Computational Geography

We live in a world transformed by big (geo)data: from Facebook likes and satellites, to travel cards and drones, the process of collecting and analysing data about the world around us is becoming very, very cheap. Twenty years ago, gathering data about the human and physical environment was expensive, but now a lot of it is generated as the ‘exhaust’ of day-to-day activity: tapping on to the bus or train, taking photos (whether from a satellite, drone, or disposable camera), making phone calls, using our credit cards, and surfing the web. And that's before you start looking at the Terabytes of data being generated by satellites, air quality and river flow sensors, and other Earth Observation Systems! 

As the costs of capturing, curating, and processing these data sets falls, the discipline of geography is changing. You face a world in which many of the defining career options for geographers with basic quantitative skills will either no longer exist, or will have been seriously de-skilled. So much can now be done through a web browser (e.g. [CartoDB](https://carto.com) that specifying ‘Knowledge of ArcGIS’ is becoming superfluous; not because geo-analysis jobs are no longer in demand or no longer done -- in fact, they are more vital than ever -- but because the market for these skills has split in two: expensive, specialist software is being superseded by simple, non-specialist web-based tools on the ‘basic’ side, and by customised code on the ‘advanced’ side. 

## What's the Difference?

### Not Just Quantitative Geography

Computational approaches -- which is to say, approaches to geography using code -- differ in important ways from the quantitative skills commonly taught in ‘methods’ classes: computational geography is underpinned by algorithms that employ concepts such as _iteration_ and _recursion_, and we use these to tackle everything from a data processing problem to an entire research question. For example, Alex Singleton’s OpenAtlas (available for free from the [Consumer Data Research Centre](https://data.cdrc.ac.uk/product/cdrc-2011-census-open-atlas)) contains 134,567 maps. Alex designed and wrote a script to _iterate_ over the Census areas (i.e. to ‘visit’ each area in turn when creating a map), and to _recurse_ into smaller sub-regions from larger regions (i.e. to keep drilling down into smaller and smaller geographies) in order to generate maps at, literally, every conceivable scale. Then he let the computer do the ‘boring bit’ of actually creating each and every map. 

### Thinking Algorithmically

Thinking _algorithmically_ requires students – and professionals – to deal with abstraction: we don’t want to define how each analysis should work, or how each map should look; rather, we want to specify a set of rules about how to select and display data on a map, and then let the computer make them all for us. In this way of working it’s not really any more work to create 500 maps than it is to create 5 because we’ve been able to tell the computer how to make maps in a way that it can ‘understand’ or, more accurately, apply. But learning to think this way is _hard work_: I usually don't actually know exactly what my maps are going to look like until _after_ I've made them. Often, I'll find that the first time around they don't show quite what I want, or that what I thought would be interesting, wasn't. But the difference from the 'normal' way of working is that I make a few tweaks to my code and then just run the code again. And again... as many times as I need to in order to get what I want.

### The Open Source Ethos

And then I can take that code and apply it to a new problem. Or a new case study. I can post it online and let others build off of my work. Giving away my code might seem like a bad idea, but think about this: in a world of exciting research questions, are you going to be able to tackle every one? And your own work _already_ builds off of code that other people gave away... perhaps you should give back to the community? Not just because it's a good thing to do, but because people will learn who you are. They might be in a position to offer you a job, or they might approach you as a collaborator, or they might point someone else with an interesting opportunity in your direction because you have built a reputation as a contributor.

### Further Reading 

A big gap is opening up between the stuff that can be done by pushing buttons (which no longer even really requires geographical training) and the 'cutting edge'. There are many pieces that argue this case, but here are a few to start with:

* [Why the Future of Geography is Cheap](http://www.rgs.org/NR/rdonlyres/9A5CB6C8-CDE5-47AA-9577-0C7FA7765987/0/WhytheFutureofGeographyisCheap.pdf)
* [GIS Jobs of Today](http://www.directionsmag.com/entry/gis-jobs-of-today-should-you-have-programming-skills/473296): should you have programming skills?

# Why Learn to Code?



## Why _you_ should learn to program

There are many good reasons for geographers to learn to code, but let's start with some good _general_ reasons why _you_ should learn to program a computer even if you never use it to make a map or complete a bit of spatial analysis: 

[![Why You Should Learn How to Code](http://img.youtube.com/vi/UD2xoiCGTDo/0.jpg)](https://youtu.be/UD2xoiCGTDo)

And here is a useful perspective on whether or not learning to code is hard:

[![Why You Should Learn How to Code](http://img.youtube.com/vi/k7Txbdvzx90/0.jpg)](https://youtu.be/k7Txbdvzx90)

Perhaps the best point here is that 'making money' is (often) a nice outcome of learning to code, but having a passion for what you want to _do_ with code is what's going to get you through the learning curve. That said, you also need to be realistic: to become a professional programmer is something that happens over many years, you probably won't just take a couple of classes and then go out into the world saying "I'm a programmer."

And, no, you do _not_ need to know advanced maths in order to learn how to code: you need to be able to think logically and to reframe your problems in ways that align _with_ the computer.

## The Benefits of Coding?

In a practical context we think that the benefits of learning to code fall into three categories:
1. **Flexibility**: a computer can often apply the _same_ analytical process to a completely different data set (_e.g._ rainfall in UK vs rainfall in the US) with minimal effort compared to trying to do each step manually in, say, Excel or SPSS. For students it comes down to this: if you discover a newer, better data set half way through your dissertation and want to use this for your analysis instead of the old, inaccurate data, it's a lot easier and faster to update your analyses if you have used code to do the analysis to-date!
2. **Reproducibility**: recently, it's been discovered that a lot of research cannot be reproduced. In other words, if one scientist tries to duplicate what someone else did in order to check something out (as is important in the scientific method) they're finding that the results don't line up. So a second example of why coding your data analysis for a dissertation: you've just finished your analysis when someone points out that you made a mistake with the data right back at the beginning; redoing all of that in Excel or SPSS would be a nightmare, but with code it can be as easy as changing one line and hitting 'Run'!
3. **Scalability**: a computer doesn't care if you throw 10 lines or 10 billion lines at it, the only thing that changes is how long it takes to get an answer. In other words, if your code 'works' on a subset of your data it should also work on your entire data set no matter how big it is. This is also a good way to develop code: rather than try to read in the whole data set in one go while you're still trying to understand it, take a few rows and make sure you're handling _those_ ones correctly (and if what you see squares with what you were told) before expanding to larger and larger subsets.

Often, the payoff for coding the answer to a problem instead of just clicking through the options in SPSS or Arc can seem a long way away. It's like learning a new language: you spend a lot of time asking directions to the train station or whether someone had a nice breakfast before you can start work on the novel or the business case. But the payoff _is_ there if you stick with it!

## The 3 virtues of a programmer

Another useful idea comes from [Larry Wall](https://en.wikipedia.org/wiki/Larry_Wall) (the man with the strong 'tache game below!), who created a programming language called Perl. Larry said that programmers had three virtues: Laziness, Hubris, and Impatience. 

<img src="http://cdn.quotationof.com/images/larry-wall-1.jpg",width="250">

Some of the reasons that these are virtues in programming (but not in your studies!) are as follows:

1. **Laziness** makes you want to put in the effort _now_ to reduce the amount of effort you'll have to put in _later_. So it might take a lot of work to produce a map of _one_ US State automatically using code, but as your _data_ is good then once you've worked out how to do it for one state, you've also figured out how to do it for _all 50_!
2. **Hubris** makes you want to write code that other people won't want to "say bad things about". In the course we'll get into what makes 'good' code in more detail, but the short version is: it's efficient, it's easy to read, and it's clever.
3. **Impatience** is about wanting the answer _now_ and looking for ways to get there as quickly as possible. That actually means that you first look too see if and how other people have solved similar problems before starting work on your own code. Rather than reinventing the wheel, we try to stand on the shoulder of giants.

**_Hint: you'll also see a lot of laziness when you start trying to write code. Programmers don't like writing `remove` when they could just write `rm`, nor do they like writing `define` when they could just write `def`. Keep an eye out for these mnemonics as they can be pretty daunting at first._**

### The 3 false virtues

Larry also pointed out that these virtues had three mirror-image false virtues:

1. **False laziness** happens when you leave something working but half-finished and, most likely, about to break. When you start using [StackOverflow](http://www.stackoverflow.com/) you may find that it makes it easy to copy+paste answers into your notebook and then you can glue it together messily. This isn't the same as _understanding_ and _adapting_ the solution that you found online to _your_ problem, so it's false laziness. To really develop a learning mindset, [don't copy+paste code, type it out](https://medium.freecodecamp.com/the-benefits-of-typing-instead-of-copying-54ed734ad849#.es5mw1j0z). 
2. **False hubris** is thinking that no one else's code is 'good enough' for you. Sometimes copy+paste is false laziness, but refusing to recognise when copy+paste (or importing a library, more on this later) _is_ the right thing to do is false hubris.
3. **False impatience** is getting started on coding your answer to a problem when you don't yet understand what the problem actually _is_. One thing that a lot of programmers do is half-listen to what someone has asked them to do and then go haring off without sitting down to make any kind of plan. It's like writing an essay without having done the readings. Nudge, nudge.

There's a lot more thinking on this here: http://blog.teamtreehouse.com/the-programmers-virtues

## Being a 'good' programmer

The best way to be a 'good' programmer is to know when the computer can help you and when it will just get in the way. A computer cannot 'solve' a problem for you, but it _can_ help you to find the answer when you've told it what to look for and what rules to use in that search. A computer can only do _exactly_ what you tell it to do, so if you don't know what to do then the computer won't either.

One of the founders of computing, [Charles Babbage](https://en.wikiquote.org/wiki/Charles_Babbage#Passages_from_the_Life_of_a_Philosopher_.281864.29) had this to say:

> On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
> _Passages from the Life of a Philosopher (1864), ch. 5 "Difference Engine No. 1"_

Modern programmers call this: garbage in, garbage out. GIGO, for short.

**_The single most important thing that you can learn is how to think abstractly about solving a problem in a way that you can communicate to a computer._** 

What we mean is this: the real power of the computer isn't figuring out how to add `1, 2, 3, 4` together and calculate the mean, it's figuring out how to add _any possible set of numbers_ together and get the computer to work out the mean. That's what we mean about abstraction: it's not solving the problem _once_, it's solving any set of related problems _at the same time_! 

These notebooks will get you started down that path, but remember: you're not stupid if you don't know how to explain things to the computer so that it can help you find the answer. You're still learning the basics of how to communicate with computers; there are two things that _are_ silly: the first is expecting to be able to run before you can walk; the second is copying and pasting answers without trying to understand _why_ they are answers.

# Learning a (New) Language


## Mathematics is a Language, so is Code

There are obviously many ways that you can calculate the mean (also known as the _average_ if your maths is a little rusty): in your head, using pencil and paper, on a calculator, in Excel... and, of course, using code! For a small set of simple numbers, using your brain is going to be a lot faster than typing it into a calculator or computer.

_Quick_, what's the mean of: `1, 2, 3, 4`?

### Now try doing this with code!

In the area immediately below this sentence you should see something like "<span style="color:blue">In [ ]</span>". On the right of this is an empty box into which you can type computer code. Do you remember how to calculate the mean using a set of numbers and a calculator? That's all we're doing now, it's just that we're doing it from a keyboard instead of a keypad. 

Type an 'equation' to calculate the mean of the four numbers above on the empty line right above this sentence and then click the 'play' button on the tool bar at the top of the window to run your first piece of Python code! If everything has gone well then you should see something like "<span style="color:red">Out [ ]</span>" appear.

_HINT: the 'play' button is the sideways-pointing triangle with a bar; it's usually right underneath the 'Cell' menu item. Or, you can also type Ctrl+Return (that's the Control button and the Return button simultaneously) to run the code when you've got your cursor in the code box._

_HINT: Your equation should include the four numbers above with some `+` symbols and `(` and `)`, a `/` symbol, and another number._

Did you get 2.5?

### When computers beat brains (or calculators)

What makes a computer potentially _better_ than a calculator (or your brain) is that a computer isn't daunted by having to count lots of numbers and it doesn't need you to input each number individually! The computer can also do things like: 

- find out the amount of rain that fell in London, Manchester, and Edinburgh yesterday from an online weather service;
- work out the average rainfall for these three cities; and then 
- work out the standard deviation for rainfall.

And it can do all of this in a matter of milliseconds! It can also do the same for 3,000 cities just as easily; sure, it'll take a little bit longer, but it's the _same basic code_. 

In other words code is [scalable](https://en.wikipedia.org/wiki/Scalability) in a way that brains and calculators are not and that is a crucial difference.

Here's a trivial example of when computer start to get better and faster than brains:

In [None]:
(23495.23 + 9238832.657 + 2 + 12921)/4

_Remember_: click in the cell and then hit `Ctrl+Enter` to 'run' the cell and get the answer.

# Programming in Python



## About Jupyter Notebooks

There's no reason you'd know it yet, but the web page you're looking at is also known as a 'notebook' -- it's why you can 'run' code as part of the web page. Check out this example by clicking in the box (next to the <span style="color:blue;font-family:monospace">In [ ]</span>) and hitting the 'run' button or typing Ctrl+Return at the same time.

In [None]:
print('Hello world')

If all has gone well you should have seen `Hello world` appear on a line all on its own. That was code running in a notebook. Because of their history, some people will call these "iPython notebooks", others will came them "Jupyter notebooks", and some will just stick with "notebooks". They are all the same thing. Here's proof that this is actual code:

In [None]:
import sys
print(sys.version)

We are using Python version 3. Some applications still use version 2.7, but official support for version 2.7 [will end](https://pythonclock.org/) on 1 January 2020.

You don't need to understand all of that output, the point is that this is Python and we can do anything in a notebook that we would in a program.

However, rather than throw you in at the deep end with examples taken from computer science classes, for Code Camp we've tried to give you _geographical_ examples whenever possible in the hopes that the early examples will seem a _little_ less abstract and a _little_ more relevant to _your_ needs. Of course, the early examples are also very basic so the payoff might not be obvious right away, but trust us: if you stick with it you will start to change your thinking about geography as a discipline and about the power of computers to transform _everything_.

## Computer Languages

In these notebooks we will be using the Python programming language. As with human languages, there are _many_ [programming languages](http://www.computerhope.com/jargon/p/proglang.htm) in the world, each with their own advantages and disadvantages, and each with their own vocabulary (allowed words) and grammar (syntax). We use Python. 

### Python vs R

Alongside Python, the other language that is often mentioned by people doing data-led research is [R](https://www.r-project.org). It's the _other_ one that many of your lecturers and a lot of other scientists use in a lot of their work.  There's [a great deal of debate about the relative merits of Python and R](https://www.quora.com/Which-is-better-for-data-analysis-R-or-Python), but for our purposes _both_ Python and R can help us to undertake geographical analysis. That is, in fact, the premise of this entire course! 

So why have we chosen to use Python here? Of the two languages, we think that Python has some specific advantages:

1. It was designed for teaching, so its syntax is easier for a human to 'parse' than R's
2. It is more _like_ other languages, so it's more readily transferrable if you need to learn another language. Think of it as learning Italian, which also makes it easier to learn Spanish and French.
3. It is the one most-used as part of a _geographical workflow_ – what we mean by this is that you can find Python buried inside of ESRI's ArcGIS and the open-source QGIS applications, and it also sits behind (or talks to) a number of other tools that allow us to work flexibly and scalably with geo-data. 
4. Is easier to operationalise – Python offers more services/tools to enable you to turn something from a 'hack' ([which doesn't mean what you think it means](https://en.wikipedia.org/wiki/Hacks_at_the_Massachusetts_Institute_of_Technology) into a 'service'.

However, if you have been told R is the way to go then don't worry, the concepts covered here still translate. And many of the contributors to these notebooks use both languages... it just depends on the problem.

### Python what?
Python was invented by [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum) in the late 1980s and he continues in the role of 'benevolent dictator' to this day, which means that he (and some other very smart people) try to ensure that the language continues to meet the basic goals of:
* Being very easy to read (syntax)
* Using plain-English for many functions and operators (allowed words)
* Has a comprehensive style guide: [PEP8](https://www.python.org/dev/peps/pep-0008/) (syntax)
* Has no unnecessary special formatting characters (syntax _and_ allowed words)

So while Python is not language that enables the computer to make calculations the fastest (C and C++ are faster), nor is it the safest (you wouldn't use it to fly a rocket to Mars), it _is_ a very readable, learnable and maintainable language.

So if you want to learn to code, to do 'data science', or build a business, Python is a great choice.

The points above are also made in [Python In A Nutshell](http://mbrochh.github.io/python-101/#/6/1) by [Martin Brochhaus](https://github.com/mbrochh) which you may find interesting and useful to accompany your learning of Python.

#### Three takes on Python
The images below are links to three videos pitched in quite different ways at the advantages of Python, all of which touch on issues we'll be dealing with later... so watch the videos (even if they're a bit silly in places)!

[![IMAGE ALT TEXT HERE](http://img.youtube.com/vi/aXKVOLwpDg8/0.jpg)](http://www.youtube.com/watch?v=aXKVOLwpDg8)

[![IMAGE ALT TEXT HERE](http://img.youtube.com/vi/Hn4FbT4wMms/0.jpg)](http://www.youtube.com/watch?v=Hn4FbT4wMms)

[![IMAGE ALT TEXT HERE](http://img.youtube.com/vi/G8brQdClo9s/0.jpg)](http://www.youtube.com/watch?v=G8brQdClo9s)

# Thinking Like a Computer



## What _is_ a computer?

At its most basic, a computer is a programmable device for performing calculations.

_This_ is a kind of computer.
<img src="https://kingsgeocomputation.files.wordpress.com/2016/08/oldschool.png",width=300>

As is _this_.
<img src="https://kingsgeocomputation.files.wordpress.com/2016/08/modern.png",width=300>

If you've never really got to grips with what is happening inside a computer, then this TedED video would be a good way to get started because it helps to explain the basics of things like I/O and what actually happens when you click with the mouse on a button. In fact, you will see that we've used code to import the YouTube video in a way that requires me to do very little work and this is one of the strengths of programming: that someone else created the code to embed a YouTube video into an iPython notebooks (which is what this web page is) and all I need to do is know how to ask that code to find the video on the YouTube web site. Everything else happens automatically.

### What's Going on Inside Your Computer?

Let's find out through some videos – we've tried to pick ones that encompass a range of styles and levels, so we hope you'll find something that 'speaks to you' in here. If not, well Google and YouTube are your friends: we won't pretend to have all the answers and you might find by searching something that is right on your level. [This first video](http://www.youtube.com/watch?v=AkFi90lZmXA) is about what's going on inside your computer.

[![What's Going On in There?](http://img.youtube.com/vi/AkFi90lZmXA/0.jpg)](http://www.youtube.com/watch?v=AkFi90lZmXA)

### How a Computer Adds Numbers

[This next video](http://www.youtube.com/watch?v=VBDoT8o4q00) is a little more technical and we don't really expect you to remember it, but it touches on a lot of really important concepts: binary numbers, Boolean logic, and how these basic building blocks are assembled into much more complex processes like adding numbers or, ultimately, manipulating data.

[![IMAGE ALT TEXT HERE](http://img.youtube.com/vi/VBDoT8o4q00/0.jpg)](http://www.youtube.com/watch?v=VBDoT8o4q00)

The really important thing to get from this last video is that computers are chaining together long sets of simple operations which always basically work out to 1 or 0, which is the same as True or False. This is [Boolean logic](http://computer.howstuffworks.com/boolean.htm) and we're going to be doing a lot more with it later in this set of sessions, but you should always keep in mind that a huge set of calculations are going on in your computer in an order specified by a set of _rules_: do 'A', then do 'B', then... When these rules become sufficiently complex they are called algorithms. And when they get so complicated that they are not easy to write down as a set of logical outputs, it's often easier to express in a more human-readable form... which is why we have [programming languages](http://www.computerhope.com/jargon/p/proglang.htm).

But remember: finding the average of a set of numbers involves an algorithm (which in-turn in a digital computer is based on lots of logical operations involving 1s and 0s). And calculating the probability that the lecturer won't show up to the first lecture also involves an algorithm, it's just that it's a much more complicated one unless you take matters into your own hands and arrange for an accident...

## Computers: good or bad?

What are computers good at?
- Doing the same thing over and over
- Doing _exactly_ what they are told to do

What are computers currently still bad at?
- Generating knowledge
- Being creative

There is a long-standing contest, called the Turing Test in honour of [the famous computer pioneer](https://en.wikipedia.org/wiki/Alan_Turing), that demonstrates this difference rather nicely: a computer passes the Turing test if it can fool a person into thinking that they're talking to another person. Some people have claimed that if a computer can _really_ pass the Turing Test by keeping up a conversation of indefinite length on any range of topics then we'll have to declare that machines have become full AIs (Artificial Intelligences). To put it another way: if it sounds like a human and responds like a human... then is it a human?

Perhaps fortunately for us, although computers are getting a lot better at holding up their end of the conversation they still seem to have a hard time fooling anyone for very long. In contrast, bigger and better computers have now beat the best humans at Chess and Go, and are being used to help us understand earthquakes and climate change on a huge scale. Here, computers can do billions -- or trillions -- of calculations a second to work out that if 'A' happens then 'B' is the next most likely thing to happen, and so on and so on.

The difference is that games like Go and Chess have well-understood rules as (ultimately) do natural processes like climate change and earthquakes. Chess is 'easier' for a computer than Go because a big enough computer can work out every possible chess move and pick the best one, whereas it can't do that for Go and so has to make 'choices' based on incomplete information. Earthquakes have even more 'rules', but as far we know they still follow _some_ set of rules dictated by physics and chemistry. 

People, however, don't use the same unchanging rules in conversation. Yes, conversations have norms, unless you're using an online comment forum where it's normal to start a conversation by asking someone if they're an idiot, but people don't just 'play games' within the rules, they actually play with the rules themselves in a way that computers find very, very hard to follow. Think of sarcasm: you say one thing but it means exactly the opposite. And if it's delivered deadpan then sometimes even people have trouble knowing if you're being sincere!

That's why AI of the sort you might have seen in _2001_ or _Blade Runner_ has been twenty years away for the last sixty years! Recently, computers have been getting better and better at doing really difficult things, but it's usually still in a narrow area where we understand the rules and we normally need to spend a lot of time training the computer. 

#### More About the Turing Test
Turing, A (1950), _Computing Machinery and Intelligence, Mind_ LIX (236): 433–460
doi: [10.1093/mind/LIX.236.433](http://dx.doi.org/10.1093/mind/LIX.236.433), ISSN 0026-4423

## Further Reading:

- A **must** read: [The Hard Way is Easier](http://learnpythonthehardway.org/book/intro.html)
- Two easy and accessible videos to start wrapping your head around programming (although they are not Python-centric) [1](https://www.youtube.com/watch?v=qUVWM2Q4vAU) and [2](https://www.youtube.com/watch?v=AImF__7FyzM)

## The Answer

There was only one coding question in this notebook so there is only one answer. Try the following code:

In [None]:
print( (1 + 2 + 3 + 4)/4 )

You should see the answer `2.5`.


### Credits!

#### Contributors:
The following individuals have contributed to these teaching materials: James Millington (https://github.com/jamesdamillington), Jon Reades (https://github.com/jreades), Michele Ferretti (https://github.com/miccferr)

#### Licence
The content and structure of this teaching project itself is licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/) and the contributing source code is licensed under [The MIT License](https://opensource.org/licenses/mit-license.php).

#### Acknowledgements:
Supported by the [Royal Geographical Society](https://www.rgs.org/HomePage.htm) (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.

#### Potential Dependencies:
This notebook may depend on the following libraries: None
