<a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_Getting_Started_Seriously_With_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_Getting_Started_Seriously_With_Python.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

<img src="https://raw.githubusercontent.com/bamacgabhann/GY5021/2024/PD_logo.png" align=center alt="UL Geography logo"/>

# Getting Started Seriously With Python
### Closed and Open Source, Version Control, Virtual Environments, Package Management, and IDEs

If you think that using Python is the right tool for you, or at least that you'd be interested in trying to use it yourself beyond this course, there are some things which would be useful for you to know before you start. Things which I wish someone had told *me* before I started!

You don't need to memorise all this now - or even ever. Just refer back to what you need, when you need it. Even understanding all of this now would be a huge challenge - some of it, you'll not really understand until you've been using things for a while. But if you just blindly follow what's here while you get started, you'll come to understand it, and even if you then change what you're doing, you'll still be better off than if you didn't.

I'm not saying everything here is the right way to do things. There's no _right_ way to do things, but there are _consistent_ ways to do things which will make things a little easier.

## 1. Version Control

Where are you working? Office desktop? Laptop? Maybe you've had an idea at home, and want to update something on a home PC. Or you're at a conference, and working on a laptop. Sure, you can use OneDrive or Dropbox, but admit it - how many times have you ended up with:  

```document.docx  
document2.docx  
document3.docx  
document3.rev1.docx  
document_final.docx  
document_final2.docx
```

There is a better way. 

<img src="https://github.githubassets.com/assets/GitHub-Mark-ea2971cee799.png" style="height:50px" alt="GitHub invertocat logo"> <img src="https://github.githubassets.com/assets/GitHub-Logo-ee398b662d42.png" style="height:30px" alt="GitHub logo">

```Git``` is a cross-platform *version control* manager. By *version control*, what I mean is that it keeps track of changes to a file: letting you see who made changes and when, allowing you to undo changes and go back to previous versions when necessary, giving you the power to control who can make changes, with the option to review changes before merging them into the file.

Github is an online platform based around git, giving you an online remote repository for your work. 

Using Github means that you don't need multiple versions of files anymore. There'll be the remote version of the file in the online repository, and local versions on any computer you or your collaborators are using to work on it. Any edits are *pushed* up to the remote version, where you (or whoever owns it) can review all the suggested changes before incorprating them into the file. 

Github is THE standard for the vast majority of open source projects, not just in Python but also other languages including R, Julia, C, and more. Virtually any open source code you'll ever use will have a Github repository, where you can see all the code, and even contribute to it yourself if you want.

Register for an account at https://github.com/, and for each project, set up a github repository to track your files. I also strongly recommend the github desktop app, if you're on Windows, even if using WSL (Windows Subsystem for Linux - which I also recommend if you're a Linux user).

Git has what it calls an *upstream* repository, i.e. the online platform; and your *local* repository, on your computer. When you make changes to a file on your computer, you can add the changed files to the staging area:

```git add changed_file```

When you're satisfield, you *commit* those changes locally

```git commit -m "this message explains what changes I made in this version, to keep track of what changes were made when"```

And *push* those changes to the remote repository

```git push```

You can also work on more than one computer - your office desktop may be where you started, but if you want to do some work on a laptop, simply *clone* the repository to the other computer:

```git clone https://github.com/username/repositoryname.git```

Then when you go back to your desktop, which now has older versions, you don't need to rename everything ```document.old``` - you just pull the changes from the remote:

```git pull```

So, you've used git, and now you have a nice working version of a project, everything is fine and you don't want to mess with that. But you're still curious about adding a new feature, you'd like to give this idea a try. How should you do that, just work on your desktop repo until you're happy to commit, or is there a better way? Why, yes there is! You simply create a new branch

```git branch new-branch``` 

and check it out

```git checkout new-branch```

Then work away at playing around. If you don't get it working, no harm! Just delete the branch. But if you do, simply push to the remote branch, and now you want to make a *pull request* to your main branch, to merge the changes. 

And this allows you to collaborate with other people as well - because they can do that too. Your research collaborators can *clone* your repository, work away on even the same branch as you, and open a pull request to merge any changes.

Even better, you don't actually have to know the other people to do this. You can clone *any* repository, and do whatever you want in your version. Want to contribute to some open source software? Clone the repo, checkout a new branch, make your changes, and then open a pull request - and see if the owners accept it. 

This Notebook was written on a desktop and two different laptops, all using the same remote github repository.


## 2. Virtual Environments

The Python language itself is fairly small. It really just contains the basic level of instructions to define the basic different kinds of data, how to work with text and do basic maths, and work with protocols like http for using the internet and similar.

What makes Python so very useful isn't just this fundamental core of the language - it's that thousands of people have written packages in Python to do other things - from slightly more specific maths, all the way to advanced machine learning and AI tasks. Most of these are open source, and you can install them and use them in your projects. The official repository for Python packages is [PyPI](https://pypi.org/), the Python Package Index, which currently contains 509,528 projects.

However, this adds a complication. The core Python language - called the Standard Library - is updated annually. We're on version 3.12 now, released last October. Version 3.13 is available as a development version now, while new features are being added, and is scheduled for official release next October. 3.14 will be expected in October 2025. And so on.

However, all the open source packages published by other groups and individuals can't all follow such a strict annual update timeline. Most are written by unpaid volunteer contributors, who will add and update whenever they can find the time, but there's only so many hours in a day, and people have their own lives going on as well. This means that packages are normally updated on a completely different schedule to the language itself, and to each other.

So, sometimes you'll have a project using Package A for Python 3.10 which relies on Package B; and another project using Package C, which relies on an ealier version of Package A for Python 3.6 and hasn't been updated yet. 

To avoid these conflicts, we can set up what are called _Virtual Environments_ for each project. A virtual environment is basically a folder on your computer which contains a specific version of Python, and the specific versions of all the packages you need for that project. When you activate that environment to work on it, you're essentially just telling your computer "use the version of Python, and the versions of the packages, which are in this particular folder". 

Doing this means you can keep your work on different projects separate, and won't get caught out by problems caused by needing different things for different projects.

There are several different tools to create virtual environments in Python, which may depend on which operating system you are using. These also often double as....

## 3. Package Managers

Handling the conflicts between dependencies of various packages could be a nightmare, but fortunately you don't have to go and find the right versions of packages on PyPI yourself. There are a number of tools for package management, which check the compatibility of all the packages you need for a project and automatically install the right versions. These are often the same tools you'll use to create virtual environments, since the two needs overlap significantly.

Which tool is the best for you will depend heavily on your personal preferences, needs, and workflow. Unfortunately, you don't usually know what those will be until you've been using Python for a while, but in order to use Python for a while first you need to have tools for virtual environments and package management. So you need to choose something before you're ready, and of course changing later can be awkward. 

I have actually very recently changed my workflow myself, and I'm still getting used to it. Until recently, I was using conda.

![Conda logo](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Conda_logo.svg/497px-Conda_logo.svg.png)

Conda is a cross-platform all-in-one tool to handle virtual environments and packages. On Windows, you can install [Anaconda Navigator](https://www.anaconda.com/download), and use it to create new environments, install packages in those environments, and open an IDE (see below) - Jupyter, PyCharm, or Spyder. 

You can also use the command line - cmd or Powershell in Windows, or whatever terminal you're using in Linux:

Create a new environment with

```conda create -name my-new-environment```

Activate it with

```conda activate my-new-environment```

and install packages with

```conda install package-name```

There is one quirk to using Anaconda. I said that PyPI is the official repository for Python packages - but conda doesn't use it. Instead, conda has its own repository, which it refers to as the ```conda default``` channel. But the default channel only has selected versions of more mainstream packages. Many of the packages you might need for geospatial analysis aren't there. 

Instead, there is a more open channel to which anyone can publish packages, called ```conda-forge```. Many geospatial python packages can be found on conda-forge. 

When conda works, it's very simple. The problem is that conda-forge has so many packages that working out the dependencies, so that you can install a compatible set of packages for a project, can take a long time. Now, don't get me wrong - for many people, it still works just fine. But, for me, for the particular things I'd be using, this has reached epic nightmare proportions which have caused me to get over my otherwise pathological resistance to change, and trying a completely different workflow.

If you are using Windows, and unless you're very comfortable with using the command line to type commands, I would probably still recommend using Anaconda. You'll need to add conda-forge as a channel, and then re-add the default channel to make sure the default channel has higher priority, otherwise you'll end up with installations taking forever. If you're getting more into it, it will probably be worth installing the mamba package manager as well - but only go there once you're getting quite familiar with everything, don't overcomplicate things too soon.

I will tell you what my current workflow is, but bear in mind that I normally use Linux for everything I do with Python. There are Windows versions for most of what I use (and I found [this blog](https://endjin.com/blog/2023/03/how-to-setup-python-pyenv-poetry-on-windows) which shows how to set up much the same as what I do but on Windows), but I would recommend sticking to Anaconda unless you're comfortable with the command line.

If you are comfortable with the command line, or are very serious about developing coding skills, I would actually recommend skipping the Windows command line and [installing the Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install) - which allows you to run Linux on a Windows computer, at the same time. But this will come with a LOT of things to learn, so again, only if you're really serious about it or already use Linux.

I am now using ```pyenv``` for virtual environments 

[![pyenv logo](https://avatars.githubusercontent.com/u/16530698?s=200&v=4)](https://github.com/pyenv/pyenv)

and ```poetry``` for package management. 

[![poetry logo](https://avatars.githubusercontent.com/u/48722593?s=48&v=4)](https://python-poetry.org/)

After installing these, my workflow now is:

I first used pyenv to install a global python version:

```pyenv global 3.11.6```  

Using that version, I installed Jupyter Lab (we'll get to this in a sec) using pip, the default python package management tool and installer:

```pip install jupyterlab```  

Poetry can manage virtual environments, but I prefer to use pyenv, so I run

```poetry config virtualenvs.create false```

to allow me to use pyenv. Then for each project, I create a new virtual environment with pyenv. 

```pyenv virtualenv 3.11.7 my-new-project```

3.11.7 here is the version of python to use in the project, it can be whatever version you want.

```pyenv activate my-new-project```  

Poetry will respect and use the pyenv environment, so it can handle all the package management for the environment:

```poetry new my-new-project``` 

This will create a folder with the default structure and configuration files needed. Or, if you already have a project, inside the project folder, use:

```poetry init```  

Then to install packages, simply:

```poetry add package1-name package2-name```  
```poetry add --group dev black flake8 ipykernel```  

I add those three in particular - ```black``` for code formatting, ```flake8``` for linting (ie style and syntax), and ```ipykernel``` so I can use just my base instance of Jupyter Lab rather than needing a separate instance for each project. To finish this part, I run:

```python -m ipykernel install --user --name my-new-project```

When I want to work on the project in Jupyter, I can then open the default system version of Jupyter Lab, and it will have an option to select the python version from my project virtual environment to use. This just means I don't have to install Jupyter separately for every project, and can work on multiple projects in one Jupyter instance.

Then I create a new empty github repository for the project, and associate it with my new project folder by running

```git init```  
```git remote add origin https://github.com/username/repository.git```

To back up the directory to the remote repository, the process is add the contents, commit, and then push.

```git add *```
```git commit -m "message explaining what's in the commit"```
```git push```

Now you're set up, and ready to work.

## 4. IDEs

But work in what? We need a program that we can edit python code in - really, that could be any text editor, since we're just editing text files.

But we can do better than Notepad.

Here, by far most python developers will recommend VS Code (https://code.visualstudio.com/), from Microsoft. Some will recommend PyCharm (https://www.jetbrains.com/pycharm/), from JetBrains. I use PyCharm for some work, but often also use the open source IDE Spyder (https://www.spyder-ide.org/). 

These IDEs are what you need when you're writing a full module, a piece of code intended to be run an executed completely.

However, often it will be extremely useful to see the results of individual lines or short segments, and for that, we have Jupyter Notebooks. Like this one!

Jupyter Notebooks run in a browser window. They can include cells of text, and cells of code - and you can run the cells of code individually, with the results displayed inline. 

Installing Jupyter Lab in my base environment, and adding the ipykernel from each project to my base environment, means I can use one Jupyter installation and run notebooks from various projects in it.

If you've installed python on a laptop, you can do this for this module. But there's also several online platforms which let you use Jupyter Notebooks online - Binder, and Google Colaboratory. I've included links so that the notebooks for this workshop can be run in Binder or Colab - so you're probably reading this on one of those.

You can also clone the entire repository, and open them directly in Colab or Binder from your own github account.

## 5. For this module

You can cclick the links to open a Binder instance, where you can run these notebooks in Jupyter Notebooks; or a Colab instance, where you can run the notebooks in Google's version of Jupyter.

If you are comfortable with the command line, and want to work offline on your own computer, you can clone the Github repository. You should be able to use the command line to run ```pyenv virtualenv 3.11.6 GY5021``` then ```poetry install```, or else ```conda create -n GY5021 --file requirements.txt``` depending on your preferences, to install all the packages and libraries to be used, plus their dependencies. If you're not comfortable using the command line, I recommend sticking to Colab or Binder.

___

Week 1 Notebooks: 

1. Geospatial Software and Programming Languages <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_1_Geospatial_Software_and_Programming_Languages.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_1_Geospatial_Software_and_Programming_Languages.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

2. Data Types <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_2_Data_Types.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_2_Data_Types.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

3. Vector Data <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_3_Vector_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_3_Vector_Data.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

4. Attribute Data <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_4_Attribute_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_4_Attribute_Data.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

5. Coordinate Reference Systems <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_5_Coordinate_Reference_Systems.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_5_Coordinate_Reference_Systems.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

6. Geospatial Data Files <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_6_Geospatial_Data_Files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_6_Geospatial_Data_Files.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

7. Vector Geoprocessing <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_7_Vector_Geoprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_7_Vector_Geoprocessing.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

Additional:

- The Python Language <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_The_Python_Language.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_The_Python_Language.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

- Getting Started Seriously With Python <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/1_Introduction_to_Geospatial_Data/GY5021_Getting_Started_Seriously_With_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F1_Introduction_to_Geospatial_Data%2FGY5021_Getting_Started_Seriously_With_Python.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>