# Getting started with Python on your own machine

## Installing Python for offline use

If you wish to have the ability to work on these course materials without an active internet connection, we recommended that you install the full Anaconda Python 3.8 on your own machine, as it sets up your Python environment, together with a bunch of often used packages that you'll use during this course. A guide on installing Anaconda can be found here: https://docs.anaconda.com/anaconda/install/. NB: You don't have to install the optional stuff, such as the PyCharm editor. 

For more installation instructions and options, take a look at: https://github.com/bloemj/2024-coding-the-humanities/blob/master/setup.md. 


If you completed all the steps and you have Python and Jupyter notebooks installed, open this file again as a notebook and continue with the content below. Good luck and have fun! 🎉

To obtain the class materials for offline use, download them from Github. Go to the course repository: https://github.com/bloemj/2024-coding-the-humanities and then click on the big green Code button, and choose Download Zip. Extract this zip file somewhere on your laptop and navigate to that location with Jupyter to open the notebooks.

# Hello World

This notebook contains some code to allow you to check if everything runs as intended.

[Jupyter notebooks](https://jupyter.org) contain cells of Python code, or text written in [markdown](https://www.markdownguide.org/getting-started/). This cell for instance contains text written in markdown syntax. You can edit it by double clicking on it. You can create new cells using the "+" (top right bar), and you can run cells to 'execute' the markdown syntax they contain and see what happens.

The other type of cells contain Python code and need to be executed. You can either do this by clicking on the cell and then on the play button in the top of the window. Or by pressing `shift + ENTER`. Try this with the next cell, and you'll see the result of this first line of Python. 

**For a more extended revision of these materials, see http://www.karsdorp.io/python-course (Chapter 1).**

In [1]:
# It is customary for your first program to print Hello World! This is how you do it in Python.

print("Hello World!")

Hello World!


In [None]:
# You can comment your code using '#'. What you write afterwards won't be interpreted as code.
# This comes in handy if you want to comment on smaller bits of your code. Or if you want to
# add a TODO for yourself to remind you that some code needs to be added or revised.

The code you write is executed from a certain *working directory* on your machine (we will see more when doing input/output). 

You can access your current working directory by using a *package* (bundle of Python code which does something for you) part of the so-called Python standard library: `os` (a package to interact with the operating system).

In [None]:
import os # we first import the package

In [None]:
os.getcwd() # we then can use some of its functionalities. In this case, we get the current working directory (cwd)

## Python versions

![You can also do images in markdown!](https://www.python.org/static/img/python-logo@2x.png)

It is important that you at least run a version of Python that is being supported with security updates. Currently (Spring 2024), this means Python 3.8 or higher. You can see all current versions and their support dates on the [Python website](https://www.python.org/downloads/). Furthermore, every Python version adds, but sometimes also changes functionality, so if you use a different version, you may not always get the same results.

If you recently installed Python on your machine through [Anaconda](https://www.anaconda.com/products/individual#), you're most likely running version 3.9!

This course was mainly designed for version 3.8, but as the differences are minor, both 3.9 and 3.8 can be used.

Let's check the Python version you are using by importing the `sys` package. Try running the next cell and see it's output.

In [2]:
import sys

print(sys.executable)  # the path where the Python executable is located
print(sys.version)  # its version
print(sys.version_info)

/usr/bin/python3
3.7.12 (default, Sep 10 2021, 00:21:48) 
[GCC 7.5.0]
sys.version_info(major=3, minor=7, micro=12, releaselevel='final', serial=0)


You now printed the version of Python that you have installed.

You can also check the version of a package via its property `__version__`. A common package for working with tabular data is `pandas` (more on this package later). You can import the package and make it referencable by another name (a shorthand) by doing:

In [3]:
import pandas as pd  # now 'pd' is the shorthand for the 'pandas' package

NB: Is this raising an error? Look further down for a (possible) explanation!

Now the `pandas` package can be called by typing `pd`. The version number of packages is usually stored in a _magic attribute_ or a _dunder_ (=double underscore) called `__version__`. 

In [4]:
pd.__version__

'1.3.5'

The code above printed something without using the `print()` statement. Let's do the same, but this time by using a `print()` statement. 

In [5]:
print(pd.__version__)

1.3.5


Can you spot the difference? Why do you think this is? What kind of datatype do you think the version number is? And what kind of datatype can be printed on your screen? We'll go over these differences and the involved datatypes during the first lecture and seminar. 

If you want to know more about a (built-in) function of Python, you can check its manual online. The information on the `print()` function can be found in the manual for [built-in functions](https://docs.python.org/3.8/library/functions.html#print)

More on datatypes later on. 

### Exercise
Try printing your own name using the `print()` function. 

In [None]:
# TODO: print your own name


In [None]:
# TODO: print your own name and your age on one line


If all of the above cells were executed without any errors, you're clear to go! 

However, if you did get an error, you should start debugging. Most of the times, the errors returned by Python are quite meaningful. Perhaps you got this message when trying to import the `pandas` package:

```python
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-26-981caee58ba7> in <module>
----> 1 import pandas as pd

ModuleNotFoundError: No module named 'pandas'
``` 

If you go over this error message, you can see:

1. The type of error, in this example `ModuleNotFoundError` with some extra explanation
2. The location in your code where the error occurred or was _raised_, indicated with the ----> arrow

In this case, you do not have this (external) package installed in your Python installation. Have you installed the full Anaconda package? You can resolve this error by installing the package from Python's package index ([PyPI](https://pypi.org/)), which is like a store for Python packages you can use in your code. 

To install the `pandas` package (if missing) in Google Colab, run in a cell:

```python
!pip install pandas
```
The exclamation mark tells the notebook cell that this is not a Python command, but a command for the operating system (a command line script).

If you are working on your own machine with Anaconda, this should work in most cases (it will make sure that it installs into the correct Python kernel if you have multiple):

```python
!{sys.executable} -m pip install pandas
```

Or to update the `pandas` package you already have installed:

```python
!{sys.executable} -m pip install pandas -U
```

Try this in the cell below!



In [None]:
# Try either installing or updating (if there is an update) your pandas package
# your code here


Do note that in Anaconda, it is usually better to install packages directly through Anaconda and not use pip. This is done on the command line (outside of Python):

```
conda install pandas
```

If you face other errors, then Google (or DuckDuckGo etc.) is your friend. You'll see tons of questions on Python related problems on websites such as Stack Overflow. It's tempting to simply copy paste a coding pattern from there into your own code. But if you do, make sure you fully understand what is going on. Also, in assignments in this course, we ask you to:
1. Specify a URL or source of the website/book you got your copied code from
2. Explain in a _short_ text or through comments by line what the copied code is doing

This will be repeated during the lectures.

However, if you're still stuck, you can open a discussion in our [Canvas course](https://canvas.uva.nl/courses/37320/discussion_topics). You're also very much invited to engage in threads on the discussion board of others and help them out. Debugging, solving, and explaining these coding puzzles for sure makes you a better programmer!

# Level of the course
The code below does some basic things using Python. If you think you already master the 'Python basics' shown below, then get into contact with us for some more challenging exercises!

If not, do not worry, we will cover these things in the first two classes. Our course is aimed at beginners.

### Variables and basic operations

In [None]:
a = 2
b = a

In [None]:
# Or, assign two variables at the same time
c, d = 10, 20

In [None]:
c

In [None]:
b += c

In [None]:
# Just typing a variable name in the Python interpreter (= terminal/shell/cell) also returns/prints its value
a

In [None]:
# Now, what's the value of b?
b

In [None]:
# Why the double equals sign? How is this different from the above a = b ? 
a == b

In [None]:
# Because the ≠ sign is hard to find on your keyboard
a != b

In [None]:
s = "Hello World!"

print(s)

In [None]:
s[-1]

In [None]:
s[:5]

In [None]:
s[6:]

In [None]:
s[6:-1]

In [None]:
s

In [None]:
words = ["A", "list", "of", "strings"]
words

In [None]:
letters = list(s) # Names in green are reserved by Python: avoid using them as variable names
letters

If you do have bound a value to a built-in function of Python by accident, you can undo this by restarting your 'kernel' in Jupyter Notebook. Click `Kernel` and then `Restart` in the bar in the top of the screen. You'll make Python loose it's memory of previously declared variables. This also means that you must re-run all cells again if you need the executions and their outcomes.

In [None]:
# Sets are unordered collections of unique elements
unique_letters = set(letters)
unique_letters

In [None]:
# Variables have a certain data type. 
# Python is very flexible with allowing you to assign variables to data as you like
# If you need a certain data type, you need to check it explicitly

type(s)

In [None]:
print("If you forgot the value of variable 'a':", a)
type(a)

In [None]:
type(2.3)

In [None]:
type("Hello")

In [None]:
type(letters)

In [None]:
type(unique_letters)

#### Exercise

1. Create variables of each type: integer, float, text, list, and set. 
2. Try using mathematical operators such as `+ - * / **` on the numerical datatypes (integer and float)
3. Print their value as a string

In [None]:
# Your code here

Hint: You can insert more cells by going to `Insert` and then `Insert Cell Above/Below` in this Jupyter Notebook.