# Introduction to Python for working with Digital Collections 
 
This series of notebooks aims to provide an introduction to things that can be achieved fairly easily with Python. The examples are framed around work with digital collections. 

## Aims 
The aims of the notebooks/lesson are primarily:
- to help you decide whether it might be worth your time learning *some* Python
- Give you some ideas for some ways in which you can approach *how* to learn Python for the purpose of working with collections items

## Non aims (is that a thing!?) 
- to teach you Python in a few hours
- teach you to become a software engineer. The way in which you might use Python for tackling practical problems in your work might not always fit with best practices from software engineering. It is still very worthwhile trying to learn these concepts but not at the expense of getting useful things done. 

## What the notebooks cover 

- Why Python? 
- Some Python syntax 

Working with Python and Pandas to do some simple work with the BL newspaper titles list
- How to assess python packages 
- debugging issues
- working in notebooks 

# Why Python?

- There are many programming languages you could potentially choose to learn. Some of the languages are widely used and underpin a broad range of software (and hardware) you use on a day-to-day basis. 
- Other languages are primarily used in one very specific field either because they work particularly well for that task or as a result of legacy systems (you can earn big 💲 working as an IT contractor in banking if you know COBOL, a language first developed in the 60s which still powers a lot of banking systems and few people now know)

There are some reasons why Python could be a good choice as a programming language to learn:

## Syntax 

The syntax of a programming language refers to the rules a language follows to combine symbols and sequences i.e. how you write commands to the computer. Let's look at an example for printing ```Hello, world!```

### In Python: 

```python 
print('Hello, world!')
```

### In C:

```C
#include <stdio.h>

int main(void)
{
    printf("hello, world\n");
}
```

We can see that there is a lot less required as code to get Python to do the same thing. This is not an exclusive property of Python. Some programming languages are 'high level', others are 'low level'. A high level language usually has an easier to understand syntax. In this case 'C' can be considered a lower level language. This often requires you to be more explicit about how you want a machine to do something whilst a higher level language can 

Another toy example of Python syntax 

### Python 

```python

files = ['2019_12_10_visitor_stats.csv', '2019_10_10_visitor_stats.csv', '2020_01_01_visitor_stats.csv','draft_proposal.docx']
for file in files:
    if '2019' in file:
        print(file)
```

With a bit of luck you'll be able to follow what is happening here even if you don't know Python yet. One of the nice things about python is that it often possible to understand what's happening in some Python code even if you couldn't necessarily write the same code yourself. 

#### 🤫 You will read more code than you write 

Your typing speed is unlikely to be the barrier to how quickly you can code (at least for some time). Instead you will often be trying to work out how to do things in Python using a combination of your code and code written by other people. Being able to understand the syntax of code will make it much easier for you to work out what code you can borrow and adapt from other people.

## Adoption 

Python is widely used which means you will find plenty of resources for using it. It also makes it more likely that other people you work with may know some (or a lot) of it too. 

Python is also used widely accross a broad range of domains (research and industry) and for a broad range of applications (web development, data science, networking, databases...) 

## Libraries 
Python has loads of software libraries. These libraries help you do a whole range of things more easily in Python. This ranges from [tools for working with marc](https://pypi.org/project/pymarc/) to [astronomy](https://www.astropy.org/). 

## Why not Python?

There are downsides to Python too. Some reasons you may not want to use Python:
- if you are want to focus on 'front-end' web development you will want to learn javascript and HTML/CSS (this doesn't mean you can't use Python to build websites) 
- if performance/speed is critical for what you are doing, there are many ways to speed up Python code but if absolute performance is critical then you may want to consider another language. This is likely to be an issue in a heritage context (think algorithmic trading)
- your colleagues all use another language (it might be more useful to be able to get help) 
- you want to use other languages (there are plenty of other languages to learn 😀)

# Let's do something!

# Jupyter notebook 
Shift ⇧ and Enter

In [None]:
# our example use case

## Variables

- Variables are the way Python stores values
- Variables are 'assigned' using ```=```
- For examples ```name = "Daniel"``` or ```visitor_num = 100```
- These variables can be many different types of things 

In [1]:
name = "Daniel"

# Exercise 🎓 
Try an assign your name to the ```your_name``` variable 

In [None]:
your_name = "daniel"

### Using our variables
We can now use these variables:

In [None]:
print('My name is ' + name + ' it is nice to meet you ' + your_name)

## Create URL variable 

Now we know how to assign variables we can use that to store the url which points to the dataset we want to download 

In [4]:
url = 'https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244'

Why the quotes? 

In [None]:
url = https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244

### String variables 
This didn't work because we are trying to assign a [string](https://docs.python.org/3/library/stdtypes.html#textseq) variable but we didn't use the correct syntax. 

We tell python that it is a string by using ```' ' ``` or ```" "```.
We'll get back to this in a little more detail below. For now, if we want to store a variable which is text based we should wrap it in single or double quotation marks. 

# 🧙‍♂️ A note on errors and debugging 
Error messages are a standard part of programming. You should treat them as a 'feedback' from the program about how to fix things rather than a big problem. Jupyter notebooks are great for this because you can easily split your code which makes it easier to isolate where things have gone wrong. For this reason I've left error messages in. Sometimes you will work through tutorials where everything works perfectly and as soon as you try and change a single thing everything breaks. It's important to become familiar with working through errors so we'll do this together as part of the lesson. 

### Naming variables 
There are some rules to how we name variables. Some rules you will need to memorize, other times you will get some helpful feedback telling you there is a problem. 

#### Capitalization matters
Lets see what happens if we reuse the url variable above but with capitals?

In [None]:
# Same variable name with capitals
URL = 'google.com'
URL

let's check our previous variable 

In [None]:
url

uh oh! This could get confusing! We should be careful and as a general rule usually in Python variable names are in lower case. 

### Naming variables 
Try and use variable names that are meaningful. You should try and balance being expressive and being concise|

In [None]:
n = 'Daniel'
name = 'Daniel'
first_name = 'Daniel'

All of these will work but the second and third options might be clearer. Try and be helpful to other people reading your code. 

#### Some variables are 'keywords' which are reserved by Python
Some variable names won't work because they are reserved by Python. If we try these names we'll get an error. 

In [None]:
# naming: reserved
class = 'https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244'

Oops we get ```SyntaxError``` We can see a list of these keyword below but don't worry about memorising them. 

In [None]:
import keyword
print(keyword.kwlist)

### Debugging in Jupyter

In the above example you will see that when we tried to give a variable the name ```class``` it was highlighted in dark green. This gives us a clue that this word has a special meaning in Python. 

In [None]:
# See how keywords are highlighted in dark green 
try doggie for cat 

# Comments 
You will already have seen that we can include a comment by using ```#``` This line will be ignored by Python but can be used to communicate what you are trying to do to other people. 

### Printing

Printing can be done with the print function. When you get started with Python you will print things a lot. It can be very useful to print variables as you manipulate them to check how things look. 

You can print a string (or other things) directly or print a variable

In [None]:
print("string")

In [None]:
print(url)

### Notebook specific behaviour 

In [None]:
url

# Functions
Functions perform tasks in a program. We already saw an example with ```print()``` 
Some functions are included within Python. For example ```len()``` 

In [None]:
# Functions on variables 
len(url)

Python functions are identified by ```()``` Often you need to pass one or more 'arguments' to a function (though not always). These tell your function either how you want the function to work or what you want to perform a function on. 

# Methods
Closely related to functions are methods. These look a little bit different and are usually applied to a variable. 

In [5]:
print(url) # function 
url.split('/') # method 

https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244


['https:',
 '',
 'bl.oar.bl.uk',
 'fail_uploads',
 'download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244']

# 🤔 functions or variables? 
We won't worry too much about the nuances of each of these. As you become more familiar with Python you will learn more about how these are different. Let's take a look at some more examples.  

In [6]:
url.upper()

'HTTPS://BL.OAR.BL.UK/FAIL_UPLOADS/DOWNLOAD_FILE?FILESET_ID=67B25F41-A682-4C1F-BF42-550E06B48244'

In [None]:
# hit tab to see available methods 
url.

# Exercise 🎓 
- Create a variable that stores your name in lower case 
- print out your name using this variable 
- now see if you can use a method on your variable to capitalize your name

In [7]:
# your solution here 

# Python Data Types 
Within Python there are different built in types. We won't spend too long on these but it's useful to become a little bit familiar with some of the important points. Let's see what happens if we try and add 1 to our url.

In [9]:
url + 1

TypeError: can only concatenate str (not "int") to str

What do you think went wrong? 

In [10]:
type(url)

str

In [11]:
type(1)

int

As the error message above says, we can't concatenate a string (our url) and an int(1). 

# What types are there? 
We have already seen a few:

- int: a whole number
- str: strings (text) 

We also have 
- float: a number with a decimal 

In [16]:
type(1.0)

float

We can combining our url with 1 again. 

In [19]:
url + "1"

'https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b482441'

This time it works because we are using ```" "``` too indicate that 1 is a string. 

We can also sometimes change between types by using a built in python function 

In [20]:
# changing types 
str(1)

'1'

In [21]:
int('1')

1

This won't always work. 

In [22]:
int('daniel')

ValueError: invalid literal for int() with base 10: 'daniel'

# Making changes to a variable 
We've seen a few examples of manipulating variables. What happens to these changes?

In [24]:
# huh? no change to url 
url 

'https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244'

In [25]:
url.upper()

'HTTPS://BL.OAR.BL.UK/FAIL_UPLOADS/DOWNLOAD_FILE?FILESET_ID=67B25F41-A682-4C1F-BF42-550E06B48244'

In [26]:
url

'https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244'

In these examples we haven't stored our changes anywhere. We either need to create a new variable to store our modified variable or 'reassign' our variable with the changes. 

In [27]:
# Re-assign
new_url = url + "_new"
new_url

'https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244_new'

We still have the old one

In [28]:
url

'https://bl.oar.bl.uk/fail_uploads/download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244'

In [None]:
# Indexing 
url[1]



In [None]:
#huh? indexing from 0 
url[0]

In [None]:
#index range
url[0:5]

In [None]:
#Challenge 
#index into 'bl'


In [None]:
# Reverse index 
url[8:-1]

In [None]:
# function and index 
url.split('/')[-1]

In [None]:
# function and index 
# shouting 
url.split('/')[-1].upper()

In [None]:
# function and index 
# shouting 
url.split('/')[-1].upper()

In [None]:
# functions 
def print_url(url):
    print(url)

In [None]:
print_url(url)

In [29]:
# functions 
def print_dl_from_url(url):
    split_url = url.split('/')
    download_file = split_url[-1]
    print(download_file)

In [30]:
print_dl_from_url(url)

download_file?fileset_id=67b25f41-a682-4c1f-bf42-550e06b48244


In [None]:
download_file = print_dl_from_url(url)

In [None]:
#huh? 
download_file

In [None]:
print(download_file)

In [None]:
# testing in a notebook 
# blah blah 

In [None]:
# refactor 
def print_dl_from_url(url):
    split_url = url.split('/')
    download_file = split_url[-1]
    return download_file

In [None]:
download_file = print_dl_from_url(url)

In [None]:
download_file

In [None]:
# Wahoo 
# warning about order of notebook 