# Session 3:
## Strings and APIs
Andreas Bjerre-Nielsen

## Agenda

1. [Strings: manipulation, combination etc.](#Strings)
2. [Containers - key based](#Containers---key-based)
3. [Interacting with the web](#Interacting-with-the-web)
4. [Loading and saving files](#Loading-and-saving-files)

# Strings


## Strings recap

*What are strings? What do they consist of?*

- Strings are sequences of characters
    - Characters can be whitespace.
- Python has two formats
    - `ascii` stings contain only English characters
    - `utf` strings contain also other European and Asian characters

## String concatenation

*How can I combine strings?*

Strings can be added together:

In [11]:
s1 = 'police'
s2 = 'officer'
s1 + ' ' + s2

s = '\n'

print(s.join([s1, s2, 'arrests']))

police
officer
arrests


## String changing case

*Can I alter the sentence-case of strings?*

- Yes using the string methods `upper`, `lower`, `capitalize`. Example:

In [5]:
s1.upper()

'POLICE'

## Substrings (1)
*How can I check if a substring is contained in the string?*

- in/not in

In [13]:
'pol' not in s1

False

## Substrings (2)
*How can I replace a specific substring?*

- replace

In [15]:
s1.replace('po', 'ma')

'malice'

## Substrings (3)
*Can I also access a string via indices? (in the sequence of characters)*

- sequence form - slicing/indexing

In [19]:
s1[2:5]

'lic'

## Strings quiz
*Which Python object do strings remind you of?*

- Lists work like strings.
  - Concatention (`+`, `*`) works the same way.
  - We check if element/character is contained with `in`.
  - We can slice and use indices for.

## More about strings 

There are many things about strings which we have not covered:

- Methods for splitting or combining strings etc.
- [String formatting](http://www.python-course.eu/python3_formatted_output.php) is exceptionally useful, e.g for making URLs, printing etc. 


# Containers - key based

## Containers recap

*What are containers? Which have we seen?*

-

## Dictionaries (1)

*How can we make a container which is accessed by arbitrary keys?*

By using a dictionary, `dict`. Try executing the code below:

In [25]:
my_dict = {'Andreas': 'Economist',
           'Snorre': ['BSc sociology', 'MSc Socioligy'],
           'Ulf': 'Engineer'}

print(my_dict['Snorre'][1])

MSc Socioligy


## Dictionaries (2)

Dictionaries can also be constructed from two associated lists. These are tied together with the `zip` function. Try the following code:

In [32]:
?zip

In [33]:
keys = ['a', 'b', 'c']
values = [1, 2]
key_value_pairs = list(zip(keys, values, ))

my_dict2 = dict(key_value_pairs)
my_dict2['a']

1

## Storing containers

*Does there exist a file format for easy storage of containers?*

Yes, the JSON file format.

- Is at the base a list or a dict.
- Looks like one: 
    - `'{"a":1,"b":1}'`

## Storing containers (2)

*Why is JSON so useful?*

- Extreme flexibility:
    - Can hold any list or dictionary of any depth which contains only float, int, str.
- Standard format that looks exactly like Python.

# Interacting with the web

## The web protocol
*What is `http` and where is it used?*

- `http` stands for HyperText Transfer Protocol.
- `http` is good for transmitting the data when a webpage is visited:
   - the visiting client sends request for URL or object;
   - the server returns relevant data if active.

## The web protocol (2)
*Should we care about `http`?*

- In this course we don't care explicitly about `http`. 
- We use a Python module called `requests` as a `http` interface.
- However... Some useful advice - you should **always**:
  - use the encrypted version, `https`;
  - use authenticated connection, i.e. private login, whenever possible.

## Markup language (1)
*What is `html` and where is it used?*

- HyperText Markup Lanugage
- `html` is a language for communicating how a webpage looks like and behaves.
  - That is, `html` contains: content, design, available actions.

## Markup language (2)
*Should we care about `html`?*

- Yes, `html` is often where the interesting data can be found.
- Sometimes, we are lucky, and instead of `html` we get a JSON in return. 
- Getting data from `html` will the topic of the subsequent scraping sessions.

## Web APIs (1)
*So when do we get lucky, i.e. when is `html` not important?*

- When we get an  Application Protocol Interface, i.e. `API`
- What does this mean?
  - We send an API query 
  - We get an API response with data back in return, typically as JSON.

## Web APIs (2)
*So is data free? As in free lunch?*

- Most commercial APIs require authentication and have limited free usage
  - e.g. Google Maps, various weather services
- If no authentication is required the API may be delimited.
  - This means only a certain number of requests can be handled per second or per hour from a given IP address.

## Web APIs (3)
*So how do make the URLs?*

- An `API` query is a URL consisting of:
  - Server URL, e.g. `https://api.github.com`
  - Endpoint path, `/users/abjer/repos`
  - Query parameters, 

## Web APIs in Python (1)
*How do make a simple query?*

In [72]:
server_url = 'https://api.github.com/'
endpoint_path = 'users/abjer/repos'
url = server_url + endpoint_path
print(url)

https://api.github.com/users/abjer/repos


## Web APIs in Python (2)
*How can we send a query with the `requests` module?*

In [73]:
import requests # import the module

response = requests.get(url) # submit query with `get` and save response 
response.ok


True

## Web APIs in Python (3)
*How do extract something from the response?*

In [74]:
print(len(response.text))
print(response.text[:500])

33035
[{"id":111244798,"node_id":"MDEwOlJlcG9zaXRvcnkxMTEyNDQ3OTg=","name":"abjer.github.io","full_name":"abjer/abjer.github.io","owner":{"login":"abjer","id":6363844,"node_id":"MDQ6VXNlcjYzNjM4NDQ=","avatar_url":"https://avatars3.githubusercontent.com/u/6363844?v=4","gravatar_id":"","url":"https://api.github.com/users/abjer","html_url":"https://github.com/abjer","followers_url":"https://api.github.com/users/abjer/followers","following_url":"https://api.github.com/users/abjer/following{/other_user}","


## Web APIs in Python (4)
*Can we get something more meaningful or structured?*

In [75]:
response_json = response.json()
response_json

[{'archive_url': 'https://api.github.com/repos/abjer/abjer.github.io/{archive_format}{/ref}',
  'archived': False,
  'assignees_url': 'https://api.github.com/repos/abjer/abjer.github.io/assignees{/user}',
  'blobs_url': 'https://api.github.com/repos/abjer/abjer.github.io/git/blobs{/sha}',
  'branches_url': 'https://api.github.com/repos/abjer/abjer.github.io/branches{/branch}',
  'clone_url': 'https://github.com/abjer/abjer.github.io.git',
  'collaborators_url': 'https://api.github.com/repos/abjer/abjer.github.io/collaborators{/collaborator}',
  'comments_url': 'https://api.github.com/repos/abjer/abjer.github.io/comments{/number}',
  'commits_url': 'https://api.github.com/repos/abjer/abjer.github.io/commits{/sha}',
  'compare_url': 'https://api.github.com/repos/abjer/abjer.github.io/compare/{base}...{head}',
  'contents_url': 'https://api.github.com/repos/abjer/abjer.github.io/contents/{+path}',
  'contributors_url': 'https://api.github.com/repos/abjer/abjer.github.io/contributors',
  '

## Web APIs in Python (5)
*And how can we see it even more clearly?*

In [76]:
import pprint
pprint.pprint(response.json())

[{'archive_url': 'https://api.github.com/repos/abjer/abjer.github.io/{archive_format}{/ref}',
  'archived': False,
  'assignees_url': 'https://api.github.com/repos/abjer/abjer.github.io/assignees{/user}',
  'blobs_url': 'https://api.github.com/repos/abjer/abjer.github.io/git/blobs{/sha}',
  'branches_url': 'https://api.github.com/repos/abjer/abjer.github.io/branches{/branch}',
  'clone_url': 'https://github.com/abjer/abjer.github.io.git',
  'collaborators_url': 'https://api.github.com/repos/abjer/abjer.github.io/collaborators{/collaborator}',
  'comments_url': 'https://api.github.com/repos/abjer/abjer.github.io/comments{/number}',
  'commits_url': 'https://api.github.com/repos/abjer/abjer.github.io/commits{/sha}',
  'compare_url': 'https://api.github.com/repos/abjer/abjer.github.io/compare/{base}...{head}',
  'contents_url': 'https://api.github.com/repos/abjer/abjer.github.io/contents/{+path}',
  'contributors_url': 'https://api.github.com/repos/abjer/abjer.github.io/contributors',
  '

# Loading and saving files

How to do input-output (IO) operations in Python 

## Text files
*How can we save a string as a text file?*

In [61]:
my_str = '\nThis is important...'

with open('my_file.txt', 'a') as f:
    f.write(my_str)

*How can we load a string from a text file?*

In [62]:
with open('my_file.txt', 'r') as f:    
    my_str_load = f.read()
print(my_str_load)

This is important...
This is important...
This is important...


## JSON files
*How can we save a JSON file?*

The trick is to convert the JSON file to a string. This can be done with `dumps` in the module `json`:

In [64]:
import json

with open('my_file.json', 'w') as f:
    response_json_str = json.dumps(response_json)
    f.write(response_json_str)

We can convert a string to JSON with `loads`.

## File handling
*How can we remove a file?*

The module `os` can do a lot of file handling tasks:

In [68]:
import os 
# os.remove('my_file.json')
response

# Exam projects
- You decide
- Show us tools you have used in this course
- Check out [the exam post](https://abjer.github.io/sds/post/exam/) and the [practical info](https://abjer.github.io/sds/page/practical/)

# The end
[Return to agenda](#Agenda)