## Week 1

## API introduction

This first section is an example of using in API in Python.

Taken from the course material.

In [1]:
# Python package imports
import urllib.request
import json

In [2]:
# Lecture example
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8')

In [3]:
baseurl = "https://en.wikipedia.org/w/api.php?"
action = "action=query"
title = "titles=Rembrandt"
content = "prop=revisions&rvprop=content"
dataformat ="format=json"

query = "{}{}&{}&{}&{}".format(baseurl, action, content, title, dataformat)
print(query)

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=Rembrandt&format=json


In [4]:
wikiresponse = urllib.request.urlopen(query)
wikidata = wikiresponse.read()
wikitext = wikidata.decode('utf-8')

In [5]:
%%capture 
# The above line Suppresses the output of the cell
json.loads(wikitext)

### Exercises Part 1

#### 1.1 Explain in your own words: What is the the difference between the html page and the wiki-source?

#### _HTML page_
The HTML page is a formatted representation of the website's content. This format is used for defining the placement and interactions of text and media on a web page. This formating makes it easier for the user to read the content when visiting the website. However, if we were interested in using the content for some sort of text analysis, we would not be interested in all the HTML/CSS code, but simply the textual content. 

#### _Wikisource_
The wikisource is the websites actual content, meaning the text which is parsed onto the website. This provides a more ordered and structured format which makes it easier and cleaner to retrieve the data.

#### 1.2 What are the various parameters you can set for a query of the wikipedia api?

Wikipedia provides users with documentation for using their API.

One of the actions available are **Queries**.

How to use queries and which parameters they support can be read in the following links to wikipedia's documentation:

https://www.mediawiki.org/wiki/API:Main_page

https://www.mediawiki.org/w/api.php?action=help&modules=query

https://www.mediawiki.org/wiki/API:Query

#### 1.3 Write your own little notebook to download wikipedia pages based on the video above. Download the source for your 4 favorite wikipedia pages.

##### 1.3.1

Send and receive HTTP request/response via URL.

Then read the response content and print it.

In [6]:
url = 'https://en.wikipedia.org/wiki/Cosmic_latte'
response = urllib.request.urlopen(url)
data = response.read()
text = data.decode('utf-8')
print(text)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-toc-available" lang="en" dir="ltr">
<head>
<meta charset="UTF-8">
<title>Cosmic latte - Wikipedia</title>
<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feat

Create query based on Wikipedia's API parameters.

In [7]:
baseurl = "https://en.wikipedia.org/w/api.php?" # Specify endpoint - English Wikipedia APIs
action = "action=query" # Specify action - query
title = "titles=Cosmic_latte" # query-specific parameter - titles
content = "prop=revisions&rvprop=content" # query-specific properties for page and revision
dataformat ="format=json" # Specify output format of data - JSON

# Format the url string based on convention, i.e. '&'
query = "{}{}&{}&{}&{}".format(baseurl, action, content, title, dataformat)
print(query)

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=Cosmic_latte&format=json


Use the query string to send a HTTP request via URL.

Load the JSON response and print it.

In [8]:
wiki_response = urllib.request.urlopen(query)
wiki_data = wiki_response.read()
wiki_text = wiki_data.decode('utf-8')

# Print JSON response
json_response = json.loads(wiki_text)
json_response

{'batchcomplete': '',
  'revisions': {'*': 'Because "rvslots" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used.'}},
 'query': {'normalized': [{'from': 'Cosmic_latte', 'to': 'Cosmic latte'}],
  'pages': {'1643492': {'pageid': 1643492,
    'ns': 0,
    'title': 'Cosmic latte',
    'revisions': [{'contentformat': 'text/x-wiki',
      'contentmodel': 'wikitext',
      '*': '{{Short description|Average color of the universe}}\n{{Use dmy dates|date=September 2016}}\n{{infobox color\n| title=Cosmic latte\n| hex=FFF8E7\n| source=[https://web.archive.org/web/20060104173304/http://www.pha.jhu.edu:80/~kgb/cosspec/ JHU]\n| variations=yes\n| variationstitle=<hr><span style="font-weight:normal !important">Due to flawed calculations, the average color of the universe was originally thought to be turquoise.<ref name="Wired 2002" /></span>\n| variation1=Cosmic spectrum green\n| variation1color=#9CFFCE\n| isc

The JSON is a _dictionary_-dataobject with fields and subfields which can be sorted through. If we are interested in specific '_keys_' or attributes, we can pluck them out of the object.

In [9]:
json_response.keys()



We can then traverse the JSON via it's keys:

In [10]:
json_response['query'].keys()

dict_keys(['normalized', 'pages'])

In [11]:
json_response['query']['pages'].keys()

dict_keys(['1643492'])

In [12]:
json_response['query']['pages']['1643492'].keys()

dict_keys(['pageid', 'ns', 'title', 'revisions'])

In [13]:
json_response['query']['pages']['1643492']['revisions'].keys()

AttributeError: 'list' object has no attribute 'keys'

In [None]:
# Check datatype of python object
type(json_response['query']['pages']['1643492']['revisions'])

Note that the above key 'revisions' is of type 'list', meaning that it does not have any subkeys. To traverse a list we then use indexing.

In [14]:
json_response['query']['pages']['1643492']['revisions'][0].keys()

dict_keys(['contentformat', 'contentmodel', '*'])

In [15]:
type(json_response['query']['pages']['1643492']['revisions'][0]['*'])

str

In [16]:
json_response['query']['pages']['1643492']['revisions'][0]['*']

'{{Short description|Average color of the universe}}\n{{Use dmy dates|date=September 2016}}\n{{infobox color\n| title=Cosmic latte\n| hex=FFF8E7\n| source=[https://web.archive.org/web/20060104173304/http://www.pha.jhu.edu:80/~kgb/cosspec/ JHU]\n| variations=yes\n| variationstitle=<hr><span style="font-weight:normal !important">Due to flawed calculations, the average color of the universe was originally thought to be turquoise.<ref name="Wired 2002" /></span>\n| variation1=Cosmic spectrum green\n| variation1color=#9CFFCE\n| isccname=Pale yellow green\n}}\n\n\'\'\'Cosmic latte\'\'\' is the average color of the galaxies of the [[universe]] as perceived from the Earth, found by a team of [[astronomer]]s from [[Johns Hopkins University]] (JHU). In 2002, [[Karl Glazebrook]] and Ivan Baldry determined that the average color of the universe was a [[green]]ish white, but they soon corrected their analysis in a 2003 paper in which they reported that their survey of the light from over 200,000 [[

**Julia**

Send and receive HTTP request/response via URL.

Then read the response content and print it.

In [None]:
url = 'https://en.wikipedia.org/wiki/Clever_Hans'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8')

#print(text)

In [None]:
baseurl = "https://en.wikipedia.org/w/api.php?"
action = "action=query"
title = "titles=Clever_Hans"
content = "prop=revisions&rvprop=content"
dataformat ="format=json"

query = "{}{}&{}&{}&{}".format(baseurl, action, content, title, dataformat)
print(query)

In [None]:
wikiresponse = urllib.request.urlopen(query)
wikidata = wikiresponse.read()
wikitext = wikidata.decode('utf-8')

In [None]:
json.loads(wikitext)

In [None]:
#len(json.loads(wikitext)["query"]["pages"]["456590"]["revisions"])

json.loads(wikitext)["query"]["pages"]["456590"]["revisions"][0]["*"]


In [None]:
for title in ["Batyr","Alex_(parrot)","Viki_(chimpanzee)"]:
    baseurl = "https://en.wikipedia.org/w/api.php?"
    action = "action=query"
    title = f'titles={title}'
    content = "prop=revisions&rvprop=content"
    dataformat ="format=json"

    query = "{}{}&{}&{}&{}".format(baseurl, action, content, title, dataformat)
    wikiresponse = urllib.request.urlopen(query)
    wikidata = wikiresponse.read()
    wikitext = wikidata.decode('utf-8')
    print(json.loads(wikitext))

#### Clara

In [None]:
url = 'https://en.wikipedia.org/wiki/Wow!_signal'
response = urllib.request.urlopen(url)
data = response.read()
text = data.decode('utf-8')

In [None]:
baseurl = "https://en.wikipedia.org/w/api.php?" # Specify endpoint - English Wikipedia API
action = "action=query" # Specify action - query
title = "titles=Wow!_signal" # query-specific parameter - titles
content = "prop=revisions&rvprop=content" # query-specific properties for page and revision
dataformat ="format=json" # Specify output format of data - JSON

query = "{}{}&{}&{}&{}".format(baseurl, action, content, title, dataformat)
print(query)

### Exercises_Part2

#### List three different real networks and state the nodes and links for each of them.

In [None]:
Print("J")

In [None]:
Print("C")

1. Family tree would be a form of network where the nodes would a parent and a child and the link would be the relationsship between the two for instance. The bigger the family is, the bigger the network would be.

2. Network protocols is a network. the protocols being the nodes and the links are the transmissions between.

In [None]:
Print("A")

#### Tell us of the network you are personally most interested in (a fourth one). Address the following questions:

#### In your view what would be the area where network science could have the biggest impact in the next decade? Explain your answer - and base it on the text in the book.

### Exercises_Part2

#### Go to the NetworkX project's tutorial page. The goal of this exercise is to create your own Notebook that contains the entire tutorial. You're free to add your own (e.g. shorter) comments in place of the ones in the official tutorial - and change the code to make it your own where ever it makes sense.
