# Obtaining, parsing and structuring static HTML websites

In this notebook we will learn how to scrape basic static, i.e. non-interactive HTML-based websites. We will
- obtain the HTML raw content using the `requests` module
- convert the raw HTML into a format that is easier to search, or parse, using the `BeautifulSoup` module
- learn how to identify the elements of interest in the raw HTML using the browser's inspect functionality and the CSS SelectorGadget
- construct a table, or dataframe, with the popular table calculation module `pandas` and store the output locally in a standard spreadsheet format

1. Open the Anaconda Prompt and install the module `requests`

In [2]:
import requests

In [3]:
seed = 'https://www.uni-potsdam.de/de/'

2. What data type is the object `seed`? How can you check?

In [4]:
type(seed)

str

3. Is this domain an admissible path? Hint: Check the `robots.txt`

TypeError: check_compatibility() missing 1 required positional argument: 'chardet_version'

4. Was the request successful? How can you check the status? Hint: Check the available methods by using Jupyter's auto-complete functionality, i.e. type a dot at the end of the object you're investigating followed by <kbd>Tab</kbd>

In [5]:
html = requests.get(seed)

5. Which method could be most informative w.r.t. actual content? How many characters long is the raw HTML file?

In [6]:
len(html.text)

64905

6. Display the first 518 characters of the `html` object.

In [7]:
html.text[:518]

'<!DOCTYPE html><html dir="ltr" lang="de-DE"><head><meta charset="utf-8"><!-- benaja - web solutions (www.benaja-websolutions.com) Markus Meier, Roland Brandt und Tobias Gaertner GbR This website is powered by TYPO3 - inspiring people to share! TYPO3 is a free open source Content Management Framework initially created by Kasper Skaarhoj and licensed under GNU/GPL. TYPO3 is copyright 1998-2021 of Kasper Skaarhoj. Extensions are copyright of their respective owners. Information and contribution at https://typo3.org/'

7. Display meta information on the origin of the HTTP request, e.g. date. Note that it is possible to specify the `user-agent` that the server receives and provides the response (website representation) such that it optimised, e.g. Desktop vs. mobile. If it's not specified, the request will be sent using default values (potentially) containing information about your operating system, screen resolution, keyboard language, IP address and many more.

In [14]:
html.headers

{'Date': 'Thu, 15 Apr 2021 12:25:44 GMT', 'Server': 'Apache/2.4.29 (Ubuntu)', 'Vary': 'Accept-Encoding', 'Last-Modified': 'Thu, 15 Apr 2021 12:25:33 GMT', 'Accept-Ranges': 'bytes', 'Content-Length': '11841', 'Cache-Control': 'max-age=0', 'Expires': 'Thu, 15 Apr 2021 12:25:44 GMT', 'X-UA-Compatible': 'IE=edge', 'X-Content-Type-Options': 'nosniff', 'Content-Encoding': 'gzip', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=utf-8'}

The cell below saves the HTML object's text attribute in HTML format locally.

In [15]:
with open('Uni_Potsdam.html', 'w', encoding = 'utf-8') as f:
    f.write(html.text)

8. Install the module `BeautifulSoup` via `pip install beautifulsoup4`

In [8]:
from bs4 import BeautifulSoup

In [9]:
soup = BeautifulSoup(html.text, "html.parser")

9. Parse the BeautifulSoup object `soup` for all Affiliate Links. Hint: In a HTML document all elements that lead to another domain are indicated by an `a` and follow the structure `<a href="...", ... >text</a>`. Hint: Use `soup`'s method `find_all()` where the input argument is the elements' prefix. What object type is the output? Can you iterate over it? How many elements of an Affiliate Link type are contained in the HTML file?

In [10]:
soup.find_all('a')

[<a href="https://www.uni-potsdam.de/de/up-entdecken/" target="_top">Übersicht</a>,
 <a href="https://www.uni-potsdam.de/de/up-entdecken/upaktuell/uebersicht" target="_top">Aktuelle Themen</a>,
 <a href="https://www.uni-potsdam.de/de/up-entdecken/upkompakt/uebersicht" target="_top">UP kompakt</a>,
 <a href="https://www.uni-potsdam.de/de/up-entdecken/up-vor-ort/uebersicht" target="_top">UP vor Ort</a>,
 <a href="https://www.uni-potsdam.de/de/up-entdecken/up-erleben/uebersicht" target="_top">UP erleben</a>,
 <a href="https://www.uni-potsdam.de/de/up-entdecken/up-im-portraet/uebersicht" target="_top">UP im Porträt</a>,
 <a href="/de/fakultaeten/uebersicht">Übersicht</a>,
 <a href="https://www.uni-potsdam.de/de/jura/" target="_top">Juristische Fakultät</a>,
 <a href="https://www.uni-potsdam.de/de/philfak/" target="_top">Philosophische Fakultät</a>,
 <a href="https://www.uni-potsdam.de/de/humfak/" target="_top">Humanwissenschaftliche Fakultät</a>,
 <a href="https://www.uni-potsdam.de/de/wis

10. Convert the BeautifulSoup object into a "plain" Python list object containing the elements' **text** attributes by iterating over it. Hint: Instantiate an empty `list` object, write a for-loop and `append` each element to the list object. You may also remove any unwanted whitespaces by using the `strip` function.

In [30]:
soup.find_all('a')[0].text

'Übersicht'

In [29]:
link_list = []

for link in soup.find_all('a'):
    link_list.append(link.text.strip())

link_list

['Übersicht',
 'Aktuelle Themen',
 'UP kompakt',
 'UP vor Ort',
 'UP erleben',
 'UP im Porträt',
 'Übersicht',
 'Juristische Fakultät',
 'Philosophische Fakultät',
 'Humanwissenschaftliche Fakultät',
 'Wirtschafts- und Sozialwissenschaftliche Fakultät',
 'Mathematisch-Naturwissenschaftliche Fakultät',
 'Digital Engineering Fakultät',
 'Fakultät für Gesundheitswissenschaften',
 'Übersicht',
 'Organe und Gremien',
 'Universitätsleitung und Verwaltung',
 'Zentrale und wissenschaftliche Einrichtungen',
 'Bibliothek',
 'Weitere Einrichtungen',
 'Übersicht',
 'Profil International',
 'Service an der UP',
 'Ins Ausland',
 'Aus dem Ausland',
 'Projekte International',
 'Übersicht',
 'Partnerkreis Industrie und Wirtschaft',
 'Services für Unternehmen',
 'Gründung und Transfer',
 'Fördern und Stiften',
 'Weiterbildung',
 'English',
 'Studieren an der UP',
 'Studienangebot',
 'Bewerbung und Immatrikulation',
 'Studium konkret',
 'Beratungs- und Serviceeinrichtungen',
 'Termine und Fristen',
 'For

#### Pro-Tipp
Instead of explicitly writing a for-loop when disentangling specific objects from an aggregate object you can use Python's built-in `map` and `lambda` functions as a one-liner.

In [11]:
results_list = list(map(lambda x: x.text.strip(), soup.find_all('a')))
results_list

['Übersicht',
 'Aktuelle Themen',
 'UP kompakt',
 'UP vor Ort',
 'UP erleben',
 'UP im Porträt',
 'Übersicht',
 'Juristische Fakultät',
 'Philosophische Fakultät',
 'Humanwissenschaftliche Fakultät',
 'Wirtschafts- und Sozialwissenschaftliche Fakultät',
 'Mathematisch-Naturwissenschaftliche Fakultät',
 'Digital Engineering Fakultät',
 'Fakultät für Gesundheitswissenschaften',
 'Übersicht',
 'Organe und Gremien',
 'Universitätsleitung und Verwaltung',
 'Zentrale und wissenschaftliche Einrichtungen',
 'Bibliothek',
 'Weitere Einrichtungen',
 'Übersicht',
 'Profil International',
 'Service an der UP',
 'Ins Ausland',
 'Aus dem Ausland',
 'Projekte International',
 'Übersicht',
 'Partnerkreis Industrie und Wirtschaft',
 'Services für Unternehmen',
 'Gründung und Transfer',
 'Fördern und Stiften',
 'Weiterbildung',
 'English',
 'Studieren an der UP',
 'Studienangebot',
 'Bewerbung und Immatrikulation',
 'Studium konkret',
 'Beratungs- und Serviceeinrichtungen',
 'Termine und Fristen',
 'For

11. Identify the element which text attribute's value is equal to "alle Artikel". Return the element's position (`index`) within the list.

In [12]:
index = results_list.index('alle Artikel')
index

220

12. Obtain this element's value of the `href` attribute. It should be an URL pointing at the domain where the news at Universität Potsdam are collected.

In [13]:
new_seed = soup.find_all('a')[index].get('href')
print(new_seed)

https://www.uni-potsdam.de/nachrichten.html


13. Write a function which takes a String-type object (e.g. an URL) as input and returns a readily parse-able `BeautifulSoup` object.

In [14]:
def URL_to_BS(url):
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')
    return soup

In [15]:
news_soup = URL_to_BS(new_seed)

In [16]:
news_soup

<!DOCTYPE html>
<html dir="ltr" lang="de-DE"><head><meta charset="utf-8"/><!-- benaja - web solutions (www.benaja-websolutions.com) Markus Meier, Roland Brandt und Tobias Gaertner GbR This website is powered by TYPO3 - inspiring people to share! TYPO3 is a free open source Content Management Framework initially created by Kasper Skaarhoj and licensed under GNU/GPL. TYPO3 is copyright 1998-2021 of Kasper Skaarhoj. Extensions are copyright of their respective owners. Information and contribution at https://typo3.org/ --><meta content="IE=edge" http-equiv="x-ua-compatible"><meta content="TYPO3 CMS" name="generator"/><meta content="width=device-width, initial-scale=1.0" name="viewport"/><meta content="Silvana Grabowski" name="author"/><meta content="Neues aus der UP" property="og:title"/><meta content="summary" name="twitter:card"/><link href="/typo3temp/assets/compressed/merged-fc201a8dd7df23f7869e0d7219366c48-min.css.gzip?1583314121" media="print" rel="stylesheet" type="text/css"/><link 

14. Open the `new_seed` URL in your browser and enable the CSS SelectorGadget. Highlight the box containing the first article. The other, similar boxes should be highlighted as well. Copy the identified CSS selector and parse through the `news_soup` object but this time over elements corresponding to the CSS selector you found (use `.select()` instead of `find_all()`). Store the subset of elements in a list. You can achieve all of this in one line of code. How many items does this list contain?

In [17]:
css_element = '.up-news-list-item'

In [18]:
news_soup.select(css_element)

[<div class="up-news-list-item" itemscope="itemscope" itemtype="http://schema.org/Article"><div class="up-news-list-item-image"><a href="/de/nachrichten/detail/2021-04-15-studio-days-2021-digitale-studienorientierungswoche-der-staatlichen-brandenburgischen" title="Studi’O Days 2021 – digitale Studienorientierungswoche der staatlichen brandenburgischen Hochschulen"><div class="news-img-wrap"><a href="/de/nachrichten/detail/2021-04-15-studio-days-2021-digitale-studienorientierungswoche-der-staatlichen-brandenburgischen" title="Studi’O Days 2021 – digitale Studienorientierungswoche der staatlichen brandenburgischen Hochschulen"><img alt="Matthias Friel, Netzwerk Studienorientierung Brandenburg" height="140" src="/fileadmin/_processed_/c/2/csm_2021-04_Cover_StudiODays_PM_ce48bf3905.jpg" width="270"/></a></div></a></div><div class="up-news-list-item-text"><a href="/de/nachrichten/detail/2021-04-15-studio-days-2021-digitale-studienorientierungswoche-der-staatlichen-brandenburgischen" title="

In [19]:
news_list = list(map(lambda x: x, news_soup.select(css_element)))
len(news_list)

10

15. Split the list's elements into their hyperlinks (`href`) and text attributes' values.

In [20]:
news_list[0].findChild('a')['href']

'/de/nachrichten/detail/2021-04-15-studio-days-2021-digitale-studienorientierungswoche-der-staatlichen-brandenburgischen'

In [21]:
news_list[0].findChild('a')['title']

'Studi’O Days 2021 – digitale Studienorientierungswoche der staatlichen brandenburgischen Hochschulen'

In [22]:
link_list = []
title_list = []

for link_num in range(len(news_list)):
    sub_link  = news_list[link_num].findChild('a')['href']
    sub_title = news_list[link_num].findChild('a')['title']
    
    if type(sub_link)is str and 'www' not in sub_link:
        link_list.append('https://www.uni-potsdam.de' + sub_link)
        title_list.append(sub_title)

In [23]:
link_list

['https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-15-studio-days-2021-digitale-studienorientierungswoche-der-staatlichen-brandenburgischen',
 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-15-hoppla-jetzt-kommt-koppla-die-revolution-fuers-handwerk',
 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-13-science-fiction-experimentelle-kooperation-von-germanistik-und-food4future',
 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-13-wir-waren-exotisch-der-linguist-gisbert-fanselow-und-der-psychologe-reinhold-klie',
 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-12-eine-bruecke-in-den-arbeitsmarkt-warum-betriebswirtin-kristina-nistor-noch-einmal-zur-uni-g',
 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-08-die-klima-uhr-tickt-der-wirtschaftswissenschaftler-matthias-kalkuhl-erforscht-wie-die-k',
 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-06-viele-schluessel-zum-erfolg-wie-sich-das-zessko-vom-sprachen-zum-kompetenzz

In [24]:
title_list

['Studi’O Days 2021 – digitale Studienorientierungswoche der staatlichen brandenburgischen Hochschulen',
 'Hoppla! Jetzt kommt Koppla! – Die Revolution fürs Handwerk',
 'Science / Fiction – Experimentelle Kooperation von Germanistik und food4future',
 '„Wir waren exotisch!“ – Der Linguist Gisbert Fanselow und der Psychologe Reinhold Kliegl im Gespräch über löchrige Dächer, schwierige Anfänge und die richtigen Forschungsfragen',
 'Eine Brücke in den Arbeitsmarkt: Warum Betriebswirtin Kristina Nistor noch einmal zur Uni ging',
 'Die Klima-Uhr tickt – Der Wirtschaftswissenschaftler Matthias Kalkuhl erforscht, wie die Klimawende gelingen kann',
 'Viele Schlüssel zum Erfolg – Wie sich das Zessko vom Sprachen- zum Kompetenzzentrum entwickelte',
 'Im Interview: Sven Dinklage – Im Einsatz als Liaison-Officer für die UP in Brasilien',
 '„Studier was Vernünftiges!“ – „SciVisTo“-Gründerin Franziska Schwarz über einen ungewöhnlichen Weg in die Selbstständigkeit',
 '„Ich würde meine Verteidigung au

In [25]:
lot = list(zip(title_list, link_list))
news_dict = dict(lot)

In [26]:
news_dict

{'Studi’O Days 2021 – digitale Studienorientierungswoche der staatlichen brandenburgischen Hochschulen': 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-15-studio-days-2021-digitale-studienorientierungswoche-der-staatlichen-brandenburgischen',
 'Hoppla! Jetzt kommt Koppla! – Die Revolution fürs Handwerk': 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-15-hoppla-jetzt-kommt-koppla-die-revolution-fuers-handwerk',
 'Science / Fiction – Experimentelle Kooperation von Germanistik und food4future': 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-13-science-fiction-experimentelle-kooperation-von-germanistik-und-food4future',
 '„Wir waren exotisch!“ – Der Linguist Gisbert Fanselow und der Psychologe Reinhold Kliegl im Gespräch über löchrige Dächer, schwierige Anfänge und die richtigen Forschungsfragen': 'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-13-wir-waren-exotisch-der-linguist-gisbert-fanselow-und-der-psychologe-reinhold-klie',
 'Eine Brücke in 

## Pagination
You have probably realised that the articles presented on the first news page are not the entire collection of the University of Potsdam. Your goal is to retrieve a complete collection of all articles that are available on the university's website and you can easily apply your new knowledge in a repetive manner.

16. Figure out how many pages containing articles content there are in total. You can do it manually by e.g. inspecting the URL when you proceed through the collection in your browser or by checking it programmatically by writing a `while` loop that continues until some condition, such as a status returned from your request, is violated. Make sure to include a short pause (1 second) in order not to overcharge the server that in some cases could lead to a temporary ban of your device.

In [27]:
# Long code block
import time

articles_links = []

counter = 0

test_seed = 'https://www.uni-potsdam.de/de/nachrichten/'
test_html = requests.get(seed)
status = test_html.status_code

while status == 200:
    
    print('Scraping page ' + str(counter) + '.')
    
    if counter < 1:
        
        seed = 'https://www.uni-potsdam.de/de/nachrichten/'
        
        html = requests.get(seed)
        
        status = html.status_code
        
        soup = BeautifulSoup(html.text, "html.parser")
        
        news_list = list(map(lambda x: x, soup.select('.up-news-list-item')))
        
        link_list = []
        title_list = []
        
        for link_num in range(len(news_list)):
    
            sub_link = news_list[link_num].findChild("a")['href']
            sub_title = news_list[link_num].findChild("a")['title']
    
            if type(sub_link) is str and 'www' not in sub_link:
        
                link_list.append('https://www.uni-potsdam.de' + sub_link)
                title_list.append(sub_title)
        
        articles_links.extend(link_list)
        
    elif counter >= 1:
        
        seed = 'https://www.uni-potsdam.de/de/nachrichten/page-{}'.format(str(counter+1))
        
        html = requests.get(seed)
        
        status = html.status_code
        
        soup = BeautifulSoup(html.text, "html.parser")
        
        news_list = list(map(lambda x: x, soup.select('.up-news-list-item')))
        
        link_list = []
        title_list = []

        for link_num in range(len(news_list)):
    
            sub_link = news_list[link_num].findChild("a")['href']
            sub_title = news_list[link_num].findChild("a")['title']
    
            if type(sub_link) is str and 'www' not in sub_link:
        
                link_list.append('https://www.uni-potsdam.de' + sub_link)
                title_list.append(sub_title)
        
        articles_links.extend(link_list)
        
    counter += 1
    
    time.sleep(1)

Scraping page 0.
Scraping page 1.
Scraping page 2.
Scraping page 3.
Scraping page 4.
Scraping page 5.
Scraping page 6.
Scraping page 7.
Scraping page 8.
Scraping page 9.
Scraping page 10.
Scraping page 11.
Scraping page 12.
Scraping page 13.
Scraping page 14.
Scraping page 15.
Scraping page 16.
Scraping page 17.
Scraping page 18.
Scraping page 19.
Scraping page 20.
Scraping page 21.
Scraping page 22.
Scraping page 23.
Scraping page 24.
Scraping page 25.
Scraping page 26.
Scraping page 27.
Scraping page 28.
Scraping page 29.
Scraping page 30.
Scraping page 31.
Scraping page 32.
Scraping page 33.
Scraping page 34.
Scraping page 35.
Scraping page 36.
Scraping page 37.
Scraping page 38.
Scraping page 39.
Scraping page 40.
Scraping page 41.
Scraping page 42.
Scraping page 43.
Scraping page 44.
Scraping page 45.
Scraping page 46.
Scraping page 47.
Scraping page 48.
Scraping page 49.
Scraping page 50.
Scraping page 51.
Scraping page 52.
Scraping page 53.
Scraping page 54.
Scraping page 55.
Sc

In [34]:
with open('articles_links.txt', 'w') as output:
    
    output.writelines("%s\n" % line for line in articles_links)

17. Read in the JSON file you stored in step 17 and iterate over each hyperlink. Split the list into 4 evenly sized chunks and iterate over each chunk. In each iteration, obtain the HTML, parse it and identify the elements of the publication date, the contact, the contact's email address, the image's hyperlink/reference and the main text body's length. Note that some, or even all, of these elements may not be available. Define an appropriate data type for each field and append it **as a dictionary** in each iteration to a list.

In [44]:
import json

with open('articles_links.txt', 'r') as inputfile:
    urls = inputfile.read()

In [46]:
urls_list = urls.split('\n')[:-1]
urls_list[1]

'https://www.uni-potsdam.de/de/nachrichten/detail/2021-04-15-hoppla-jetzt-kommt-koppla-die-revolution-fuers-handwerk'

In [47]:
size = len(urls_list)/4
size

250.0

In [52]:
html = requests.get(urls_list[0]).text
html

'<!DOCTYPE html><html dir="ltr" lang="de-DE"><head><meta charset="utf-8"><!-- benaja - web solutions (www.benaja-websolutions.com) Markus Meier, Roland Brandt und Tobias Gaertner GbR This website is powered by TYPO3 - inspiring people to share! TYPO3 is a free open source Content Management Framework initially created by Kasper Skaarhoj and licensed under GNU/GPL. TYPO3 is copyright 1998-2021 of Kasper Skaarhoj. Extensions are copyright of their respective owners. Information and contribution at https://typo3.org/ --><meta http-equiv="x-ua-compatible" content="IE=edge"/><meta name="generator" content="TYPO3 CMS"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><meta name="author" content="Silvana Grabowski"/><meta property="og:title" content="Detail"/><meta name="twitter:card" content="summary"/><link rel="stylesheet" type="text/css" href="/typo3temp/assets/compressed/merged-fc201a8dd7df23f7869e0d7219366c48-min.css.gzip?1583314121" media="print"><link rel="styles

In [82]:
date = BeautifulSoup(html, "html.parser").select(".up-news-single-date")
date

[<div class="up-news-single-date"><time datetime="2021-04-15"> 15.04.2021 </time></div>]

In [90]:
date = date[0].text.strip()

'15.04.2021'

In [113]:
contact = BeautifulSoup(html, "html.parser").select('.up-news-single-contact-item a')[0].text.strip()
contact

'Christian Mödebeck-Bagrowski'

In [122]:
mailc = BeautifulSoup(html, "html.parser").find_all('meta')
mailc

[<meta charset="utf-8"/>,
 <meta content="IE=edge" http-equiv="x-ua-compatible"><meta content="TYPO3 CMS" name="generator"/><meta content="width=device-width, initial-scale=1.0" name="viewport"/><meta content="Silvana Grabowski" name="author"/><meta content="Detail" property="og:title"/><meta content="summary" name="twitter:card"/><link href="/typo3temp/assets/compressed/merged-fc201a8dd7df23f7869e0d7219366c48-min.css.gzip?1583314121" media="print" rel="stylesheet" type="text/css"/><link href="/typo3temp/assets/compressed/merged-ac5e367df0d2daf0f288930ccdf26749-min.css.gzip?1613975705" media="screen" rel="stylesheet" type="text/css"/><script src="/typo3temp/assets/compressed/jquery-2.2.4.min-min.js.gzip?1583314121" type="text/javascript"></script><script src="/typo3temp/assets/compressed/merged-8dc6a9a84127a40ca23fee40ecace56f-min.js.gzip?1583314133" type="text/javascript"></script><script type="text/javascript">function returnAt() { document.write('&#64;'); }function returnDot() { doc

In [114]:
picture = BeautifulSoup(html, "html.parser").select('#up_news_single_media_slider img')[0]['src']
picture

'/fileadmin/_processed_/c/2/csm_2021-04_Cover_StudiODays_PM_3031cc67a0.jpg'

In [119]:
textlen = BeautifulSoup(html, "html.parser").select('.up-news-single-leftbox')[0].text
textlen = len(textlen)
textlen

3497

## Asynchronous HTTP requests

18. Install the libaries `asyncio`, `aiohttp` and `tqdm`.

In [None]:
import asyncio
import aiohttp
import bs4
import tqdm

19. Find the missing link that appears in `articles_links_r` but not in `results_list` using a list comprehension.

20. Install the `pandas` library.

In [None]:
import pandas as pd

21. Convert the `publication_date` into a `pandas` `datetime` object and plot a time series of published articles on a daily basis. Bonus: Aggregate the time series into monthly frequency. In which month-year were most articles published?

22. Install the library `matplotlib`.

In [None]:
import matplotlib
import matplotlib.pyplot as plt

23. Install the libraries `cufflinks` and `plotly`.

In [None]:
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

24. Install the `chart-studio` library.

In [None]:
import chart_studio
import chart_studio.plotly as py
import plotly.graph_objs as go

25. Log in to [Plotly Chart Studio](https://chart-studio.plotly.com/Auth/login/#/) and obtain your `Username` and `API key`. Store them both line-by-line in a .py file, e.g. name it "plotly_config.py".

In [None]:
import plotly_config

chart_studio.tools.set_credentials_file(username=plotly_config.Username, api_key=plotly_config.api_key)