# Collecting Data from the Web - APIs (api's: google books, .... )

# take info c260F machine learning in education for data scraping
# use AskOski to find classes to take

We're going to first explore a simple web API, the Google Books API, to perform some searches for books and see what metadata we get in return. Although many APIs require a key in order to access the data, we can perform Google Books searches without one. The guide to the Google Books API, as well as a lot of other useful information, can be found [here](https://developers.google.com/books/).

If you are interested in [Twitter data, check out Tweepy.](https://www.earthdatascience.org/courses/earth-analytics-python/using-apis-natural-language-processing-twitter/get-and-use-twitter-data-in-python/)

First we'll import the [`requests`](http://docs.python-requests.org/en/master/) library. The `requests` library is necessary for most interaction with the internet in Python. We'll use it to make a `get` request to the API endpoint.

## google:  w3 scholls.com
XML tutorial - data storage 

JSON into - data storage 

HTML intro - how site looks

PYTHON TRY EXCEPT


In [1]:
# !pip install requests
import requests

To call an API it to just build a unique URL. We always need a base URL, or endpoint, to which we can add the parameters specific to our request. Let's assign the base Google Books endpoint to a variable, we'll call it `books_url`. We know this URL from the documentation linked above.

In [2]:
books_url = 'https://www.googleapis.com/books/v1/volumes?'  # this is an API

We can start off with a very simple search to see what the results look like. Then we'll move on to adding more filters and parameters. Let's assign our query to a variable `query`.

In [3]:
query = "data science social science"

To incorporate this into our query we can make a dictionary called `parameters`. We'll pass these parameters to the `get` method. The `'q'` stands for 'query', and whatever value we assign to that is what we're searching for, just as if we typed it into the Google search bar.

In [4]:
parameters = {'q': query}

We'll pass two arguments to the `get` method of `requests` library: the URL and the parameters we want. It returns a response object.

In [5]:
r = requests.get(books_url, params=parameters)  # parameter are search terms 

Printing the `url` property, we can see that this function converted the URL into the proper format to include our search terms.

In [6]:
print(r.url)

https://www.googleapis.com/books/v1/volumes?q=data+science+social+science


To see our results, we simply use the request object's `json` method, which we can then navigate as a Python dictionary. Take a minute or two to navigate through the results in order to understand their structure and remember your JSON training. 

In [7]:
results = r.json()
results

{'kind': 'books#volumes',
 'totalItems': 2642,
 'items': [{'kind': 'books#volume',
   'id': 'zqiKDQAAQBAJ',
   'etag': 'ocEEvh0Oq8E',
   'selfLink': 'https://www.googleapis.com/books/v1/volumes/zqiKDQAAQBAJ',
   'volumeInfo': {'title': 'Big Data and Social Science',
    'subtitle': 'A Practical Guide to Methods and Tools',
    'authors': ['Ian Foster',
     'Rayid Ghani',
     'Ron S. Jarmin',
     'Frauke Kreuter',
     'Julia Lane'],
    'publisher': 'CRC Press',
    'publishedDate': '2016-08-10',
    'description': "Both Traditional Students and Working Professionals Acquire the Skills to Analyze Social Problems. Big Data and Social Science: A Practical Guide to Methods and Tools shows how to apply data science to real-world problems in both research and the practice. The book provides practical guidance on combining methods and tools from computer science, statistics, and social science. This concrete approach is illustrated throughout using an important national problem, the quant

You probably figured out that the books are found under the `'items'` key, and the most important information for each one is under the `'volumeInfo'` key. Let's take a look at the first result.

In [8]:
results['items']

[{'kind': 'books#volume',
  'id': 'zqiKDQAAQBAJ',
  'etag': 'ocEEvh0Oq8E',
  'selfLink': 'https://www.googleapis.com/books/v1/volumes/zqiKDQAAQBAJ',
  'volumeInfo': {'title': 'Big Data and Social Science',
   'subtitle': 'A Practical Guide to Methods and Tools',
   'authors': ['Ian Foster',
    'Rayid Ghani',
    'Ron S. Jarmin',
    'Frauke Kreuter',
    'Julia Lane'],
   'publisher': 'CRC Press',
   'publishedDate': '2016-08-10',
   'description': "Both Traditional Students and Working Professionals Acquire the Skills to Analyze Social Problems. Big Data and Social Science: A Practical Guide to Methods and Tools shows how to apply data science to real-world problems in both research and the practice. The book provides practical guidance on combining methods and tools from computer science, statistics, and social science. This concrete approach is illustrated throughout using an important national problem, the quantitative study of innovation. The text draws on the expertise of promin

In [9]:
results['items'][0]['volumeInfo']

{'title': 'Big Data and Social Science',
 'subtitle': 'A Practical Guide to Methods and Tools',
 'authors': ['Ian Foster',
  'Rayid Ghani',
  'Ron S. Jarmin',
  'Frauke Kreuter',
  'Julia Lane'],
 'publisher': 'CRC Press',
 'publishedDate': '2016-08-10',
 'description': "Both Traditional Students and Working Professionals Acquire the Skills to Analyze Social Problems. Big Data and Social Science: A Practical Guide to Methods and Tools shows how to apply data science to real-world problems in both research and the practice. The book provides practical guidance on combining methods and tools from computer science, statistics, and social science. This concrete approach is illustrated throughout using an important national problem, the quantitative study of innovation. The text draws on the expertise of prominent leaders in statistics, the social sciences, data science, and computer science to teach students how to use modern social science research principles as well as the best analytica

# Challenge 1

There's a lot of information in the results, but we probably don't want all of it. Suppose that for each volume in the results, we only want to extract 1) the title, 2) the author(s), 3) the publication date, and 4) the description. Write a function `parse_results` that takes the `results` object as an argument and returns a list of dictionaries. Each dictionary within the list corresponds to a book, and has an `author` key, a `title` key, a `publication_data` key, and a `description` key.

The outline of the code is written for you below, your job is to fill in the blanks. Often when scraping you'll need to build in `try` and `except` blocks because web data will rarely be 'tidy' data.

In [10]:
results["items"][2]["volumeInfo"]

{'title': 'Data Science and Social Research',
 'subtitle': 'Epistemology, Methods, Technology and Applications',
 'authors': ['N. Carlo Lauro',
  'Enrica Amaturo',
  'Maria Gabriella Grassia',
  'Biagio Aragona',
  'Marina Marino'],
 'publisher': 'Springer',
 'publishedDate': '2017-11-17',
 'description': 'This edited volume lays the groundwork for Social Data Science, addressing epistemological issues, methods, technologies, software and applications of data science in the social sciences. It presents data science techniques for the collection, analysis and use of both online and offline new (big) data in social research and related applications. Among others, the individual contributions cover topics like social media, learning analytics, clustering, statistical literacy, recurrence analysis and network analysis. Data science is a multidisciplinary approach based mainly on the methods of statistics and computer science, and its aim is to develop appropriate methodologies for forecast

In [11]:
## YOUR CODE HERE

def parse_results(results):

    results_list = []

    for book in results['items']:

        title = book["volumeInfo"]["title"]

        # some books don't have authors, dates, or a description
        try:
            authors = ",".join(book["volumeInfo"]["autors"])
        except:
            authors = 'NA'
        
        try:
            published_date = book["volumeInfo"]["publishedDate"]
        except:
            published_date = 'NA'

        try:
            description = book["volumeInfo"]["description"]
        except:
            description = "NA"

        results_dict = {'title': title,
                        'authors': authors,
                        'description': description,
                        'published_date': published_date}
        
        results_list.append(results_dict)
        
    return(results_list)



In [12]:
results_final = parse_results(results)
print("numeber of books =", len(results_final))
print()
print(results_final)


numeber of books = 10

[{'title': 'Big Data and Social Science', 'authors': 'NA', 'description': "Both Traditional Students and Working Professionals Acquire the Skills to Analyze Social Problems. Big Data and Social Science: A Practical Guide to Methods and Tools shows how to apply data science to real-world problems in both research and the practice. The book provides practical guidance on combining methods and tools from computer science, statistics, and social science. This concrete approach is illustrated throughout using an important national problem, the quantitative study of innovation. The text draws on the expertise of prominent leaders in statistics, the social sciences, data science, and computer science to teach students how to use modern social science research principles as well as the best analytical and computational tools. It uses a real-world challenge to introduce how these tools are used to identify and capture appropriate data, apply data science models and tools 

# Challenge 2

Now let's explore the API using more parameters. You may have noticed that our query only gave us 10 books, but there are probably more than 10 books written about data science and the social sciences. To adjust our search, we need to add in the `maxResults` parameter and the `startIndex` parameter. We can do that by adding these as keys to the `parameters` dictionary, and then run our request again. To read about these parameters, see the [documentation](https://developers.google.com/books/docs/v1/using#api_params).

In [13]:
parameters = {'q': query,
          'maxResults': 20,
          'startIndex': 0}

r = requests.get(books_url, params=parameters)

print(r.url)

results_final = r.json()

print()

print(parse_results(results_final))

https://www.googleapis.com/books/v1/volumes?q=data+science+social+science&maxResults=20&startIndex=0

[{'title': 'Big Data and Social Science', 'authors': 'NA', 'description': "Both Traditional Students and Working Professionals Acquire the Skills to Analyze Social Problems. Big Data and Social Science: A Practical Guide to Methods and Tools shows how to apply data science to real-world problems in both research and the practice. The book provides practical guidance on combining methods and tools from computer science, statistics, and social science. This concrete approach is illustrated throughout using an important national problem, the quantitative study of innovation. The text draws on the expertise of prominent leaders in statistics, the social sciences, data science, and computer science to teach students how to use modern social science research principles as well as the best analytical and computational tools. It uses a real-world challenge to introduce how these tools are used

Now write a for loop to collect the first 100 results into `all_results`. But make sure you use `time.sleep` at the end of each loop! Python is so fast that if you write a for loop without pausing between calls you can overload someone's server, or get yourself (temporarily) banned:

In [14]:
## YOUR CODE HERE

import time

parameters = {'q': query, 'maxResults': 20, 'startIndex': 0}  # 20 results per page for 5 pages 

all_results = []
for i in range(5):
    print("collecting page " + str(i + 1))
    
    r = requests.get(books_url, params=parameters)
    results = r.json()
    parsed = parse_results(results)
    all_results.extend(parsed) # just extends the existing list 
    
    time.sleep(1) # very important to not overload API
    parameters['startIndex'] += parameters["maxResults"]
    


collecting page 1
collecting page 2
collecting page 3
collecting page 4
collecting page 5


In [15]:
print(len(all_results))


100


Now we can write this data to a CSV.

In [16]:
import csv

keys = all_results[0].keys()  # keys for first book 

with open('books_search.csv', 'w') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(all_results) 
    
# now read as a dataframe 

# Google Geocoding API /  NEED CREDIT CARD FOR THIS NOW

We can use what you learned above to use the [Google geocoding API](https://developers.google.com/maps/documentation/geocoding/start). If you send the API any unique geographic identifier, it will give you all the geographic information back. Let's start by getting countries from coordinates. The function `get_country` below will take any latitude longitude pair and return the country.

First, install gmaps and googlemaps:

In [17]:
# !pip install gmaps
# !pip install -U googlemaps

In [18]:
import googlemaps
from datetime import datetime
import gmaps
import gmaps.datasets

# Getting an API key 

Now, you must obtain an API key for connecting to the Google Maps server: 

0. https://developers.google.com/maps/documentation/
1. Click "Sign in" and login with your UC Berkeley credentials
2. Type "Google Maps Platform" in the search box
3. Click "Get Started" 
4. Check the boxes for Maps (and Routes and Places if you choose)
5. Give your project a name
6. Click Set Billing Account (click Skip if you are asked to enter any info)
7. Click "Next" whent it says it will enable your APIs
8. If successful, you should see a box that says "You're all set!" with a clipboard icon to copy your API key. 
9. Paste your API key into the cell below

In [19]:
gmaps.configure(api_key="")

In [20]:
gmaps = googlemaps.Client(key="")
geocode_result = gmaps.geocode("356 Barrows Hall, Berkeley CA 94720")
geocode_result

ValueError: Must provide API key or enterprise credentials when creating client.

In [21]:
reverse_geocode_result = gmaps.reverse_geocode((37.8686690197085, -122.2597019802915))
reverse_geocode_result

AttributeError: module 'gmaps' has no attribute 'reverse_geocode'

In [22]:
# Request directions via public transit
now = datetime.now()
directions_result = gmaps.directions(origin = "356 Barrows Hall, Berkeley CA 94720",
                                     destination = "1600 Amphitheatre Parkway, Mountain View, CA",
                                     mode="transit",
                                     departure_time=now)
directions_result

TypeError: 'module' object is not callable

# Reverse geocoding

You can also reverse geocode any lat/lon pair of numbers to get a country name for example:

In [23]:
lat = 37.8686690197085
long = -122.2597019802915
results = gmaps.reverse_geocode((lat, long))
[x['formatted_address'] for x in results if 'country' in x['types']]

AttributeError: module 'gmaps' has no attribute 'reverse_geocode'

In [24]:
def get_country(lat, long):
    results = gmaps.reverse_geocode((lat, long))
    country = [x['formatted_address'] for x in results if 'country' in x['types']]
    country = country[0] if country else 'Unknown'

    return country

In [25]:
def get_countries(country_list):
    return [get_country(lat,long) for (lat,long) in countries]

In [26]:
countries = [(41.90, 12.49),(35.86, 104.19), (0,0), (37.86, -122.25)]
get_countries(countries)

AttributeError: module 'gmaps' has no attribute 'reverse_geocode'

# Collecting Data from the Web - Web scraping with *BeautifulSoup* - Wikipedia

We're going to scrape some information from Wikipedia, which has a simple page layout with a consistent template.

For web scraping we're going to need two libraries. The first is [requests](http://docs.python-requests.org/en/master/), which we used in the API notebook. In addition to that, we're going to use [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/). `BeautifulSoup` is what we use to actually navigate and parse the page that we're scraping. We'll import the `time` library too. This will allow us to `time.sleep(5)` so that we don't overload anyone's servers.

In [27]:
# !pip install beautifulsoup4

In [28]:
import requests
from bs4 import BeautifulSoup
import time

First we use `requests` to make a GET request to the page. Let's see what's on the "[Data science](https://en.wikipedia.org/wiki/Data_science)" Wikipedia page:

In [29]:
r = requests.get('https://en.wikipedia.org/wiki/Data_science')
r

<Response [200]>

Then we can read the contents of the server's response and assign it to a new variable, just as we did for APIs, only here we don't want to call the `.json` method because it's HTML. Unfortunately, there's no `.html` method in the requests library, but `BeautifulSoup` will help us there! So let's first get the string:

In [30]:
source = r.text
source

'\n<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Data science - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"b7f8496f-862b-4644-a449-4bc1d9f6fa67","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Data_science","wgTitle":"Data science","wgCurRevisionId":958726853,"wgRevisionId":958726853,"wgArticleId":35458904,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["CS1 maint: others","CS1 maint: date and year","Use dmy dates from December 2012","Information science","Computer occupations","Computational fields of study","Data

Now we use `BeatifulSoup` to convert it into a **`soup` object** that makes navigating the HTML tree much easier.

In [31]:
soup = BeautifulSoup(source, 'html5lib')

Now that we have created this soup object, we can use the `prettify` method to look at the HTML, and even get a slice of it. Let's take a look at what we have:

In [32]:
print(soup.prettify()[:1000])

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Data science - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"b7f8496f-862b-4644-a449-4bc1d9f6fa67","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Data_science","wgTitle":"Data science","wgCurRevisionId":958726853,"wgRevisionId":958726853,"wgArticleId":35458904,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["CS1 maint: others","CS1 maint: date and year","Use dmy dates from December 2012","Information science","Computer occupations","Computational fields of st

At this point, we're in a similar position to what we saw with XML, except now it's slightly more complicated. `BeautifulSoup` has a number of functions to find things on a page. Like other webscraping tools, `BeautifulSoup` lets you find elements by their:

1. HTML tags
2. HTML Attributes
3. CSS Selectors


Let's search first for **HTML tags**. 

The function `find_all` searches the `soup` tree to find all the elements with a particular HTML tag, and returns a list of all those elements.

In [33]:
soup.find_all("a")  # a tag is for links 

[<a id="top"></a>,
 <a class="mw-jump-link" href="#mw-head">Jump to navigation</a>,
 <a class="mw-jump-link" href="#p-search">Jump to search</a>,
 <a href="/wiki/Information_science" title="Information science">information science</a>,
 <a href="/wiki/Machine_learning" title="Machine learning">Machine learning</a>,
 <a href="/wiki/Data_mining" title="Data mining">data mining</a>,
 <a class="image" href="/wiki/File:Kernel_Machine.svg"><img alt="Kernel Machine.svg" data-file-height="233" data-file-width="512" decoding="async" height="100" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Kernel_Machine.svg/220px-Kernel_Machine.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Kernel_Machine.svg/330px-Kernel_Machine.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Kernel_Machine.svg/440px-Kernel_Machine.svg.png 2x" width="220"/></a>,
 <a href="/wiki/Statistical_classification" title="Statistical classification">Classification</a>,
 <a href="/wiki

Since the `find_all()` method is used so frequently, there is a shortcut for it. You can just treat the soup object itself as a function, and pass it the tag you're looking for as an argument.

So `soup.find_all('a')` is the same as `soup('a')`:

In [34]:
soup.find_all('a') == soup('a')

True

You probably noticed that `soup('a')` returned a lot of elements, most of which we might not want. One way to narrow down our search is to specify that we're only looking for elements that have a certain CSS class. Alternatively we can use the `select()` method. We pass the method an argument that consists of the tag and the CSS class separated by a period. We can grab all the links in the navigation box in the upper right with the following CSS selector:

In [35]:
# soup.select("table.vertical-navbox.nowraplinks.plainlist a")
soup.select("table.vertical-navbox")

[<table class="vertical-navbox nowraplinks" style="float:right;clear:right;width:22.0em;margin:0 0 1.0em 1.0em;background:#f9f9f9;border:1px solid #aaa;padding:0.2em;border-spacing:0.4em 0;text-align:center;line-height:1.4em;font-size:88%"><tbody><tr><th style="padding:0.2em 0.4em 0.2em;font-size:145%;line-height:1.2em"><a href="/wiki/Machine_learning" title="Machine learning">Machine learning</a> and<br/><a href="/wiki/Data_mining" title="Data mining">data mining</a></th></tr><tr><td style="padding:0.2em 0 0.4em;padding:0.25em 0.25em 0.75em;"><a class="image" href="/wiki/File:Kernel_Machine.svg"><img alt="Kernel Machine.svg" data-file-height="233" data-file-width="512" decoding="async" height="100" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Kernel_Machine.svg/220px-Kernel_Machine.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Kernel_Machine.svg/330px-Kernel_Machine.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Kernel_Machine.svg/

If you're looking for a quick crash course in developer tools, check out this [YouTube video](https://www.youtube.com/watch?v=FQKvro1Wz-E).

[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/FQKvro1Wz-E/0.jpg)](https://www.youtube.com/watch?v=FQKvro1Wz-E)

# Find a citation

Let's find all the places in the text where there is a citation, along with the references themselves. Using the `select()` method, find all the elements in the page that belong to the `reference-text` class.

---

Once we identify elements, we want to access the information in a certain element. This usually means two things:

1. Text
2. Attributes

Getting the text inside an element is easy. All we have to do is use the `text` member of a `tag` object. Let's look at the first citation:

In [None]:
first_citation = soup.select("span.reference-text")[0]
first_citation

In [None]:
# check out its type
type(first_citation)

It's a tag! Which means it has a `text` member:

In [None]:
first_citation.text

That gives us the text of the citation. But we can also dig deeper into the tag to get other information that's contained there.

If we want to get the link to this citation, we just have to navigate to it. We can again find whatever `a` elements are in this tag, just like we did for the soup object as a whole.

In [None]:
# Find the "a" elements
print(first_citation("a"))

Again this returns a list. In this case the link is located in the first item. We can get that easily.

In [None]:
# Get the first one
print(first_citation("a")[0])

This object is also a tag. Now let's use the `attrs` method to see the tag's attributes.

In [None]:
first_citation("a")[0].attrs

You'll notice that it looks a lot like a dictionary. And we can index it as such. Since we want the link, we can use the `href` attribute like a dictionary key to get the corresponding value.

In [None]:
print(first_citation("a")[0]['href'])

## Challenge 

Let's get all the links contained in the references and add them to a list. We've provided a lot of the code for you, your job is to fill in the blanks. Since not all of the references include a link, we have dealt with that for you already with the first line in the body of the `for` loop.

In [None]:
# YOUR CODE HERE

# make accumulator list
refs_list = []

# start at the endnotes
references = soup.select("span.reference-text")

# loop through references
for ref in references:
    if ref("a") != []:  # ignore the references without links
        
        a_element = ref("a")[0]
        link = _____
        
        refs_list._____(_____)

        
# get rid of links to wiki articles
refs_list = [ref for ref in refs_list if not ref.startswith('/wiki')]
refs_list

If we wanted to collect all the information from these websites, we'd just loop through calling `requests.get` on each URL. If you wanted to get more citation links from other wiki pages, all you have to do is start the GET request above with a different Wikipedia page. You could loop through a whole list of page URLs if you want.

# Tables

Now let's try working with data in tabular format. We'll scrape the data, then we'll save it to a CSV file so that it can easily be processed.

First we'll make a GET request as usual and then convert it to a soup object.

In [None]:
r = requests.get("https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)")
source = r.text
soup = BeautifulSoup(source, 'html5lib')

If you look at the page, you'll see there are three tables, but we're just going to use the first one. The html element we want is `table`, and the class is `wikitable.sortable`.

In [None]:
tables = soup.find_all("table", class_="wikitable sortable")

imf = tables[0] # international monetary fund
imf

First we'll break it down into rows using the `tr` element.

In [None]:
rows = imf("tr")

The first row is the header, so we can get rid of that.

In [None]:
rows = rows[1:]

In [None]:
print(rows[1]) #united states

We can use the `td` element to get the individual cells.

In [None]:
rows[1]("td")

## Challenge

Using what you've learned so far, parse the cells in each row to get the name of each country and its GDP. Then add to the `gdp` `dictionary` with the country as the key and its GDP as the value. With `BeautifulSoup` there is often more than one way to get the information you need, so feel free to experiment.

In [None]:
# YOUR CODE HERE

gdp = {}

for country in rows:
    
    cells = country("td") # get cells in each row
    
    name = _____  # get country name
    gdp_value = _____  # get country gdp

    gdp[name] = gdp_value
    
gdp_value

Now we can write this data to a CSV.

In [None]:
import csv

header = ['Country', 'GDP']

with open('gdp.csv', 'w') as output_file:
    csv_out = csv.writer(output_file)
    csv_out.writerow(header)
    for k in gdp.keys():
        csv_out.writerow([k, gdp[k]])