# DSC 80: Lab 06

### Due Date: Tuesday May 11th, 11:59 PM

## Instructions
Much like in DSC 10, this Jupyter Notebook contains the statements of the problems and provides code and markdown cells to display your answers to the problems. Unlike DSC 10, the notebook is *only* for displaying a readable version of your final answers. The coding work will be developed in an accompanying `lab*.py` file, that will be imported into the current notebook.

Labs and programming assignments will be graded in (at most) two ways:
1. The functions and classes in the accompanying python file will be tested (a la DSC 20),
2. The notebook will be graded (for graphs and free response questions).

**Do not change the function names in the `*.py` file**
- The functions in the `*.py` file are how your assignment is graded, and they are graded by their name. The dictionary at the end of the file (`GRADED FUNCTIONS`) contains the "grading list". The final function in the file allows your doctests to check that all the necessary functions exist.
- If you changed something you weren't supposed to, just use git to revert!

**Tips for working in the Notebook**:
- The notebooks serve to present you the questions and give you a place to present your results for later review.
- The notebook on *lab assignments* are not graded (only the `.py` file).
- Notebooks for PAs will serve as a final report for the assignment, and contain conclusions and answers to open ended questions that are graded.
- The notebook serves as a nice environment for 'pre-development' and experimentation before designing your function in your `.py` file.

**Tips for developing in the .py file**:
- Do not change the function names in the starter code; grading is done using these function names.
- Do not change the docstrings in the functions. These are there to tell you if your work is on the right track!
- You are encouraged to write your own additional functions to solve the lab! 
    - Developing in python usually consists of larger files, with many short functions.
    - You may write your other functions in an additional `.py` file that you import in `lab**.py` (much like we do in the notebook).
- Always document your code!

### Importing code from `lab**.py`

* We import our `.py` file that's contained in the same directory as this notebook.
* We use the `autoreload` notebook extension to make changes to our `lab**.py` file immediately available in our notebook. Without this extension, we would need to restart the notebook kernel to see any changes to `lab**.py` in the notebook.
    - `autoreload` is necessary because, upon import, `lab**.py` is compiled to bytecode (in the directory `__pycache__`). Subsequent imports of `lab**` merely import the existing compiled python.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import lab06 as lab

In [3]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob
import os
import time

In [4]:
import requests
import bs4
import datetime

# Basic HTML tags practice

**Question 1**

Create a very basic `html` file that satisfies the following properties:

1. Has `<head>` and `<body>` tags.
2. Has a title
3. Inside the body tags:
    * At least two headers
    * At least three images:
        * At least one image must be a local file;
        * At least one image must be linked to online source; 
        * At least one image has to have default text when it cannot be displayed.
    * At least three references (hyperlinks) to different web pages;
    * At least one table with two columns.
    
        
   
4. Save your work as `lab06_1.html` in the same directory as `lab06.py`, make sure it loads in the browser and do not forget to submit it.
5. **Do not forget to submit all data files needed to display your page.**

**Note:** You can toy with (basic) HTML in the cells of a notebook, using either a "markdown cell" or by using the `IPython.display.HTML` function. However, be sure to open your saved file in a browser to be sure the page displays properly!

**Note:** If you work within Jupyter Notebook, you can later copy your text into a text editor and save it with the .html extension.

In [5]:
def question1():
    """
    NOTE: You do NOT need to do anything with this function.

    The function for this question makes sure you
    have a correctly named HTML file in the right
    place. Note: This does NOT check if the supplementary files
    needed for your page are there!

    >>> os.path.exists('lab06_1.html')
    True
    """

    # Don't change this function body!
    # No python required; create the HTML file.

    return

In [6]:
os.path.exists('lab06_1.html')

True

In [7]:
import IPython
IPython.display.HTML('lab06_1.html')

First Name,Last Name
Joe,Mama
John,Deer
Jane,Doe


# Scraping an Online Bookstore


**Question 2**

Browse through the following fake on-line bookstore: http://books.toscrape.com/. This website is meant for toying with scraping.

Scrape the website, collecting data on all books that have **at least a four-star rating**, with a price **under £50** and belong to the book categories you want. You should collect the data in a dataframe as below (if you get an encoding error on your prices columns, like you see in the table below, don't worry about it):
<img src="data/bookdata.png">


Do this using the following steps:
1. Create a function `extract_book_links` that takes in the content of a book-listing page (a string of html), and returns a list of urls of book-detail pages that satisfy the requirements on "*at least* a four-star rating, and prices are *under* £50". 


*Note:* Your function should take under 180 seconds to run through the entire bookstore.

*Note:* Don't worry about type casting (ie changing number of reviews to an int)

In [8]:
fp = os.path.join('data', 'products.html')
text = open(fp, encoding='utf-8').read()
#text

In [9]:
#from bs4 import BeautifulSoup
books = bs4.BeautifulSoup(text).find_all('article',attrs={'class':'product_pod'})
#print(soup.text)

In [10]:
#soup#.find('div')
books[0].find('p',attrs={'class':'price_color'}).text

'£52.33'

In [11]:
def extract_book_links(text):
    """
    :Example:
    >>> fp = os.path.join('data', 'products.html')
    >>> out = extract_book_links(open(fp, encoding='utf-8').read())
    >>> url = 'scarlet-the-lunar-chronicles-2_218/index.html'
    >>> out[1] == url
    True
    """
    ratings = ['four','five']
    urls = []
    books = bs4.BeautifulSoup(text,features="lxml").find_all('article',attrs={'class':'product_pod'})
    for book in books:
        if (book.find('p').attrs['class'][1].lower() in ratings) and (float(book.find('p',attrs={'class':'price_color'}).text.strip('£').strip('Â').strip('£')) < 50):
            urls.append(book.find('a').attrs['href'].replace('catalogue/',''))
    return urls

In [12]:
fp = os.path.join('data', 'products.html')
out = extract_book_links(open(fp, encoding='utf-8').read())
out

['seven-brief-lessons-on-physics_219/index.html',
 'scarlet-the-lunar-chronicles-2_218/index.html',
 'saga-volume-3-saga-collected-editions-3_216/index.html',
 'running-with-scissors_215/index.html',
 'rise-of-the-rocket-girls-the-women-who-propelled-us-from-missiles-to-the-moon-to-mars_213/index.html',
 'ready-player-one_209/index.html']

In [13]:
url = 'scarlet-the-lunar-chronicles-2_218/index.html'
out[1] == url

True

2. Create a function `get_product_info` that takes in the content of a book-detail page (a string of html), a variable `categories` that is a list of book categories you want. If this input book is in the categories you want, returns a dictionary corresponding to a row in the dataframe in the image above (where the keys are the column names and the values are the row values); else, skip this book since this is not the book you want (ie. return None).

In [14]:
fp = os.path.join('data', 'Frankenstein.html')
categories = ['Default']

book = bs4.BeautifulSoup(open(fp, encoding='utf-8').read())
cat = book.find('ul',attrs={'class':'breadcrumb'}).find_all('a')[2].text
table_entries = book.find('table',attrs={'class':'table table-striped'}).find_all('tr')
#get_entry = lambda x: table_entries[x].find('tr').text
table_entries

[<tr>
 <th>UPC</th><td>a492f49a3e2b6a71</td>
 </tr>,
 <tr>
 <th>Product Type</th><td>Books</td>
 </tr>,
 <tr>
 <th>Price (excl. tax)</th><td>£38.00</td>
 </tr>,
 <tr>
 <th>Price (incl. tax)</th><td>£38.00</td>
 </tr>,
 <tr>
 <th>Tax</th><td>£0.00</td>
 </tr>,
 <tr>
 <th>Availability</th>
 <td>In stock (1 available)</td>
 </tr>,
 <tr>
 <th>Number of reviews</th>
 <td>0</td>
 </tr>]

In [15]:
book.find('article', attrs={'class':'product_page'}).find_all('p')[3].text

"Mary Shelley began writing Frankenstein when she was only eighteen. At once a Gothic thriller, a passionate romance, and a cautionary tale about the dangers of science, Frankenstein tells the story of committed science student Victor Frankenstein. Obsessed with discovering the cause of generation and life and bestowing animation upon lifeless matter, Frankenstein assembles Mary Shelley began writing Frankenstein when she was only eighteen. At once a Gothic thriller, a passionate romance, and a cautionary tale about the dangers of science, Frankenstein tells the story of committed science student Victor Frankenstein. Obsessed with discovering the cause of generation and life and bestowing animation upon lifeless matter, Frankenstein assembles a human being from stolen body parts but; upon bringing it to life, he recoils in horror at the creature's hideousness. Tormented by isolation and loneliness, the once-innocent creature turns to evil and unleashes a campaign of murderous revenge a

In [16]:
main = book.find('div', attrs={'class':'col-sm-6 product_main'})
#main

In [17]:
main.find_all('p')[2].attrs['class']#[1]

['star-rating', 'Two']

In [18]:
main.find('h1').text

'Frankenstein'

In [19]:
#book

In [20]:
book.find('ul',attrs={'class':'breadcrumb'}).find_all('a')[2].text

'Default'

In [21]:
def get_product_info(text, categories):
    """
    :Example:
    >>> fp = os.path.join('data', 'Frankenstein.html')
    >>> out = get_product_info(open(fp, encoding='utf-8').read(), ['Default'])
    >>> isinstance(out, dict)
    True
    >>> 'Category' in out.keys()
    True
    >>> out['Rating']
    'Two'
    """
    
    book = bs4.BeautifulSoup(text,features="lxml")
    cat = book.find('ul',attrs={'class':'breadcrumb'}).find_all('a')[2].text
    
    if cat not in categories:
        return None
    else:
        table_entries = book.find('table',attrs={'class':'table table-striped'}).find_all('tr')
        entries = []
        for x in table_entries:
            cut1 = str(x).find('<td>') + 4
            cut2= str(x).find('</td>')
            entries.append(str(x)[cut1:cut2])
        
        description = book.find('article', attrs={'class':'product_page'}).find_all('p')[3].text

        main = book.find('div', attrs={'class':'col-sm-6 product_main'})
        title = main.find('h1').text
        rating = main.find_all('p')[2].attrs['class'][1]
        
        df_dict = {'Availability':entries[5], 'Category':cat, 'Description':description, 
                   'Number of reviews':entries[6], 'Price (excl. tax)':entries[2], 'Price (incl. tax)':entries[3],
                   'Product Type':entries[1], 'Rating':rating, 'Tax':entries[4], 'Title':title, 'UPC':entries[0]}
    return df_dict

In [22]:
fp = os.path.join('data', 'Frankenstein.html')
out = get_product_info(open(fp, encoding='utf-8').read(), ['Default'])
out

{'Availability': 'In stock (1 available)',
 'Category': 'Default',
 'Description': "Mary Shelley began writing Frankenstein when she was only eighteen. At once a Gothic thriller, a passionate romance, and a cautionary tale about the dangers of science, Frankenstein tells the story of committed science student Victor Frankenstein. Obsessed with discovering the cause of generation and life and bestowing animation upon lifeless matter, Frankenstein assembles Mary Shelley began writing Frankenstein when she was only eighteen. At once a Gothic thriller, a passionate romance, and a cautionary tale about the dangers of science, Frankenstein tells the story of committed science student Victor Frankenstein. Obsessed with discovering the cause of generation and life and bestowing animation upon lifeless matter, Frankenstein assembles a human being from stolen body parts but; upon bringing it to life, he recoils in horror at the creature's hideousness. Tormented by isolation and loneliness, the o

In [23]:
isinstance(out, dict)

True

In [24]:
'Category' in out.keys()

True

In [25]:
out['Rating']

'Two'

3. Create a function `scrape_books` of a single variable `k` that scrapes the first `k` pages of the bookstore (as determined by starting at the url above and clicking on the 'next' button),a variable `categories` that is a list of book categories you want, and returns a dataframe of books as the picture above. (Note: make sure the books returned satisfy the requirements set in part 1 about rating and price).

In [59]:
def scrape_books(k, categories):
    """
    :param k: number of book-listing pages to scrape.
    :returns: a dataframe of information on (certain) books
    on the k pages (as described in the question).

    :Example:
    >>> out = scrape_books(1, ['Mystery'])
    >>> out.shape
    (1, 11)
    >>> out['Rating'][0] == 'Four'
    True
    >>> out['Title'][0] == 'Sharp Objects'
    True
    """
    pages = []
    df = pd.DataFrame()
    for i in range (1,k+1):
        pages.append('http://books.toscrape.com/catalogue/page-{}.html'.format(i))
    for fp in pages:
        page = requests.get(fp).text
        books = extract_book_links(page)
        for book in books:
            link = 'http://books.toscrape.com/catalogue/' + book
            book_page = requests.get(link).text
            book_dict = get_product_info(book_page,categories)
            if book_dict is not None:
                book_df = pd.DataFrame([book_dict])
                df = df.append(book_df)
    return df.reindex(sorted(df.columns),axis=1)

In [60]:
out = scrape_books(1, ['Mystery'])
out

Unnamed: 0,Availability,Category,Description,Number of Reviews,Price (excl. tax),Price (incl. tax),Product Type,Rating,Tax,Title,UPC
0,In stock (20 available),Mystery,"WICKED above her hipbone, GIRL across her hear...",0,Â£47.82,Â£47.82,Books,Four,Â£0.00,Sharp Objects,e00eb4fd7b871a48


In [61]:
out = scrape_books2(1, ['Mystery'])
out

Unnamed: 0,Availability,Category,Description,Number of Reviews,Price (excl. tax),Price (incl. tax),Product Type,Rating,Tax,Title,UPC
0,In stock (20 available),Mystery,"WICKED above her hipbone, GIRL across her hear...",0,Â£47.82,Â£47.82,Books,Four,Â£0.00,Sharp Objects,e00eb4fd7b871a48


In [28]:
out.shape

(1, 11)

In [29]:
out['Rating'][0] == 'Four'

True

In [30]:
out['Title'][0] == 'Sharp Objects'

True

# API Requests
**Question 3**

You trade stocks as a hobby. As an avid pandas coder, you figured it is best to calculate some statistics by pulling data from a public API (https://financialmodelingprep.com/developer/docs/#Stock-Historical-Price). Specifically, "Historical price with change and volume interval".

Some definitions (these are the ones you need to know):
- open: The opening price of a stock at the beginning of a trading day
- close: The closing price of a stock at the end of a trading day
- volume: The total number of shares being traded in a day
- percent change: difference in price with respect to the original price (in percentages)


1. Create a function `stock_history` which takes in the stock code (`ticker`) as a string, `year` and `month` as integers, and return a dataframe which has the price history for that stock in that month (include all columns).

2. Create a function `stock_stats` that takes in the output dataframe from `stock_history` and output the stock price change as a percentage and a rough total transaction volume **in billion dollars** for that month. Assume that on average, shares are traded at the midpoint price of high and low for that day. Return these two values as a tuple in a readable format: reserve 2 decimal points for both values and add a plus or minus sign at the front of the percent change. 
$$ \text{Total Transaction Volume (in dollars)} = \text{Volume (number of shares traded)} \times \text{Price} $$

*Example*: If \\$BYND opens at \\$80 and closes at \\$120 with a volume of 1 million, its percent change for the day is $(\$120-\$80) \div \$80 = +50.00\%$. And the estimated total transaction volume is: $(\$80+\$120) / 2 \times 10^6 = 0.10\text{B}$.


Hint: [pd.date_range](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html), 

*Note:* Make sure you read the API documentation if you get stuck!

*Note 2:* In order to make successful requests, you will need an API key. In order to get one, you will need to sign up to the website. Once signed up, you can use the API key that comes with the free plan. It has a limit of 250 requests per day, which should be more than enough. In the code below, replace `your_key` when making requests.

In [31]:
import json

In [32]:
#stock_endpoint = 'https://financialmodelingprep.com/api/v3/historical-price-full/AAPL?apikey=fe8f70fbc0359ff10974537662eb687f'

In [33]:
#response = requests.get("https://financialmodelingprep.com/developer/docs/#Stock-Historical-Price")
#response = requests.get(stock_endpoint)
#print(response.status_code)
#print(response.content)

In [34]:

#json_load = json.loads(response.content)
#df = pd.json_normalize(json_load['historical'])
#df['date'] = pd.to_datetime(df['date'])
#df

In [35]:
#start = datetime.datetime(2019, 6,1)
#end = datetime.datetime(2019, 7,1)
#date_range = pd.date_range(start,end)[:-1].to_pydatetime()
#date_range

In [36]:
#df[df['date'].isin(date_range)]

In [37]:
# def stock_history(ticker, year, month):
#     """
#     Given a stock code and month, return the stock price details for that month
#     as a dataframe

#     >>> history = stock_history('BYND', 2019, 6)
#     >>> history.shape == (20, 13)
#     True
#     >>> history.label.iloc[-1]
#     'June 03, 19'
#     """
#     key = 'fe8f70fbc0359ff10974537662eb687f'
#     stock_endpoint = 'https://financialmodelingprep.com/api/v3/historical-price-full/{}?apikey={}'.format(ticker,key)
#     response = requests.get(stock_endpoint)
    
#     json_load = json.loads(response.content)
#     pd.json_normalize(json_load['historical'])
#     json_load = json.loads(response.content)
    
#     df = pd.json_normalize(json_load['historical'])
#     df['date'] = pd.to_datetime(df['date'])
    
#     start = datetime.datetime(year, month, 1)
#     #print(start)
#     end = datetime.datetime(year, month+1, 1)
#     #print(end)
#     date_range = pd.date_range(start,end)[:-1].to_pydatetime()
    
#     return df[df['date'].isin(date_range)]

In [40]:
def stock_history(ticker, year, month):
    """
    Given a stock code and month, return the stock price details for that month
    as a dataframe

    >>> history = stock_history('BYND', 2019, 6)
    >>> history.shape == (20, 13)
    True
    >>> history.label.iloc[-1]
    'June 03, 19'
    """
    date_range = pd.date_range(start = f'{str(year)}-{str(month)}', end = f'{str(year)}-{str(int(month) + 1)}')[:-1]
    key = 'fe8f70fbc0359ff10974537662eb687f'
    stock_endpoint = f'https://financialmodelingprep.com/api/v3/historical-price-full/{ticker}?from={date_range[0].strftime("%Y-%m-%d")}-1&to={date_range[-1].strftime("%Y-%m-%d")}&apikey={key}'
    response = requests.get(stock_endpoint).json()
    stock_info = response['historical']
    return pd.DataFrame(stock_info)

In [51]:
# history = stock_history('BYND', 2019, 6)
# history.sort_values('date')

In [42]:
history.shape == (20, 13)

True

In [43]:
history.label.iloc[-1]

'June 03, 19'

In [44]:
str(-10)

'-10'

In [47]:
def stock_stats(history):
    """
    Given a stock's trade history, return the percent change and transactions
    in billion dollars.

    >>> history = stock_history('BYND', 2019, 6)
    >>> stats = stock_stats(history)
    >>> len(stats[0]), len(stats[1])
    (7, 6)
    >>> float(stats[0][1:-1]) > 30
    True
    >>> float(stats[1][:-1]) > 1
    True
    """
    df = history.sort_values('date')
    
    pc = (df.iloc[-1]['close'] - df.iloc[0]['open']) / df.iloc[0]['open'] * 100
    if pc > 0:
        pc = '+' + str(f"{pc:.2f}") + '%'
    else:
        pc = str(f"{percent:.2f}") + '%'
    
    ttv_series = df.apply(lambda row : (row.low + row.high) / 2 * row.volume, axis=1)
    ttv = ttv_series.sum() / 1000000000
    ttv = str(f"{ttv:.2f}") + 'B'
    return pc,ttv

In [48]:
history = stock_history('BYND', 2019, 6)
stats = stock_stats(history)
stats

('+54.29%', '33.64B')

In [49]:
(stock_history('BYND', 2019, 6).iloc[0]['low'] + stock_history('BYND', 2019, 6).iloc[0]['high']) / 2 * stock_history('BYND', 2019, 6).iloc[0]['volume'] / 1000000000

1.2009893994459

In [50]:
dummy = stock_history('BYND', 2019, 6)
tv1 = dummy.apply(lambda row : (row.low + row.high) / 2 * row.volume / 1000000000, axis=1)#.sum() 
tv1.sum()

33.63798630907855

In [None]:
#dummy2 = stock_history('BYND', 2019, 6)
#dummy2 = dummy2.assign(mid=(dummy2['high'] + dummy2['low'])/2,)
#tv2 = (dummy2['mid']*dummy2['volume']) / 1000000000
#tv2

In [None]:
#print(tv1.sum())
#print(tv2.sum())

In [None]:
len(stats[0]), len(stats[1])

In [None]:
float(stats[0][1:-1]) > 30

In [None]:
float(stats[1][:-1]) > 1

# Comment Threads

**Question 4**

As a hacker, you get your daily dose of tech news on [Hacker News](https://news.ycombinator.com/). The problem now is that you don't have internet access on your phone in your morning commute to work, so you want to save the interesting stories' comments thread beforehand in a flat file source like csv. You find their API documentation ( https://github.com/HackerNews/API) and implement the following task:

1. Write a function `get_comments` that takes `storyid` as a parameter and returns a dataframe of all the comments below the news story. You can ignore 'dead' comments (you will know it when you see it). **Make sure the order of the comments in your dataframe is from top to bottom just as you see on the website**. You are allowed to use loops in this function. Addtional requirement: write at least one helper method

You only want these information for the comments:
1. `id`: the unique ids
2. `by`: the author of the comment
3. `parent`: who (also in unique ids) they are replying to
4. `text`: the actual comment
5. `time`: when the comment is created (in `pd.datetime` format)

Hints:
1. Use depth-first-search when traversing the comments tree.
2. https://docs.python.org/3/tutorial/datastructures.html#using-lists-as-stacks.
3. Check the size of your dataframe to the story's `descendants` attribute (number of comments).

In [None]:
#news_endpoint = "https://hacker-news.firebaseio.com/v0/item/18344932.json"
#load = requests.get(news_endpoint).json()
#load

In [None]:
#load['kids']

In [None]:
#L = {x for x in load}
#df = pd.read_json(load)
#pd.DataFrame(load)

In [None]:
#load = requests.get(news_endpoint)
#pd.read_json(load.content, orient='columns')

In [None]:
#df_dict = {'id':[], 'by':[], 'parent':[], 'text':[], 'time':[]}
#comment_df = pd.DataFrame(df_dict)
#comment_df

In [None]:
#requests.get("https://hacker-news.firebaseio.com/v0/item/18348631.json").json()

In [None]:
#"https://hacker-news.firebaseio.com/v0/item/{}.json".format(18348631)

In [None]:
# def get_kids(visited,loaded,kids):
#     if kids not in visited:
#         visited.add(kid)
#         if 'kid' in visited['kids']
#         for kid in loaded['kids']:
#             get_kids(visited,loaded,kid)

In [None]:
# story_endpoint = "https://hacker-news.firebaseio.com/v0/item/{}.json".format('18344932')
# load = requests.get(story_endpoint).json()
# story_df = pd.DataFrame(load).sort_values(by='time')
# story_df

In [None]:
def kids_dfs(comment_id,cols):
    link = "https://hacker-news.firebaseio.com/v0/item/{}.json".format(comment_id)
    load = requests.get(link).json()
    link_series = pd.Series(load)
    
    if 'kids' in link_series.index:
        kids = list(link_series['kids'])
        if 'dead' in link_series.index:
            return pd.concat([kids_dfs(kid,cols) for kid in kids], ignore_index=True)
        else:
            link_df = [pd.DataFrame([link_series[cols]])] + [kids_dfs(kid,cols) for kid in kids]
            return pd.concat(link_df, ignore_index=True)
    else:
        if 'dead' in link_series.index:
            return pd.DataFrame(columns=cols)
        else:
            return pd.DataFrame([link_series[cols]])

In [None]:
def get_comments(storyid):
    """
    Returns a dataframe of all the comments below a news story
    >>> out = get_comments(18344932)
    >>> out.shape
    (18, 5)
    >>> out.loc[5, 'by']
    'RobAtticus'
    >>> out.loc[5, 'time'].day
    31
    """
    story_endpoint = "https://hacker-news.firebaseio.com/v0/item/{}.json".format(storyid)
    load = requests.get(story_endpoint).json()
    story_df = pd.DataFrame(load)
    
    cols = ['id','by','parent','text','time']
    comment_df = pd.DataFrame(columns=cols)

    for comment_id in story_df['kids']:
        #comment_endpoint = "https://hacker-news.firebaseio.com/v0/item/{}.json".format(comment_id)
        comments = kids_dfs(comment_id,cols)
        comment_df = pd.concat([comment_df,comments],ignore_index=True)
    
    comment_df['time'] = pd.to_datetime(comment_df['time'], unit='s')
    comment_df = comment_df.astype({'id':'int','parent':'int'})
    
    return comment_df

In [None]:
out = get_comments(18344932)
out

In [None]:
out.shape

In [None]:
out.loc[5, 'by']

In [None]:
out.loc[5, 'time'].day

## Congratulations! You're done!

* Submit the lab on Gradescope