# Pandas version history

This document is based on the [tutorial by calmcode.io](https://calmcode.io/gazpacho/introduction.html)

Here we're reading html from the page supplied by the url.

In [14]:
url = "https://pypi.org/project/pandas/#history"

from gazpacho import get

html = get(url)

We use function find of object Soupt to get specific tags and/or classes of objects from the html doc.

In [15]:
from gazpacho import Soup

soup = Soup(html)
cards = soup.find("a", {"class": "card"})

Parameter partial allows us to either select by full or partial tag/class.

In [16]:
cards[0].find("p", {"class": "release__version"}, partial=False)

<p class="release__version">
                  1.4.1
                </p>

In [17]:
cards[0].find("p", {"class": "release__version"}, partial=True)

[<p class="release__version">
                   1.4.1
                 </p>,
 <p class="release__version-date">
   <time datetime="2022-02-12T11:21:13+0000" data-controller="localized-time" data-localized-time-relative="true" data-localized-time-show-time="false">
   Feb 12, 2022
 </time>
 </p>]

We use attrs function to get the attributes of the tag, instead of the text within the tag.

In [18]:
# this is a dictionary
cards[0].find("time").attrs

{'datetime': '2022-02-12T11:21:13+0000',
 'data-controller': 'localized-time',
 'data-localized-time-relative': 'true',
 'data-localized-time-show-time': 'false'}

We're specifically interested in the datetime attribute.

In [19]:
# this is the information we're interested in
cards[0].find("time").attrs["datetime"]

'2022-02-12T11:21:13+0000'

Full process to get the list of dictionaries.

In [24]:
from gazpacho import get, Soup

url = "https://pypi.org/project/pandas/#history"

html = get(url)
soup = Soup(html)
cards = soup.find("a", {"class": "card"})


def parse_card(card):
    version = card.find("p", {"class": "release__version"}, partial=False).text
    timestamp = card.find("time").attrs["datetime"]
    return {"version": version, "timestamp": timestamp}


data_list = [parse_card(c) for c in cards]

We can use pandas to process that list of dictionaries into a convenient format.

In [25]:
import pandas as pd

(
    pd.DataFrame(data_list).assign(
        timestamp=lambda d: pd.to_datetime(d["timestamp"])
    )
)

Unnamed: 0,version,timestamp
0,1.4.1,2022-02-12 11:21:13+00:00
1,1.4.0,2022-01-22 14:47:00+00:00
2,1.4.0rc0,2022-01-06 11:01:13+00:00
3,1.3.5,2021-12-12 14:30:49+00:00
4,1.3.4,2021-10-17 16:42:57+00:00
...,...,...
80,0.4.1,2011-09-26 01:22:28+00:00
81,0.4.0,2011-09-12 19:41:11+00:00
82,0.3.0,2011-02-20 01:00:04+00:00
83,0.2,2010-05-18 13:14:26+00:00
