# Scrapping patent data from USPTO

We will be scrapping patent data from [USPTO](http://patft.uspto.gov/netahtml/PTO/index.html) website. 

For an example we are using US10618288B2 patent.

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [4]:
url = 'http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&d=PALL&s1=10618288.PN.'
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc) #Create a BeautifulSoup object from the HTML
pretty_soup = soup.prettify() #Prettify the BeautifulSoup object
print(pretty_soup)

<html>
 <head>
  <base target="_top"/>
  <title>
   United States Patent: 10618288
  </title>
 </head>
 <body bgcolor="#FFFFFF">
  <a name="top">
  </a>
  <center>
   <img alt="[US Patent &amp; Trademark Office, Patent Full Text and Image Database]" src="/netaicon/PTO/patfthdr.gif"/>
   <br/>
   <table>
    <tr>
     <td align="center">
      <a href="/netahtml/PTO/index.html">
       <img alt="[Home]" border="0" src="/netaicon/PTO/home.gif" valign="middle"/>
      </a>
      <a href="/netahtml/PTO/search-bool.html">
       <img alt="[Boolean Search]" border="0" src="/netaicon/PTO/boolean.gif" valign="middle"/>
      </a>
      <a href="/netahtml/PTO/search-adv.htm">
       <img alt="[Manual Search]" border="0" src="/netaicon/PTO/manual.gif" valign="middle"/>
      </a>
      <a href="/netahtml/PTO/srchnum.htm">
       <img alt="[Number Search]" border="0" src="/netaicon/PTO/number.gif" valign="middle"/>
      </a>
      <a href="/netahtml/PTO/help/help.htm">
       <img alt="[Help]" b

In [7]:
soup.head.title.text

'United States Patent: 10618288'

In [13]:
tables = soup.body.find_all('table')

b_tables = tables[2].find_all('b')

for b in b_tables:
    print(b.text)

United States Patent 
10,618,288


         Ender
,   et al.

     April 14, 2020



In [46]:
patent = b_tables[1].text
pub_date = b_tables[4].text
print("Patent Number: " + str(patent))
print("Publication Date: " + str(pub_date))

Patent Number: 10,618,288
Publication Date: 
     April 14, 2020



In [56]:
title = soup.body.find_all('font')
title = title[3].text
print("Title: " + str(title))

Title: Shroud for a printhead assembly



In [57]:
abstract = soup.p.text
print("Abstract: \n" + str(abstract))

Abstract: 
 In one example, a shroud to protect a group of printheads in a printhead
     assembly includes a body having a first notched end where an upstream
     part of the body extends past a downstream part of the body and a second
     notched end opposite the first end where a downstream part of the body
     extends past an upstream part of the body. A first group of openings in
     the body is aligned across the upstream part of the body and a second
     group of openings in the body is aligned across the downstream part of
     the body. Each opening is to surround an exposed part of a printhead when
     the shroud is installed on the printhead assembly.



In [67]:
tr = tables[3].find_all('tr')
inventors = tr[0].find_all('b')
print("Inventors: ")
for inventor in inventors:
    print(inventor.text)

Inventors: 
Ender; Ronald J.
, Dowell; Daniel D.


In [68]:
assignee = tr[1].b.text
print("Assignee: " + str(assignee))

Assignee: Hewlett-Packard Development Company, L.P.


In [73]:
file = tr[7].b.text
print("Filing Date: " + str(file))

Filing Date: April 24, 2018


In [130]:
coma = soup.coma.text
coma

'Dierker & Kavanaugh PC\n\n\nParent Case Text\n\nCROSS REFERENCE TO RELATED APPLICATIONS\n This is a continuation of U.S. patent application Ser. No. 15/311,902\n     filed Nov. 17, 2016 which is itself a Section 371 national entry of\n     international patent application no. PCT/US2014/040330 filed May 30,\n     2014, each of which is incorporated herein by reference in its entirety.\n         \nClaims  What is claimed is:  1.  A shroud to protect a group of printheads in a printhead assembly, the shroud comprising: a stretched S shaped body characterized by elongated, parallel first and second\nsides staggered relative to one another such that one end of each side protrudes past one end of other side;  multiple openings in the body through which the printheads are exposed when the shroud is installed on the printhead assembly, the openings\narranged into a first group aligned across the first side of the body and a second group aligned across the second side of the body parallel to 

In [144]:
p_coma = soup.coma.prettify()

In [151]:
claim = p_coma.split("<br/>")

In [152]:
claim

['<coma>\n Dierker &amp; Kavanaugh PC\n ',
 '\n <hr/>\n <center>\n  <b>\n   <i>\n    Parent Case Text\n   </i>\n  </b>\n </center>\n <hr/>\n ',
 '\n ',
 '\n CROSS REFERENCE TO RELATED APPLICATIONS\n ',
 '\n ',
 '\n This is a continuation of U.S. patent application Ser. No. 15/311,902\n     filed Nov. 17, 2016 which is itself a Section 371 national entry of\n     international patent application no. PCT/US2014/040330 filed May 30,\n     2014, each of which is incorporated herein by reference in its entirety.\n <hr/>\n <center>\n  <b>\n   <i>\n    Claims\n   </i>\n  </b>\n </center>\n <hr/>\n ',
 '\n ',
 '\n What is claimed is:\n ',
 '\n ',
 '\n 1.  A shroud to protect a group of printheads in a printhead assembly, the shroud comprising: a stretched S shaped body characterized by elongated, parallel first and second\nsides staggered relative to one another such that one end of each side protrudes past one end of other side;  multiple openings in the body through which the printheads are 

In [179]:
claims = claim[7:34]
for i in claims:
    print(i)


 What is claimed is:
 

 

 1.  A shroud to protect a group of printheads in a printhead assembly, the shroud comprising: a stretched S shaped body characterized by elongated, parallel first and second
sides staggered relative to one another such that one end of each side protrudes past one end of other side;  multiple openings in the body through which the printheads are exposed when the shroud is installed on the printhead assembly, the openings
arranged into a first group aligned across the first side of the body and a second group aligned across the second side of the body parallel to the openings in the first group;  and a continuous elongated first ridge across an exterior surface of the
body next to the openings in the first group, the first ridge upstream from and completely spanning the openings in the first group.
 

 

 2.  The shroud of claim 1, where the openings are arranged in a staggered configuration in which: an end of each opening in the first group overlaps an end 

In [176]:
desc = claim[35:96]
for i in desc:
    print(i)


 BACKGROUND
 

 

 In some inkjet printers, a stationary media wide printhead assembly, commonly called a print bar, is used to print on paper or other print media moved past the print bar.
 

 

 DRAWINGS
 

 

 FIG. 1 is a block diagram illustrating an inkjet printer in which examples of a new printhead assembly shroud may be implemented.
 

 

 FIG. 2 illustrates a modular print bar implementing one example of a protective shroud such as might be used in the printer of FIG. 1.
 

 

 FIG. 3 is a perspective view of one of the printhead assembly modules in the print bar shown in FIG. 2.
 

 

 FIG. 4 is an exploded view of the printhead assembly module shown in FIG. 3.
 

 

 FIG. 5 is a close up view showing the topography of the shroud in the module of FIGS. 3 and 4 in more detail.
 

 

 FIG. 6 is a section along the line 6-6 in FIG. 5.
 

 

 FIG. 7 is a detail view from FIG. 6.
 

 

 FIG. 8 is a section view along the line 8-8 in FIG. 5.
 

 

 FIG. 9 is a side view illustrati

In [178]:
a_tags = soup.find_all('a')

for link in a_tags:
    print("http://patft.uspto.gov/" + str(link.get('href')))

http://patft.uspto.gov/None
http://patft.uspto.gov//netahtml/PTO/index.html
http://patft.uspto.gov//netahtml/PTO/search-bool.html
http://patft.uspto.gov//netahtml/PTO/search-adv.htm
http://patft.uspto.gov//netahtml/PTO/srchnum.htm
http://patft.uspto.gov//netahtml/PTO/help/help.htm
http://patft.uspto.gov/#bottom
http://patft.uspto.gov/https://certifiedcopycenter.uspto.gov/other/patft/view.html?backUrl1=http%3A//patft.uspto.gov/netacgi/nph-Parser?Sect1%3DPTO1%26Sect2%3DHITOFF%26p%3D1%26u%3D%2Fnetahtml%2FPTO%2Fsrchnum.html%26r%3D1%26f%3DG%26l%3D50%26d%3DPALL%26s1%3D10618288.PN.%26OS%3D&backLabel1=Back%20to%20Document%3A%2010618288
http://patft.uspto.gov/https://certifiedcopycenter.uspto.gov/other/patft/order.html?docNumber=10618288&backUrl1=http%3A//patft.uspto.gov/netacgi/nph-Parser?Sect1%3DPTO1%26Sect2%3DHITOFF%26p%3D1%26u%3D%2Fnetahtml%2FPTO%2Fsrchnum.html%26r%3D1%26f%3DG%26l%3D50%26d%3DPALL%26s1%3D10618288.PN.%26OS%3D&backLabel1=Back%20to%20Document%3A%2010618288
http://patft.uspto.go