We saw that some articles are available in the Seeking Alpha site.
We can request a summary news articles about a particular company from this site as in the following example.

In [27]:
import requests
url="https://seekingalpha.com/api/sa/combined/GOOGL.xml"
req=requests.get(url)
req.status_code

200

In [29]:
req.text

'<?xml version="1.0"?>\n<rss xmlns:sa="https://seekingalpha.com/api/1.0" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">\n  <channel>\n    <title>Alphabet, Inc. Cl A - News and Analysis on Seeking Alpha</title>\n    <link>https://seekingalpha.com</link>\n    <description>&#xA9; seekingalpha.com. Use of this feed is limited to personal, non-commercial use and is governed by Seeking Alpha\'s Terms of Use (https://seekingalpha.com/page/terms-of-use). Publishing this feed for public or commercial use and/or misrepresentation by a third party is prohibited.</description>\n    <item>\n      <title>EU deciding on Google privacy scope</title>\n      <link>https://seekingalpha.com/symbol/GOOGL/news?source=feed_symbol_GOOGL</link>\n      <guid isPermaLink="false">https://seekingalpha.com/MarketCurrent:3421782</guid>\n      <pubDate>Wed, 09 Jan 2019 10:14:48 -0500</pubDate>\n      <sa:author_name>Brandy Betz</sa:author_name>\n      <media:thumbnail url=""/>\n      <sa:picture/>\n      

We see that the file returned is an XML file.  XML is another commonly used standard format for creating documents that store information in a text file.

Here is a link to a nice tutorial:

    https://www.tutorialspoint.com/xml/index.htm
    
    
A key takeway from the tutorial is that XML has a tree structure with nodes that have 
-  tags
-  attributes, and 
-  text.

There is a nice package for navigating through the tree as shown in this example. Before demonstrating that capability, we can open up the text file in a text editor and an XML previewer and look at its structure.

When looking at the file in a text editor the newline characters help. Let's write the file out and have a look at it.


In [30]:
fout=open("GOOGLE.xml","w")
fout.write(req.text)
fout.close()

Now we try a Python package for inspecting the file. The following command gives us the root node of the tree.  From the code below we see that a tag is a strings, the attributes are returned as dictionaries and text is a string.

In [43]:
import xml.etree.ElementTree as ET
root = ET.fromstring(req.text)
print(type(root.tag))
print(type(root.attrib))
print(type(root.text))
print("root tag = " + root.tag)
print("root attrib = " + str(root.attrib))
print("root text" + root.text)

<class 'str'>
<class 'dict'>
<class 'str'>
root tag = rss
root attrib = {'version': '2.0'}
root text
  


Next we look at the children of the root node. How many children does this node have?

In [48]:
print(len(root))

1


We can extract a child of a node using its index.

In [56]:
child=root[0]

How many children does this child have?

In [57]:
len(child)

33

We can iterate over the children. (Nodes are iterable.)

In [85]:
for ch in child:
    print(ch.tag)
    print(len(ch))

title
0
link
0
description
0
item
9
item
11
item
8
item
9
item
8
item
9
item
9
item
9
item
9
item
13
item
28
item
9
item
10
item
12
item
9
item
10
item
10
item
13
item
10
item
9
item
11
item
10
item
11
item
9
item
9
item
9
item
9
item
9
item
13
item
10


We see that the first 3 nodes have no children. We can view the text associated with these nodes.

In [88]:
for i in range(3):
    print("text = " + child[i].text)


text = Alphabet, Inc. Cl A - News and Analysis on Seeking Alpha
text = https://seekingalpha.com
text = © seekingalpha.com. Use of this feed is limited to personal, non-commercial use and is governed by Seeking Alpha's Terms of Use (https://seekingalpha.com/page/terms-of-use). Publishing this feed for public or commercial use and/or misrepresentation by a third party is prohibited.


Now we have a look at an item.

In [93]:
item=child[3]
len(item)

9

In [99]:
for i in item:
    print("tag = " + i.tag)
    if type(i.text) is str:
        print("text = " + i.text)

tag = title
text = EU deciding on Google privacy scope
tag = link
text = https://seekingalpha.com/symbol/GOOGL/news?source=feed_symbol_GOOGL
tag = guid
text = https://seekingalpha.com/MarketCurrent:3421782
tag = pubDate
text = Wed, 09 Jan 2019 10:14:48 -0500
tag = {https://seekingalpha.com/api/1.0}author_name
text = Brandy Betz
tag = {http://search.yahoo.com/mrss/}thumbnail
tag = {https://seekingalpha.com/api/1.0}picture
tag = {https://seekingalpha.com/api/1.0}stock
text = 
        
tag = {https://seekingalpha.com/api/1.0}stock
text = 
        
