<h1>Examples</h1>

<strong>Example #1: Where’s The Beef?!</strong>

Acquiring data from the internet is often not as straight forward as dropping a URL in a script. In some cases, in order to get the data that you want, you have to pass in URL variables. Below is a good example of this.

We are going to download some beef data from the USDA website. We are going to pull down an XML file, but the file is not sitting static on the server. The file is built from a reports generating application on the USDA server. 

Normally users get their data by using the web application’s GUI. But we need to automate things. So, the strategy is to use the GUI once. When you do that, you take note of the query string that gets built in the URL. Then you make Python variables out of any URL parameters. Now you can bypass the GUI and automate the process of downloading the data.

In [None]:
import urllib
import os

report_date = '09/01/2020'

if not 'script_dir' in globals():
    script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'HTTPAndFTPExample\\'
file_name = 'BeefReport.xml'

target_path = os.path.join(script_dir,data_directory,example_directory,file_name)

url = 'https://mpr.datamart.ams.usda.gov/ws/report/v1/beef/LM_XB459?'
url = url + 'filter={%22filters%22:[{%22fieldName%22:%22Report%20date%22,'
url = url + '%22operatorType%22:%22GREATER%22,%22values%22:[%22' + report_date + '%22]}]}'

with urllib.request.urlopen(url) as source_file:
    with open(target_path, 'wb') as target_file:
        target_file.write(source_file.read())

<strong>Example #2: Download Files Like You’re CIA</strong>

We are going to use a test SFTP server called Rebex. The protocols for server access can be found at <a href="https://test.rebex.net/" target="_blank"> https://test.rebex.net</a>.

If you go to that site and look at the SFTP protocol, you will find the following settings.

host: test.rebex.net

username: demo

password: password

port: 22

Plug those settings into your FTP tool and log into the server. Navigate around and see if you can find an interesting file to download. For this example, we are going to keep is simple and just grab the readme.txt file from the root folder.

This is a simplified example for clarity. Below you will see the line where we set cnopts.hostkeys = None. This actually leaves you open to something called a "man in the middle" attack. In a real scenario, you would have to use something called a host key. We will tackle that in the solutions section.

In [None]:
import urllib
import os
import pysftp

report_date = '09/01/2020'

if not 'script_dir' in globals():
    script_dir = os.getcwd()
data_directory = 'data\\'
example_directory = 'HTTPAndFTPExample\\'
file_name = 'readme.txt'
known_host = 'known_host'

host = 'test.rebex.net'
username = 'demo'
password = 'password'

target_path = os.path.join(script_dir,data_directory,example_directory,file_name)
known_host_path = os.path.join(script_dir,data_directory,example_directory,known_host)

cnopts = pysftp.CnOpts()
cnopts.hostkeys = None  

with pysftp.Connection(host = host, username = username, password = password, cnopts=cnopts) as sftp:
        sftp.get(file_name, target_path)

Copyright © 2020, Mass Street Analytics, LLC. All Rights Reserved.