** Working with JSON data **

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. 

JSON is built on two structures:
    
1. A collection of name/value pairs - you are already familiar with this concept through working with Python dictionaries
2. An ordered list of values - you are already familiar with this concept through working with Python lists
    
More information on JSON is available at https://www.json.org/

The data for this tutorial came from Tweets of Congress daily archives: https://freegovinfo.info/node/tag/twitter-data and https://alexlitel.github.io/congresstweets/

In [28]:
import csv

# The os module provides a portable way of using operating system dependent functionality. 
# For example, Windows and Mac operating systems use different path notations for files. 
# On a Mac, you may see a path to a file or a folder listed as /var/www/html/folder
# On a Windows computer, the same path would be expressed as C:\var\www\html\folder
# The os module hides these differences from the programmer and makes programs written in Python
# more operating system-independent
import os


# The json library can parse JSON from strings or files. 
# The library parses JSON into a Python dictionary or list. 
# It can also convert Python dictionaries or lists into JSON strings.
# http://docs.python-guide.org/en/latest/scenarios/json/
import json

In [29]:
# As a first step, we need to get the current working directory.  
# What that means is that in order for us to access a list of files that reside somewhere on 
# a hard drive, we need a starting point. Generally, that starting point is the director where
# the current script is located (this file).  From that directory, we can figure out a relative path 
# to a subfolder that we need and the files located in that subfolder.
# For more information on relative vs. absolute paths review the following resources:
# https://en.wikipedia.org/wiki/Path_(computing)
# http://resources.esri.com/help/9.3/ArcGISengine/java/Gp_ToolRef/sharing_tools_and_toolboxes/pathnames_explained_colon_absolute_relative_unc_and_url.htm


# Get current working directory
cwd = os.getcwd()
print(cwd)

# Determine the path (location on the harddrive) of the subfolder that contains our data files
data_subfolder = "congress_tweets"

# Combine the working directory name with the subfolder name.  Note that if you are on a Windows machine,
# folder hierarchy is separated with a backslash (\).  However, because in Python the backslash is
# a special character, you actually need to use two backslashes (\\)
# If you are on a Mac or a Linux computer, you need to use a forward slash (/)
folder_path = cwd + "\\" + data_subfolder 
print(folder_path)


C:\Users\dmb72\Box Sync\TEACHING\2017-2018\Python for Data Management and Analytics\Python Materials\06 - Reading and processing data from the web
C:\Users\dmb72\Box Sync\TEACHING\2017-2018\Python for Data Management and Analytics\Python Materials\06 - Reading and processing data from the web\congress_tweets


In [30]:
# Now we need to iterate through the list of files in our data subfolder
for root_folder, subfolders, files in os.walk(folder_path):
    for file_name in files:
        file_path = root_folder + '\\' + file_name
        file = open(file_path, 'r', encoding="utf-8")
        tweet_data = json.load(file)
        print(file_path)
        for tweet in tweet_data[1:10]:
            print("______________________________")
            print("ID: " + tweet["id"])
            print("Screen Name: " + tweet["screen_name"])
            print("User ID: " + tweet["user_id"])
            print("Time: " + tweet["time"])
            print("Text: " + tweet["text"])
            print("Source: " + tweet["source"])

C:\Users\dmb72\Box Sync\TEACHING\2017-2018\Python for Data Management and Analytics\Python Materials\06 - Reading and processing data from the web\congress_tweets\2018-01-24.json
______________________________
ID: 956037956392890368
Screen Name: DeanHeller
User ID: 41363507
Time: 2018-01-24T00:36:25-05:00
Text: RT @ChloeNews3LV GO KNIGHTS GO!!! 🏒✨ @GoldenKnights http://pbs.twimg.com/media/DUSCE1rW0AAzhlX.jpg
Source: Twitter for iPhone
______________________________
ID: 956036850765922304
Screen Name: MurrayCampaign
User ID: 158470209
Time: 2018-01-24T00:32:02-05:00
Text: Endlessly proud of the record-breaking, glass-ceiling smashing number of women who are making their voices heard and running for public office in 2018.
https://twitter.com/emilyslist/status/954829135511142400 QT @emilyslist A record number of women are running for office in 2018, and the movement won't end there. We're upending the status quo. https://www.thecut.com/2018/01/women-candidates-2018-elections.html?utm_camp