** Working with JSON data **

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. 

** Challenge 1 **

Write a function that takes care of JSON parsing and converts a single tweet from JSON format to a comma-separated-values (csv) line.  

1. This function should have the following signature:

        def parse_json_tweet(tweet)
2. The tweet argument will be a tweet extracted from the JSON data file
3. The function will return a Python list with all the values contained in the tweet.  For example, if the tweet looks like
        {
            "id": "956039366593470464",
            "screen_name": "bob",
            "user_id": "377609596",
            "time": "2018-01-24T00:42:02-05:00",
            "text": "Hello",
            "source": "Twitter for iPhone"
        }
    the output should look like
        ["956039366593470464", "bob", "377609596", "2018-01-24T00:42:02-05:00", "Hello", "Twitter for iPhone"]
        

** Challenge 2 **

Convert ALL provided JSON files into tabular format and save as a CSV file

In [10]:
import csv
import os
import json

In [11]:
# Get current working directory
cwd = os.getcwd()
print(cwd)

# Determine the path (location on the harddrive) of the subfolder that contains our data files
data_subfolder = "congress_tweets"

# Combine the working directory name with the subfolder name.
folder_path = cwd + "/" + data_subfolder 



/Users/dmitriyb/Box Sync/TEACHING/2017-2018/Python for Data Management and Analytics/Python Materials/06 - Reading and processing data from the web


In [12]:
# Define function for converting tweet dictionaries to lists
def parse_json_tweet(tweet):
    # "tal" is an acronym that stands for "tweet as list"
    tal = []
    tal.append(str(tweet["id"] + "_"))
    tal.append(tweet["screen_name"])
    tal.append(tweet["user_id"] + "_")
    tal.append(tweet["time"])
    tal.append(tweet["text"])
    tal.append(tweet["source"])
    
    return tal
    
    


In [14]:
# Name of the output file
out_filename = "tweet_output.csv"

# We'll save the file in the working directory
out_file = open(cwd + "/" + out_filename, "w", encoding="utf-8")

# Create a csv writer object
writer = csv.writer(out_file)

device = {}

# Now we need to iterate through the list of files in our data subfolder
for root_folder, subfolders, files in os.walk(folder_path):
    for file_name in files:
        file_path = root_folder + '/' + file_name
        file = open(file_path, 'r', encoding="utf-8")
        tweet_data = json.load(file)
        
        for tweet in tweet_data[1:10]:
            tweet_as_list = parse_json_tweet(tweet)
            writer.writerow(tweet_as_list)
            if tweet["source"] in tweet.keys():
                device[tweet["source"]] += 1
            else:
                device[tweet["source"]] = 1

out_file.close()
print("DONE")

print(device)

DONE
{'Twitter for iPhone': 1, 'Sprout Social': 1, 'Tweetbot for iΟS': 1, 'Twitter Web Client': 1, 'Twitter for iPad': 1, 'TweetDeck': 1, 'Twitter for Android': 1, 'Facebook': 1}
