# Data Analysis In Base Python Using JSON

JSON(JavaScript Object Notation) is a light-weight data-interchange format that's easy to read and write. 

In this lesson, we'll cover load, analyze, and manipulate JSON data in Python without using external libraries like `pandas`

## Objectives

* Understand JSON structure and loading JSON data in python
* Perform basic data analysis and transformation using Python's built in features
* Save results back to JSON format 


## JSON Structure

JSON represents data as key-value pairs, arrays or nested objects.

In [1]:
# example

employees = {
    "employees": [
        {'id': 9178, 'name': 'Sylvia', 'age': 24, 'department': 'tech & data'},
        {'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'},
        {'id': 2279, 'name': 'Teresia', 'age': 26, 'department': 'finance'},
    ]
}

In [2]:
# Teresia's department
t_department = employees['employees'][2]['department']
t_department

'finance'

## Loading JSON Data in Python

* Using the `json` module

In [3]:
import json 

# example json string

employees = '''
{
    "employees": [
        {"id": 9178, "name": "Sylvia", "age": 24, "department": "tech & data"},
        {"id": 8124, "name": "Peter", "age": 34, "department": "engineering"},
        {"id": 2279, "name": "Teresia", "age": 26, "department": "finance"}
    ]
}
'''  

# parse JSON string to python dictionary 
data_dict = json.loads(employees)

type(data_dict)
data_dict

{'employees': [{'id': 9178,
   'name': 'Sylvia',
   'age': 24,
   'department': 'tech & data'},
  {'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'},
  {'id': 2279, 'name': 'Teresia', 'age': 26, 'department': 'finance'}]}

* `dump()` - Converts a Python Object(e.g., list, dict) into JSON format and writes it to a file

In [4]:
# save the above dict as json file
with open('employees.json', 'w') as json_file:
    json.dump(data_dict, json_file, indent=4) # indent=4 makes the json file more readbale

* `load()` - Reads JSON data from a file and converts it into a Python Object 

In [5]:
# read the saved json data from a file 
with open('employees.json', 'r') as json_file:
    data = json.load(json_file)

In [6]:
data

{'employees': [{'id': 9178,
   'name': 'Sylvia',
   'age': 24,
   'department': 'tech & data'},
  {'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'},
  {'id': 2279, 'name': 'Teresia', 'age': 26, 'department': 'finance'}]}

* The `loads()` method is different from the `load()` because it works with JSON data in string format, not directly with files.

## Exploring and analyzing data

In [7]:
# access all employees

employees = data['employees']

employees

[{'id': 9178, 'name': 'Sylvia', 'age': 24, 'department': 'tech & data'},
 {'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'},
 {'id': 2279, 'name': 'Teresia', 'age': 26, 'department': 'finance'}]

In [8]:
type(employees)

list

In [9]:
type(data['employees'])

list

In [10]:
# calculate averega age 
ages = [employee['age'] for employee in data['employees']] # list comprehension
print(ages)
avg_age = sum(ages) / len(ages)
print(avg_age)

[24, 34, 26]
28.0


In [11]:
num = [x for x in range(5)]
print(num)

[0, 1, 2, 3, 4]


In [12]:
num = []
for x in range(5):
    num.append(x)
num

[0, 1, 2, 3, 4]

In [13]:
# filter employees by department 

tech_data_employees = [emp for emp in employees if emp['department'] == 'tech & data']
print(tech_data_employees)

[{'id': 9178, 'name': 'Sylvia', 'age': 24, 'department': 'tech & data'}]


In [14]:
# filter without list comprehension
tech_data_employees = []

for emp in employees:
    if emp['department'] == 'tech & data':
        tech_data_employees.append(emp)

tech_data_employees

[{'id': 9178, 'name': 'Sylvia', 'age': 24, 'department': 'tech & data'}]

In [15]:
# sort employees by age 
sort_employees = sorted(employees, key=lambda x: x['age'])
sort_employees

[{'id': 9178, 'name': 'Sylvia', 'age': 24, 'department': 'tech & data'},
 {'id': 2279, 'name': 'Teresia', 'age': 26, 'department': 'finance'},
 {'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'}]

In [16]:
help(list)

Help on class list in module builtins:

class list(object)
 |  list(iterable=(), /)
 |
 |  Built-in mutable sequence.
 |
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |
 |  Methods defined here:
 |
 |  __add__(self, value, /)
 |      Return self+value.
 |
 |  __contains__(self, key, /)
 |      Return bool(key in self).
 |
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __getitem__(self, index, /)
 |      Return self[index].
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate sign

In [17]:
## Add a new employee

new_employee = {'id': 2025, 'name': 'Beryl', 'age': 28, 'department': 'engineering'}
data['employees'].append(new_employee)
data

{'employees': [{'id': 9178,
   'name': 'Sylvia',
   'age': 24,
   'department': 'tech & data'},
  {'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'},
  {'id': 2279, 'name': 'Teresia', 'age': 26, 'department': 'finance'},
  {'id': 2025, 'name': 'Beryl', 'age': 28, 'department': 'engineering'}]}

In [18]:
names = {
    'first_name': 'John',
    'second_name': 'Kamau'
}

# change second_name to Maina
names['second_name'] = 'Maina'

names

{'first_name': 'John', 'second_name': 'Maina'}

In [19]:
# update Sylvia's age

for emp in data['employees']:
    if emp['name'] == 'Sylvia':
        emp['age'] = 27

data

{'employees': [{'id': 9178,
   'name': 'Sylvia',
   'age': 27,
   'department': 'tech & data'},
  {'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'},
  {'id': 2279, 'name': 'Teresia', 'age': 26, 'department': 'finance'},
  {'id': 2025, 'name': 'Beryl', 'age': 28, 'department': 'engineering'}]}

In [20]:
# delete Teresia >> resigned 
data['employees'] = [emp for emp in data['employees'] if emp['name'] != 'Teresia']

data['employees']

[{'id': 9178, 'name': 'Sylvia', 'age': 27, 'department': 'tech & data'},
 {'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'},
 {'id': 2025, 'name': 'Beryl', 'age': 28, 'department': 'engineering'}]

## Knock yourself out 

In [21]:
# research on collections 
from collections import defaultdict

# group by department 
department_groups = defaultdict(list)
for emp in data['employees']:
    department_groups[emp['department']].append(emp)

print(dict(department_groups))

{'tech & data': [{'id': 9178, 'name': 'Sylvia', 'age': 27, 'department': 'tech & data'}], 'engineering': [{'id': 8124, 'name': 'Peter', 'age': 34, 'department': 'engineering'}, {'id': 2025, 'name': 'Beryl', 'age': 28, 'department': 'engineering'}]}
