Comparing and Contrasting Different Data Structures , Format, and Mark Languages

Here’s a practical Python application and real-world discussion for each of the following data structures: JSON, XML, CSV, YAML, and Pickle. 

Practical Python Application:

In [18]:
import json

# Creating a Python dictionary
data = {
    "Group_name": "Data Dynamos",

      "role": "Data Engineer",
    "experience": 3,
    "skills": ["Python", "SQL", "Data Visualization"],
    
           
}

# Convert dictionary to JSON string
json_data = json.dumps(data, indent=4)
print("JSON Data:\n", json_data)

# Save JSON data to a file
with open("data.json", "w") as file:
    json.dump(data, file, indent=4)
    print("\nData saved to 'data.json'.")

# Reading JSON data from the file
with open("data.json", "r") as file:
    loaded_data = json.load(file)
    print("\nLoaded Data from file:")
    print(loaded_data)

# Accessing specific data
print("\nAccessing Specific Data:")
print("Group_Name:", loaded_data["Group_name"])
print("Skills:", loaded_data["skills"])

JSON Data:
 {
    "Group_name": "Data Dynamos",
    "role": "Data Engineer",
    "experience": 3,
    "skills": [
        "Python",
        "SQL",
        "Data Visualization"
    ]
}

Data saved to 'data.json'.

Loaded Data from file:
{'Group_name': 'Data Dynamos', 'role': 'Data Engineer', 'experience': 3, 'skills': ['Python', 'SQL', 'Data Visualization']}

Accessing Specific Data:
Group_Name: Data Dynamos
Skills: ['Python', 'SQL', 'Data Visualization']


 XML (eXtensible Markup Language)

In [19]:
import xml.etree.ElementTree as ET

# Your XML data
xml_data = '''<?xml version="1.0" encoding="UTF-8"?>
<employees>
  <employee id="001">
    <name>Khensani Kubayi</name>
    <position>Data Engineer Intern</position>
    <salary>10000</salary>
  </employee>
  <employee id="002">
    <name>Lerato Motaung</name>
    <position>Cyber Security Intern</position>
    <salary>12000</salary>
  </employee>
  <employee id="003">
    <name>Nozipo Ntiyo</name>
    <position>Java Developer</position>
    <salary>30000</salary>
  </employee>
  <employee id="004">
    <name>Shamaine Mukithi</name>
    <position>Computer Systems Analyst</position>
    <salary>40000</salary>
  </employee>
  <employee id="005">
    <name>Duncan Nukeri</name>
    <position>BI Developer</position>
    <salary>20000</salary>
  </employee>
  <employee id="006">
    <name>Nomcebo Mkhwanazi</name>
    <position>IT Manager</position>
    <salary>450000</salary>
  </employee>
</employees>'''

# Parse the XML data
root = ET.fromstring(xml_data)

# Iterate through each employee and print their details
for employee in root.findall('employee'):
    emp_id = employee.get('id')
    name = employee.find('name').text
    position = employee.find('position').text
    salary = employee.find('salary').text
    print(f"Employee ID: {emp_id}")
    print(f"Name: {name}")
    print(f"Position: {position}")
    print(f"Salary: {salary}\n")
    print("-" * 30)  # Separating the data


Employee ID: 001
Name: Khensani Kubayi
Position: Data Engineer Intern
Salary: 10000

------------------------------
Employee ID: 002
Name: Lerato Motaung
Position: Cyber Security Intern
Salary: 12000

------------------------------
Employee ID: 003
Name: Nozipo Ntiyo
Position: Java Developer
Salary: 30000

------------------------------
Employee ID: 004
Name: Shamaine Mukithi
Position: Computer Systems Analyst
Salary: 40000

------------------------------
Employee ID: 005
Name: Duncan Nukeri
Position: BI Developer
Salary: 20000

------------------------------
Employee ID: 006
Name: Nomcebo Mkhwanazi
Position: IT Manager
Salary: 450000

------------------------------


CSV (Comma-Separated Values)

Practical Python Application:

In [20]:
import csv

# Writing to CSV
data = [["Full Name", "Employee_ID","Position", "Company Name",], ["Lerato Motaung", 3912, "Data Analyst", "Standard Bank"], ["Duncan Nukeri", 8546, "Data Engineer", "FNB"], ["Shamaine Mukithi", 8715, "Computer System Analyst", "Nedbank"], ["Khensani Kubayi", 2658, "Database Administrator", "Absa"], ["Nozipho Ntiyo", 4565, "Data Scientist", "Capitec"], ["Nomcebo Mkhwanazi", 2923, "Data Engineer", "Absa"]]
with open("data.csv", mode="w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(data)

# Reading from CSV
with open("data.csv", mode="r") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

['Full Name', 'Employee_ID', 'Position', 'Company Name']
['Lerato Motaung', '3912', 'Data Analyst', 'Standard Bank']
['Duncan Nukeri', '8546', 'Data Engineer', 'FNB']
['Shamaine Mukithi', '8715', 'Computer System Analyst', 'Nedbank']
['Khensani Kubayi', '2658', 'Database Administrator', 'Absa']
['Nozipho Ntiyo', '4565', 'Data Scientist', 'Capitec']
['Nomcebo Mkhwanazi', '2923', 'Data Engineer', 'Absa']


YAML (YAML Ain't Markup Language)

In [21]:
pip install pyaml

Note: you may need to restart the kernel to use updated packages.


In [22]:
pip show pyyaml

Name: PyYAML
Version: 6.0.1
Summary: YAML parser and emitter for Python
Home-page: https://pyyaml.org/
Author: Kirill Simonov
Author-email: xi@resolvent.net
License: MIT
Location: C:\Users\SEBALAMAKGOLO3\anaconda4\Lib\site-packages
Requires: 
Required-by: anaconda-client, astropy, bokeh, conda-build, conda-repo-cli, cookiecutter, dask, distributed, intake, jupyter-events, pyaml
Note: you may need to restart the kernel to use updated packages.


In [23]:
import yaml

# Dictionary representing personal details and professional experience
profile_info = {
    'Full Name': 'Duncan Nukeri',
    'Job Title': 'Data Engineer',
    'Country of Residence': 'South Africa',
    'Age in Years': '25',
    
    # Work experience across multiple platforms
    'Professional Background': {
        'GitHub Profile': 'Software Engineer',
        'Google Workspace': 'Technical Engineer',
        'LinkedIn Profile': 'Data Analyst'
    },

    # Programming languages and markup skills
    'Skills': {
        'Markup Languages': ['HTML'],
        'Programming Languages': ['Python', 'JavaScript', 'Golang']
    }
}

# Dump the dictionary into a YAML formatted string
yaml_string = yaml.dump(profile_info, default_flow_style=False)

# Print the YAML formatted string
print(yaml_string)


Age in Years: '26'
Country of Residence: South Africa
Full Name: Duncan Nukeri
Job Title: Data Engineer
Professional Background:
  GitHub Profile: Software Engineer
  Google Workspace: Technical Engineer
  LinkedIn Profile: Data Analyst
Skills:
  Markup Languages:
  - HTML
  Programming Languages:
  - Python
  - JavaScript
  - Golang



 Pickle (Python Object Serialization)

In [16]:

import pickle

# Processed data (average age)
people = [
    {"name": "Cebo", "age": 22},
    {"name": "Lerato", "age": 18},
    {"name": "Nozipo", "age": 28},
    {"name": "Khensani", "age":23},
    {"name": "Shamaine", "age":29},
    {"name": "Duncan", "age":25}
]

# Calculate the average age
average_age = sum(person["age"] for person in people) / len(people)

# Serialize the result (average_age)
with open("average_age.pkl", "wb") as file:
    pickle.dump(average_age, file)

print("Average age has been pickled.")


Average age has been pickled.


In [17]:
import pickle

# Load the pickled result (average age)
with open("average_age.pkl", "rb") as file:
    loaded_average_age = pickle.load(file)

print("Loaded average age:", loaded_average_age)


Loaded average age: 24.166666666666668
