# Types of Data & Formats

This notebook gives hands-on examples of structured, semi-structured, and unstructured data and shows how to load or inspect common formats (CSV, JSON). We'll also demonstrate nominal vs ordinal handling in pandas.

## Working with CSV (structured data)

In [None]:
import pandas as pd

# Create a small CSV-like DataFrame and show reading/writing behavior
people = pd.DataFrame({
    'name': ['Alice','Bob','Charlie','Diana'],
    'age': [25, 32, 37, 29],
    'city': ['NY','LA','NY','SF']
})
people

## Working with JSON (semi-structured)

In [None]:
import json

# Example semi-structured JSON payload (list of orders with nested items)
data = [
    {'order_id':1, 'customer':'Alice', 'items':[{'sku':'A1','qty':2},{'sku':'B2','qty':1}]},
    {'order_id':2, 'customer':'Bob', 'items':[{'sku':'C3','qty':1}]},
]

# Normalize nested JSON into a flat table
orders = pd.json_normalize(data, record_path=['items'], meta=['order_id','customer'])
orders

## Nominal vs Ordinal (categorical handling)

In [None]:
# Nominal example: city (no natural order)
people['city_cat'] = people['city'].astype('category')
people['city_cat'].cat.categories

# Ordinal example: size with order small<medium<large
size_df = pd.DataFrame({'size':['small','medium','large','medium']})
size_type = pd.CategoricalDtype(categories=['small','medium','large'], ordered=True)
size_df['size_ord'] = size_df['size'].astype(size_type)
size_df.sort_values('size_ord')

## Notes on Unstructured Data
Unstructured data (images, text, audio) require specialized processing. For text, use tokenization and NLP libraries; for images, use computer vision libraries. In later lessons we'll handle examples.

In [None]:
# Demonstrate writing the 'people' table to CSV and reading it back
csv_file = 'types_of_data_people.csv'
people.to_csv(csv_file, index=False)
print('Wrote', csv_file)
pd.read_csv(csv_file).head()