#### Part 12: JSON and Excel Operations in Pandas

In this notebook, we'll explore:
- Working with JSON data in pandas
- Different JSON orientation options
- Date handling in JSON
- Working with Excel files

##### Setup
First, let's import the necessary libraries:

In [1]:
import pandas as pd
import numpy as np
from io import StringIO

##### 1. JSON Operations

### 1.1 Basic JSON Conversion

Let's start by creating a DataFrame and converting it to JSON:

In [2]:
dfj = pd.DataFrame(np.random.randn(5, 2), columns=list('AB'))

json = dfj.to_json()

json

'{"A":{"0":-0.205772632,"1":-3.1348958547,"2":2.1438206862,"3":-0.3667757566,"4":-1.0010306659},"B":{"0":-0.7191712093,"1":-0.5040079348,"2":0.2484554929,"3":0.8361432408,"4":-0.5172125147}}'

### 1.2 Orient Options

There are different options for the format of the resulting JSON file/string. Let's create a DataFrame and Series to demonstrate:

In [3]:
dfjo = pd.DataFrame(dict(A=range(1, 4), B=range(4, 7), C=range(7, 10)),
                   columns=list('ABC'), index=list('xyz'))

dfjo

Unnamed: 0,A,B,C
x,1,4,7
y,2,5,8
z,3,6,9


In [4]:
sjo = pd.Series(dict(x=15, y=16, z=17), name='D')

sjo

x    15
y    16
z    17
Name: D, dtype: int64

#### Column Oriented (default for DataFrame)
Serializes the data as nested JSON objects with column labels acting as the primary index:

In [5]:
dfjo.to_json(orient="columns")

'{"A":{"x":1,"y":2,"z":3},"B":{"x":4,"y":5,"z":6},"C":{"x":7,"y":8,"z":9}}'

#### Index Oriented (default for Series)
Similar to column oriented but the index labels are now primary:

In [6]:
dfjo.to_json(orient="index")

'{"x":{"A":1,"B":4,"C":7},"y":{"A":2,"B":5,"C":8},"z":{"A":3,"B":6,"C":9}}'

In [7]:
sjo.to_json(orient="index")

'{"x":15,"y":16,"z":17}'

#### Record Oriented
Serializes the data to a JSON array of column -> value records, index labels are not included:

In [8]:
dfjo.to_json(orient="records")

'[{"A":1,"B":4,"C":7},{"A":2,"B":5,"C":8},{"A":3,"B":6,"C":9}]'

In [9]:
sjo.to_json(orient="records")

'[15,16,17]'

#### Value Oriented
A bare-bones option which serializes to nested JSON arrays of values only, column and index labels are not included:

In [10]:
dfjo.to_json(orient="values")

'[[1,4,7],[2,5,8],[3,6,9]]'

#### Split Oriented
Serializes to a JSON object containing separate entries for values, index and columns. Name is also included for Series:

In [11]:
dfjo.to_json(orient="split")

'{"columns":["A","B","C"],"index":["x","y","z"],"data":[[1,4,7],[2,5,8],[3,6,9]]}'

In [12]:
sjo.to_json(orient="split")

'{"name":"D","index":["x","y","z"],"data":[15,16,17]}'

### 1.3 Date Handling

#### Writing in ISO date format:

In [14]:
dfd = pd.DataFrame(np.random.randn(5, 2), columns=list('AB'))

dfd['date'] = pd.Timestamp('20130101')

dfd = dfd.sort_index(axis=1, ascending=False)
json = dfd.to_json(date_format='iso')

json

'{"date":{"0":"2013-01-01T00:00:00.000","1":"2013-01-01T00:00:00.000","2":"2013-01-01T00:00:00.000","3":"2013-01-01T00:00:00.000","4":"2013-01-01T00:00:00.000"},"B":{"0":2.1314730275,"1":0.1370649203,"2":-0.2528304693,"3":0.1083873157,"4":-0.5145385258},"A":{"0":-0.0542646546,"1":1.4443554679,"2":-0.0130010471,"3":-2.0431126313,"4":-1.4408423854}}'

#### Writing in ISO date format, with microseconds:

In [15]:
json = dfd.to_json(date_format='iso', date_unit='us')

##### 2. Excel Operations

### 2.1 Reading Excel Files

There are multiple ways to read Excel files in pandas:

In [16]:
# This is a code example - you would need an actual Excel file to run this
# Using the ExcelFile class
'''
data = {}
with pd.ExcelFile('path_to_file.xls') as xls:
    data['Sheet1'] = pd.read_excel(xls, 'Sheet1', index_col=None,
                                   na_values=['NA'])
    data['Sheet2'] = pd.read_excel(xls, 'Sheet2', index_col=None,
                                   na_values=['NA'])
'''

"\ndata = {}\nwith pd.ExcelFile('path_to_file.xls') as xls:\n    data['Sheet1'] = pd.read_excel(xls, 'Sheet1', index_col=None,\n                                   na_values=['NA'])\n    data['Sheet2'] = pd.read_excel(xls, 'Sheet2', index_col=None,\n                                   na_values=['NA'])\n"

In [17]:
# Equivalent using the read_excel function
'''
data = pd.read_excel('path_to_file.xls', ['Sheet1', 'Sheet2'],
                     index_col=None, na_values=['NA'])
'''

"\ndata = pd.read_excel('path_to_file.xls', ['Sheet1', 'Sheet2'],\n                     index_col=None, na_values=['NA'])\n"

### 2.2 Using xlrd.book.Book Object

In [18]:
# ExcelFile can also be called with a xlrd.book.Book object
'''
import xlrd
xlrd_book = xlrd.open_workbook('path_to_file.xls', on_demand=True)
with pd.ExcelFile(xlrd_book) as xls:
    df1 = pd.read_excel(xls, 'Sheet1')
    df2 = pd.read_excel(xls, 'Sheet2')
'''

"\nimport xlrd\nxlrd_book = xlrd.open_workbook('path_to_file.xls', on_demand=True)\nwith pd.ExcelFile(xlrd_book) as xls:\n    df1 = pd.read_excel(xls, 'Sheet1')\n    df2 = pd.read_excel(xls, 'Sheet2')\n"

### 2.3 Specifying Sheets

The `sheet_name` argument allows specifying the sheet or sheets to read:
- Default value is 0, indicating to read the first sheet
- Pass a string to refer to the name of a particular sheet
- Pass an integer to refer to the index of a sheet (0-based)
- Pass a list of strings or integers to return a dictionary of specified sheets
- Pass None to return a dictionary of all available sheets

In [19]:
# Examples (these are code examples - you would need actual Excel files)
'''
# Returns a DataFrame
pd.read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

# Using the sheet index
pd.read_excel('path_to_file.xls', 0, index_col=None, na_values=['NA'])

# Using all default values
pd.read_excel('path_to_file.xls')

# Using None to get all sheets
pd.read_excel('path_to_file.xls', sheet_name=None)

# Using a list to get multiple sheets
pd.read_excel('path_to_file.xls', sheet_name=['Sheet1', 3])
'''

"\n# Returns a DataFrame\npd.read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])\n\n# Using the sheet index\npd.read_excel('path_to_file.xls', 0, index_col=None, na_values=['NA'])\n\n# Using all default values\npd.read_excel('path_to_file.xls')\n\n# Using None to get all sheets\npd.read_excel('path_to_file.xls', sheet_name=None)\n\n# Using a list to get multiple sheets\npd.read_excel('path_to_file.xls', sheet_name=['Sheet1', 3])\n"

### 2.4 Reading a MultiIndex

`read_excel` can read a MultiIndex index by passing a list of columns to `index_col` and a MultiIndex column by passing a list of rows to `header`.

In [20]:
# Example of creating and reading a MultiIndex DataFrame with Excel
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [5, 6, 7, 8]},
                 index=pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']]))

df

Unnamed: 0,Unnamed: 1,a,b
a,c,1,5
a,d,2,6
b,c,3,7
b,d,4,8


In [21]:
# This would write to an Excel file and then read it back
'''
df.to_excel('path_to_file.xlsx')
df = pd.read_excel('path_to_file.xlsx', index_col=[0, 1])
'''

"\ndf.to_excel('path_to_file.xlsx')\ndf = pd.read_excel('path_to_file.xlsx', index_col=[0, 1])\n"

If the index has level names, they will be parsed as well:

In [22]:
df.index = df.index.set_names(['lvl1', 'lvl2'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b
lvl1,lvl2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,c,1,5
a,d,2,6
b,c,3,7
b,d,4,8


In [23]:
# This would write to an Excel file and then read it back
'''
df.to_excel('path_to_file.xlsx')
df = pd.read_excel('path_to_file.xlsx', index_col=[0, 1])
'''

"\ndf.to_excel('path_to_file.xlsx')\ndf = pd.read_excel('path_to_file.xlsx', index_col=[0, 1])\n"