<font size="+3"><strong> Working with The JSON File</strong></font>

In this project, we’ll be looking at tracking corporate bankruptcies in Taiwan. To do that, we’ll
need to get data that’s been stored in a JSON file, explore it, and turn it into a DataFrame that
we’ll use to train our model.

In [None]:
# Import libraries
import json
import pandas as pd

# Prepare Data

## Open

In [None]:
# Load data into dataframe `df` with `read_json` method
df = pd.read_json("taiwan-bankruptcy-data.json")
df.head()

ValueError: ignored

In [None]:
# Open file and load JSON as a dictionary
with open("taiwan-bankruptcy-data.json","r") as read_file:
       taiwan_data = json.load(read_file)
print(type(taiwan_data))

<class 'dict'>


## Explore

In [None]:
# Print `taiwan_data` keys
taiwan_data.keys()

dict_keys(['schema', 'data', 'metadata'])

```
Schema tells us how the data is structured, metadata tells us where the data comes from, and data
is the data itself. Now let’s take a look at the values. Remember, the values in a dictionary are ways to describe
the variable that belongs to a key.

```

In [None]:
# Explore the values associated with the keys in `taiwan_data`
print(taiwan_data.keys())
print(len(taiwan_data["data"]))
print(len(taiwan_data["data"][0]))

dict_keys(['schema', 'data', 'metadata'])
6819
97


In [None]:
# Calculate number of companies in the dataset
len(taiwan_data["data"])

6819

In [None]:
# Calculate number of features associated with `company_1`
len(taiwan_data["data"][0])

97

In [None]:
# Iterate through companies, to check that they all have the same number of features
for item in taiwan_data["data"]:
      if len(item) != 97:
         print("ALERT!!")

In [None]:
# Create a DataFrame df that contains the all companies in the dataset, indexed by "id"
df = pd.DataFrame().from_dict(taiwan_data["data"]).set_index("id")
print(df.shape)
df.head()

(6819, 96)


Unnamed: 0_level_0,feat_1,feat_2,feat_3,feat_4,feat_5,feat_6,feat_7,feat_8,feat_9,feat_10,...,feat_87,feat_88,feat_89,feat_90,feat_91,feat_92,feat_93,feat_94,feat_95,bankrupt
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.370594,0.424389,0.40575,0.601457,0.601457,0.998969,0.796887,0.808809,0.302646,0.780985,...,0.009219,0.622879,0.601453,0.82789,0.290202,0.026601,0.56405,1,0.016469,True
2,0.464291,0.538214,0.51673,0.610235,0.610235,0.998946,0.79738,0.809301,0.303556,0.781506,...,0.008323,0.623652,0.610237,0.839969,0.283846,0.264577,0.570175,1,0.020794,True
3,0.426071,0.499019,0.472295,0.60145,0.601364,0.998857,0.796403,0.808388,0.302035,0.780284,...,0.040003,0.623841,0.601449,0.836774,0.290189,0.026555,0.563706,1,0.016474,True
4,0.399844,0.451265,0.457733,0.583541,0.583541,0.9987,0.796967,0.808966,0.30335,0.781241,...,0.003252,0.622929,0.583538,0.834697,0.281721,0.026697,0.564663,1,0.023982,True
5,0.465022,0.538432,0.522298,0.598783,0.598783,0.998973,0.797366,0.809304,0.303475,0.78155,...,0.003878,0.623521,0.598782,0.839973,0.278514,0.024752,0.575617,1,0.03549,True


## Import

In [None]:
def wrangle(filename):
      """
      The function takes the name of a decompressed file as input and returns a tidy DataFrame

      Parameters
      ----------

      filename : str
        The name of a decompressed JSON file

      Returns
      -------
      df : dataframe

      """
      #open decompressed file, load into dict
      with open (filename, "r") as f:
              data = json.load(f)
      # Turn dict into DataFrame
      df = pd.DataFrame().from_dict(data["data"]).set_index("id")

      return df

In [None]:
# Read the data into dataframe `df`
df = wrangle("taiwan-bankruptcy-data.json")
print(df.shape)
df.head()

(6819, 96)


Unnamed: 0_level_0,feat_1,feat_2,feat_3,feat_4,feat_5,feat_6,feat_7,feat_8,feat_9,feat_10,...,feat_87,feat_88,feat_89,feat_90,feat_91,feat_92,feat_93,feat_94,feat_95,bankrupt
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.370594,0.424389,0.40575,0.601457,0.601457,0.998969,0.796887,0.808809,0.302646,0.780985,...,0.009219,0.622879,0.601453,0.82789,0.290202,0.026601,0.56405,1,0.016469,True
2,0.464291,0.538214,0.51673,0.610235,0.610235,0.998946,0.79738,0.809301,0.303556,0.781506,...,0.008323,0.623652,0.610237,0.839969,0.283846,0.264577,0.570175,1,0.020794,True
3,0.426071,0.499019,0.472295,0.60145,0.601364,0.998857,0.796403,0.808388,0.302035,0.780284,...,0.040003,0.623841,0.601449,0.836774,0.290189,0.026555,0.563706,1,0.016474,True
4,0.399844,0.451265,0.457733,0.583541,0.583541,0.9987,0.796967,0.808966,0.30335,0.781241,...,0.003252,0.622929,0.583538,0.834697,0.281721,0.026697,0.564663,1,0.023982,True
5,0.465022,0.538432,0.522298,0.598783,0.598783,0.998973,0.797366,0.809304,0.303475,0.78155,...,0.003878,0.623521,0.598782,0.839973,0.278514,0.024752,0.575617,1,0.03549,True
