# Let's clean our data.
Our data is in a ```data.json``` file, so we need to fetch the data using the ```read_json()``` function from ```pandas``` to transform it into a dataframe for cleaning.


In [39]:
import json
import pandas as pd
import numpy as np

In [25]:
df = pd.read_json('data.json')
df.head(10)

Unnamed: 0,titles,locations,prices
0,"[5 Bed House with En Suite in Kamakis, 6 Bed H...","[Kamakis, Ruiru, Lamu Town, Lamu, lamu, 0105, ...","[18,500,000, 24,000,000, 195,000,000, 22,000,0..."


**There is a problem**

Our df returns a list of titles, locations and prices instead of a singular item.
This is becuase the titles, locations and prices list are defined inside a dictionary in ```scrape.py``` by:
```python

properties = []
titles = []
locations = []
prices = []

# other code

properties.append(
    {
        "titles": titles,
        "locations": locations,
        "prices": prices
    }
)

```

So I'll need to explode it into individual lists. But I encountered a ValueError that the values of titles, locations and prices do not match, so let's check the length of each list from ```data.json```

In [26]:
with open('data.json', 'r') as f:
    data = json.load(f)

In [27]:
len(data[0]["titles"])

2458

In [28]:
len(data[0]["locations"])

2458

In [34]:
len(data[0]["prices"])

2155

**Problem identified.**

So the problem is, the length of the prices list is greater than both prices and locations.
I'll truncate the values of the ```prices``` column to the 2457th index (to capture the 2456th price) then explode to individual columns

In [33]:
df.head(10)
df["prices"] = df["prices"].reindex(df.index, fill_value=np.nan)
df.head(10)

Unnamed: 0,titles,locations,prices
0,"[5 Bed House with En Suite in Kamakis, 6 Bed H...","[Kamakis, Ruiru, Lamu Town, Lamu, lamu, 0105, ...","[18,500,000, 24,000,000, 195,000,000, 22,000,0..."


ValueError: columns must have matching element counts