# 1.2 - Intro Pandas (Panel Data)

**[Documentación](https://pandas.pydata.org/docs/reference/index.html#api)**

**[Código fuente](https://github.com/pandas-dev/pandas)**


![pandas](images/pandas.png)


Pandas es una librería de python especializada en el manejo y análisis de estructuras de datos.


Las principales características de esta librería son:

+ Define nuevas estructuras de datos basadas en los arrays de la librería NumPy pero con nuevas funcionalidades.
+ Permite leer y escribir fácilmente ficheros en formato CSV, Excel y bases de datos SQL.
+ Permite acceder a los datos mediante índices o nombres para filas y columnas.
+ Ofrece métodos para reordenar, dividir y combinar conjuntos de datos.
+ Permite trabajar con series temporales.
+ Realiza todas estas operaciones de manera muy eficiente.


**Tipos de datos de Pandas**

Pandas dispone de dos estructuras de datos diferentes:

+ Series: Estructura de una dimensión.
+ DataFrame: Estructura de dos dimensiones (tablas).

Estas estructuras se construyen a partir de arrays de la librería NumPy, añadiendo nuevas funcionalidades.

In [1]:
import numpy as np

In [2]:
import pandas as pd

### Serie

Son estructuras similares a los arrays de una dimensión. Son homogéneas, es decir, sus elementos tienen que ser del mismo tipo, y su tamaño es inmutable, es decir, no se puede cambiar, aunque si su contenido.

Dispone de un índice que asocia un nombre a cada elemento del la serie, a través de la cuál se accede al elemento.

In [3]:
lst=[(5.3*x)**2 for x in range(5)]

lst

[0.0, 28.09, 112.36, 252.80999999999995, 449.44]

In [4]:
serie=pd.Series(lst)

serie

0      0.00
1     28.09
2    112.36
3    252.81
4    449.44
dtype: float64

In [5]:
serie.head(2)  # cabeza, inicio de la serie, por defecto 5

0     0.00
1    28.09
dtype: float64

In [6]:
serie.tail()  # cola, fin de la serie, por defecto 5

0      0.00
1     28.09
2    112.36
3    252.81
4    449.44
dtype: float64

In [7]:
serie.index=['a', 'b', 'c', 'd', 'e']

serie

a      0.00
b     28.09
c    112.36
d    252.81
e    449.44
dtype: float64

### DataFrame

Un objeto del tipo DataFrame define un conjunto de datos estructurado en forma de tabla donde cada columna es un objeto de tipo Series, es decir, todos los datos de una misma columna son del mismo tipo, y las filas son registros que pueden contender datos de distintos tipos.

Un DataFrame contiene dos índices, uno para las filas y otro para las columnas, y se puede acceder a sus elementos mediante los nombres de las filas y las columnas.

In [8]:
columnas=['col1', 'col2', 'col3', 'col4', 'col5']

data=np.random.random((10, 5))

df=pd.DataFrame(data, columns=columnas)

df

Unnamed: 0,col1,col2,col3,col4,col5
0,0.197014,0.195568,0.476033,0.420288,0.967903
1,0.488386,0.156688,0.710609,0.126989,0.335465
2,0.403296,0.041905,0.316433,0.191518,0.182222
3,0.101404,0.675866,0.78586,0.790726,0.533501
4,0.679483,0.17851,0.251299,0.478212,0.555522
5,0.705982,0.83225,0.439943,0.479732,0.873714
6,0.209907,0.549749,0.576421,0.874932,0.439795
7,0.992395,0.870327,0.943329,0.191747,0.703613
8,0.077877,0.286402,0.227941,0.391256,0.974356
9,0.460582,0.469207,0.339793,0.473824,0.064767


In [9]:
df['col2']

0    0.195568
1    0.156688
2    0.041905
3    0.675866
4    0.178510
5    0.832250
6    0.549749
7    0.870327
8    0.286402
9    0.469207
Name: col2, dtype: float64

In [10]:
df.col2

0    0.195568
1    0.156688
2    0.041905
3    0.675866
4    0.178510
5    0.832250
6    0.549749
7    0.870327
8    0.286402
9    0.469207
Name: col2, dtype: float64

In [11]:
df[['col3', 'col5']]  # multiseleccion

Unnamed: 0,col3,col5
0,0.476033,0.967903
1,0.710609,0.335465
2,0.316433,0.182222
3,0.78586,0.533501
4,0.251299,0.555522
5,0.439943,0.873714
6,0.576421,0.439795
7,0.943329,0.703613
8,0.227941,0.974356
9,0.339793,0.064767


In [12]:
df['col10']=df.col1 * df.col2 + 3*df.col5

df

Unnamed: 0,col1,col2,col3,col4,col5,col10
0,0.197014,0.195568,0.476033,0.420288,0.967903,2.942238
1,0.488386,0.156688,0.710609,0.126989,0.335465,1.082919
2,0.403296,0.041905,0.316433,0.191518,0.182222,0.563567
3,0.101404,0.675866,0.78586,0.790726,0.533501,1.66904
4,0.679483,0.17851,0.251299,0.478212,0.555522,1.78786
5,0.705982,0.83225,0.439943,0.479732,0.873714,3.208696
6,0.209907,0.549749,0.576421,0.874932,0.439795,1.43478
7,0.992395,0.870327,0.943329,0.191747,0.703613,2.974547
8,0.077877,0.286402,0.227941,0.391256,0.974356,2.945371
9,0.460582,0.469207,0.339793,0.473824,0.064767,0.41041


In [13]:
df.col10= df.col10 /10

df

Unnamed: 0,col1,col2,col3,col4,col5,col10
0,0.197014,0.195568,0.476033,0.420288,0.967903,0.294224
1,0.488386,0.156688,0.710609,0.126989,0.335465,0.108292
2,0.403296,0.041905,0.316433,0.191518,0.182222,0.056357
3,0.101404,0.675866,0.78586,0.790726,0.533501,0.166904
4,0.679483,0.17851,0.251299,0.478212,0.555522,0.178786
5,0.705982,0.83225,0.439943,0.479732,0.873714,0.32087
6,0.209907,0.549749,0.576421,0.874932,0.439795,0.143478
7,0.992395,0.870327,0.943329,0.191747,0.703613,0.297455
8,0.077877,0.286402,0.227941,0.391256,0.974356,0.294537
9,0.460582,0.469207,0.339793,0.473824,0.064767,0.041041


In [14]:
lst=[3*x for x in range(8)]

lst.append(None)   # añado nulos
lst.append(None)

lst

[0, 3, 6, 9, 12, 15, 18, 21, None, None]

In [15]:
df['lista']=lst

df

Unnamed: 0,col1,col2,col3,col4,col5,col10,lista
0,0.197014,0.195568,0.476033,0.420288,0.967903,0.294224,0.0
1,0.488386,0.156688,0.710609,0.126989,0.335465,0.108292,3.0
2,0.403296,0.041905,0.316433,0.191518,0.182222,0.056357,6.0
3,0.101404,0.675866,0.78586,0.790726,0.533501,0.166904,9.0
4,0.679483,0.17851,0.251299,0.478212,0.555522,0.178786,12.0
5,0.705982,0.83225,0.439943,0.479732,0.873714,0.32087,15.0
6,0.209907,0.549749,0.576421,0.874932,0.439795,0.143478,18.0
7,0.992395,0.870327,0.943329,0.191747,0.703613,0.297455,21.0
8,0.077877,0.286402,0.227941,0.391256,0.974356,0.294537,
9,0.460582,0.469207,0.339793,0.473824,0.064767,0.041041,


In [16]:
df.dtypes   # para ver el tipo de dato de las columnas

col1     float64
col2     float64
col3     float64
col4     float64
col5     float64
col10    float64
lista    float64
dtype: object

In [17]:
df.info()  # info del dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    10 non-null     float64
 1   col2    10 non-null     float64
 2   col3    10 non-null     float64
 3   col4    10 non-null     float64
 4   col5    10 non-null     float64
 5   col10   10 non-null     float64
 6   lista   8 non-null      float64
dtypes: float64(7)
memory usage: 688.0 bytes


In [18]:
df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    10 non-null     float64
 1   col2    10 non-null     float64
 2   col3    10 non-null     float64
 3   col4    10 non-null     float64
 4   col5    10 non-null     float64
 5   col10   10 non-null     float64
 6   lista   8 non-null      float64
dtypes: float64(7)
memory usage: 688.0 bytes


In [19]:
df=df.fillna('hola')

df

Unnamed: 0,col1,col2,col3,col4,col5,col10,lista
0,0.197014,0.195568,0.476033,0.420288,0.967903,0.294224,0.0
1,0.488386,0.156688,0.710609,0.126989,0.335465,0.108292,3.0
2,0.403296,0.041905,0.316433,0.191518,0.182222,0.056357,6.0
3,0.101404,0.675866,0.78586,0.790726,0.533501,0.166904,9.0
4,0.679483,0.17851,0.251299,0.478212,0.555522,0.178786,12.0
5,0.705982,0.83225,0.439943,0.479732,0.873714,0.32087,15.0
6,0.209907,0.549749,0.576421,0.874932,0.439795,0.143478,18.0
7,0.992395,0.870327,0.943329,0.191747,0.703613,0.297455,21.0
8,0.077877,0.286402,0.227941,0.391256,0.974356,0.294537,hola
9,0.460582,0.469207,0.339793,0.473824,0.064767,0.041041,hola


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    10 non-null     float64
 1   col2    10 non-null     float64
 2   col3    10 non-null     float64
 3   col4    10 non-null     float64
 4   col5    10 non-null     float64
 5   col10   10 non-null     float64
 6   lista   10 non-null     object 
dtypes: float64(6), object(1)
memory usage: 688.0+ bytes


In [21]:
# intro de datos con listas de listas

lst_lst=[
    [325235, 'hola', 6346363],
    [32523, 'adios', 43634646],
    [12323, 'oooôooo', 665454]
]

columnas=['num', 'palabra', 'num2']

In [22]:
df_lst=pd.DataFrame(lst_lst, columns=columnas)

df_lst

Unnamed: 0,num,palabra,num2
0,325235,hola,6346363
1,32523,adios,43634646
2,12323,oooôooo,665454


In [23]:
# intro de datos con un dict

dictio={'casa': lst_lst[0], 'oficina': lst_lst[1], 'clase': lst_lst[2]}

dictio

{'casa': [325235, 'hola', 6346363],
 'oficina': [32523, 'adios', 43634646],
 'clase': [12323, 'oooôooo', 665454]}

In [24]:
df_dictio=pd.DataFrame(dictio)

df_dictio

Unnamed: 0,casa,oficina,clase
0,325235,32523,12323
1,hola,adios,oooôooo
2,6346363,43634646,665454


In [25]:
df_dictio=pd.DataFrame.from_dict(dictio)

df_dictio

Unnamed: 0,casa,oficina,clase
0,325235,32523,12323
1,hola,adios,oooôooo
2,6346363,43634646,665454


### Operaciones


In [26]:
df

Unnamed: 0,col1,col2,col3,col4,col5,col10,lista
0,0.197014,0.195568,0.476033,0.420288,0.967903,0.294224,0.0
1,0.488386,0.156688,0.710609,0.126989,0.335465,0.108292,3.0
2,0.403296,0.041905,0.316433,0.191518,0.182222,0.056357,6.0
3,0.101404,0.675866,0.78586,0.790726,0.533501,0.166904,9.0
4,0.679483,0.17851,0.251299,0.478212,0.555522,0.178786,12.0
5,0.705982,0.83225,0.439943,0.479732,0.873714,0.32087,15.0
6,0.209907,0.549749,0.576421,0.874932,0.439795,0.143478,18.0
7,0.992395,0.870327,0.943329,0.191747,0.703613,0.297455,21.0
8,0.077877,0.286402,0.227941,0.391256,0.974356,0.294537,hola
9,0.460582,0.469207,0.339793,0.473824,0.064767,0.041041,hola


In [27]:
df.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
col1,0.197014,0.488386,0.403296,0.101404,0.679483,0.705982,0.209907,0.992395,0.077877,0.460582
col2,0.195568,0.156688,0.041905,0.675866,0.17851,0.83225,0.549749,0.870327,0.286402,0.469207
col3,0.476033,0.710609,0.316433,0.78586,0.251299,0.439943,0.576421,0.943329,0.227941,0.339793
col4,0.420288,0.126989,0.191518,0.790726,0.478212,0.479732,0.874932,0.191747,0.391256,0.473824
col5,0.967903,0.335465,0.182222,0.533501,0.555522,0.873714,0.439795,0.703613,0.974356,0.064767
col10,0.294224,0.108292,0.056357,0.166904,0.178786,0.32087,0.143478,0.297455,0.294537,0.041041
lista,0.0,3.0,6.0,9.0,12.0,15.0,18.0,21.0,hola,hola


In [28]:
df.T  # traspuesto

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
col1,0.197014,0.488386,0.403296,0.101404,0.679483,0.705982,0.209907,0.992395,0.077877,0.460582
col2,0.195568,0.156688,0.041905,0.675866,0.17851,0.83225,0.549749,0.870327,0.286402,0.469207
col3,0.476033,0.710609,0.316433,0.78586,0.251299,0.439943,0.576421,0.943329,0.227941,0.339793
col4,0.420288,0.126989,0.191518,0.790726,0.478212,0.479732,0.874932,0.191747,0.391256,0.473824
col5,0.967903,0.335465,0.182222,0.533501,0.555522,0.873714,0.439795,0.703613,0.974356,0.064767
col10,0.294224,0.108292,0.056357,0.166904,0.178786,0.32087,0.143478,0.297455,0.294537,0.041041
lista,0.0,3.0,6.0,9.0,12.0,15.0,18.0,21.0,hola,hola


In [29]:
df.sum()  # suma de cada col

col1     4.316325
col2     4.256474
col3     5.067661
col4     4.419224
col5     5.630858
col10    1.901943
dtype: float64

In [30]:
df.std()

col1     0.297100
col2     0.297391
col3     0.242076
col4     0.245047
col5     0.318081
col10    0.105416
dtype: float64

In [31]:
df.var()

col1     0.088268
col2     0.088441
col3     0.058601
col4     0.060048
col5     0.101176
col10    0.011113
dtype: float64

In [32]:
df.mean()

col1     0.431632
col2     0.425647
col3     0.506766
col4     0.441922
col5     0.563086
col10    0.190194
dtype: float64

In [33]:
df.median()

col1     0.431939
col2     0.377805
col3     0.457988
col4     0.447056
col5     0.544512
col10    0.172845
dtype: float64

In [34]:
df.mode()

Unnamed: 0,col1,col2,col3,col4,col5,col10,lista
0,0.077877,0.041905,0.227941,0.126989,0.064767,0.041041,hola
1,0.101404,0.156688,0.251299,0.191518,0.182222,0.056357,
2,0.197014,0.17851,0.316433,0.191747,0.335465,0.108292,
3,0.209907,0.195568,0.339793,0.391256,0.439795,0.143478,
4,0.403296,0.286402,0.439943,0.420288,0.533501,0.166904,
5,0.460582,0.469207,0.476033,0.473824,0.555522,0.178786,
6,0.488386,0.549749,0.576421,0.478212,0.703613,0.294224,
7,0.679483,0.675866,0.710609,0.479732,0.873714,0.294537,
8,0.705982,0.83225,0.78586,0.790726,0.967903,0.297455,
9,0.992395,0.870327,0.943329,0.874932,0.974356,0.32087,


In [35]:
df.max()

col1     0.992395
col2     0.870327
col3     0.943329
col4     0.874932
col5     0.974356
col10    0.320870
dtype: float64

In [36]:
df.min()

col1     0.077877
col2     0.041905
col3     0.227941
col4     0.126989
col5     0.064767
col10    0.041041
dtype: float64

In [37]:
df.min(axis=0)

col1     0.077877
col2     0.041905
col3     0.227941
col4     0.126989
col5     0.064767
col10    0.041041
dtype: float64

In [38]:
df.min(axis=1)

0    0.195568
1    0.108292
2    0.041905
3    0.101404
4    0.178510
5    0.320870
6    0.143478
7    0.191747
8    0.077877
9    0.041041
dtype: float64

### Importar archivos

+ CSV
+ XLSX
+ XLS
+ JSON

In [52]:
# csv

df_csv=pd.read_csv('../data/vehicles_messy.csv', sep=',')

df_csv.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,barrels08,barrelsA08,charge120,charge240,city08,city08U,cityA08,cityA08U,cityCD,cityE,...,mfrCode,c240Dscr,charge240b,c240bDscr,createdOn,modifiedOn,startStop,phevCity,phevHwy,phevComb
0,15.695714,0.0,0.0,0.0,19,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
1,29.964545,0.0,0.0,0.0,9,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
2,12.207778,0.0,0.0,0.0,23,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
3,29.964545,0.0,0.0,0.0,10,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
4,17.347895,0.0,0.0,0.0,17,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0


In [42]:
#!pip install xlrd
#!pip install openpyxl

Collecting openpyxl
  Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
[K     |████████████████████████████████| 242 kB 7.0 MB/s eta 0:00:01
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9


In [43]:
# xlsx

df_xlsx=pd.read_excel('../data/Online Retail.xlsx')

df_xlsx.head()

Unnamed: 0,InvoiceNo,InvoiceDate,StockCode,Description,Quantity,UnitPrice,Revenue,CustomerID,Country
0,536365,2010-12-01 08:26:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom
1,536373,2010-12-01 09:02:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom
2,536375,2010-12-01 09:32:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom
3,536390,2010-12-01 10:19:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,64,2.55,163.2,17511,United Kingdom
4,536394,2010-12-01 10:39:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,32,2.55,81.6,13408,United Kingdom


In [45]:
df_xlsx.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 396034 entries, 0 to 396033
Data columns (total 9 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    396034 non-null  int64         
 1   InvoiceDate  396034 non-null  datetime64[ns]
 2   StockCode    396034 non-null  object        
 3   Description  396034 non-null  object        
 4   Quantity     396034 non-null  int64         
 5   UnitPrice    396034 non-null  float64       
 6   Revenue      396034 non-null  float64       
 7   CustomerID   396034 non-null  int64         
 8   Country      396034 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(3), object(3)
memory usage: 90.7 MB


In [46]:
# xls

df_xls=pd.read_excel('../data/Sensor Data.xls')  # primera hoja

df_xls.head()

Unnamed: 0,Input 1,Input 2,Input 3,Input 4,Input 5,Input 6,Input 7,Input 8,Input 9,Input 10,Input 11,Input 12,output1,output2,class
0,1.473,2.311,3.179,2.666,0.2795,0.2771,0.2234,0.1855,0.2539,1.138,1.111,4.712,1,1,one
1,1.46,2.377,3.214,2.92,0.2527,0.3064,0.02563,0.1965,0.3027,1.213,1.027,5.463,1,1,one
2,1.552,2.164,3.064,2.745,0.282,0.21,0.1721,0.1929,0.21,1.221,1.058,5.332,1,1,one
3,1.605,2.228,3.149,2.834,0.2917,0.3613,0.2087,0.1294,0.2734,1.144,1.062,4.829,1,1,one
4,1.534,2.114,3.309,2.976,0.21,0.2502,0.2258,0.177,0.2039,1.254,1.112,5.734,1,1,one


In [47]:
df_xls_1=pd.read_excel('../data/Sensor Data.xls', 'Sheet1')  # primera hoja

df_xls_1.head()

Unnamed: 0,Input 1,Input 2,Input 3,Input 4,Input 5,Input 6,Input 7,Input 8,Input 9,Input 10,Input 11,Input 12,output1,output2,class
0,1.473,2.311,3.179,2.666,0.2795,0.2771,0.2234,0.1855,0.2539,1.138,1.111,4.712,1,1,one
1,1.46,2.377,3.214,2.92,0.2527,0.3064,0.02563,0.1965,0.3027,1.213,1.027,5.463,1,1,one
2,1.552,2.164,3.064,2.745,0.282,0.21,0.1721,0.1929,0.21,1.221,1.058,5.332,1,1,one
3,1.605,2.228,3.149,2.834,0.2917,0.3613,0.2087,0.1294,0.2734,1.144,1.062,4.829,1,1,one
4,1.534,2.114,3.309,2.976,0.21,0.2502,0.2258,0.177,0.2039,1.254,1.112,5.734,1,1,one


In [48]:
df_xls_2=pd.read_excel('../data/Sensor Data.xls', 'Sheet2')  # segunda hoja

df_xls_2.head()

Unnamed: 0,Sensor Data
0,The data source as well as the exact nature of...
1,Each data instance contains 12 real-valued inp...
2,represents a sensor designed to detect the pre...
3,"of substances. As an alternative, the sensor r..."
4,Substance 1 is represented by the value 'one' ...


In [49]:
df_xls_3=pd.read_excel('../data/Sensor Data.xls', 'Sheet3')  # tercera hoja

df_xls_3.head()

Unnamed: 0,hola


In [50]:
df_xls_3.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   hola    0 non-null      object
dtypes: object(1)
memory usage: 0.0+ bytes


In [53]:
# json

df_json=pd.read_json('../data/companies.json', lines=True, orient='records')

df_json.head()

Unnamed: 0,_id,name,permalink,crunchbase_url,homepage_url,blog_url,blog_feed_url,twitter_username,category_code,number_of_employees,...,offices,milestones,video_embeds,screenshots,external_links,partners,deadpooled_month,deadpooled_day,deadpooled_url,ipo
0,{'$oid': '52cdef7c4bab8bd675297d8a'},Wetpaint,abc2,http://www.crunchbase.com/company/wetpaint,http://wetpaint-inc.com,http://digitalquarters.net/,http://digitalquarters.net/feed/,BachelrWetpaint,web,47.0,...,"[{'description': '', 'address1': '710 - 2nd Av...","[{'id': 5869, 'description': 'Wetpaint named i...",[],"[{'available_sizes': [[[150, 86], 'assets/imag...",[{'external_url': 'http://www.geekwire.com/201...,[],,,,
1,{'$oid': '52cdef7c4bab8bd675297d8b'},AdventNet,abc3,http://www.crunchbase.com/company/adventnet,http://adventnet.com,,,manageengine,enterprise,600.0,...,"[{'description': 'Headquarters', 'address1': '...",[],[],"[{'available_sizes': [[[150, 94], 'assets/imag...",[],[],,,,
2,{'$oid': '52cdef7c4bab8bd675297d8c'},Zoho,abc4,http://www.crunchbase.com/company/zoho,http://zoho.com,http://blogs.zoho.com/,http://blogs.zoho.com/feed,zoho,software,1600.0,...,"[{'description': 'Headquarters', 'address1': '...","[{'id': 388, 'description': 'Zoho Reaches 2 Mi...","[{'embed_code': '<object width=""430"" height=""2...",[],[{'external_url': 'http://www.online-tech-tips...,[],,,,
3,{'$oid': '52cdef7c4bab8bd675297d8d'},Digg,digg,http://www.crunchbase.com/company/digg,http://www.digg.com,http://blog.digg.com/,http://blog.digg.com/?feed=rss2,digg,news,60.0,...,"[{'description': None, 'address1': '135 Missis...","[{'id': 9588, 'description': 'Another Digg Exe...","[{'embed_code': '<embed src=""http://blip.tv/pl...","[{'available_sizes': [[[117, 150], 'assets/ima...",[{'external_url': 'http://www.sociableblog.com...,[],,,,
4,{'$oid': '52cdef7c4bab8bd675297d8e'},Facebook,facebook,http://www.crunchbase.com/company/facebook,http://facebook.com,http://blog.facebook.com,http://blog.facebook.com/atom.php,facebook,social,5299.0,...,"[{'description': 'Headquarters', 'address1': '...","[{'id': 108, 'description': 'Facebook adds com...",[],"[{'available_sizes': [[[150, 68], 'assets/imag...",[{'external_url': 'http://latimesblogs.latimes...,[],,,,"{'valuation_amount': 104000000000, 'valuation_..."


In [55]:
df_json.offices[0]

[{'description': '',
  'address1': '710 - 2nd Avenue',
  'address2': 'Suite 1100',
  'zip_code': '98104',
  'city': 'Seattle',
  'state_code': 'WA',
  'country_code': 'USA',
  'latitude': 47.603122,
  'longitude': -122.333253},
 {'description': '',
  'address1': '270 Lafayette Street',
  'address2': 'Suite 505',
  'zip_code': '10012',
  'city': 'New York',
  'state_code': 'NY',
  'country_code': 'USA',
  'latitude': 40.7237306,
  'longitude': -73.9964312}]