# 1.2 - Intro Pandas (Panel Data)

**[Documentación](https://pandas.pydata.org/docs/reference/index.html#api)**

**[Código fuente](https://github.com/pandas-dev/pandas)**


![pandas](images/pandas.png)


Pandas es una librería de python especializada en el manejo y análisis de estructuras de datos.


Las principales características de esta librería son:

+ Define nuevas estructuras de datos basadas en los arrays de la librería NumPy pero con nuevas funcionalidades.
+ Permite leer y escribir fácilmente ficheros en formato CSV, Excel y bases de datos SQL.
+ Permite acceder a los datos mediante índices o nombres para filas y columnas.
+ Ofrece métodos para reordenar, dividir y combinar conjuntos de datos.
+ Permite trabajar con series temporales.
+ Realiza todas estas operaciones de manera muy eficiente.


**Tipos de datos de Pandas**
Pandas dispone de dos estructuras de datos diferentes:

+ Series: Estructura de una dimensión.
+ DataFrame: Estructura de dos dimensiones (tablas).

Estas estructuras se construyen a partir de arrays de la librería NumPy, añadiendo nuevas funcionalidades.

In [1]:
%pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd

In [3]:
import numpy as np

### Serie

Son estructuras similares a los arrays de una dimensión. Son homogéneas, es decir, sus elementos tienen que ser del mismo tipo, y su tamaño es inmutable, es decir, no se puede cambiar, aunque si su contenido.

Dispone de un índice que asocia un nombre a cada elemento del la serie, a través de la cuál se accede al elemento.

In [11]:
lst=[(3.4 + i)**2 for i in range(5)]   # lista

lst

[11.559999999999999,
 19.360000000000003,
 29.160000000000004,
 40.96000000000001,
 54.760000000000005]

In [12]:
serie=pd.Series(lst)   # serie de pandas

serie

0    11.56
1    19.36
2    29.16
3    40.96
4    54.76
dtype: float64

In [13]:
serie.head(2)  # 5 primeros por defecto

0    11.56
1    19.36
dtype: float64

In [14]:
serie.tail(2)  # 5 ultimos por defecto

3    40.96
4    54.76
dtype: float64

In [15]:
type(serie)

pandas.core.series.Series

In [16]:
serie.index

RangeIndex(start=0, stop=5, step=1)

In [17]:
serie.index=['a', 'b', 'c', 'd', 'e']

serie

a    11.56
b    19.36
c    29.16
d    40.96
e    54.76
dtype: float64

### DataFrame

Un objeto del tipo DataFrame define un conjunto de datos estructurado en forma de tabla donde cada columna es un objeto de tipo Series, es decir, todos los datos de una misma columna son del mismo tipo, y las filas son registros que pueden contender datos de distintos tipos.

Un DataFrame contiene dos índices, uno para las filas y otro para las columnas, y se puede acceder a sus elementos mediante los nombres de las filas y las columnas.

In [18]:
columnas=['col1', 'col2', 'col3', 'col4', 'col5']

array=np.random.random((10, 5))

array

array([[0.84062666, 0.67006031, 0.38984766, 0.0107845 , 0.7947874 ],
       [0.49363968, 0.88770516, 0.03976694, 0.08735892, 0.22075188],
       [0.03737202, 0.6178582 , 0.31077542, 0.52270968, 0.67003806],
       [0.82506156, 0.45749756, 0.38034438, 0.45218132, 0.33363719],
       [0.17248669, 0.28098904, 0.99006379, 0.76127358, 0.44990164],
       [0.27082852, 0.58212226, 0.37081601, 0.38820873, 0.52360916],
       [0.5941512 , 0.09894564, 0.70235181, 0.86079993, 0.55953106],
       [0.11687177, 0.75172642, 0.20284161, 0.38540921, 0.10215713],
       [0.6865005 , 0.32619684, 0.69111007, 0.04429775, 0.34772003],
       [0.40592088, 0.48714884, 0.63962809, 0.79078766, 0.80664676]])

In [22]:
df=pd.DataFrame(array, columns=columnas)   # dataframe, tabla, matriz

df.head()

Unnamed: 0,col1,col2,col3,col4,col5
0,0.840627,0.67006,0.389848,0.010784,0.794787
1,0.49364,0.887705,0.039767,0.087359,0.220752
2,0.037372,0.617858,0.310775,0.52271,0.670038
3,0.825062,0.457498,0.380344,0.452181,0.333637
4,0.172487,0.280989,0.990064,0.761274,0.449902


In [23]:
df['col2']   # llamada a columna

0    0.670060
1    0.887705
2    0.617858
3    0.457498
4    0.280989
5    0.582122
6    0.098946
7    0.751726
8    0.326197
9    0.487149
Name: col2, dtype: float64

In [24]:
df.col2

0    0.670060
1    0.887705
2    0.617858
3    0.457498
4    0.280989
5    0.582122
6    0.098946
7    0.751726
8    0.326197
9    0.487149
Name: col2, dtype: float64

In [25]:
df[['col2', 'col4']]

Unnamed: 0,col2,col4
0,0.67006,0.010784
1,0.887705,0.087359
2,0.617858,0.52271
3,0.457498,0.452181
4,0.280989,0.761274
5,0.582122,0.388209
6,0.098946,0.8608
7,0.751726,0.385409
8,0.326197,0.044298
9,0.487149,0.790788


In [26]:
df['col10']=df.col1 * df.col2 - df.col4

df.head()

Unnamed: 0,col1,col2,col3,col4,col5,col10
0,0.840627,0.67006,0.389848,0.010784,0.794787,0.552486
1,0.49364,0.887705,0.039767,0.087359,0.220752,0.350848
2,0.037372,0.617858,0.310775,0.52271,0.670038,-0.499619
3,0.825062,0.457498,0.380344,0.452181,0.333637,-0.074718
4,0.172487,0.280989,0.990064,0.761274,0.449902,-0.712807


In [34]:
df['ceros']=0.

df.head()

Unnamed: 0,col1,col2,col3,col4,col5,col10,ceros
0,0.840627,0.67006,0.389848,0.010784,0.794787,0.552486,0.0
1,0.49364,0.887705,0.039767,0.087359,0.220752,0.350848,0.0
2,0.037372,0.617858,0.310775,0.52271,0.670038,-0.499619,0.0
3,0.825062,0.457498,0.380344,0.452181,0.333637,-0.074718,0.0
4,0.172487,0.280989,0.990064,0.761274,0.449902,-0.712807,0.0


In [38]:
df.col10=df.col3 / df.col1 * 2

df.head()

Unnamed: 0,col1,col2,col3,col4,col5,col10,ceros
0,0.840627,0.67006,0.389848,0.010784,0.794787,0.927517,0.0
1,0.49364,0.887705,0.039767,0.087359,0.220752,0.161117,0.0
2,0.037372,0.617858,0.310775,0.52271,0.670038,16.631448,0.0
3,0.825062,0.457498,0.380344,0.452181,0.333637,0.921978,0.0
4,0.172487,0.280989,0.990064,0.761274,0.449902,11.479886,0.0


In [40]:
lst=[3*i for i in range(8)]

lst.append(None)
lst.append(None)

lst

[0, 3, 6, 9, 12, 15, 18, 21, None, None]

In [42]:
df['lista']=lst

df.tail()

Unnamed: 0,col1,col2,col3,col4,col5,col10,ceros,lista
5,0.270829,0.582122,0.370816,0.388209,0.523609,2.738382,0.0,15.0
6,0.594151,0.098946,0.702352,0.8608,0.559531,2.364219,0.0,18.0
7,0.116872,0.751726,0.202842,0.385409,0.102157,3.471182,0.0,21.0
8,0.686501,0.326197,0.69111,0.044298,0.34772,2.013429,0.0,
9,0.405921,0.487149,0.639628,0.790788,0.806647,3.151491,0.0,


In [43]:
df.dtypes   # tipo de dato por columna

col1     float64
col2     float64
col3     float64
col4     float64
col5     float64
col10    float64
ceros    float64
lista    float64
dtype: object

In [44]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    10 non-null     float64
 1   col2    10 non-null     float64
 2   col3    10 non-null     float64
 3   col4    10 non-null     float64
 4   col5    10 non-null     float64
 5   col10   10 non-null     float64
 6   ceros   10 non-null     float64
 7   lista   8 non-null      float64
dtypes: float64(8)
memory usage: 768.0 bytes


In [48]:
df=df.fillna('hola')

df.fillna('hola', inplace=True)  # inplace sobreescribe la variable

In [49]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    10 non-null     float64
 1   col2    10 non-null     float64
 2   col3    10 non-null     float64
 3   col4    10 non-null     float64
 4   col5    10 non-null     float64
 5   col10   10 non-null     float64
 6   ceros   10 non-null     float64
 7   lista   10 non-null     object 
dtypes: float64(7), object(1)
memory usage: 768.0+ bytes


In [52]:
type('0.0')

str

In [55]:
# introducir datos con una lista de listas

lst_lst=[[687261, 'hola', 4728364], 
         [83546, 'adios', 58943], 
         [321, 'oo^oo']]


columnas=['num', 'palabra', 'otro_num']

In [56]:
df_lst=pd.DataFrame(lst_lst, columns=columnas)

df_lst

Unnamed: 0,num,palabra,otro_num
0,687261,hola,4728364.0
1,83546,adios,58943.0
2,321,oo^oo,


In [57]:
df_lst.fillna(0., inplace=True)

df_lst

Unnamed: 0,num,palabra,otro_num
0,687261,hola,4728364.0
1,83546,adios,58943.0
2,321,oo^oo,0.0


In [60]:
# introducir datos con un diccionario

dictio={'casa':lst_lst[0], 'oficina':lst_lst[1], 'numero':lst_lst[2]+[0.]}  # listas de misma longitud

dictio

{'casa': [687261, 'hola', 4728364],
 'oficina': [83546, 'adios', 58943],
 'numero': [321, 'oo^oo', 0.0]}

In [61]:
df_dictio=pd.DataFrame(dictio)

df_dictio

Unnamed: 0,casa,oficina,numero
0,687261,83546,321
1,hola,adios,oo^oo
2,4728364,58943,0.0


In [62]:
df.index

RangeIndex(start=0, stop=10, step=1)

In [65]:
df_dictio.columns=['a', 'b', 'c']

df_dictio

Unnamed: 0,a,b,c
0,687261,83546,321
1,hola,adios,oo^oo
2,4728364,58943,0.0


In [70]:
df_dictio.columns=['a', 'b', 'a']

df_dictio

Unnamed: 0,a,b,a.1
0,687261,83546,321
1,hola,adios,oo^oo
2,4728364,58943,0.0


In [72]:
df_dictio['a'].sum()

  df_dictio['a'].sum()


Series([], dtype: float64)

### Operaciones


In [73]:
df

Unnamed: 0,col1,col2,col3,col4,col5,col10,ceros,lista
0,0.840627,0.67006,0.389848,0.010784,0.794787,0.927517,0.0,0.0
1,0.49364,0.887705,0.039767,0.087359,0.220752,0.161117,0.0,3.0
2,0.037372,0.617858,0.310775,0.52271,0.670038,16.631448,0.0,6.0
3,0.825062,0.457498,0.380344,0.452181,0.333637,0.921978,0.0,9.0
4,0.172487,0.280989,0.990064,0.761274,0.449902,11.479886,0.0,12.0
5,0.270829,0.582122,0.370816,0.388209,0.523609,2.738382,0.0,15.0
6,0.594151,0.098946,0.702352,0.8608,0.559531,2.364219,0.0,18.0
7,0.116872,0.751726,0.202842,0.385409,0.102157,3.471182,0.0,21.0
8,0.686501,0.326197,0.69111,0.044298,0.34772,2.013429,0.0,hola
9,0.405921,0.487149,0.639628,0.790788,0.806647,3.151491,0.0,hola


In [74]:
df.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
col1,0.840627,0.49364,0.037372,0.825062,0.172487,0.270829,0.594151,0.116872,0.686501,0.405921
col2,0.67006,0.887705,0.617858,0.457498,0.280989,0.582122,0.098946,0.751726,0.326197,0.487149
col3,0.389848,0.039767,0.310775,0.380344,0.990064,0.370816,0.702352,0.202842,0.69111,0.639628
col4,0.010784,0.087359,0.52271,0.452181,0.761274,0.388209,0.8608,0.385409,0.044298,0.790788
col5,0.794787,0.220752,0.670038,0.333637,0.449902,0.523609,0.559531,0.102157,0.34772,0.806647
col10,0.927517,0.161117,16.631448,0.921978,11.479886,2.738382,2.364219,3.471182,2.013429,3.151491
ceros,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
lista,0.0,3.0,6.0,9.0,12.0,15.0,18.0,21.0,hola,hola


In [78]:
df.T.index

Index(['col1', 'col2', 'col3', 'col4', 'col5', 'col10', 'ceros', 'lista'], dtype='object')

In [80]:
df.sum()

  df.sum()


col1      4.443459
col2      5.160250
col3      4.717546
col4      4.303811
col5      4.808780
col10    43.860651
ceros     0.000000
dtype: float64

In [81]:
df.std()

  df.std()


col1     0.291329
col2     0.236093
col3     0.280518
col4     0.312735
col5     0.235684
col10    5.342457
ceros    0.000000
dtype: float64

In [82]:
df.var()

  df.var()


col1      0.084872
col2      0.055740
col3      0.078690
col4      0.097803
col5      0.055547
col10    28.541850
ceros     0.000000
dtype: float64

In [83]:
df.mean()

  df.mean()


col1     0.444346
col2     0.516025
col3     0.471755
col4     0.430381
col5     0.480878
col10    4.386065
ceros    0.000000
dtype: float64

In [84]:
df.mode()

Unnamed: 0,col1,col2,col3,col4,col5,col10,ceros,lista
0,0.037372,0.098946,0.039767,0.010784,0.102157,0.161117,0.0,hola
1,0.116872,0.280989,0.202842,0.044298,0.220752,0.921978,,
2,0.172487,0.326197,0.310775,0.087359,0.333637,0.927517,,
3,0.270829,0.457498,0.370816,0.385409,0.34772,2.013429,,
4,0.405921,0.487149,0.380344,0.388209,0.449902,2.364219,,
5,0.49364,0.582122,0.389848,0.452181,0.523609,2.738382,,
6,0.594151,0.617858,0.639628,0.52271,0.559531,3.151491,,
7,0.686501,0.67006,0.69111,0.761274,0.670038,3.471182,,
8,0.825062,0.751726,0.702352,0.790788,0.794787,11.479886,,
9,0.840627,0.887705,0.990064,0.8608,0.806647,16.631448,,


In [85]:
df.median()

  df.median()


col1     0.449780
col2     0.534636
col3     0.385096
col4     0.420195
col5     0.486755
col10    2.551301
ceros    0.000000
dtype: float64

In [86]:
df.max()

  df.max()


col1      0.840627
col2      0.887705
col3      0.990064
col4      0.860800
col5      0.806647
col10    16.631448
ceros     0.000000
dtype: float64

In [87]:
df.max(axis=0)

  df.max(axis=0)


col1      0.840627
col2      0.887705
col3      0.990064
col4      0.860800
col5      0.806647
col10    16.631448
ceros     0.000000
dtype: float64

In [88]:
df.max(axis=1)  # por filas

  df.max(axis=1)


0     0.927517
1     0.887705
2    16.631448
3     0.921978
4    11.479886
5     2.738382
6     2.364219
7     3.471182
8     2.013429
9     3.151491
dtype: float64

In [89]:
df.min()

  df.min()


col1     0.037372
col2     0.098946
col3     0.039767
col4     0.010784
col5     0.102157
col10    0.161117
ceros    0.000000
dtype: float64

### Importar archivos

+ CSV
+ XLSX
+ XLS
+ JSON

In [90]:
# csv

df_csv=pd.read_csv('../data/vehicles_messy.csv')

df_csv.head()

  df_csv=pd.read_csv('../data/vehicles_messy.csv')


Unnamed: 0,barrels08,barrelsA08,charge120,charge240,city08,city08U,cityA08,cityA08U,cityCD,cityE,...,mfrCode,c240Dscr,charge240b,c240bDscr,createdOn,modifiedOn,startStop,phevCity,phevHwy,phevComb
0,15.695714,0.0,0.0,0.0,19,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
1,29.964545,0.0,0.0,0.0,9,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
2,12.207778,0.0,0.0,0.0,23,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
3,29.964545,0.0,0.0,0.0,10,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
4,17.347895,0.0,0.0,0.0,17,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0


In [91]:
df_csv.columns

Index(['barrels08', 'barrelsA08', 'charge120', 'charge240', 'city08',
       'city08U', 'cityA08', 'cityA08U', 'cityCD', 'cityE', 'cityUF', 'co2',
       'co2A', 'co2TailpipeAGpm', 'co2TailpipeGpm', 'comb08', 'comb08U',
       'combA08', 'combA08U', 'combE', 'combinedCD', 'combinedUF', 'cylinders',
       'displ', 'drive', 'engId', 'eng_dscr', 'feScore', 'fuelCost08',
       'fuelCostA08', 'fuelType', 'fuelType1', 'ghgScore', 'ghgScoreA',
       'highway08', 'highway08U', 'highwayA08', 'highwayA08U', 'highwayCD',
       'highwayE', 'highwayUF', 'hlv', 'hpv', 'id', 'lv2', 'lv4', 'make',
       'model', 'mpgData', 'phevBlended', 'pv2', 'pv4', 'range', 'rangeCity',
       'rangeCityA', 'rangeHwy', 'rangeHwyA', 'trany', 'UCity', 'UCityA',
       'UHighway', 'UHighwayA', 'VClass', 'year', 'youSaveSpend', 'guzzler',
       'trans_dscr', 'tCharger', 'sCharger', 'atvType', 'fuelType2', 'rangeA',
       'evMotor', 'mfrCode', 'c240Dscr', 'charge240b', 'c240bDscr',
       'createdOn', 'modifiedOn

In [92]:
!pip install openpyxl
!pip install xlrd



In [94]:
# xlsx

df_xlsx=pd.read_excel('../data/Online Retail.xlsx')

df_xlsx.head()

Unnamed: 0,InvoiceNo,InvoiceDate,StockCode,Description,Quantity,UnitPrice,Revenue,CustomerID,Country
0,536365,2010-12-01 08:26:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom
1,536373,2010-12-01 09:02:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom
2,536375,2010-12-01 09:32:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom
3,536390,2010-12-01 10:19:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,64,2.55,163.2,17511,United Kingdom
4,536394,2010-12-01 10:39:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,32,2.55,81.6,13408,United Kingdom


In [96]:
# xls

df_xls=pd.read_excel(r'../data/Sensor Data.xls')   # primera hoja

df_xls.head()

Unnamed: 0,Input 1,Input 2,Input 3,Input 4,Input 5,Input 6,Input 7,Input 8,Input 9,Input 10,Input 11,Input 12,output1,output2,class
0,1.473,2.311,3.179,2.666,0.2795,0.2771,0.2234,0.1855,0.2539,1.138,1.111,4.712,1,1,one
1,1.46,2.377,3.214,2.92,0.2527,0.3064,0.02563,0.1965,0.3027,1.213,1.027,5.463,1,1,one
2,1.552,2.164,3.064,2.745,0.282,0.21,0.1721,0.1929,0.21,1.221,1.058,5.332,1,1,one
3,1.605,2.228,3.149,2.834,0.2917,0.3613,0.2087,0.1294,0.2734,1.144,1.062,4.829,1,1,one
4,1.534,2.114,3.309,2.976,0.21,0.2502,0.2258,0.177,0.2039,1.254,1.112,5.734,1,1,one


In [97]:
df_xls=pd.read_excel(r'../data/Sensor Data.xls', 'Sheet1')   # primera hoja

df_xls.head()

Unnamed: 0,Input 1,Input 2,Input 3,Input 4,Input 5,Input 6,Input 7,Input 8,Input 9,Input 10,Input 11,Input 12,output1,output2,class
0,1.473,2.311,3.179,2.666,0.2795,0.2771,0.2234,0.1855,0.2539,1.138,1.111,4.712,1,1,one
1,1.46,2.377,3.214,2.92,0.2527,0.3064,0.02563,0.1965,0.3027,1.213,1.027,5.463,1,1,one
2,1.552,2.164,3.064,2.745,0.282,0.21,0.1721,0.1929,0.21,1.221,1.058,5.332,1,1,one
3,1.605,2.228,3.149,2.834,0.2917,0.3613,0.2087,0.1294,0.2734,1.144,1.062,4.829,1,1,one
4,1.534,2.114,3.309,2.976,0.21,0.2502,0.2258,0.177,0.2039,1.254,1.112,5.734,1,1,one


In [99]:
df_xls=pd.read_excel(r'../data/Sensor Data.xls', 'Sheet2')  

df_xls.head()

Unnamed: 0,Sensor Data
0,The data source as well as the exact nature of...
1,Each data instance contains 12 real-valued inp...
2,represents a sensor designed to detect the pre...
3,"of substances. As an alternative, the sensor r..."
4,


In [100]:
df_xls=pd.read_excel(r'../data/Sensor Data.xls', 1)  

df_xls.head()

Unnamed: 0,Sensor Data
0,The data source as well as the exact nature of...
1,Each data instance contains 12 real-valued inp...
2,represents a sensor designed to detect the pre...
3,"of substances. As an alternative, the sensor r..."
4,


In [101]:
df_xls=pd.read_excel(r'../data/Sensor Data.xls', 'Sheet3')  

df_xls.head()

Unnamed: 0,hola


In [104]:
# json

df_json=pd.read_json('../data/companies.json', orient='records', lines=True)

df_json

Unnamed: 0,_id,name,permalink,crunchbase_url,homepage_url,blog_url,blog_feed_url,twitter_username,category_code,number_of_employees,...,offices,milestones,video_embeds,screenshots,external_links,partners,deadpooled_month,deadpooled_day,deadpooled_url,ipo
0,{'$oid': '52cdef7c4bab8bd675297d8a'},Wetpaint,abc2,http://www.crunchbase.com/company/wetpaint,http://wetpaint-inc.com,http://digitalquarters.net/,http://digitalquarters.net/feed/,BachelrWetpaint,web,47.0,...,"[{'description': '', 'address1': '710 - 2nd Av...","[{'id': 5869, 'description': 'Wetpaint named i...",[],"[{'available_sizes': [[[150, 86], 'assets/imag...",[{'external_url': 'http://www.geekwire.com/201...,[],,,,
1,{'$oid': '52cdef7c4bab8bd675297d8b'},AdventNet,abc3,http://www.crunchbase.com/company/adventnet,http://adventnet.com,,,manageengine,enterprise,600.0,...,"[{'description': 'Headquarters', 'address1': '...",[],[],"[{'available_sizes': [[[150, 94], 'assets/imag...",[],[],,,,
2,{'$oid': '52cdef7c4bab8bd675297d8c'},Zoho,abc4,http://www.crunchbase.com/company/zoho,http://zoho.com,http://blogs.zoho.com/,http://blogs.zoho.com/feed,zoho,software,1600.0,...,"[{'description': 'Headquarters', 'address1': '...","[{'id': 388, 'description': 'Zoho Reaches 2 Mi...","[{'embed_code': '<object width=""430"" height=""2...",[],[{'external_url': 'http://www.online-tech-tips...,[],,,,
3,{'$oid': '52cdef7c4bab8bd675297d8d'},Digg,digg,http://www.crunchbase.com/company/digg,http://www.digg.com,http://blog.digg.com/,http://blog.digg.com/?feed=rss2,digg,news,60.0,...,"[{'description': None, 'address1': '135 Missis...","[{'id': 9588, 'description': 'Another Digg Exe...","[{'embed_code': '<embed src=""http://blip.tv/pl...","[{'available_sizes': [[[117, 150], 'assets/ima...",[{'external_url': 'http://www.sociableblog.com...,[],,,,
4,{'$oid': '52cdef7c4bab8bd675297d8e'},Facebook,facebook,http://www.crunchbase.com/company/facebook,http://facebook.com,http://blog.facebook.com,http://blog.facebook.com/atom.php,facebook,social,5299.0,...,"[{'description': 'Headquarters', 'address1': '...","[{'id': 108, 'description': 'Facebook adds com...",[],"[{'available_sizes': [[[150, 68], 'assets/imag...",[{'external_url': 'http://latimesblogs.latimes...,[],,,,"{'valuation_amount': 104000000000, 'valuation_..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18796,{'$oid': '52cdef7f4bab8bd67529c6f6'},Adhunk,adhunk,http://www.crunchbase.com/company/adhunk,http://www.adhunk.com,http://blog.adhunk.com,http://blog.adhunk.com/feed,,advertising,3.0,...,"[{'description': 'Indian Office', 'address1': ...",[],[],[],[{'external_url': 'http://www.hubpages.in/inte...,[],,,,
18797,{'$oid': '52cdef7f4bab8bd67529c6f7'},AfterLogic,afterlogic,http://www.crunchbase.com/company/afterlogic,http://www.afterlogic.com,,,afterlogic,software,,...,"[{'description': 'Livingston', 'address1': 'P....",[],[],"[{'available_sizes': [[[150, 137], 'assets/ima...",[],[],,,,
18798,{'$oid': '52cdef7f4bab8bd67529c6f8'},goBookmaker,gobookmaker,http://www.crunchbase.com/company/gobookmaker,http://www.gobookmaker.com,http://blog.gobookmaker.com,,gobookmaker,web,,...,[],[],[],"[{'available_sizes': [[[150, 80], 'assets/imag...",[],[],,,,
18799,{'$oid': '52cdef7f4bab8bd67529c6f9'},EnteGreat Solutions,entegreat-solutions,http://www.crunchbase.com/company/entegreat-so...,,,,,software,,...,"[{'description': '', 'address1': '', 'address2...",[],[],[],[],[],,,,


In [105]:
df_json.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18801 entries, 0 to 18800
Data columns (total 42 columns):
 #   Column               Non-Null Count  Dtype              
---  ------               --------------  -----              
 0   _id                  18801 non-null  object             
 1   name                 18801 non-null  object             
 2   permalink            18801 non-null  object             
 3   crunchbase_url       18801 non-null  object             
 4   homepage_url         16895 non-null  object             
 5   blog_url             16890 non-null  object             
 6   blog_feed_url        16757 non-null  object             
 7   twitter_username     11383 non-null  object             
 8   category_code        16050 non-null  object             
 9   number_of_employees  8889 non-null   float64            
 10  founded_year         13136 non-null  float64            
 11  founded_month        7898 non-null   float64            
 12  founded_day       

In [110]:
df_xlsx['Quantity * UnitPrice']=df_xlsx.Quantity * df_xlsx.UnitPrice


df_xlsx.head()

Unnamed: 0,InvoiceNo,InvoiceDate,StockCode,Description,Quantity,UnitPrice,Revenue,CustomerID,Country,suma,suma_desc,Quantity * UnitPrice
0,536365,2010-12-01 08:26:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom,15.3,Quantity * UnitPrice,15.3
1,536373,2010-12-01 09:02:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom,15.3,Quantity * UnitPrice,15.3
2,536375,2010-12-01 09:32:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,6,2.55,15.3,17850,United Kingdom,15.3,Quantity * UnitPrice,15.3
3,536390,2010-12-01 10:19:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,64,2.55,163.2,17511,United Kingdom,163.2,Quantity * UnitPrice,163.2
4,536394,2010-12-01 10:39:00,85123A,CREAM HANGING HEART T-LIGHT HOLDER,32,2.55,81.6,13408,United Kingdom,81.6,Quantity * UnitPrice,81.6


In [111]:
df_xlsx.columns

Index(['InvoiceNo', 'InvoiceDate', 'StockCode', 'Description', 'Quantity',
       'UnitPrice', 'Revenue', 'CustomerID', 'Country', 'suma', 'suma_desc',
       'Quantity * UnitPrice'],
      dtype='object')

In [115]:
lst=[43354353, '395478639584739', 4, 'hola que tal', 4, 3.78, 56, 17678, 'EIRE', 890, 'tr']

In [116]:
dictio=dict(zip(df_xlsx.columns, lst))

dictio

{'InvoiceNo': 43354353,
 'InvoiceDate': '395478639584739',
 'StockCode': 4,
 'Description': 'hola que tal',
 'Quantity': 4,
 'UnitPrice': 3.78,
 'Revenue': 56,
 'CustomerID': 17678,
 'Country': 'EIRE',
 'suma': 890,
 'suma_desc': 'tr'}

In [120]:
list(zip(df_xlsx.columns, lst))

[('InvoiceNo', 43354353),
 ('InvoiceDate', '395478639584739'),
 ('StockCode', 4),
 ('Description', 'hola que tal'),
 ('Quantity', 4),
 ('UnitPrice', 3.78),
 ('Revenue', 56),
 ('CustomerID', 17678),
 ('Country', 'EIRE'),
 ('suma', 890),
 ('suma_desc', 'tr')]

In [125]:
df.index=[f'a_{i}' for i in range(10)]

df.head()

Unnamed: 0,col1,col2,col3,col4,col5,col10,ceros,lista
a_0,0.840627,0.67006,0.389848,0.010784,0.794787,0.927517,0.0,0.0
a_1,0.49364,0.887705,0.039767,0.087359,0.220752,0.161117,0.0,3.0
a_2,0.037372,0.617858,0.310775,0.52271,0.670038,16.631448,0.0,6.0
a_3,0.825062,0.457498,0.380344,0.452181,0.333637,0.921978,0.0,9.0
a_4,0.172487,0.280989,0.990064,0.761274,0.449902,11.479886,0.0,12.0


In [129]:
df.col1['a_1']

0.493639676076048

In [130]:
df[0]

KeyError: 0

In [131]:
df[0]=0

In [132]:
df[0]

a_0    0
a_1    0
a_2    0
a_3    0
a_4    0
a_5    0
a_6    0
a_7    0
a_8    0
a_9    0
Name: 0, dtype: int64

In [133]:
df['0']

KeyError: '0'

In [135]:
len(df.col1)

10

In [136]:
df.col1[-1]

0.4059208800584643

In [137]:
if 1:
    print('hola')

hola


In [138]:
df['1']=1

In [140]:
if df['1'][0]:
    print('aqui')

aqui


In [141]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a_0 to a_9
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    10 non-null     float64
 1   col2    10 non-null     float64
 2   col3    10 non-null     float64
 3   col4    10 non-null     float64
 4   col5    10 non-null     float64
 5   col10   10 non-null     float64
 6   ceros   10 non-null     float64
 7   lista   10 non-null     object 
 8   0       10 non-null     int64  
 9   1       10 non-null     int64  
dtypes: float64(7), int64(2), object(1)
memory usage: 1.2+ KB


In [142]:
df['bool']=True


df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a_0 to a_9
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    10 non-null     float64
 1   col2    10 non-null     float64
 2   col3    10 non-null     float64
 3   col4    10 non-null     float64
 4   col5    10 non-null     float64
 5   col10   10 non-null     float64
 6   ceros   10 non-null     float64
 7   lista   10 non-null     object 
 8   0       10 non-null     int64  
 9   1       10 non-null     int64  
 10  bool    10 non-null     bool   
dtypes: bool(1), float64(7), int64(2), object(1)
memory usage: 1.2+ KB


In [146]:
ls

1.1 - Numpy.ipynb         1.2 - Intro Pandas.ipynb  [1m[36mimages[m[m/
