# Pandas

https://pandas.pydata.org/

`conda install pandas`

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:

- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame provides and much more. pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.

Here are just a few of the things that pandas does well:

- Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
- Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
- Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
- Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
- Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
- Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
- Intuitive merging and joining data sets
- Flexible reshaping and pivoting of data sets
- Hierarchical labeling of axes (possible to have multiple labels per tick)
- Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
- Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

## Series and Dataframes

The best way to think about the pandas data structures is as flexible containers for lower dimensional data. For example, DataFrame is a container for Series, and Series is a container for scalars. We would like to be able to insert and remove objects from these containers in a dictionary-like fashion.

Also, we would like sensible default behaviors for the common API functions which take into account the typical orientation of time series and cross-sectional data sets. When using ndarrays to store 2- and 3-dimensional data, a burden is placed on the user to consider the orientation of the data set when writing functions; axes are considered more or less equivalent (except when C- or Fortran-contiguousness matters for performance). In pandas, the axes are intended to lend more semantic meaning to the data; i.e., for a particular data set there is likely to be a “right” way to orient the data. The goal, then, is to reduce the amount of mental effort required to code up data transformations in downstream functions.

## SQL and Pandas

Both of them work with "tabular" data but they are not the same thing.

![Basic DataFrame](images/pandas-basic.png "Basic DataFrame")

SQL usually refers to a DBMS (Data Base Management System) that implements the relational model. It is used to keep data and maintain data integrity in long period of times. At the same time it offers a language to query and analyse the data that resides in the DBMS.

![Relational Model](images/relational.jpg "Relational Model")

Pandas is a data analysis library, it's not designed to keep data and it's integrity.

![MultiIndex DataFrame](images/pandas-multindex.png "MultiIndex DataFrame")

## 10 minutes to pandas

https://pandas.pydata.org/pandas-docs/version/0.25/getting_started/10min.html

In [0]:
import numpy as np
import pandas as pd

In [2]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [3]:
dates = pd.date_range('20130101', periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [4]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.139915,0.686243,-0.198489,0.672779
2013-01-02,-0.090755,0.219519,-0.137148,-0.321809
2013-01-03,-1.618753,-1.175474,0.492026,-0.348725
2013-01-04,0.372716,-0.907985,-0.531178,-0.320207
2013-01-05,-0.972631,2.091317,2.423233,1.506462
2013-01-06,0.375061,0.340456,-0.788131,-0.502219


In [5]:
df2 = pd.DataFrame({'A': 1.,
  'B': pd.Timestamp('20130102'),
  'C': pd.Series(1, index=list(range(4)), dtype='float32'),
  'D': np.array([3] * 4, dtype='int32'),
  'E': pd.Categorical(["test", "train", "test", "train"]),
  'F': 'foo'})

df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


In [7]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

In [9]:
df.head(2)
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,0.372716,-0.907985,-0.531178,-0.320207
2013-01-05,-0.972631,2.091317,2.423233,1.506462
2013-01-06,0.375061,0.340456,-0.788131,-0.502219


In [10]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [11]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [12]:
df.to_numpy()

array([[ 0.13991516,  0.68624321, -0.19848899,  0.67277927],
       [-0.09075509,  0.21951905, -0.1371479 , -0.32180871],
       [-1.61875287, -1.17547433,  0.49202553, -0.34872546],
       [ 0.37271558, -0.90798474, -0.53117832, -0.32020718],
       [-0.97263137,  2.0913167 ,  2.42323316,  1.5064622 ],
       [ 0.3750613 ,  0.34045642, -0.78813144, -0.50221875]])

In [13]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.299075,0.209013,0.210052,0.11438
std,0.816958,1.179211,1.167127,0.802931
min,-1.618753,-1.175474,-0.788131,-0.502219
25%,-0.752162,-0.626109,-0.448006,-0.341996
50%,0.02458,0.279988,-0.167818,-0.321008
75%,0.314515,0.599797,0.334732,0.424533
max,0.375061,2.091317,2.423233,1.506462


In [14]:
df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,0.139915,-0.090755,-1.618753,0.372716,-0.972631,0.375061
B,0.686243,0.219519,-1.175474,-0.907985,2.091317,0.340456
C,-0.198489,-0.137148,0.492026,-0.531178,2.423233,-0.788131
D,0.672779,-0.321809,-0.348725,-0.320207,1.506462,-0.502219


In [15]:
df.sort_index(axis = 1,ascending= False)

Unnamed: 0,D,C,B,A
2013-01-01,0.672779,-0.198489,0.686243,0.139915
2013-01-02,-0.321809,-0.137148,0.219519,-0.090755
2013-01-03,-0.348725,0.492026,-1.175474,-1.618753
2013-01-04,-0.320207,-0.531178,-0.907985,0.372716
2013-01-05,1.506462,2.423233,2.091317,-0.972631
2013-01-06,-0.502219,-0.788131,0.340456,0.375061


In [16]:
df.sort_values(by = 'B')

Unnamed: 0,A,B,C,D
2013-01-03,-1.618753,-1.175474,0.492026,-0.348725
2013-01-04,0.372716,-0.907985,-0.531178,-0.320207
2013-01-02,-0.090755,0.219519,-0.137148,-0.321809
2013-01-06,0.375061,0.340456,-0.788131,-0.502219
2013-01-01,0.139915,0.686243,-0.198489,0.672779
2013-01-05,-0.972631,2.091317,2.423233,1.506462


In [17]:
df['A']

2013-01-01    0.139915
2013-01-02   -0.090755
2013-01-03   -1.618753
2013-01-04    0.372716
2013-01-05   -0.972631
2013-01-06    0.375061
Freq: D, Name: A, dtype: float64

In [18]:
df[['A','B']]

Unnamed: 0,A,B
2013-01-01,0.139915,0.686243
2013-01-02,-0.090755,0.219519
2013-01-03,-1.618753,-1.175474
2013-01-04,0.372716,-0.907985
2013-01-05,-0.972631,2.091317
2013-01-06,0.375061,0.340456


In [19]:
df[0:3]

Unnamed: 0,A,B,C,D
2013-01-01,0.139915,0.686243,-0.198489,0.672779
2013-01-02,-0.090755,0.219519,-0.137148,-0.321809
2013-01-03,-1.618753,-1.175474,0.492026,-0.348725


In [20]:
df['20130102':'20130104']

Unnamed: 0,A,B,C,D
2013-01-02,-0.090755,0.219519,-0.137148,-0.321809
2013-01-03,-1.618753,-1.175474,0.492026,-0.348725
2013-01-04,0.372716,-0.907985,-0.531178,-0.320207


In [21]:
df.loc[dates[0]]

A    0.139915
B    0.686243
C   -0.198489
D    0.672779
Name: 2013-01-01 00:00:00, dtype: float64

In [22]:
df.loc[:,['A','B']]

Unnamed: 0,A,B
2013-01-01,0.139915,0.686243
2013-01-02,-0.090755,0.219519
2013-01-03,-1.618753,-1.175474
2013-01-04,0.372716,-0.907985
2013-01-05,-0.972631,2.091317
2013-01-06,0.375061,0.340456


In [23]:
df.loc['20130102':'20130104',['A','B']]

Unnamed: 0,A,B
2013-01-02,-0.090755,0.219519
2013-01-03,-1.618753,-1.175474
2013-01-04,0.372716,-0.907985


In [24]:
df.loc['20130102',['A','B']]

A   -0.090755
B    0.219519
Name: 2013-01-02 00:00:00, dtype: float64

In [25]:
df.loc[dates[0],'A']

0.13991515817821326

In [26]:
df.at[dates[0],'A']

0.13991515817821326

In [27]:
df.iloc[3]

A    0.372716
B   -0.907985
C   -0.531178
D   -0.320207
Name: 2013-01-04 00:00:00, dtype: float64

In [28]:
df.iloc[3:5,0:2]

Unnamed: 0,A,B
2013-01-04,0.372716,-0.907985
2013-01-05,-0.972631,2.091317


In [30]:
df.iloc[[1,2,4],[0,2]]

Unnamed: 0,A,C
2013-01-02,-0.090755,-0.137148
2013-01-03,-1.618753,0.492026
2013-01-05,-0.972631,2.423233


In [31]:
df.iloc[1:3,:]

Unnamed: 0,A,B,C,D
2013-01-02,-0.090755,0.219519,-0.137148,-0.321809
2013-01-03,-1.618753,-1.175474,0.492026,-0.348725


In [32]:
df.iloc[:,1:3]

Unnamed: 0,B,C
2013-01-01,0.686243,-0.198489
2013-01-02,0.219519,-0.137148
2013-01-03,-1.175474,0.492026
2013-01-04,-0.907985,-0.531178
2013-01-05,2.091317,2.423233
2013-01-06,0.340456,-0.788131


In [33]:
df.iloc[1,1]

0.21951905097793029

In [34]:
df.iat[1,1]

0.21951905097793029

In [35]:
df[df.A > 0]

Unnamed: 0,A,B,C,D
2013-01-01,0.139915,0.686243,-0.198489,0.672779
2013-01-04,0.372716,-0.907985,-0.531178,-0.320207
2013-01-06,0.375061,0.340456,-0.788131,-0.502219


In [36]:
df[df > 0]

Unnamed: 0,A,B,C,D
2013-01-01,0.139915,0.686243,,0.672779
2013-01-02,,0.219519,,
2013-01-03,,,0.492026,
2013-01-04,0.372716,,,
2013-01-05,,2.091317,2.423233,1.506462
2013-01-06,0.375061,0.340456,,


In [39]:
df2 = df.copy()

df2['E'] = ['one','one','two','three','four','three']

df2[df2['E'].isin(['two','four'])]

Unnamed: 0,A,B,C,D,E
2013-01-03,-1.618753,-1.175474,0.492026,-0.348725,two
2013-01-05,-0.972631,2.091317,2.423233,1.506462,four


In [0]:
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range('20130102', periods=6))

df['F'] = s1

In [41]:
df.at[dates[0],'A'] = 0
df.iat[0,1] = 0
df.loc[:,'D'] = np.array([5] * len(df))

df

Unnamed: 0,A,B,C,D,F
2013-01-01,0.0,0.0,-0.198489,5,
2013-01-02,-0.090755,0.219519,-0.137148,5,1.0
2013-01-03,-1.618753,-1.175474,0.492026,5,2.0
2013-01-04,0.372716,-0.907985,-0.531178,5,3.0
2013-01-05,-0.972631,2.091317,2.423233,5,4.0
2013-01-06,0.375061,0.340456,-0.788131,5,5.0


In [43]:
df2 = df.copy()

df2[df2 > 0] = -df2

df2

Unnamed: 0,A,B,C,D,F
2013-01-01,0.0,0.0,-0.198489,-5,
2013-01-02,-0.090755,-0.219519,-0.137148,-5,-1.0
2013-01-03,-1.618753,-1.175474,-0.492026,-5,-2.0
2013-01-04,-0.372716,-0.907985,-0.531178,-5,-3.0
2013-01-05,-0.972631,-2.091317,-2.423233,-5,-4.0
2013-01-06,-0.375061,-0.340456,-0.788131,-5,-5.0


In [44]:
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df1.loc[dates[0]:dates[1], 'E'] = 1
df1

Unnamed: 0,A,B,C,D,F,E
2013-01-01,0.0,0.0,-0.198489,5,,1.0
2013-01-02,-0.090755,0.219519,-0.137148,5,1.0,1.0
2013-01-03,-1.618753,-1.175474,0.492026,5,2.0,
2013-01-04,0.372716,-0.907985,-0.531178,5,3.0,


In [46]:
df1.dropna(how = 'any')

Unnamed: 0,A,B,C,D,F,E
2013-01-02,-0.090755,0.219519,-0.137148,5,1.0,1.0


In [47]:
df1.fillna(value = 5)

Unnamed: 0,A,B,C,D,F,E
2013-01-01,0.0,0.0,-0.198489,5,5.0,1.0
2013-01-02,-0.090755,0.219519,-0.137148,5,1.0,1.0
2013-01-03,-1.618753,-1.175474,0.492026,5,2.0,5.0
2013-01-04,0.372716,-0.907985,-0.531178,5,3.0,5.0


In [48]:
df1.isna()

Unnamed: 0,A,B,C,D,F,E
2013-01-01,False,False,False,False,True,False
2013-01-02,False,False,False,False,False,False
2013-01-03,False,False,False,False,False,True
2013-01-04,False,False,False,False,False,True


In [49]:
df.mean()

A   -0.322394
B    0.094639
C    0.210052
D    5.000000
F    3.000000
dtype: float64

In [50]:
df.mean(1)

2013-01-01    1.200378
2013-01-02    1.198323
2013-01-03    0.939560
2013-01-04    1.386711
2013-01-05    2.508384
2013-01-06    1.985477
Freq: D, dtype: float64

In [51]:
s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
s

2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64

In [52]:
df.sub(s, axis='index')

Unnamed: 0,A,B,C,D,F
2013-01-01,,,,,
2013-01-02,,,,,
2013-01-03,-2.618753,-2.175474,-0.507974,4.0,1.0
2013-01-04,-2.627284,-3.907985,-3.531178,2.0,0.0
2013-01-05,-5.972631,-2.908683,-2.576767,0.0,-1.0
2013-01-06,,,,,


In [53]:
df.apply(np.cumsum)

Unnamed: 0,A,B,C,D,F
2013-01-01,0.0,0.0,-0.198489,5,
2013-01-02,-0.090755,0.219519,-0.335637,10,1.0
2013-01-03,-1.709508,-0.955955,0.156389,15,3.0
2013-01-04,-1.336792,-1.86394,-0.37479,20,6.0
2013-01-05,-2.309424,0.227377,2.048443,25,10.0
2013-01-06,-1.934362,0.567833,1.260312,30,15.0


In [54]:
df.apply(lambda x: x.max() - x.min())

A    1.993814
B    3.266791
C    3.211365
D    0.000000
F    4.000000
dtype: float64

In [55]:
s = pd.Series(np.random.randint(0, 7, size=10))
s.value_counts()

5    3
3    3
2    2
4    1
1    1
dtype: int64

In [56]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
s.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

In [57]:
df = pd.DataFrame(np.random.randn(10, 4))
pieces = [df[:3], df[3:7], df[7:]]
pd.concat(pieces)

Unnamed: 0,0,1,2,3
0,0.682866,0.446478,1.082094,-0.898303
1,-0.482725,-1.024041,0.603116,0.831035
2,0.735248,-0.941045,1.540577,0.175588
3,0.092921,-0.100323,-0.148897,0.557623
4,0.254199,-1.807379,-0.607507,-0.578282
5,0.437963,0.728015,-0.550241,0.316439
6,-0.123062,0.875085,0.371756,-0.194436
7,0.380964,-1.15707,-2.216568,-1.549414
8,-1.383344,-1.006823,-0.145955,-0.224505
9,0.697538,-1.127854,0.996044,-0.602882


In [58]:
left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})

pd.merge(left, right, on='key')

Unnamed: 0,key,lval,rval
0,foo,1,4
1,foo,1,5
2,foo,2,4
3,foo,2,5


In [59]:
left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})

pd.merge(left, right, on='key')

Unnamed: 0,key,lval,rval
0,foo,1,4
1,bar,2,5


In [60]:
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
s = df.iloc[3]
df.append(s, ignore_index=True)

Unnamed: 0,A,B,C,D
0,0.030909,-1.167385,0.146032,0.468421
1,-0.825665,-0.190736,0.499751,-0.953032
2,-2.408368,0.274544,0.954296,1.019537
3,0.941413,0.047111,1.276208,-1.032197
4,1.820532,-0.097585,-0.321179,-1.172483
5,0.792138,-0.407578,0.715356,0.517043
6,-0.379774,0.467724,-0.436161,-0.17923
7,-1.461953,0.280565,-0.862792,-1.644154
8,0.941413,0.047111,1.276208,-1.032197


In [61]:
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
  'foo', 'bar', 'foo', 'foo'],
  'B': ['one', 'one', 'two', 'three',
  'two', 'two', 'one', 'three'],
  'C': np.random.randn(8),
  'D': np.random.randn(8)})

df

Unnamed: 0,A,B,C,D
0,foo,one,0.691127,-0.454007
1,bar,one,0.686744,0.46103
2,foo,two,0.564423,-1.314168
3,bar,three,-0.183554,2.202208
4,foo,two,0.635668,-0.696705
5,bar,two,1.407759,0.444731
6,foo,one,-1.94902,0.072809
7,foo,three,0.333329,-1.732348


In [62]:
df.groupby('A').sum()

Unnamed: 0_level_0,C,D
A,Unnamed: 1_level_1,Unnamed: 2_level_1
bar,1.91095,3.107969
foo,0.275526,-4.124418


In [63]:
df.groupby(['A', 'B']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,C,D
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,0.686744,0.46103
bar,three,-0.183554,2.202208
bar,two,1.407759,0.444731
foo,one,-1.257893,-0.381198
foo,three,0.333329,-1.732348
foo,two,1.200091,-2.010873


In [65]:
tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',
  'foo', 'foo', 'qux', 'qux'],
  ['one', 'two', 'one', 'two',
  'one', 'two', 'one', 'two']]))

tuples

[('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

In [67]:
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [68]:
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,0.432961,1.338315
bar,two,-3.447433,0.113752
baz,one,-0.146996,-0.743702
baz,two,-1.589996,-0.09607
foo,one,1.034848,0.976384
foo,two,-0.802327,-0.541221
qux,one,0.08678,0.038293
qux,two,-0.637964,0.894517


In [70]:
df2 = df[:4]
df2

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,0.432961,1.338315
bar,two,-3.447433,0.113752
baz,one,-0.146996,-0.743702
baz,two,-1.589996,-0.09607


In [72]:
stacked = df2.stack()
stacked

first  second   
bar    one     A    0.432961
               B    1.338315
       two     A   -3.447433
               B    0.113752
baz    one     A   -0.146996
               B   -0.743702
       two     A   -1.589996
               B   -0.096070
dtype: float64

In [73]:
stacked.unstack()

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
first,second,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,0.432961,1.338315
bar,two,-3.447433,0.113752
baz,one,-0.146996,-0.743702
baz,two,-1.589996,-0.09607


In [74]:
stacked.unstack(1)

Unnamed: 0_level_0,second,one,two
first,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,A,0.432961,-3.447433
bar,B,1.338315,0.113752
baz,A,-0.146996,-1.589996
baz,B,-0.743702,-0.09607


In [76]:
stacked.unstack(0)

Unnamed: 0_level_0,first,bar,baz
second,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,A,0.432961,-0.146996
one,B,1.338315,-0.743702
two,A,-3.447433,-1.589996
two,B,0.113752,-0.09607


In [77]:
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,
  'B': ['A', 'B', 'C'] * 4,
  'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
  'D': np.random.randn(12),
  'E': np.random.randn(12)})

df

Unnamed: 0,A,B,C,D,E
0,one,A,foo,-1.971724,1.462562
1,one,B,foo,-0.091007,-1.616504
2,two,C,foo,0.668178,-0.232335
3,three,A,bar,0.620718,0.263847
4,one,B,bar,-0.694171,1.105894
5,one,C,bar,-0.238895,-1.557922
6,two,A,foo,-0.422354,-1.646734
7,three,B,foo,-0.151648,-1.094908
8,one,C,foo,0.866489,-0.106508
9,one,A,bar,1.053695,0.46036


In [78]:
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])

Unnamed: 0_level_0,C,bar,foo
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
one,A,1.053695,-1.971724
one,B,-0.694171,-0.091007
one,C,-0.238895,0.866489
three,A,0.620718,
three,B,,-0.151648
three,C,-0.030789,
two,A,,-0.422354
two,B,-2.030844,
two,C,,0.668178


## Essential basic functionality

https://pandas.pydata.org/pandas-docs/version/0.25/getting_started/basics.html

## Data structures

https://pandas.pydata.org/pandas-docs/version/0.25/getting_started/dsintro.html

## Reference

- loc[], iloc[], at[], iat[]
- head(), tail()
- describe()
- sort_index(axis=N, ascending=True)
- sort_values(by=N)
- to_numpy()
- pd.date_range()
- .str
- groupby()

## Data statistics with pandas

Reproduce the exercise `Populations` from Numpy but using Pandas data structures and functions