# 1. Import Pandas

**History of development**

In 2008, pandas development began at AQR Capital Management. By the end of 2009 it had been open sourced, and is actively supported today by a community of like-minded individuals around the world who contribute their valuable time and energy to help make open source pandas possible. Thank you to all of our contributors.

Since 2015, pandas is a NumFOCUS sponsored project. This will help ensure the success of development of pandas as a world-class open-source project.

Timeline
2008: Development of pandas started
2009: pandas becomes open source
2012: First edition of Python for Data Analysis is published
2015: pandas becomes a NumFOCUS sponsored project
2018: First in-person core developer sprint

source : https://pandas.pydata.org/about/


In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.__version__

'0.24.1'

In [4]:
# !pip install pandas == 0.24.1

## Set Options

click https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html for more

In [4]:
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 8)
pd.set_option('display.max_rows', 8)
pd.set_option('display.width', 80)

## DataFrame

In [15]:
A = [0,1,4,7,1,2,5,7,9,7,2]
B = [4,1,4,6,7,9,9,4,7,1,2]
C = [3,6,4,9,9,8,8,7,1,2,1]
E = np.arange(200,320,len(A))
F = np.random.normal(size=len(A))

In [6]:
A

[0, 1, 4, 7, 1, 2, 5, 7, 9, 7, 2]

In [7]:
B

[4, 1, 4, 6, 7, 9, 9, 4, 7, 1, 2]

In [17]:
C

[3, 6, 4, 9, 9, 8, 8, 7, 1, 2, 1]

In [18]:
E

array([200, 211, 222, 233, 244, 255, 266, 277, 288, 299, 310])

In [19]:
F

array([ 0.84378096, -0.7484033 ,  0.59281569, -0.50760537, -1.7809486 ,
        1.38689134, -0.95548732, -1.01982397, -2.40097725,  0.26924331,
       -0.28387666])

In [20]:
data_gen = pd.DataFrame({'A1' : A, 'B1' : B, 'C1' : C, 'D1' : E, 'E1': F})

In [21]:
data_gen

Unnamed: 0,A1,B1,C1,D1,E1
0,0,4,3,200,0.843781
1,1,1,6,211,-0.748403
2,4,4,4,222,0.592816
3,7,6,9,233,-0.507605
4,1,7,9,244,-1.780949
5,2,9,8,255,1.386891
6,5,9,8,266,-0.955487
7,7,4,7,277,-1.019824
8,9,7,1,288,-2.400977
9,7,1,2,299,0.269243


In [22]:
type(data_gen)

pandas.core.frame.DataFrame

In [26]:
## To_Array
data_gen.as_matrix()

  


array([[ 0.00000000e+00,  4.00000000e+00,  3.00000000e+00,
         2.00000000e+02,  8.43780964e-01],
       [ 1.00000000e+00,  1.00000000e+00,  6.00000000e+00,
         2.11000000e+02, -7.48403304e-01],
       [ 4.00000000e+00,  4.00000000e+00,  4.00000000e+00,
         2.22000000e+02,  5.92815691e-01],
       [ 7.00000000e+00,  6.00000000e+00,  9.00000000e+00,
         2.33000000e+02, -5.07605373e-01],
       [ 1.00000000e+00,  7.00000000e+00,  9.00000000e+00,
         2.44000000e+02, -1.78094860e+00],
       [ 2.00000000e+00,  9.00000000e+00,  8.00000000e+00,
         2.55000000e+02,  1.38689134e+00],
       [ 5.00000000e+00,  9.00000000e+00,  8.00000000e+00,
         2.66000000e+02, -9.55487324e-01],
       [ 7.00000000e+00,  4.00000000e+00,  7.00000000e+00,
         2.77000000e+02, -1.01982397e+00],
       [ 9.00000000e+00,  7.00000000e+00,  1.00000000e+00,
         2.88000000e+02, -2.40097725e+00],
       [ 7.00000000e+00,  1.00000000e+00,  2.00000000e+00,
         2.99000000e+02

## Series

In [29]:
A

[0, 1, 4, 7, 1, 2, 5, 7, 9, 7, 2]

In [28]:
series_gen = pd.Series(A)
series_gen2 = pd.Series(F)

In [30]:
series_gen

0     0
1     1
2     4
3     7
4     1
5     2
6     5
7     7
8     9
9     7
10    2
dtype: int64

In [31]:
series_gen2

0     0.843781
1    -0.748403
2     0.592816
3    -0.507605
4    -1.780949
5     1.386891
6    -0.955487
7    -1.019824
8    -2.400977
9     0.269243
10   -0.283877
dtype: float64

In [32]:
type(series_gen)

pandas.core.series.Series

In [34]:
new_index = ['A'+str(i+1) for i in range(len(A))]
new_index

['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10', 'A11']

In [35]:
series_gen2 = pd.Series(A, index = new_index)

In [36]:
series_gen2

A1     0
A2     1
A3     4
A4     7
A5     1
A6     2
A7     5
A8     7
A9     9
A10    7
A11    2
dtype: int64

In [41]:
## generate series of dates
# dates = pd.date_range('2016-04-01', '2016-04-06')
index_dates = pd.date_range('2016-04-01', periods=11,freq='D')
index_dates

DatetimeIndex(['2016-04-01', '2016-04-02', '2016-04-03', '2016-04-04',
               '2016-04-05', '2016-04-06', '2016-04-07', '2016-04-08',
               '2016-04-09', '2016-04-10', '2016-04-11'],
              dtype='datetime64[ns]', freq='D')

In [43]:
series_gen3 = pd.Series(A, index = index_dates)
series_gen3

2016-04-01    0
2016-04-02    1
2016-04-03    4
2016-04-04    7
2016-04-05    1
2016-04-06    2
2016-04-07    5
2016-04-08    7
2016-04-09    9
2016-04-10    7
2016-04-11    2
Freq: D, dtype: int64

In [44]:
## Reverse
series_gen3.values

array([0, 1, 4, 7, 1, 2, 5, 7, 9, 7, 2])

In [45]:
## Notes

In [46]:
data_gen

Unnamed: 0,A1,B1,C1,D1,E1
0,0,4,3,200,0.843781
1,1,1,6,211,-0.748403
2,4,4,4,222,0.592816
3,7,6,9,233,-0.507605
4,1,7,9,244,-1.780949
5,2,9,8,255,1.386891
6,5,9,8,266,-0.955487
7,7,4,7,277,-1.019824
8,9,7,1,288,-2.400977
9,7,1,2,299,0.269243


In [49]:
# type(data_gen['A1'])
data_gen['A1']

0     0
1     1
2     4
3     7
4     1
5     2
6     5
7     7
8     9
9     7
10    2
Name: A1, dtype: int64

In [51]:
# type(data_gen[['A1']])
data_gen[['A1']]

Unnamed: 0,A1
0,0
1,1
2,4
3,7
4,1
5,2
6,5
7,7
8,9
9,7
