# 1. Import Pandas

**History of development**

In 2008, pandas development began at AQR Capital Management. By the end of 2009 it had been open sourced, and is actively supported today by a community of like-minded individuals around the world who contribute their valuable time and energy to help make open source pandas possible. Thank you to all of our contributors.

Since 2015, pandas is a NumFOCUS sponsored project. This will help ensure the success of development of pandas as a world-class open-source project.

Timeline
2008: Development of pandas started
2009: pandas becomes open source
2012: First edition of Python for Data Analysis is published
2015: pandas becomes a NumFOCUS sponsored project
2018: First in-person core developer sprint

source : https://pandas.pydata.org/about/


In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.__version__

'1.2.4'

In [3]:
# !pip install pandas == 0.24.1

## Set Options

click https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html for more

In [4]:
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 8)
pd.set_option('display.max_rows', 8)
pd.set_option('display.width', 80)

## DataFrame

In [6]:
A = [0,1,4,7,1,2,5,7,9,7,2]
B = [4,1,4,6,7,9,9,4,7,1,2]
C = [3,6,4,9,9,8,8,7,1,2,1]
E = np.arange(200,320,len(A))
F = np.random.normal(size=len(A))

In [7]:
E

array([200, 211, 222, 233, 244, 255, 266, 277, 288, 299, 310])

In [8]:
F

array([-0.11261821, -1.63410876,  0.03535861,  0.03272837, -0.20964675,
        2.04186186,  0.72563255, -0.67784043, -0.46252845,  1.55554675,
       -1.25643314])

In [9]:
data_gen = pd.DataFrame({'A1' : A, 'B1' : B, 'C1' : C, 'D1' : E, 'E1': F})

In [10]:
data_gen

    A1  B1  C1   D1        E1
0    0   4   3  200 -0.112618
1    1   1   6  211 -1.634109
2    4   4   4  222  0.035359
3    7   6   9  233  0.032728
..  ..  ..  ..  ...       ...
7    7   4   7  277 -0.677840
8    9   7   1  288 -0.462528
9    7   1   2  299  1.555547
10   2   2   1  310 -1.256433

[11 rows x 5 columns]

In [11]:
type(data_gen)

pandas.core.frame.DataFrame

In [36]:
data_gen.columns

Index(['A1', 'B1', 'C1', 'D1', 'E1'], dtype='object')

In [14]:
## To_Array
data_gen.values

array([[ 0.00000000e+00,  4.00000000e+00,  3.00000000e+00,
         2.00000000e+02, -1.12618214e-01],
       [ 1.00000000e+00,  1.00000000e+00,  6.00000000e+00,
         2.11000000e+02, -1.63410876e+00],
       [ 4.00000000e+00,  4.00000000e+00,  4.00000000e+00,
         2.22000000e+02,  3.53586063e-02],
       [ 7.00000000e+00,  6.00000000e+00,  9.00000000e+00,
         2.33000000e+02,  3.27283736e-02],
       [ 1.00000000e+00,  7.00000000e+00,  9.00000000e+00,
         2.44000000e+02, -2.09646750e-01],
       [ 2.00000000e+00,  9.00000000e+00,  8.00000000e+00,
         2.55000000e+02,  2.04186186e+00],
       [ 5.00000000e+00,  9.00000000e+00,  8.00000000e+00,
         2.66000000e+02,  7.25632548e-01],
       [ 7.00000000e+00,  4.00000000e+00,  7.00000000e+00,
         2.77000000e+02, -6.77840426e-01],
       [ 9.00000000e+00,  7.00000000e+00,  1.00000000e+00,
         2.88000000e+02, -4.62528450e-01],
       [ 7.00000000e+00,  1.00000000e+00,  2.00000000e+00,
         2.99000000e+02

## Series

In [19]:
A

[0, 1, 4, 7, 1, 2, 5, 7, 9, 7, 2]

In [20]:
series_gen = pd.Series(A)
series_gen2 = pd.Series(F)

In [21]:
series_gen

0     0
1     1
2     4
3     7
     ..
7     7
8     9
9     7
10    2
Length: 11, dtype: int64

In [22]:
series_gen2

0    -0.112618
1    -1.634109
2     0.035359
3     0.032728
        ...   
7    -0.677840
8    -0.462528
9     1.555547
10   -1.256433
Length: 11, dtype: float64

In [23]:
type(series_gen)

pandas.core.series.Series

In [26]:
new_i =  ['A'+str(i) for i in range(len(A))]
new_i

['A0', 'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10']

In [24]:
series_gen2 = pd.Series(A, index = new_i)

In [27]:
series_gen2

A0     0
A1     1
A2     4
A3     7
      ..
A7     7
A8     9
A9     7
A10    2
Length: 11, dtype: int64

In [29]:
## generate series of dates
# dates = pd.date_range('2016-04-01', '2016-04-06')
dates = pd.date_range('2016-04-01', periods=11,freq='D')

series_gen3 = pd.Series(A, index = dates)

In [30]:
series_gen3

2016-04-01    0
2016-04-02    1
2016-04-03    4
2016-04-04    7
             ..
2016-04-08    7
2016-04-09    9
2016-04-10    7
2016-04-11    2
Freq: D, Length: 11, dtype: int64

In [31]:
## Reverse
series_gen3.values

array([0, 1, 4, 7, 1, 2, 5, 7, 9, 7, 2], dtype=int64)

In [32]:
## Notes

In [33]:
type(data_gen['A1'])

pandas.core.series.Series

In [34]:
type(data_gen[['A1']])

pandas.core.frame.DataFrame