# Pandas Introduction

In [2]:
import numpy as np
import pandas as pd

## Object creation

Create a Series by passing a list of values

In [3]:
s = pd.Series([1, 3, 4, np.nan, 6, 8])
print(s)

0    1.0
1    3.0
2    4.0
3    NaN
4    6.0
5    8.0
dtype: float64


Creating a DataFrame by passing a NumPy array, with datetime index using data_range() and labeled columns

In [6]:
dates = pd.date_range("20130101", periods=6)
print(dates)
df = pd.DataFrame(np.random.rand(6, 4), index=dates, columns=list("ABCD"))
print(df)

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')
                   A         B         C         D
2013-01-01  0.376441  0.480929  0.434533  0.394856
2013-01-02  0.469542  0.067968  0.221564  0.203295
2013-01-03  0.387729  0.421222  0.651962  0.695903
2013-01-04  0.657929  0.487649  0.487543  0.095084
2013-01-05  0.920227  0.133245  0.654263  0.772975
2013-01-06  0.862264  0.782905  0.204774  0.883591


Creating a DataFrame by passing a dictionaty of objects that can be converted into a series-like structure: 

In [13]:
df2 = pd.DataFrame({
    "A": 1.0,
    "B": pd.Timestamp("20130102"),
    "C": pd.Series(1, index=list(range(4)), dtype="float32"),
    "D": np.array([3]* 4, dtype="int32"),
    "E": pd.Categorical(["test", "train", "test", "train"]),
    "F": "foo",
})
print(df2)
print('-'*40)
print(df2.dtypes)

     A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo
----------------------------------------
A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object


## Viewing data

Use:
- DataFrame.head()
- DataFrame.tail()
<br>to view top and bottom rows respectively.

In [19]:
print(df.head(3))
print('-'*50)
print(df.tail(3))

                   A         B         C         D
2013-01-01  0.376441  0.480929  0.434533  0.394856
2013-01-02  0.469542  0.067968  0.221564  0.203295
2013-01-03  0.387729  0.421222  0.651962  0.695903
--------------------------------------------------
                   A         B         C         D
2013-01-04  0.657929  0.487649  0.487543  0.095084
2013-01-05  0.920227  0.133245  0.654263  0.772975
2013-01-06  0.862264  0.782905  0.204774  0.883591


Display df index or columns

In [21]:
print(df.index)
print('-'*50)
print(df.columns)

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')
--------------------------------------------------
Index(['A', 'B', 'C', 'D'], dtype='object')


__Note:__ NumPy arrays have one dtype for the entire array, while pandas DataFrames have one dtype per column.

In [22]:
df.to_numpy()

array([[0.37644104, 0.48092886, 0.43453273, 0.39485648],
       [0.46954191, 0.06796759, 0.22156389, 0.20329495],
       [0.38772899, 0.42122243, 0.65196227, 0.69590268],
       [0.65792919, 0.48764857, 0.48754336, 0.09508366],
       [0.92022724, 0.13324476, 0.65426299, 0.77297508],
       [0.86226375, 0.78290457, 0.20477423, 0.88359086]])

__Note:__ DataFrame.to_numpy() does not include the index or column labels in the output.

In [23]:
df2.to_numpy()

array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)