# Pandas


In [1]:
import numpy as np 
import pandas as pd

## Basic data structures in pandas
Pandas provides two types of classes for handling data:
1. `Series`: a one-dimensional labeled array holding data of any type.
    - such as integers, strings, Python objects etc. 

2. `DataFrame`: A two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns. 

## Object creation 

Creatin a `series` by passing a list of values, letting pandas create a default `RangeIndex` 

In [2]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])

In [3]:
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a `DataFrame` by passing a NumPy array with a datetime index using `date_range()` and labeled columns:


In [4]:
dates = pd.date_range("20130101", periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [6]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

Unnamed: 0,A,B,C,D
2013-01-01,-0.20645,0.690361,0.108728,0.869593
2013-01-02,0.268097,-1.159317,-1.152265,0.176963
2013-01-03,-0.188976,1.239838,0.03264,-0.807762
2013-01-04,-0.429091,-1.730086,-0.047493,-0.896067
2013-01-05,-1.391268,-1.873927,1.786262,0.307292
2013-01-06,0.647881,-1.922196,-1.313862,1.70166


Creating a `DataFrame` by passing a dictionary of objects where the keys are the column labels and the values are the column values.

In [7]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


The columns of the resulting `DataFrame` have different `dtypes`:

In [8]:
df2.dtypes

A          float64
B    datetime64[s]
C          float32
D            int32
E         category
F           object
dtype: object

In [None]:
df2.<TAB> #works only in IPython 

SyntaxError: invalid syntax (876754528.py, line 1)

## Viewing data 
Use `DataFrame.head()` and `DataFrame.tail()` to view the top and bottom rows of the frame respectively:

In [10]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,-0.20645,0.690361,0.108728,0.869593
2013-01-02,0.268097,-1.159317,-1.152265,0.176963
2013-01-03,-0.188976,1.239838,0.03264,-0.807762
2013-01-04,-0.429091,-1.730086,-0.047493,-0.896067
2013-01-05,-1.391268,-1.873927,1.786262,0.307292


In [11]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,-0.429091,-1.730086,-0.047493,-0.896067
2013-01-05,-1.391268,-1.873927,1.786262,0.307292
2013-01-06,0.647881,-1.922196,-1.313862,1.70166


Display the `DataFrame.index` or `DataFrame.columns`:

In [12]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [13]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

Return a Numpy representation of the underlying data with `DataFrane.to_numpy()` without the index or column labels:

In [14]:
df.to_numpy()

array([[-0.20645037,  0.69036108,  0.10872841,  0.8695928 ],
       [ 0.26809654, -1.15931736, -1.15226495,  0.17696346],
       [-0.18897579,  1.23983816,  0.03264006, -0.80776166],
       [-0.42909127, -1.73008606, -0.04749342, -0.89606729],
       [-1.39126757, -1.87392667,  1.78626213,  0.30729246],
       [ 0.64788127, -1.92219621, -1.31386153,  1.70166007]])